diff --git a/transcripts/ch8-performance/1.txt b/transcripts/ch8-performance/1.txt new file mode 100644 index 0000000..5ceb8d1 --- /dev/null +++ b/transcripts/ch8-performance/1.txt @@ -0,0 +1,53 @@ +00:01 Now that you know how to work MongoDB, you know how to work its shell, +00:04 what the query syntax is, you've seen PyMongo as well as MongoEngine, +00:07 it's time to turn our attention to tuning MongoDB +00:11 to be the best database it can possibly be. +00:14 We're going to focus on how to make our regular MongoDB server +00:18 a high performance MongoDB database +00:21 and you'll see there's no magic here, a lot of the things that you can do +00:24 are relatively straightforward, and there's a systematic way to go about it. +00:29 I want to start this section by maybe putting a little perspective on it. +00:34 I want to start this section, this chapter, by putting a little perspective out there. +00:41 When people come to NoSql and they start looking for alternative databases +00:44 often the allure of these databases is their performance +00:49 you hear about things like sharding, horizontally scaling them, +00:52 some incredible performance numbers, things like that. +00:55 That may be what you really need, that may be the most important thing +00:58 and certainly if you don't have performance out of your database it's a big problem. +01:03 We're going to certainly figure out how to make our databases faster +01:07 and the variety of techniques that we have available to us in MongoDB. +01:11 That said, your biggest problem probably isn't performance, +01:14 you may have a big data problem, you may have terabytes or petabytes of data +01:19 but most applications don't. +01:22 You may have a performance problem, it may be that you have so much data +01:26 or you are asking such complex queries that it really does take +01:30 very precise tuning and scaling to make it work. +01:33 So we're going to focus on some of these types of things. +01:36 That said, we all have a complexity problem with our application, +01:40 it's always a pain to maintain these databases +01:43 especially when we're working with relational databases, +01:46 you hear about things like migrations and updating your schema +01:49 adding, removing, transforming columns, all of this stuff is really complex +01:53 and it even makes deployment really, really challenging, +01:56 you want to release a new version of something based on SQLAlchemy +01:59 but you need to change the database scheme before it will even run— +02:02 okay, that sounds like it could be a little bit of a problem. +02:05 What you'll see with MongoDB and these document databases is +02:10 one of their biggest benefits is the simplicity that they bring. +02:14 The document structure means there's fewer tables, +02:18 there is much fewer connections between these tables, +02:21 so when you think about the trade-offs and performance and things like that +02:24 keep in mind that probably the biggest benefit +02:27 that you are going to get from MongoDB is you are going to have +02:30 simpler versioning, evolution, maintainability, development story. +02:33 I just want to put that out there, because I know sometimes +02:36 people will say well, I got MongoDB to perform at this speed +02:40 and I cut this other database, and if I tweak it like this and adapt it like that +02:44 maybe I could get it to go a little faster, so maybe we should use that instead. +02:47 And maybe, I don't know, it depends on the situation, +02:50 and this is very abstract, so it's hard to say, but keep in mind +02:53 that one of the biggest things these document databases +02:55 bring to you to the table here, is this simplicity. +02:59 It just so happens we can also make them really, really fast. +03:03 So simple and fast, sounds like a great combination, +03:05 so let's get into this section where we are going to make MongoDB much faster. \ No newline at end of file diff --git a/transcripts/ch8-performance/10.txt b/transcripts/ch8-performance/10.txt new file mode 100644 index 0000000..854c294 --- /dev/null +++ b/transcripts/ch8-performance/10.txt @@ -0,0 +1,92 @@ +00:01 One of the most important things you can do for performance +00:04 in your database and these document databases +00:06 is think about your document design, +00:08 should you embed stuff, should you not, what embeds where, +00:11 do you embed just ids, do you embed the whole thing; +00:14 all of these are really important questions +00:16 and it takes a little bit of experience to know what the right thing to do is. +00:20 It also really depends on your application's use case, +00:24 so something that's really obviously a thing we should consider +00:28 is this service history thing, this adds the most weight to these car objects, +00:34 so we've got this embedded document list field +00:38 so how often do we need these histories? +00:44 How many histories might a car have? +00:46 Should those maybe be in a separate collection +00:49 where it has all the stuff that service record, the class has, +00:52 plus car id, or something to that effect? +00:56 So this is a really important question, +00:59 and it really depends on how we're using this car object, this car document +01:05 if almost all the time we want to work with the service history, +01:07 it's probably good to go in and put it here, +01:10 unless these can be really large or something to that effect, +01:13 but if you don't need them often, you'll consider putting them in their own collection, +01:16 there's just a tension between complexity and separation, +01:20 safety and separation, speed of having them in separate +01:24 so you don't pull them back all the time; +01:26 you can also consider using the only keyword or only operator in MongoEngine +01:30 to say if I don't need it, exclude the service history, +01:34 it adds a little bit of complexity because you often know, +01:38 hey is this the car that came with service history +01:40 or is it a car where that was excluded, things like that, +01:42 but you could use performance profiling and tuning +01:45 to figure out where you might use only. +01:48 Let's look at one more thing around document design. +01:50 You want to consider the size of the document, +01:52 remember MongoDB has a limit on how large these documents can be, +01:56 that's 16 MB per record, that doesn't mean you should think +02:01 oh it's only 10 MB so everything is fine for my document design, +02:05 that might be terrible this is like a hard upper bound, +02:07 like the database stops working after it hits 16 MB, +02:11 so you really want to think about what is the right size, +02:14 so let's look at a couple examples: +02:16 we can go to any collection and say .stats +02:18 and it will talk about the size of the documents and things like that, +02:21 so here we ran db.cars.stats in MongoEngine, +02:25 and we see that the average object size is about 700 bytes, +02:29 there is information about how many there are, and all that kind of stuff, +02:33 but really the most interesting thing for this discussion is +02:35 what is the average object size, 700 bytes +02:38 that seems like a pretty good size to me, it's not huge by any means, +02:42 and this is the cars that contain those service histories, +02:45 so this is probably fine for what we're doing. +02:48 Let me give you a more realistic example. +02:50 Let's think about the Talk Python Training website, +02:52 and the courses and chapters, we talked about them before, +02:56 so here if we run that same thing, db.courses.stats +03:02 you can see that the average object size is 900 bytes for a course, +03:07 and remember the course has the description that shows on the page +03:10 and that's probably most the size, it has a few other things as well, +03:13 like student testimonials and whatnot, +03:16 but basically it's the description and a few hyperlinks. +03:19 So I think this is again a totally good object, average object size. +03:23 Now one of the considerations was I could have taken the chapters +03:27 which themselves contain all the lectures, +03:29 and embedded those within the course, +03:32 would that have been a good idea— +03:34 I think I might have even had it created that way +03:36 in the very beginning, and it was a lot slower than I was hoping for, +03:38 so I redesigned the documents. +03:40 If we run this on this chapter section, you can see +03:43 that the average object size is 2.3 KB, +03:46 this is starting to get a little bit big, on its own it's fine, +03:50 but think about the fact that a course on average has like 10 to 20 chapters, +03:55 so if I embedded the chapters in the course +03:58 instead of putting them to a separate document like I do, +04:02 this is how it actually runs at the time of the recording, +04:04 then it would be something like these courses would be +04:07 24 up to maybe 50 KB of data per entry, +04:12 think about that you go to like the courses page +04:15 and it shows you a big list of all the courses +04:17 and there might be 10 or later 20 courses, +04:20 we're pulling back and deserializing like megabytes of data +04:24 to render a really, really common page, that is probably not ok, +04:28 so this is why I did not embed the chapters and lectures inside the course, +04:34 I just said okay, this is the breaking point +04:37 I looked at the objects' size I looked at where the performance was +04:41 and I said you know what, really it's not that common +04:44 that we actually want more than one chapter at a time, +04:46 but it is common we want lectures, so it's probably the right partitioning, +04:51 but you build it one way, you try it, it doesn't work, +04:53 you just redesign your class structure, recreate the database and try it again, +04:57 but you do want to think about the average object size +05:00 and you can do it super easy with db.colection name.stats. \ No newline at end of file diff --git a/transcripts/ch8-performance/11.txt b/transcripts/ch8-performance/11.txt new file mode 100644 index 0000000..39f5620 --- /dev/null +++ b/transcripts/ch8-performance/11.txt @@ -0,0 +1,35 @@ +00:01 One of the last simple tools you have in your tool belt +00:04 when we're working with MongoEngine or even in PyMongo, just different api +00:08 is this ability to restrict the data returned from the document. +00:13 In our car object we've got the make, the model, the id, some other things, +00:17 we've got the engine which is a subdocument or an embedded document there +00:22 and then the biggest thing that contributes to the size +00:25 is actually the service history which might be many service record entries. +00:30 If really all we care about is the make, the model and the id of a car, +00:34 and we're going to create like a list or something like that, +00:36 we can use this .only operator here +00:39 and dramatically reduce the amount of data returned from MongoDB +00:43 so this is an operation that we saw when we first learned about the api +00:46 actually operates at the database level, +00:48 you're able to restrict the elements returned from the queries +00:52 so when it gets back to MongoEngine +00:54 basically it looks at what comes back and it says, +00:57 alright, I need to create some cars +00:59 and I need to set their make to this, the model to that +01:01 and their id to whatever comes back, +01:03 and then nothing else is transferred, deserialized, anything. +01:05 So you can, if you don't need them, exclude the heavyweight things +01:09 like the engine and the service histories for this particular use case. +01:12 So this is kind of like select make, model, id from table such and such in SQL, +01:20 and it really can improve the performance +01:22 especially when you have either large documents or many documents. +01:27 So you've seen a lot of different ways to turn the knobs of MongoDB +01:31 to make it faster and to use MongoEngine to control those knobs. +01:35 Now this applies to a single individual database server +01:38 and if you use this to tune your database, +01:41 you can actually make the need for having a sharded cluster +01:45 and all these scaling things possibly go away, +01:48 but even if you do end up with one of these more interesting topologies, +01:52 all of these techniques still apply and they'll make your cluster go faster, +01:56 they'll make your replicas go faster, all of those things. +01:59 What you've learned here are really the foundational items of making MognoDB go fast. \ No newline at end of file diff --git a/transcripts/ch8-performance/2.txt b/transcripts/ch8-performance/2.txt new file mode 100644 index 0000000..2fa2e57 --- /dev/null +++ b/transcripts/ch8-performance/2.txt @@ -0,0 +1,124 @@ +00:01 You've heard MongoDB is fast, really fast, +00:03 and you've gone through setting up your documents and modeling things, +00:07 you inserted, you imported your data, and you're ready to go; +00:11 and you run a query and it comes back, +00:13 so okay, I want to find all the service histories +00:15 that have a certain price, greater than such and such, how many are there— +00:18 apparently there's 989, but it took almost a second to answer that question. +00:22 So this is a new version of the database, so we are going to talk about it shortly. +00:27 Instead of having just a handful of cars and service histories +00:30 that we maybe entered in our little play-around app, +00:32 it has a quarter million cars with a million service histories, something to that effect. +00:38 And the fact that we were able to answer this query +00:41 of how many sort of nested documents had this property +00:44 in less than a second, on one hand that's kind of impressive, +00:48 but to be honest, it feels like MongoDB is just dragging, +00:51 this is not very special, this is not great. +00:55 So this is what you get out of the box, if you just follow what we've done so far +00:59 this is how MongoDB is going to perform. +01:02 However, in this chapter, we're going to make this better, a lot better. +01:06 How much— well, let's see, we're going to make it fast, +01:09 here's that same query after applying just some of the techniques of this chapter. +01:13 Notice now it runs in one millisecond, not 706 milliseconds. +01:17 So we've made our MongoDB just take off, +01:21 it's running over 700 times faster than what the default MongoDB does. +01:26 Well, how do we do it, how do we make this fast? +01:30 Let's have a look at the various knobs +01:33 that we can turn to control MongoDB performance. +01:35 Some of which we're going to cover in this course, +01:38 and some are well beyond the scope of what we're doing, +01:40 but it's still great to know about them. +01:42 The first knob are indexes, so it turns out +01:44 that there are not too many indexes added to MongoDB by default, +01:48 in fact, the only index that gets set up is on _id +01:52 which is basically an index as well as a uniqueness constraint, +01:55 but other than that, there are no indexes, +01:57 and it might be a little non intuitive at first, when you first hear about this, +02:02 but indexes and manually tuning and tweaking and understanding the indexes +02:06 in document databases is far more important +02:10 than understanding indexes in a third normal form designed relational database. +02:15 So why would that be? That seems really odd. +02:18 So think about a third normal form database, +02:21 you've broken everything up into little tiny tables that link back to each other +02:24 and they often have foreign key constraints traversing all of these relationships, +02:28 well, those foreign key constraints go back to primary keys on the main tables, +02:33 those are indexed, every time you have one of those relationships +02:35 it usually at least on one end has an index on that thing. +02:39 In document databases, because we take some of those external tables +02:43 and we embed them in documents, +02:45 those subdocuments while they kind of logically play the same role +02:49 there is no concept of an index being added to those. +02:52 So we have fewer tables, but we still have +02:55 basically the same amount of relationships +02:57 and because of the way documents work, +02:59 we actually have fewer indexes than we do in say a relational database. +03:04 So we're going to see that working with understanding +03:07 and basically exploring indexes is super, super important +03:09 and that's going to be the most important thing that we do. +03:12 In fact, the MongoDB folks, one of their things they do is +03:16 they sell like services, consulting and what not to help their customers +03:19 and you could hire them, say hey I got this big cluster and it's slow +03:24 can you help me make it faster— the single most dramatic thing that they do, +03:30 the thing that almost always is the problem is incorrect use of indexes. +03:34 So we're going to talk about how to use, discover and explore indexes for sure. +03:38 Next is document design, all that discussion about to embed or not to embed, +03:43 how should you relate documents, this is sort of the beginning of this conversation, +03:47 it turns out the document design has dramatic implications across the board +03:52 and we did talk quite a bit about this, but we'll touch on it again in this chapter. +03:56 Query style, how are you writing your queries, +04:01 is there a way that you could maybe restructure a query, +04:05 or ask the question differently and end up with +04:08 a more high performance query, maybe one example misses an index +04:12 and the other particular example uses a better index or something to this effect. +04:16 Projections and subsets are also something that we can control, +04:20 remember when we talked about the Javascript api +04:23 we saw that you could limit your set of returned responses +04:26 and this can be super helpful for performance; +04:29 you could write a query where it returns 5 MB of data +04:32 but if you restrict that to just the few fields that you actually care about +04:36 maybe its all K instead of 5 MB, it could be really dramatic, +04:40 depending on how large and nested your documents might be. +04:43 We're going to talk about how we can do this, especially from MongoEngine. +04:46 These are the knobs that we're going to turn in this course, +04:49 these are the things that will work even if you have a single individual database, +04:53 so you should always think about these things, +04:56 some of them happen on the database side, document design, indexes, +04:59 and the other, maybe is in your application interacting with the database, the other two, +05:04 but MongoDB being a NoSql database, allows for other types of interactions, +05:08 other configurations and network topologies and so on. +05:11 So, one of the things that it supports is something called replication, +05:14 now replication is largely responsible for redundancy and failover. +05:19 Instead of just having one server I could have three servers, +05:22 and they could work in triplicate, basically one is what's called the primary, +05:26 and you read and write from this database, +05:28 and the other two are just there ready to spring into action, +05:31 always getting themselves in sync with the primary, +05:34 and if one goes down, the other will spring in to be the primary +05:36 and they will sort of fix themselves as the what used to be the primary comes back. +05:40 There is no performance benefit from that at all. +05:43 However, there are ways to configure your connection to say +05:46 allow me to read not just from the primary one, but also from the secondary, +05:50 so you can configure a replication for a performance boost, +05:53 but mostly this is a durability thing. +05:55 The other type of network configuration you can do is what's called sharding. +05:59 This is where you take your data instead of putting all into one individual server, +06:02 you might spread this across 10 or 20 servers, +06:06 one 20th, hopefully, of evenly balanced, +06:09 across all of them, and then when you issue a query, +06:12 can either figure out where if it's based on the shard key, +06:15 which server to point that at and let that one +06:17 handle the query across the smaller set of data, +06:20 or if it's general like show me all the things with greater than this for the price, +06:23 it might need to fan that out to all 20 servers, +06:26 but it would run on parallel on 20 machines. +06:30 So sharding is all about speeding up performance, +06:32 especially write performance, but also queries as well, +06:35 so you can get tons of scalability out of sharding, +06:38 and you can even combine these like, when I said there is 20 shards, +06:41 each one of those could actually be a replica set, +06:43 so there is a lot of stuff you could do with network topology +06:46 and clustering and sharding and scaling and so on. +06:48 We're not turning those knobs in this course, +06:50 I'll show you how to make individual pieces fast, +06:52 the same idea applies to these replicas and shards, +06:54 just on a much grander scale if you want to go look at them. \ No newline at end of file diff --git a/transcripts/ch8-performance/3.txt b/transcripts/ch8-performance/3.txt new file mode 100644 index 0000000..8c22450 --- /dev/null +++ b/transcripts/ch8-performance/3.txt @@ -0,0 +1,30 @@ +00:01 Let's return to our dealership. +00:03 This was the example we started back when we began the MongoEngine section, +00:05 and it turns out the dealership is super popular now. +00:08 Before we just had a couple of cars, now we have a quarter million cars +00:11 in our database, we have a 100 thousand owners, +00:15 I don't believe we talked about owners before in terms of what that looks like in our code, +00:19 but I've added this concept of owners +00:21 so we can ask interesting like cross-document related type questions, +00:25 and we'll look at the details of them, when we get to the code, in just a moment. +00:28 Each one of these owners, these 100 thousand owners, +00:31 owns an average of 2.5 cars, this is kind of like collectors, right, +00:36 not a standard person that drives to work or whatever, these are Ferraries, +00:39 and each car has on average about 5 service records +00:43 and that could be like a new engine, +00:45 change the tires, change the spark plug, whatever; +00:48 in particular, there's about 1.25 million service histories, +00:51 so when we ask questions about like those nested documents +00:54 that have to do with service histories like customer ratings and price, +00:57 you can see that that is really quite impressive I think, +01:01 we got the quarter million cars and within those quarter million documents +01:04 interspersed are 1.25 million service histories. +01:07 So our job is to make a lot of the typical things that we might ask this database, +01:11 the queries will run to do so in a couple of milliseconds, not in seconds, +01:17 so that's going to be what the basic goal of this whole section is. +01:23 Now, the other things you might want to know is +01:25 we've got about 180 megs of data +01:28 and on average each document of the various document kinds, +01:30 all average together is about 500 bytes per document. +01:33 So let's return to or example slightly transformed +01:37 and see how it's performing now and let's make it fast. \ No newline at end of file diff --git a/transcripts/ch8-performance/4.txt b/transcripts/ch8-performance/4.txt new file mode 100644 index 0000000..cbede0b --- /dev/null +++ b/transcripts/ch8-performance/4.txt @@ -0,0 +1,42 @@ +00:01 Here we are in the github repository for this course +00:04 and notice we have this data section +00:06 and in here I have this thing called dealership db 250 K +00:09 that is this data that I just talked about, +00:12 with the 250 thousand cars, 100 thousand owners, that sort of thing. +00:16 So I'm going to put that over here on the desktop and unzip it +00:21 and if we look in here, you'll see that there's a cars collection and an owners collection, +00:29 and I don't believe we've spoken about how to get this data into MongoDB, +00:33 so let's go over here and I'll use RoboMongo, +00:37 notice we have these two dealership things that I have been playing with +00:42 and I want to create one called like test dealership or something to that effect. +00:46 We're going to restore this— how do we do that, +00:51 we'll go like this, we'll say mongorestore +00:55 and this is the way that we get this exported data imported into MongoDB, +01:00 now, the first thing you have to ask yourself is this additive to the database, +01:05 if it exists do you want to also insert this, +01:07 or do you want to have this be the database and replace anything it exists, +01:11 we want this one to replace existing data +01:14 so I'll say --drop and then I need to tell it what database +01:18 so I'll say db and I could say what you should say is this dealership, +01:23 but just because I don't want to wipe away what I currently have, +01:28 I'll say dealership example, but the code that you're going to run +01:31 expects the name of the database to be just a dealership; +01:34 and then I need to give it the folder that it's going to work from, +01:37 so I am just going to give it this folder like so, all right. +01:41 So mongorestore, drop to replace the data -- db to name it, and then the location, +01:46 we hit go, and it's going to go cranking away on this +01:50 and you can see it's inserting, inserting and done, +01:53 that was really fast for like close to 1.5 million records. +01:57 All right, so let's go over here and refresh +02:00 and here's our example and we can see that we have our collection, +02:03 here's our cars and we could just ask how many cars are there. +02:07 Notice, there is that many, and if we change this to owners, +02:11 remember you can also write it like this, owners like this, +02:16 Now notice, I think the restore data we got here, +02:19 you want to drop this index right here, I have it only have the id indexes, ok +02:26 so that's this example I just restored, +02:28 we're going to work with something you can imagine is exactly the same. +02:33 So we're going to work with this dealership code +02:36 but the way it got there, I'll show you the app I used to originally create it, +02:39 and then I just restored it using mongorestore just as I showed you up here. +02:43 So the way to generate the data that goes into mongorestore, you say mongo dump. \ No newline at end of file diff --git a/transcripts/ch8-performance/5.txt b/transcripts/ch8-performance/5.txt new file mode 100644 index 0000000..fae1172 --- /dev/null +++ b/transcripts/ch8-performance/5.txt @@ -0,0 +1,104 @@ +00:01 Let's explore this slightly updated version of our code. +00:04 Here we are in the github repository, +00:07 and I am in the source folder and I've added an 08_perf section, +00:10 and we have the starter_big_dealership and we have the big_dealership +00:15 it even has instructions here to tell you basically how to restore +00:17 that database we did just in the previous video. +00:20 This one is going to be a snapshot of how this chapter starts, +00:24 it's what we're starting from now and will remain that way; +00:27 here we're going to take basically a copy of that one +00:29 and evolve it into the fast high performance version, +00:33 so let's go over here and see what we've got. +00:35 Now, we have a few things that are slightly different, +00:38 the car is basically unchanged from before +00:41 although I added a little comment about how do we get to the owners. +00:45 The one thing that is new here, in terms of the model is this owner idea, +00:50 so cars can now have an owner +00:52 and how do we know which cars are owned by this owner +00:57 is we have a list of object ids, those object ids are the object ids of the cars +01:02 so we're going to push the ids of the cars that are owned here +01:06 I guess we could run it as a many to many or one to many relationship, +01:10 just depending on how we treat the owner, but theoretically, +01:13 we can have owners where there is a single car that is multiple owners +01:17 and there are owners that own multiple cars, and we can manage it this way, +01:21 you almost never see like a car to owner intermediate table, +01:25 so you're almost always going to have something like +01:27 those ids are either embedded in the owner or in the car, +01:32 or under rare circumstances both. +01:35 So here's how we refer back to the cars, +01:38 then we have a few basic things like the name, +01:41 when was this owner created, how many times have they visited and things like that. +01:45 We want to call it owners in the database and it's just this core collection, +01:49 so other than that, there's not a whole lot going on here, +01:51 let's look over here, we now have these services, +01:54 I've taken all the car queries and moved them down here +01:57 do you want to create a car, you call this function, +01:59 do you want to record a customer visit, here we can go to the owner +02:03 and we can use this increment operator +02:06 to increment the number of visits in place. +02:09 Find cars by make, find owner by name and so on. +02:16 Number of cars with bad service, a lot of this stuff is what we wrote previously; +02:20 there was the program thing that we ran over here that was interactive +02:23 and I've replaced that with a few things, +02:25 one is this db stats and you can run this and it will tell you +02:28 like how many cars are there, how many owners are there, +02:31 what's the average number of histories, +02:33 this is basically those stats that I presented to you before, +02:36 this takes a while to run on this database, I don't recommend you run it +02:39 but if you want to just run it and see what you get you can. +02:42 The database was originally created using this script, +02:46 I am using something interesting you may not have heard about, +02:49 I am using this thing called Faker, so down here +02:53 Faker lets you create this thing and I'm seeding it +02:59 so it always generates exactly the same things, +03:01 I'm seeding random and fake and you can see down here +03:04 it's creating the owners and you can ask it for things like +03:06 give me a fake name, give me a fake date +03:08 between these two dates, things like that. +03:12 Similarly with cars, we're using random to get a hold of a lot of the numbers +03:15 then we can use fake for anything else we might. +03:18 We ran this, with the right amount of data, it'll build it all up for us, +03:25 so for some reason if you need to recreate it +03:27 run this low data thing, you can have it create a small one, +03:30 if you comment, uncomment that or a large one +03:32 if you only run it with those settings. +03:34 Those are all good, this is like the foundation and this is where we are. +03:37 Next, we're going to ask interesting questions of this database +03:43 and we want to know how long those questions take to answer, +03:46 so I've written this super simple function called time +03:48 you pass it a message and a function, +03:50 it will time how long the function takes to run +03:53 and then print out the message along with the time in terms of milliseconds. +03:57 And then we're going to go through +03:59 and we're going to ask interesting questions here +04:01 like how many owners, how many cars, who is the 10 thousandth owner, +04:05 notice the slicing here to give us a slice of item of length one +04:10 and then we'll just access it, +04:12 and then we can start asking interesting questions like +04:14 how many cars are owned by the 10 thousandth owner, +04:17 or if we go down here, how many owners own the 10 thousandth car, +04:21 so ask it in the reverse direction. +04:23 Here we want to find the 50 thousand owner by name, +04:26 so yes, technically have them but the idea is +04:30 we want to do a query based on the name field +04:32 and we originally won't have any performance +04:35 around these types of queries so it should be slow. +04:38 This one, how many cars are there with expensive service +04:40 this was the one with the snail +04:43 and in one of the first videos in this chapter, +04:46 I showed you look this takes 700 milliseconds to run to ask this question +04:49 how many cars have a service history with a price greater than 16800. +04:55 So we're going to be to be able to ask all of these questions +04:58 and this program will let us explore that +05:02 and we'll see how to add indexes +05:04 and I'll show you how to add indexes in the shell +05:06 and how to add them in MongoEngine, and MongoEngine is really nice +05:09 because as you evolve your indexes, as you add new ones +05:13 simply deploying your Python web app +05:15 will adapt the database that it goes and finds +05:18 to automatically upgrade to those indexes, so it's really really nice. +05:22 So here you can see we're going to run this code and ask a bunch of questions +05:25 we could load the data from here, we could generate the data, +05:28 but you're much better off importing the data from that zip file +05:32 because this takes like half an hour to run, +05:35 you saw that zip takes like five seconds. diff --git a/transcripts/ch8-performance/6.txt b/transcripts/ch8-performance/6.txt new file mode 100644 index 0000000..3b530e8 --- /dev/null +++ b/transcripts/ch8-performance/6.txt @@ -0,0 +1,165 @@ +00:01 Let's go ahead and run this code, you've seen the minor changes +00:04 like the addition of this concept of an owner, +00:06 and how we generated all this data, and how you can restore it. +00:09 Let's go ahead and run it, and see what's happening. +00:13 Let's look at this from two perspectives, let's begin over actually in Robomongo, +00:17 so we're going to ask the question, basically how many owners own a certain car +00:21 the idea is more or less we're going to call this function which goes right here, +00:25 really what we're looking for is this query, +00:28 find me all of the owners where this car id is in their car ids collection, +00:33 just generate and deserialize that. +00:37 The other one that we're going to focus on is +00:39 show me the cars with the expensive service history, +00:42 how many cars or what cars had some kind of service +00:46 that cost over 16800 dollars. +00:49 Let's begin by looking at those in Robomongo. +00:54 Here we have this concept, we could simplify this a little bit, but it doesn't matter, +00:57 cars here's the service history, let's go to the price +01:00 where that's greater than 16800, how many of them are there. +01:05 If I run this, notice, it took a while to come back, +01:08 run it again, here's the speed right there, 0.724 sec, 0.731, 0.733, +01:14 so it's pretty reliably taking around 700 milliseconds to answer that question. +01:19 We're going to come back to this. +01:22 Here's a more interesting example, like go and randomly grab a car +01:25 somewhere deep in the list, in this case I put 61600, +01:30 grab that car and then find me all the owners, +01:33 where that car id appears in their id list, and then we'll just dump that out, +01:38 by saying var it doesn't appear if you just state the name it will show up down here, +01:43 so make sure to deselect it and run this, +01:45 and this is actually surprisingly fast, given all the stuff that's going on here, +01:48 but it's taking still about 75, 80 milliseconds to run here, +01:53 which, I don't know, maybe in your database +01:55 going across a 100 thousand records 80 milliseconds seems decent, +01:59 I can tell you in MongoDB 80 milliseconds is terrible +02:02 you should really think about making something that's 80 milliseconds faster +02:06 it's not always possible you can do it, +02:08 but most of the queries as we'll see are possible. +02:11 Let's take this one and just try to understand what's happening here +02:16 and then we're going to go look at it in Python, +02:19 but let's just explore it here in the shell for just a moment. +02:21 Why is this taking 700 milliseconds? +02:24 MongoDB has this way to basically ask how are you running this query, +02:29 and the way you do that is you say explain, like so, +02:35 so I can say this query instead of giving me a result tell me how you're running it, +02:38 if I unselect it, it just runs the selected stuff if there's something there, +02:42 so we can go and look at it in this mode, +02:44 so it says okay, here's what the query planner found for you, +02:47 we've parsed this query, and this is something +02:50 it's basically what went into the find, +02:52 it also might have something to the effect of like a sword +02:55 and other things that are happening, but this is a simple query. +02:58 Look down here, see this winning plan, stage column scan, +03:02 that is bad, that is really, really bad. +03:05 Also notice the rejected plan, so if there are multiple indexes +03:08 and other things that could have done +03:10 it might have attempted a bunch of them and said no, no, no this is the best, +03:13 let's see it doesn't seem to tell us any more about what it did there, +03:18 like sometimes it'll tell you how many records it scanned and things like this, +03:21 but it's just basically reading entirely in the forward direction +03:25 over this and just doing a comparison. +03:27 So that's why this was taking 700 milliseconds +03:32 as it was literally reading and comparing 100 thousand entries +03:36 or actually more, remember their is 1.2 million search histories +03:40 across those 250 thousand cars, so not 100 thousand, +03:43 1.2 million records it scanned over, that's bad, you don't want that. +03:47 So what we can do is we can actually add an index, +03:51 now there's two ways to add an index, +03:54 but before I add the index, let's go over here +03:58 just explain is super, super valuable, +04:00 any time something is slow we're going to explain +04:03 there's actually way to turn on profiling and say log all of the queries +04:07 that you see MongoDB that are slower than x, +04:11 you providing them like say 10 milliseconds might be great, +04:14 show me all the queries that take more than 10 milliseconds +04:17 and then you can drop them in here, put an explain +04:19 and then start creating indexes to make them faster. +04:22 So just google mongodb profile enable slow queries +04:26 or something like this, it's pretty straightforward. +04:29 Now let's run this code, we're asking a lot of questions +04:31 what we want to run is q and a, so we go over here and just right click and say run, +04:37 notice some of these things are taking time, +04:42 the database might be cold, it might have not loaded that stuff, +04:46 so let me run it one more time just to be fair, +04:49 there's a few things that are already really fast, and that's cool, +04:55 so let's go here and review, how many owners are there— +04:58 well, I can tell you it doesn't show the answer +05:01 it just sort of says this is the question I'm asking here is how long it takes. +05:04 Three milliseconds, that is solid, how many cars— half a millisecond. +05:07 That's pretty solid, I don't think we can improve the count on the entire collection +05:11 but this one, find the 10 thousandth owner— not good, +05:14 so let's see how many cars are owned by that person— +05:19 this is pretty fast actually, this is surprisingly fast, +05:23 how many owners this can have— 66 milliseconds +05:26 that's the one we were looking at in there. +05:29 I'm going to take these numbers and put them over here, +05:32 let's say, this will be Without indexes +05:36 we're going to get this, we don't really care about the exit code, do we? +05:41 With indexes, and we're going to kind of iterate on this a little bit +05:45 so let's begin over here, and we're going to talk about +05:49 how we can add an index in MongoDB and then for the most part +05:55 do this in MongoEngine because it's really part of the way our application works, +06:00 what the indexes are, and it's better to make that part of our document +06:03 then kind of do a separate database setup step; +06:07 we could create a script in Javascript and run it, +06:09 it will do these things and that may be fine, but let's go over here and work on this. +06:14 Again we had the count, here's the almost 800 milliseconds, +06:19 let's go over here and just I'll take this, I'll make a copy, +06:24 +06:28 so here is what we can do, instead of doing the find operation +06:31 we can say create index, +06:35 and then we have the thing that we're doing the query on, +06:38 most the time this is one item but you can have composite indexes +06:43 they are a little more nuance so we'll talk about them later, +06:45 but let's just do this one, we want to be able to query by service history's price +06:52 Here we can put one of two things, one or minus one, +06:56 what do you want the default sort, descending or ascending? +06:59 A lot of times it doesn't really matter, +07:01 it can read from the back or it can read from the front, whatever, +07:04 you saw the forward direction on our column scan for example. +07:06 So over here we could say one, this creates an index, there's no count; +07:09 the other thing we can do is we can give it a name +07:13 so we can come over here and say name is search by service history price, +07:24 so if we go look in this little indexes, we'll see the name here, +07:27 we can also say run in the background, +07:30 if I don't say that it's going to block the database until the index is generated, +07:33 if you're doing this in production, and you have tons and tons of data +07:36 maybe background is the way to go. +07:38 Okay, anyway let's go ahead and run this and see what happens. +07:41 Notice the pause, this is it's actually computing the index +07:44 right now the database is effectively down, now it's back, +07:47 what do we get ok, we created collection automatically know it already existed +07:51 a number of indexes before was one, now we have two +07:54 and everything was a ok so if I refresh, +07:57 +07:59 here's that index and I can actually edit this over here in Robomongo, +08:05 go for the advanced properties, here is the create index and background +08:09 whether it's sparse, how long it lives, +08:11 whether it's based on text search or whatever, but here's just the basic thing. +08:15 +08:18 We've added this index, remember this took 800 milliseconds +08:21 ask the same question now, boom, 8 milliseconds. +08:24 Ask it one more time, 2, here we go, 2, 2, 2, 3, 2, 2, +08:28 right, the screen sharing is probably put in a pretty heavy load on the server +08:32 that's also the database server, right but still, +08:35 we're getting it down 350, 400 times faster by adding that. +08:39 Now if I go back and I ask that question explain +08:42 now we get something way better, winning plan is index scan +08:50 index name search by service history price, that is really awesome; +08:57 that means we're using our index which is so much faster. +09:02 There was no rejected plans, so it only found one index +09:06 it tried to use it if found that it was awesome, it's very happy. +09:09 +09:16 Go back to my account more time, +09:21 boom 2 milliseconds, and that's a really good answer, +09:24 let's go run our Python code and see what answers we get now, +09:27 that was already faster, let's go over here +09:32 and load car name and ids with expensive prices and spark plugs, +09:38 20 milliseconds this is actually a pretty complicated query +09:43 we'll get into cars with expensive service, 1.9 milliseconds. +09:47 This is exactly what we saw in Robomongo, +09:51 so over here in MongoEngine, we're getting essentially the same results— how cool is that? +09:56 Very nice, we're going to go through and in Python from now on +10:02 we're going to add the necessary index to start making these +10:05 almost all of these run super fast, all of them run fast +10:09 some of them we can get incredibly fast, like one millisecond, +10:11 others not quite that fast, but we'll still do good on all of them. \ No newline at end of file diff --git a/transcripts/ch8-performance/7.txt b/transcripts/ch8-performance/7.txt new file mode 100644 index 0000000..a5b7f8c --- /dev/null +++ b/transcripts/ch8-performance/7.txt @@ -0,0 +1,298 @@ +00:01 Now that you've seen how to create indexes in the shell in Javascript effectively, +00:04 let's go and see how to do this in MongoEngine. +00:07 I think it's preferable to do this in MongoEngine because that means +00:11 simply pushing your code into production will ensure +00:14 that the database has all the right indexes set up for to operate correctly. +00:19 You theoretically could end up with too many, +00:21 if you have one in code and then you take it out +00:23 but you can always manage that from the shell, +00:26 this way at least the indexes that are required will be there. +00:29 I dropped all the indexes again, let's go back through our questions here +00:33 and see how we're doing. +00:36 It says how many owners, how many cars, +00:38 this is just based on the natural sort however it's in the database +00:41 there's really nothing to do here, +00:44 but this one, find the 10 thousandth car by owner, let's look at that; +00:48 that is going to basically be this name, we'll use test, +00:55 it doesn't really matter what we put here +00:57 if we put explain, this should come back as column scan or something like that, +01:01 yeah, no indexes, okay, so how long did it take to answer that question? +01:06 Find the 10 thousandth owner by name, +01:12 it didn't say by name, I'll go and add by name, +01:16 well that took 300 milliseconds, well that seems bad +01:21 and look we're actually using sorting, +01:24 we're actually using paging skip and limit those types of things here, +01:27 but in order for that to mean anything, we have to sort it, +01:31 it's really the sort that we're running into. +01:34 Maybe I should change this, like so, +01:38 sort like so, we could just put one, I guess it's the way we're sorting it, +01:47 so here you can see down there the sort pattern name is one +01:49 and guess what, we're still doing column scan. +01:53 Any time you want to do a filter by, a greater than, an equality, +01:56 or you want to do a sort, you need an index. +01:59 Let's go over to the owner here, this is the owner class +02:04 and let's add the ability to sort it by name or +02:08 equivalently also do a filter like find exactly by name, +02:12 so we're going to come down here +02:14 we're going to add another thing to this meta section, +02:16 and we're going to add indexes, +02:20 and indexes are a list of indexes, +02:25 now this is going to be simple strings +02:28 or they can be complex subdictionaries, +02:31 for composite indexes or uniqueness constraints, things like that, +02:34 but for name all we need is name. +02:38 Let's run this, first of all, let's go over here +02:41 and notice, if I go to owners and refresh, no name, +02:46 let's run this code, find the 10 thousandth owner by name, +02:52 19 milliseconds, that's pretty good, +02:55 let me run it one more time, +02:57 15 yeah okay, so that seems pretty stable, +03:00 and let's go over here and do a refresh, hey look there's one by name; +03:03 we can see it went from what was that, +03:08 something like 300 milliseconds to 15 milliseconds, so that's good. +03:11 How many cars are owned by the 10 thousandth owner, +03:15 so that's 3 milliseconds, but let's go ahead and have a look at this question anyway. +03:19 How many cars are owned by the 10 thousandth owner, +03:22 so here's this function right here that we're calling +03:25 it doesn't quite fit into a lambda expression, so we put it up here +03:28 so we want to go and find the owner by id, +03:30 that should be indexed right, that should be indexed right there +03:34 because it's the id, the id always says an index, +03:36 and now we are saying the id is in this set, +03:40 so we're doing two queries, but both of them are hitting the id thing, +03:44 so those should both be indexed and 3 milliseconds, +03:47 well that really seems to indicate that that's the case. +03:50 How many owners own the 10 thousandth car, that is right here. +03:54 So we'll go find the car, ask how many owners own it. +03:59 Now this one is interesting, so remember when we're doing this +04:02 basically this in query, let's do a quick print of car id here, +04:11 so if we go back over to this, we say let's go over to the owners +04:17 save your documents, so this is going to be car ids, +04:21 it's going to have an object id of that, +04:26 all right, so run this, zero records, apparently this person owns nothing, +04:33 but notice it's taking 77 milliseconds, we could do our explain again here +04:37 and column scan, yet again, not the most amazing. +04:43 So what we want is we want to have an index on car ids, right +04:48 because column scan, not good, +04:50 I think it's not really telling us in our store example +04:53 but for the find it definitely should be. +04:55 So we can come back to our owner over here, +04:58 let's add also like an index on car_ids, +05:02 If we'd run this once again, just the act of restarting it +05:05 should regenerate the database, how long did it take over here— +05:09 a little late now isn't it, because I did the explain, +05:13 I can look at this one, how many cars, +05:16 how many owners does the 10 thousandth car have, +05:19 66 milliseconds, if we look at it now— +05:22 how many owners own the 10 thousandth car, 1.9 milliseconds, +05:29 so 33 times faster by adding that index, excellent, +05:34 find the 50 thousandth owner by name, that's already done. +05:38 +05:40 Alright we already have an index on owners name so that goes nice and quick, +05:45 and how is this doing, one millisecond perfect, +05:48 this one is super bad, the cars with expensive service 712 milliseconds, +05:52 alright so here, we're looking at service history +05:56 and then we're navigating that .relationship, that hierarchy, +06:00 with the double underscore, going to the price, +06:02 greater than, less than, equal it doesn't matter, +06:05 we're basically working with this value here, this subdocument. +06:08 Let's go over to the car and make that work, +06:11 now the car doesn't yet have any indexes but it will in a second, +06:14 so what we want to do is represent that here +06:17 and in the the raw way of discussing this with MongoDB +06:21 we use . (dot) not double underscore, so . represents the hierarchy here. +06:25 Let's run that again, notice expensive service, 712, +06:30 cars with expensive service, instead of 712 we have 2.4 milliseconds, +06:39 now notice that first time I ran it there is was a pause, +06:42 the second time it was like immediate, +06:45 and that's because it basically was recreating that index +06:47 and that pause time was how long that index took to create. +06:51 So here we have cars with expensive service, +06:53 now we're getting into something more interesting, look at this one with spark plugs, +06:58 we're querying on two things, we're querying on the history and the service, +07:04 let's actually put this over in the shell so we can look at it. +07:07 +07:19 I've got to convert this over, do the dots there, +07:23 this is going to be the dollar greater operator, colon, like so, +07:30 all right, so we're comparing that service history.price +07:35 and this one, again because you can't put dots in normal json, +07:39 do the dot here and quotes, and this one is just spark plugs, +07:46 alright, let's run this, okay 22 milliseconds, +07:52 how long is it taking over here— 20 milliseconds, +07:56 so that's actually pretty good and the reason I think it's pretty good is +07:59 we already have an index on this half +08:02 and so it has to just basically sort the result, let's find out. +08:05 +08:11 Winning plan, index on this one, yes, exactly +08:14 so this one is just going to be crank across there +08:18 but we're going to use at least this index here, this by price +08:22 so that gets part of the query there. +08:25 Now maybe we want to be able to do a query just based on the description +08:30 show me all the spark plugs, well that's a column scan, +08:33 so let's go back and add over here one for the description. +08:40 Now how do I know what goes in this part, +08:44 see I have a service history here, if we actually look at the service record object +08:49 it has a price and description, right +08:52 so we know that that results in this hierarchy of +08:54 service history.price, service history.description. +08:57 If we'd run this again, it will regenerate those and let's go over here +09:01 and run this, and let's see, now we're doing index scan on price, +09:09 what else do we got, rejected plans, okay so we got this and query +09:18 and it looks like we're still using the— yes, oh my goodness, +09:24 how about that for a mistake, comma, so what did that do +09:28 that created, in Python you can wrap these lines and that just created this, +09:33 and obviously, that's not what we want, that comma is super important there. +09:38 So let me go over here and drop this nonsense thing, +09:41 try this again, I can see it's building index right now, +09:47 okay, once again we can explain this, okay great, +09:51 so now we're using price and actually we use the description this time +09:58 and you can see the rejected plan is the one that would have used the price, +10:04 so we're using description, not price, +10:06 and how long does it take to run that query— 7.9 milliseconds, that's better +10:13 but what would be even better still is if we could do +10:16 the description and price as a single thing. How do we do that? +10:22 This gets to be a little trickier, if we look at the query we're running, +10:25 we're first asking for the price and then the description, +10:30 so we can actually create a composite index here as well, +10:35 and we do that by putting a little dictionary, saying fields +10:39 and putting a list of the names of the fields +10:44 and you can bet those go like this, +10:48 now this turns out to be really important, the order that you put them here +10:52 price and the description versus description price, for sorting, +10:56 not so much for matching, run it one more time, +11:00 alright, expensive cars with spark plugs, +11:04 +11:07 here we go, look at that, less than one millisecond, +11:10 so we added one index, it took it from like 66 milliseconds down to 15, +11:16 and then, we added the description one, it turns out that was a better index +11:21 and it took it from 15 to 9, we added the composite index, +11:24 and we took it from 9 to half a millisecond, a 0.6 milliseconds, that is really cool. +11:31 Notice over here, this got faster, let's go back and look at what that is. +11:36 Load cars, so this is the one we are optimizing +11:40 and what are we doing here— let me wrap this so you can see, +11:43 we're doing a count, okay, we're doing a count +11:46 and so it's basically having the database do all the work +11:48 but there's zero serialization. +11:52 Now in this one, we're actually calling list +11:55 so we're deserializing, we're actually pulling all of those records back +11:59 and let's just go over here and see how many there are, +12:03 +12:08 well that's not super interesting, to have just one, is it, +12:12 alright, that's good, but let's actually make this just this, +12:17 +12:23 let's drop this spark plug thing and just see +12:26 how many cars there are with this, +12:30 okay there we go, now we have some data to work with, +12:33 65 thousand cars had 15 thousand dollar service or higher, +12:36 after all, this is a Ferrari dealership, right. +12:39 Now, it turns out it's a really bad idea to pull back that many cars, +12:43 let me stop this, let's limit that to just a thousand here as well. +12:52 +12:54 Okay, so we're pulling back thousand cars because we're limited to this +13:00 and we're pulling back a thousand cars here. +13:03 But notice, this car name and id versus the entire car +13:08 so let's go over here cars with expensive service, car name and id, +13:13 so notice the time, so to pull back and serialize those thousand records +13:17 took actually a while, so it took one basically a second, +13:21 and if we don't ask for all the other pieces, +13:25 if we just say give me just the make, the model and the id, +13:29 here we're using the only keywords, it says don't pull back the other things +13:34 just give me the these three fields when you create them, +13:37 it makes it basically ten times faster, +13:40 let's turn this down to a 100 and see, maybe get a little more realistic set of data. +13:44 Okay, there we go, a 100 milliseconds down to 14 milliseconds, +13:49 so it turns out that the deserialization step in MongoEngine is a little bit expensive +13:55 so if you like blast a million cars into that list, it's going to take a little bit. +14:01 If we can express like I only want to pull back these items, +14:05 than it turns out to be quite a bit faster, +14:10 in this case not quite faster, but definitely faster. +14:15 Let's round this out here and finish this up. +14:17 Here we're asking for the highly rated, highly priced cars, +14:20 we're asking like hey for all the people that come and spend a lot of money +14:26 how did they feel about it? +14:29 And then also what cars had a low price and also a low rating, +14:33 so maybe we could have just somehow changed our service +14:37 for these sort of cheaper like oil change type people. +14:39 It turns out that that one is quite fast, +14:42 this one we could do some work and fixing one will really fix the other +14:46 so we have this customer rating thing, we probably want to have an index on, +14:52 and we already have one on the price, +14:54 so I think that that's why it's pretty quick actually. +14:57 Go over here, and we don't yet have one on the price, on the rating rather, +15:03 so we can do that and see if things get better, +15:07 not too much, it didn't really make too much of a difference, +15:12 it's probably better to use the price than it is the rating, +15:16 because we're kind of doing that together, so we're also going to go down here +15:19 and have the price and customer rating, +15:21 one of these composite indexes, once again, +15:24 and maybe if we change price one more time, +15:29 rating and price— it doesn't seem like we're getting much better, +15:36 so down here this is about as fast as we can get, 16 milliseconds +15:40 and this is less than one millisecond, so that's really good. +15:44 The final thing is, we are looking for high mileage cars, +15:47 so let's go down here and say find where the mileage of the car +15:51 is greater than 140 thousand miles, do we have an index on that, +15:55 you can bet the answer is no. +15:58 Now we could go to the shell and see that, but no we don't have one, +16:01 so let's go up here and add one more, +16:04 and this is in fact the only index we have here in this thing +16:07 that is on like just plain field, not one of these nested ones like this; +16:13 so maybe we also want to be able to select by year, +16:16 so we could have one for year as well. I'm going to add those in. +16:21 Now this high mileage car goes from a hundred and something milliseconds +16:26 down to six, maybe one more time just to make sure, +16:28 yep, 5, 6, seems pretty stable around there. +16:32 So we've gone and we've added these indexes +16:34 to our models, our MongoEngine documents by adding indexes +16:40 and we can have flat ones like this, or we have these here, +16:48 and we also can have composite ones or richer things, +16:52 if we create a little dictionary and we have fields and things like that. +16:57 Similarly an owner, we didn't have as many things we were after +17:00 but we did want to find them by their name and by car id, +17:03 so we had those two indexes, +17:05 honestly this is just a simpler document than the cars. +17:08 So with these things added here, we can run this one more time +17:11 and see how we're doing that code all runs really quick, +17:14 if we kind of scan through here, there's nothing that stands out like super bad, +17:18 5 milliseconds, half, 18, 6, half, 1, 3, 1, let's say, +17:26 this one, I really wish we could do better, +17:29 it just turns out there is like so many records there +17:32 that if we run that here you can see that the whole thing runs in one millisecond, +17:38 super, super fast, we can't make it any faster than that. +17:41 The slowness is basically the allocation, +17:45 assignment, verification of 100 car objects. +17:48 I'd like to see a little better serialization time out of MongoEngine, +17:53 if you have some part of your code that has to load tons of these things +17:56 and it's super performance critical, you could drop down to PyMongo, +18:00 talk to it directly and probably in the case where you're doing that +18:05 you don't need to pull back many, many objects, +18:07 but also you can see that if we limit what we ask for down here, +18:12 that goes back to 14 miliseconds which is really great, +18:15 here we're looking at a lot of events, this is like 16 thousand +18:21 or no, 65 thousand, that's quite a bit, this one is really fast, +18:25 this one is really fast, so I feel like from an index perspective +18:28 we've done quite a good job, how do we know we're done? +18:32 I guess this is the final question, this has been a bit of a long— +18:35 how do we know we're done with this performance bit? +18:39 We know we're done when all of these numbers come by +18:43 and they're all within reason of what we're willing to take. +18:47 Here I have set this up as these are the explicit queries +18:51 we're going to ask and then we'll just time them, +18:54 like your real application does not work that way. +18:56 How do you know what questions is your applications asking and how long it's taking. +19:01 So you want to set up profiling, so you can come over here +19:05 and definitely google how to do profiling in MongoDB, +19:08 so we can came over here and let's just say, db set profiling level +19:13 and you can use this function to say I'm looking for slow queries +19:18 and to me slow means 10 milliseconds, 20 milliseconds something like that, +19:23 it will generate a table called system.profile and you can just go look in there +19:29 and see what queries are slow, clear it out, +19:33 run your app, see what shows up in there +19:35 add a bunch of indexes, make them fast, clear that table, +19:38 then turn around and run your app again, +19:43 and just until stuff stops showing up in there, +19:46 you can basically find the slowest one, make it faster, clear out the profile +19:51 and just iterate on that process, and that will effectively like gather up +19:55 all of the meaningful queries that your app is going to do, +19:59 and then you can go through the same process here +20:01 to figure out what indexes you need to create. \ No newline at end of file diff --git a/transcripts/ch8-performance/8.txt b/transcripts/ch8-performance/8.txt new file mode 100644 index 0000000..00cc6a3 --- /dev/null +++ b/transcripts/ch8-performance/8.txt @@ -0,0 +1,33 @@ +00:01 We've seen how powerful adding indexes to MongoDB is +00:04 and I talked a little bit how the nested nature of these documents means +00:09 there's naturally fewer primary keys, +00:11 so there's fewer on average actual indexes +00:15 that get created just as part of working with the database; +00:18 so creating these indexes is even more important in document databases +00:22 than it is in relational databases. +00:24 So here we are in the shell, this would be Robomongo +00:27 or just the Mongo command line interface +00:30 and we can create an index on a collection by saying db.collection name +00:33 so here we have cars.createIndex +00:35 and then we pass it two things, first one required, second one optional +00:39 we pass it the actual fields we want to create the index on; +00:44 so here we have service_history.customer_rating +00:48 so we could traverse this hierarchy if necessary +00:51 we just use that dot like we have been in the shell the whole time +00:55 and then we say one or minus one, +00:57 so do you want to sort ascending or descending. +00:59 And this mostly matters for either what you might consider the natural sort +01:03 or if you're doing a composite key or a composite index +01:08 and that composite index is being used for sorting on both fields +01:12 and all the orders have to line up exactly for the sort to use that index. +01:17 Then we can pass additional information, +01:19 here we have background as true and the name, +01:21 I like to name my indexes if I'm doing this shell +01:24 because then it's easier to see like okay why did I create this index +01:28 here we want the customer ratings of service, +01:31 so that's pretty nice, background true, that's not the default +01:35 but that means it will run basically in the background +01:38 without blocking the database operations, +01:41 if you don't put that, when you hit go +01:43 the database will stop doing any sort of database stuff +01:46 until this index is generated so be aware. diff --git a/transcripts/ch8-performance/9.txt b/transcripts/ch8-performance/9.txt new file mode 100644 index 0000000..3c590b3 --- /dev/null +++ b/transcripts/ch8-performance/9.txt @@ -0,0 +1,39 @@ +00:01 Now if we're using MongoEngine, +00:03 we don't have to go to the shell and manually type all the indexes +00:05 we basically go to each individual top level document +00:08 so all the things that derive from mongoengine.document +00:11 not the embedded documents, and we go to the meta section +00:14 and we add an indexes, basically array +00:17 so here we want to have, you can see the blue stuff that's highlighted +00:20 we want an index on make, we want an index on service history +00:23 and within service history, remember these are service records showing on the bottom +00:27 we went an index the description and price. +00:30 So for index that we put 'make', that's straightforward +00:34 and then we have service_history.customer_rating +00:37 so service history is the field name +00:39 and then customer rating is the field name of service record +00:42 and for some reason I don't have it blue, it's that last one down there +00:45 but we also want this composite key +00:47 so service_history.price and service_history.description +00:50 we want to be able to find where both of those match +00:53 and we're going to do that up by having +00:56 a more complicated entry in the indexes bit here +00:58 this is going to be a dictionary where the fields are set +01:00 to be this array of strings and not just the flat string itself. +01:04 So once we add this, when we run our code, +01:07 it's actually going to first time we work with that document +01:10 ensure that all the indexes are there, +01:12 and remember that like hung up our application for just a little bit, +01:16 but the real benefit here is our app is always going to be in sync, +01:21 we don't have to go oh oops, I forgot to add the index, +01:24 that one particular index to say the staging server, +01:27 or when I push to production are there new indexes, +01:30 I got to go out on the database, +01:32 now you don't worry about that, you just push your code, +01:34 restart your web app or whatever kind of app it is, +01:36 and then as part of interacting with it, +01:38 it will make sure that those indexes are there. +01:41 If you don't want that pause to be there, +01:43 just go and create the indexes you know the thing is going to create +01:48 put them on the production server and then push the new version of code +01:50 and it will just go great, these indexes exist. \ No newline at end of file