Transcripts for chapter 8

8 years ago · edf7c0ca1b
parent aac0d882b3
commit edf7c0ca1b
11 changed files with 1015 additions and 0 deletions
--- a/transcripts/ch8-performance/1.txt
+++ b/transcripts/ch8-performance/1.txt
@ -0,0 +1,53 @@
 00:01 Now that you know how to work MongoDB, you know how to work its shell,
 00:04 what the query syntax is, you've seen PyMongo as well as MongoEngine,
 00:07 it's time to turn our attention to tuning MongoDB
 00:11 to be the best database it can possibly be.
 00:14 We're going to focus on how to make our regular MongoDB server
 00:18 a high performance MongoDB database
 00:21 and you'll see there's no magic here, a lot of the things that you can do
 00:24 are relatively straightforward, and there's a systematic way to go about it.
 00:29 I want to start this section by maybe putting a little perspective on it.
 00:34 I want to start this section, this chapter, by putting a little perspective out there.
 00:41 When people come to NoSql and they start looking for alternative databases
 00:44 often the allure of these databases is their performance
 00:49 you hear about things like sharding, horizontally scaling them,
 00:52 some incredible performance numbers, things like that.
 00:55 That may be what you really need, that may be the most important thing
 00:58 and certainly if you don't have performance out of your database it's a big problem.
 01:03 We're going to certainly figure out how to make our databases faster
 01:07 and the variety of techniques that we have available to us in MongoDB.
 01:11 That said, your biggest problem probably isn't performance,
 01:14 you may have a big data problem, you may have terabytes or petabytes of data
 01:19 but most applications don't.
 01:22 You may have a performance problem, it may be that you have so much data
 01:26 or you are asking such complex queries that it really does take
 01:30 very precise tuning and scaling to make it work.
 01:33 So we're going to focus on some of these types of things.
 01:36 That said, we all have a complexity problem with our application,
 01:40 it's always a pain to maintain these databases
 01:43 especially when we're working with relational databases,
 01:46 you hear about things like migrations and updating your schema
 01:49 adding, removing, transforming columns, all of this stuff is really complex
 01:53 and it even makes deployment really, really challenging,
 01:56 you want to release a new version of something based on SQLAlchemy
 01:59 but you need to change the database scheme before it will even run—
 02:02 okay, that sounds like it could be a little bit of a problem.
 02:05 What you'll see with MongoDB and these document databases is
 02:10 one of their biggest benefits is the simplicity that they bring.
 02:14 The document structure means there's fewer tables,
 02:18 there is much fewer connections between these tables,
 02:21 so when you think about the trade-offs and performance and things like that
 02:24 keep in mind that probably the biggest benefit
 02:27 that you are going to get from MongoDB is you are going to have
 02:30 simpler versioning, evolution, maintainability, development story.
 02:33 I just want to put that out there, because I know sometimes
 02:36 people will say well, I got MongoDB to perform at this speed
 02:40 and I cut this other database, and if I tweak it like this and adapt it like that
 02:44 maybe I could get it to go a little faster, so maybe we should use that instead.
 02:47 And maybe, I don't know, it depends on the situation,
 02:50 and this is very abstract, so it's hard to say, but keep in mind
 02:53 that one of the biggest things these document databases
 02:55 bring to you to the table here, is this simplicity.
 02:59 It just so happens we can also make them really, really fast.
 03:03 So simple and fast, sounds like a great combination,
 03:05 so let's get into this section where we are going to make MongoDB much faster.
--- a/transcripts/ch8-performance/10.txt
+++ b/transcripts/ch8-performance/10.txt
@ -0,0 +1,92 @@
 00:01 One of the most important things you can do for performance
 00:04 in your database and these document databases
 00:06 is think about your document design,
 00:08 should you embed stuff, should you not, what embeds where,
 00:11 do you embed just ids, do you embed the whole thing;
 00:14 all of these are really important questions
 00:16 and it takes a little bit of experience to know what the right thing to do is.
 00:20 It also really depends on your application's use case,
 00:24 so something that's really obviously a thing we should consider
 00:28 is this service history thing, this adds the most weight to these car objects,
 00:34 so we've got this embedded document list field
 00:38 so how often do we need these histories?
 00:44 How many histories might a car have?
 00:46 Should those maybe be in a separate collection
 00:49 where it has all the stuff that service record, the class has,
 00:52 plus car id, or something to that effect?
 00:56 So this is a really important question,
 00:59 and it really depends on how we're using this car object, this car document
 01:05 if almost all the time we want to work with the service history,
 01:07 it's probably good to go in and put it here,
 01:10 unless these can be really large or something to that effect,
 01:13 but if you don't need them often, you'll consider putting them in their own collection,
 01:16 there's just a tension between complexity and separation,
 01:20 safety and separation, speed of having them in separate
 01:24 so you don't pull them back all the time;
 01:26 you can also consider using the only keyword or only operator in MongoEngine
 01:30 to say if I don't need it, exclude the service history,
 01:34 it adds a little bit of complexity because you often know,
 01:38 hey is this the car that came with service history
 01:40 or is it a car where that was excluded, things like that,
 01:42 but you could use performance profiling and tuning
 01:45 to figure out where you might use only.
 01:48 Let's look at one more thing around document design.
 01:50 You want to consider the size of the document,
 01:52 remember MongoDB has a limit on how large these documents can be,
 01:56 that's 16 MB per record, that doesn't mean you should think
 02:01 oh it's only 10 MB so everything is fine for my document design,
 02:05 that might be terrible this is like a hard upper bound,
 02:07 like the database stops working after it hits 16 MB,
 02:11 so you really want to think about what is the right size,
 02:14 so let's look at a couple examples:
 02:16 we can go to any collection and say .stats
 02:18 and it will talk about the size of the documents and things like that,
 02:21 so here we ran db.cars.stats in MongoEngine,
 02:25 and we see that the average object size is about 700 bytes,
 02:29 there is information about how many there are, and all that kind of stuff,
 02:33 but really the most interesting thing for this discussion is
 02:35 what is the average object size, 700 bytes
 02:38 that seems like a pretty good size to me, it's not huge by any means,
 02:42 and this is the cars that contain those service histories,
 02:45 so this is probably fine for what we're doing.
 02:48 Let me give you a more realistic example.
 02:50 Let's think about the Talk Python Training website,
 02:52 and the courses and chapters, we talked about them before,
 02:56 so here if we run that same thing, db.courses.stats
 03:02 you can see that the average object size is 900 bytes for a course,
 03:07 and remember the course has the description that shows on the page
 03:10 and that's probably most the size, it has a few other things as well,
 03:13 like student testimonials and whatnot,
 03:16 but basically it's the description and a few hyperlinks.
 03:19 So I think this is again a totally good object, average object size.
 03:23 Now one of the considerations was I could have taken the chapters
 03:27 which themselves contain all the lectures,
 03:29 and embedded those within the course,
 03:32 would that have been a good idea—
 03:34 I think I might have even had it created that way
 03:36 in the very beginning, and it was a lot slower than I was hoping for,
 03:38 so I redesigned the documents.
 03:40 If we run this on this chapter section, you can see
 03:43 that the average object size is 2.3 KB,
 03:46 this is starting to get a little bit big, on its own it's fine,
 03:50 but think about the fact that a course on average has like 10 to 20 chapters,
 03:55 so if I embedded the chapters in the course
 03:58 instead of putting them to a separate document like I do,
 04:02 this is how it actually runs at the time of the recording,
 04:04 then it would be something like these courses would be
 04:07 24 up to maybe 50 KB of data per entry,
 04:12 think about that you go to like the courses page
 04:15 and it shows you a big list of all the courses
 04:17 and there might be 10 or later 20 courses,
 04:20 we're pulling back and deserializing like megabytes of data
 04:24 to render a really, really common page, that is probably not ok,
 04:28 so this is why I did not embed the chapters and lectures inside the course,
 04:34 I just said okay, this is the breaking point
 04:37 I looked at the objects' size I looked at where the performance was
 04:41 and I said you know what, really it's not that common
 04:44 that we actually want more than one chapter at a time,
 04:46 but it is common we want lectures, so it's probably the right partitioning,
 04:51 but you build it one way, you try it, it doesn't work,
 04:53 you just redesign your class structure, recreate the database and try it again,
 04:57 but you do want to think about the average object size
 05:00 and you can do it super easy with db.colection name.stats.
--- a/transcripts/ch8-performance/11.txt
+++ b/transcripts/ch8-performance/11.txt
@ -0,0 +1,35 @@
 00:01 One of the last simple tools you have in your tool belt
 00:04 when we're working with MongoEngine or even in PyMongo, just different api
 00:08 is this ability to restrict the data returned from the document.
 00:13 In our car object we've got the make, the model, the id, some other things,
 00:17 we've got the engine which is a subdocument or an embedded document there
 00:22 and then the biggest thing that contributes to the size
 00:25 is actually the service history which might be many service record entries.
 00:30 If really all we care about is the make, the model and the id of a car,
 00:34 and we're going to create like a list or something like that,
 00:36 we can use this .only operator here
 00:39 and dramatically reduce the amount of data returned from MongoDB
 00:43 so this is an operation that we saw when we first learned about the api
 00:46 actually operates at the database level,
 00:48 you're able to restrict the elements returned from the queries
 00:52 so when it gets back to MongoEngine
 00:54 basically it looks at what comes back and it says,
 00:57 alright, I need to create some cars
 00:59 and I need to set their make to this, the model to that
 01:01 and their id to whatever comes back,
 01:03 and then nothing else is transferred, deserialized, anything.
 01:05 So you can, if you don't need them, exclude the heavyweight things
 01:09 like the engine and the service histories for this particular use case.
 01:12 So this is kind of like select make, model, id from table such and such in SQL,
 01:20 and it really can improve the performance
 01:22 especially when you have either large documents or many documents.
 01:27 So you've seen a lot of different ways to turn the knobs of MongoDB
 01:31 to make it faster and to use MongoEngine to control those knobs.
 01:35 Now this applies to a single individual database server
 01:38 and if you use this to tune your database,
 01:41 you can actually make the need for having a sharded cluster
 01:45 and all these scaling things possibly go away,
 01:48 but even if you do end up with one of these more interesting topologies,
 01:52 all of these techniques still apply and they'll make your cluster go faster,
 01:56 they'll make your replicas go faster, all of those things.
 01:59 What you've learned here are really the foundational items of making MognoDB go fast.
--- a/transcripts/ch8-performance/2.txt
+++ b/transcripts/ch8-performance/2.txt
@ -0,0 +1,124 @@
 00:01 You've heard MongoDB is fast, really fast,
 00:03 and you've gone through setting up your documents and modeling things,
 00:07 you inserted, you imported your data, and you're ready to go;
 00:11 and you run a query and it comes back,
 00:13 so okay, I want to find all the service histories
 00:15 that have a certain price, greater than such and such, how many are there—
 00:18 apparently there's 989, but it took almost a second to answer that question.
 00:22 So this is a new version of the database, so we are going to talk about it shortly.
 00:27 Instead of having just a handful of cars and service histories
 00:30 that we maybe entered in our little play-around app,
 00:32 it has a quarter million cars with a million service histories, something to that effect.
 00:38 And the fact that we were able to answer this query
 00:41 of how many sort of nested documents had this property
 00:44 in less than a second, on one hand that's kind of impressive,
 00:48 but to be honest, it feels like MongoDB is just dragging,
 00:51 this is not very special, this is not great.
 00:55 So this is what you get out of the box, if you just follow what we've done so far
 00:59 this is how MongoDB is going to perform.
 01:02 However, in this chapter, we're going to make this better, a lot better.
 01:06 How much— well, let's see, we're going to make it fast,
 01:09 here's that same query after applying just some of the techniques of this chapter.
 01:13 Notice now it runs in one millisecond, not 706 milliseconds.
 01:17 So we've made our MongoDB just take off,
 01:21 it's running over 700 times faster than what the default MongoDB does.
 01:26 Well, how do we do it, how do we make this fast?
 01:30 Let's have a look at the various knobs
 01:33 that we can turn to control MongoDB performance.
 01:35 Some of which we're going to cover in this course,
 01:38 and some are well beyond the scope of what we're doing,
 01:40 but it's still great to know about them.
 01:42 The first knob are indexes, so it turns out
 01:44 that there are not too many indexes added to MongoDB by default,
 01:48 in fact, the only index that gets set up is on _id
 01:52 which is basically an index as well as a uniqueness constraint,
 01:55 but other than that, there are no indexes,
 01:57 and it might be a little non intuitive at first, when you first hear about this,
 02:02 but indexes and manually tuning and tweaking and understanding the indexes
 02:06 in document databases is far more important
 02:10 than understanding indexes in a third normal form designed relational database.
 02:15 So why would that be? That seems really odd.
 02:18 So think about a third normal form database,
 02:21 you've broken everything up into little tiny tables that link back to each other
 02:24 and they often have foreign key constraints traversing all of these relationships,
 02:28 well, those foreign key constraints go back to primary keys on the main tables,
 02:33 those are indexed, every time you have one of those relationships
 02:35 it usually at least on one end has an index on that thing.
 02:39 In document databases, because we take some of those external tables
 02:43 and we embed them in documents,
 02:45 those subdocuments while they kind of logically play the same role
 02:49 there is no concept of an index being added to those.
 02:52 So we have fewer tables, but we still have
 02:55 basically the same amount of relationships
 02:57 and because of the way documents work,
 02:59 we actually have fewer indexes than we do in say a relational database.
 03:04 So we're going to see that working with understanding
 03:07 and basically exploring indexes is super, super important
 03:09 and that's going to be the most important thing that we do.
 03:12 In fact, the MongoDB folks, one of their things they do is
 03:16 they sell like services, consulting and what not to help their customers
 03:19 and you could hire them, say hey I got this big cluster and it's slow
 03:24 can you help me make it faster— the single most dramatic thing that they do,
 03:30 the thing that almost always is the problem is incorrect use of indexes.
 03:34 So we're going to talk about how to use, discover and explore indexes for sure.
 03:38 Next is document design, all that discussion about to embed or not to embed,
 03:43 how should you relate documents, this is sort of the beginning of this conversation,
 03:47 it turns out the document design has dramatic implications across the board
 03:52 and we did talk quite a bit about this, but we'll touch on it again in this chapter.
 03:56 Query style, how are you writing your queries,
 04:01 is there a way that you could maybe restructure a query,
 04:05 or ask the question differently and end up with
 04:08 a more high performance query, maybe one example misses an index
 04:12 and the other particular example uses a better index or something to this effect.
 04:16 Projections and subsets are also something that we can control,
 04:20 remember when we talked about the Javascript api
 04:23 we saw that you could limit your set of returned responses
 04:26 and this can be super helpful for performance;
 04:29 you could write a query where it returns 5 MB of data
 04:32 but if you restrict that to just the few fields that you actually care about
 04:36 maybe its all K instead of 5 MB, it could be really dramatic,
 04:40 depending on how large and nested your documents might be.
 04:43 We're going to talk about how we can do this, especially from MongoEngine.
 04:46 These are the knobs that we're going to turn in this course,
 04:49 these are the things that will work even if you have a single individual database,
 04:53 so you should always think about these things,
 04:56 some of them happen on the database side, document design, indexes,
 04:59 and the other, maybe is in your application interacting with the database, the other two,
 05:04 but MongoDB being a NoSql database, allows for other types of interactions,
 05:08 other configurations and network topologies and so on.
 05:11 So, one of the things that it supports is something called replication,
 05:14 now replication is largely responsible for redundancy and failover.
 05:19 Instead of just having one server I could have three servers,
 05:22 and they could work in triplicate, basically one is what's called the primary,
 05:26 and you read and write from this database,
 05:28 and the other two are just there ready to spring into action,
 05:31 always getting themselves in sync with the primary,
 05:34 and if one goes down, the other will spring in to be the primary
 05:36 and they will sort of fix themselves as the what used to be the primary comes back.
 05:40 There is no performance benefit from that at all.
 05:43 However, there are ways to configure your connection to say
 05:46 allow me to read not just from the primary one, but also from the secondary,
 05:50 so you can configure a replication for a performance boost,
 05:53 but mostly this is a durability thing.
 05:55 The other type of network configuration you can do is what's called sharding.
 05:59 This is where you take your data instead of putting all into one individual server,
 06:02 you might spread this across 10 or 20 servers,
 06:06 one 20th, hopefully, of evenly balanced,
 06:09 across all of them, and then when you issue a query,
 06:12 can either figure out where if it's based on the shard key,
 06:15 which server to point that at and let that one
 06:17 handle the query across the smaller set of data,
 06:20 or if it's general like show me all the things with greater than this for the price,
 06:23 it might need to fan that out to all 20 servers,
 06:26 but it would run on parallel on 20 machines.
 06:30 So sharding is all about speeding up performance,
 06:32 especially write performance, but also queries as well,
 06:35 so you can get tons of scalability out of sharding,
 06:38 and you can even combine these like, when I said there is 20 shards,
 06:41 each one of those could actually be a replica set,
 06:43 so there is a lot of stuff you could do with network topology
 06:46 and clustering and sharding and scaling and so on.
 06:48 We're not turning those knobs in this course,
 06:50 I'll show you how to make individual pieces fast,
 06:52 the same idea applies to these replicas and shards,
 06:54 just on a much grander scale if you want to go look at them.
--- a/transcripts/ch8-performance/3.txt
+++ b/transcripts/ch8-performance/3.txt
@ -0,0 +1,30 @@
 00:01 Let's return to our dealership.
 00:03 This was the example we started back when we began the MongoEngine section,
 00:05 and it turns out the dealership is super popular now.
 00:08 Before we just had a couple of cars, now we have a quarter million cars
 00:11 in our database, we have a 100 thousand owners,
 00:15 I don't believe we talked about owners before in terms of what that looks like in our code,
 00:19 but I've added this concept of owners
 00:21 so we can ask interesting like cross-document related type questions,
 00:25 and we'll look at the details of them, when we get to the code, in just a moment.
 00:28 Each one of these owners, these 100 thousand owners,
 00:31 owns an average of 2.5 cars, this is kind of like collectors, right,
 00:36 not a standard person that drives to work or whatever, these are Ferraries,
 00:39 and each car has on average about 5 service records
 00:43 and that could be like a new engine,
 00:45 change the tires, change the spark plug, whatever;
 00:48 in particular, there's about 1.25 million service histories,
 00:51 so when we ask questions about like those nested documents
 00:54 that have to do with service histories like customer ratings and price,
 00:57 you can see that that is really quite impressive I think,
 01:01 we got the quarter million cars and within those quarter million documents
 01:04 interspersed are 1.25 million service histories.
 01:07 So our job is to make a lot of the typical things that we might ask this database,
 01:11 the queries will run to do so in a couple of milliseconds, not in seconds,
 01:17 so that's going to be what the basic goal of this whole section is.
 01:23 Now, the other things you might want to know is
 01:25 we've got about 180 megs of data
 01:28 and on average each document of the various document kinds,
 01:30 all average together is about 500 bytes per document.
 01:33 So let's return to or example slightly transformed
 01:37 and see how it's performing now and let's make it fast.
--- a/transcripts/ch8-performance/4.txt
+++ b/transcripts/ch8-performance/4.txt
@ -0,0 +1,42 @@
 00:01 Here we are in the github repository for this course
 00:04 and notice we have this data section
 00:06 and in here I have this thing called dealership db 250 K
 00:09 that is this data that I just talked about,
 00:12 with the 250 thousand cars, 100 thousand owners, that sort of thing.
 00:16 So I'm going to put that over here on the desktop and unzip it
 00:21 and if we look in here, you'll see that there's a cars collection and an owners collection,
 00:29 and I don't believe we've spoken about how to get this data into MongoDB,
 00:33 so let's go over here and I'll use RoboMongo,
 00:37 notice we have these two dealership things that I have been playing with
 00:42 and I want to create one called like test dealership or something to that effect.
 00:46 We're going to restore this— how do we do that,
 00:51 we'll go like this, we'll say mongorestore
 00:55 and this is the way that we get this exported data imported into MongoDB,
 01:00 now, the first thing you have to ask yourself is this additive to the database,
 01:05 if it exists do you want to also insert this,
 01:07 or do you want to have this be the database and replace anything it exists,
 01:11 we want this one to replace existing data
 01:14 so I'll say --drop and then I need to tell it what database
 01:18 so I'll say db and I could say what you should say is this dealership,
 01:23 but just because I don't want to wipe away what I currently have,
 01:28 I'll say dealership example, but the code that you're going to run
 01:31 expects the name of the database to be just a dealership;
 01:34 and then I need to give it the folder that it's going to work from,
 01:37 so I am just going to give it this folder like so, all right.
 01:41 So mongorestore, drop to replace the data -- db to name it, and then the location,
 01:46 we hit go, and it's going to go cranking away on this
 01:50 and you can see it's inserting, inserting and done,
 01:53 that was really fast for like close to 1.5 million records.
 01:57 All right, so let's go over here and refresh
 02:00 and here's our example and we can see that we have our collection,
 02:03 here's our cars and we could just ask how many cars are there.
 02:07 Notice, there is that many, and if we change this to owners,
 02:11 remember you can also write it like this, owners like this,
 02:16 Now notice, I think the restore data we got here,
 02:19 you want to drop this index right here, I have it only have the id indexes, ok
 02:26 so that's this example I just restored,
 02:28 we're going to work with something you can imagine is exactly the same.
 02:33 So we're going to work with this dealership code
 02:36 but the way it got there, I'll show you the app I used to originally create it,
 02:39 and then I just restored it using mongorestore just as I showed you up here.
 02:43 So the way to generate the data that goes into mongorestore, you say mongo dump.
--- a/transcripts/ch8-performance/5.txt
+++ b/transcripts/ch8-performance/5.txt
@ -0,0 +1,104 @@
 00:01 Let's explore this slightly updated version of our code.
 00:04 Here we are in the github repository,
 00:07 and I am in the source folder and I've added an 08_perf section,
 00:10 and we have the starter_big_dealership and we have the big_dealership
 00:15 it even has instructions here to tell you basically how to restore
 00:17 that database we did just in the previous video.
 00:20 This one is going to be a snapshot of how this chapter starts,
 00:24 it's what we're starting from now and will remain that way;
 00:27 here we're going to take basically a copy of that one
 00:29 and evolve it into the fast high performance version,
 00:33 so let's go over here and see what we've got.
 00:35 Now, we have a few things that are slightly different,
 00:38 the car is basically unchanged from before
 00:41 although I added a little comment about how do we get to the owners.
 00:45 The one thing that is new here, in terms of the model is this owner idea,
 00:50 so cars can now have an owner
 00:52 and how do we know which cars are owned by this owner
 00:57 is we have a list of object ids, those object ids are the object ids of the cars
 01:02 so we're going to push the ids of the cars that are owned here
 01:06 I guess we could run it as a many to many or one to many relationship,
 01:10 just depending on how we treat the owner, but theoretically,
 01:13 we can have owners where there is a single car that is multiple owners
 01:17 and there are owners that own multiple cars, and we can manage it this way,
 01:21 you almost never see like a car to owner intermediate table,
 01:25 so you're almost always going to have something like
 01:27 those ids are either embedded in the owner or in the car,
 01:32 or under rare circumstances both.
 01:35 So here's how we refer back to the cars,
 01:38 then we have a few basic things like the name,
 01:41 when was this owner created, how many times have they visited and things like that.
 01:45 We want to call it owners in the database and it's just this core collection,
 01:49 so other than that, there's not a whole lot going on here,
 01:51 let's look over here, we now have these services,
 01:54 I've taken all the car queries and moved them down here
 01:57 do you want to create a car, you call this function,
 01:59 do you want to record a customer visit, here we can go to the owner
 02:03 and we can use this increment operator
 02:06 to increment the number of visits in place.
 02:09 Find cars by make, find owner by name and so on.
 02:16 Number of cars with bad service, a lot of this stuff is what we wrote previously;
 02:20 there was the program thing that we ran over here that was interactive
 02:23 and I've replaced that with a few things,
 02:25 one is this db stats and you can run this and it will tell you
 02:28 like how many cars are there, how many owners are there,
 02:31 what's the average number of histories,
 02:33 this is basically those stats that I presented to you before,
 02:36 this takes a while to run on this database, I don't recommend you run it
 02:39 but if you want to just run it and see what you get you can.
 02:42 The database was originally created using this script,
 02:46 I am using something interesting you may not have heard about,
 02:49 I am using this thing called Faker, so down here
 02:53 Faker lets you create this thing and I'm seeding it
 02:59 so it always generates exactly the same things,
 03:01 I'm seeding random and fake and you can see down here
 03:04 it's creating the owners and you can ask it for things like
 03:06 give me a fake name, give me a fake date
 03:08 between these two dates, things like that.
 03:12 Similarly with cars, we're using random to get a hold of a lot of the numbers
 03:15 then we can use fake for anything else we might.
 03:18 We ran this, with the right amount of data, it'll build it all up for us,
 03:25 so for some reason if you need to recreate it
 03:27 run this low data thing, you can have it create a small one,
 03:30 if you comment, uncomment that or a large one
 03:32 if you only run it with those settings.
 03:34 Those are all good, this is like the foundation and this is where we are.
 03:37 Next, we're going to ask interesting questions of this database
 03:43 and we want to know how long those questions take to answer,
 03:46 so I've written this super simple function called time
 03:48 you pass it a message and a function,
 03:50 it will time how long the function takes to run
 03:53 and then print out the message along with the time in terms of milliseconds.
 03:57 And then we're going to go through
 03:59 and we're going to ask interesting questions here
 04:01 like how many owners, how many cars, who is the 10 thousandth owner,
 04:05 notice the slicing here to give us a slice of item of length one
 04:10 and then we'll just access it,
 04:12 and then we can start asking interesting questions like
 04:14 how many cars are owned by the 10 thousandth owner,
 04:17 or if we go down here, how many owners own the 10 thousandth car,
 04:21 so ask it in the reverse direction.
 04:23 Here we want to find the 50 thousand owner by name,
 04:26 so yes, technically have them but the idea is
 04:30 we want to do a query based on the name field
 04:32 and we originally won't have any performance
 04:35 around these types of queries so it should be slow.
 04:38 This one, how many cars are there with expensive service
 04:40 this was the one with the snail
 04:43 and in one of the first videos in this chapter,
 04:46 I showed you look this takes 700 milliseconds to run to ask this question
 04:49 how many cars have a service history with a price greater than 16800.
 04:55 So we're going to be to be able to ask all of these questions
 04:58 and this program will let us explore that
 05:02 and we'll see how to add indexes
 05:04 and I'll show you how to add indexes in the shell
 05:06 and how to add them in MongoEngine, and MongoEngine is really nice
 05:09 because as you evolve your indexes, as you add new ones
 05:13 simply deploying your Python web app
 05:15 will adapt the database that it goes and finds
 05:18 to automatically upgrade to those indexes, so it's really really nice.
 05:22 So here you can see we're going to run this code and ask a bunch of questions
 05:25 we could load the data from here, we could generate the data,
 05:28 but you're much better off importing the data from that zip file
 05:32 because this takes like half an hour to run,
 05:35 you saw that zip takes like five seconds.
--- a/transcripts/ch8-performance/6.txt
+++ b/transcripts/ch8-performance/6.txt
@ -0,0 +1,165 @@
 00:01 Let's go ahead and run this code, you've seen the minor changes
 00:04 like the addition of this concept of an owner,
 00:06 and how we generated all this data, and how you can restore it.
 00:09 Let's go ahead and run it, and see what's happening.
 00:13 Let's look at this from two perspectives, let's begin over actually in Robomongo,
 00:17 so we're going to ask the question, basically how many owners own a certain car
 00:21 the idea is more or less we're going to call this function which goes right here,
 00:25 really what we're looking for is this query,
 00:28 find me all of the owners where this car id is in their car ids collection,
 00:33 just generate and deserialize that.
 00:37 The other one that we're going to focus on is
 00:39 show me the cars with the expensive service history,
 00:42 how many cars or what cars had some kind of service
 00:46 that cost over 16800 dollars.
 00:49 Let's begin by looking at those in Robomongo.
 00:54 Here we have this concept, we could simplify this a little bit, but it doesn't matter,
 00:57 cars here's the service history, let's go to the price
 01:00 where that's greater than 16800, how many of them are there.
 01:05 If I run this, notice, it took a while to come back,
 01:08 run it again, here's the speed right there, 0.724 sec, 0.731, 0.733,
 01:14 so it's pretty reliably taking around 700 milliseconds to answer that question.
 01:19 We're going to come back to this.
 01:22 Here's a more interesting example, like go and randomly grab a car
 01:25 somewhere deep in the list, in this case I put 61600,
 01:30 grab that car and then find me all the owners,
 01:33 where that car id appears in their id list, and then we'll just dump that out,
 01:38 by saying var it doesn't appear if you just state the name it will show up down here,
 01:43 so make sure to deselect it and run this,
 01:45 and this is actually surprisingly fast, given all the stuff that's going on here,
 01:48 but it's taking still about 75, 80 milliseconds to run here,
 01:53 which, I don't know, maybe in your database
 01:55 going across a 100 thousand records 80 milliseconds seems decent,
 01:59 I can tell you in MongoDB 80 milliseconds is terrible
 02:02 you should really think about making something that's 80 milliseconds faster
 02:06 it's not always possible you can do it,
 02:08 but most of the queries as we'll see are possible.
 02:11 Let's take this one and just try to understand what's happening here
 02:16 and then we're going to go look at it in Python,
 02:19 but let's just explore it here in the shell for just a moment.
 02:21 Why is this taking 700 milliseconds?
 02:24 MongoDB has this way to basically ask how are you running this query,
 02:29 and the way you do that is you say explain, like so,
 02:35 so I can say this query instead of giving me a result tell me how you're running it,
 02:38 if I unselect it, it just runs the selected stuff if there's something there,
 02:42 so we can go and look at it in this mode,
 02:44 so it says okay, here's what the query planner found for you,
 02:47 we've parsed this query, and this is something
 02:50 it's basically what went into the find,
 02:52 it also might have something to the effect of like a sword
 02:55 and other things that are happening, but this is a simple query.
 02:58 Look down here, see this winning plan, stage column scan,
 03:02 that is bad, that is really, really bad.
 03:05 Also notice the rejected plan, so if there are multiple indexes
 03:08 and other things that could have done
 03:10 it might have attempted a bunch of them and said no, no, no this is the best,
 03:13 let's see it doesn't seem to tell us any more about what it did there,
 03:18 like sometimes it'll tell you how many records it scanned and things like this,
 03:21 but it's just basically reading entirely in the forward direction
 03:25 over this and just doing a comparison.
 03:27 So that's why this was taking 700 milliseconds
 03:32 as it was literally reading and comparing 100 thousand entries
 03:36 or actually more, remember their is 1.2 million search histories
 03:40 across those 250 thousand cars, so not 100 thousand,
 03:43 1.2 million records it scanned over, that's bad, you don't want that.
 03:47 So what we can do is we can actually add an index,
 03:51 now there's two ways to add an index,
 03:54 but before I add the index, let's go over here
 03:58 just explain is super, super valuable,
 04:00 any time something is slow we're going to explain
 04:03 there's actually way to turn on profiling and say log all of the queries
 04:07 that you see MongoDB that are slower than x,
 04:11 you providing them like say 10 milliseconds might be great,
 04:14 show me all the queries that take more than 10 milliseconds
 04:17 and then you can drop them in here, put an explain
 04:19 and then start creating indexes to make them faster.
 04:22 So just google mongodb profile enable slow queries
 04:26 or something like this, it's pretty straightforward.
 04:29 Now let's run this code, we're asking a lot of questions
 04:31 what we want to run is q and a, so we go over here and just right click and say run,
 04:37 notice some of these things are taking time,
 04:42 the database might be cold, it might have not loaded that stuff,
 04:46 so let me run it one more time just to be fair,
 04:49 there's a few things that are already really fast, and that's cool,
 04:55 so let's go here and review, how many owners are there—
 04:58 well, I can tell you it doesn't show the answer
 05:01 it just sort of says this is the question I'm asking here is how long it takes.
 05:04 Three milliseconds, that is solid, how many cars— half a millisecond.
 05:07 That's pretty solid, I don't think we can improve the count on the entire collection
 05:11 but this one, find the 10 thousandth owner— not good,
 05:14 so let's see how many cars are owned by that person—
 05:19 this is pretty fast actually, this is surprisingly fast,
 05:23 how many owners this can have— 66 milliseconds
 05:26 that's the one we were looking at in there.
 05:29 I'm going to take these numbers and put them over here,
 05:32 let's say, this will be Without indexes
 05:36 we're going to get this, we don't really care about the exit code, do we?
 05:41 With indexes, and we're going to kind of iterate on this a little bit
 05:45 so let's begin over here, and we're going to talk about
 05:49 how we can add an index in MongoDB and then for the most part
 05:55 do this in MongoEngine because it's really part of the way our application works,
 06:00 what the indexes are, and it's better to make that part of our document
 06:03 then kind of do a separate database setup step;
 06:07 we could create a script in Javascript and run it,
 06:09 it will do these things and that may be fine, but let's go over here and work on this.
 06:14 Again we had the count, here's the almost 800 milliseconds,
 06:19 let's go over here and just I'll take this, I'll make a copy,
 06:24
 06:28 so here is what we can do, instead of doing the find operation
 06:31 we can say create index,
 06:35 and then we have the thing that we're doing the query on,
 06:38 most the time this is one item but you can have composite indexes
 06:43 they are a little more nuance so we'll talk about them later,
 06:45 but let's just do this one, we want to be able to query by service history's price
 06:52 Here we can put one of two things, one or minus one,
 06:56 what do you want the default sort, descending or ascending?
 06:59 A lot of times it doesn't really matter,
 07:01 it can read from the back or it can read from the front, whatever,
 07:04 you saw the forward direction on our column scan for example.
 07:06 So over here we could say one, this creates an index, there's no count;
 07:09 the other thing we can do is we can give it a name
 07:13 so we can come over here and say name is search by service history price,
 07:24 so if we go look in this little indexes, we'll see the name here,
 07:27 we can also say run in the background,
 07:30 if I don't say that it's going to block the database until the index is generated,
 07:33 if you're doing this in production, and you have tons and tons of data
 07:36 maybe background is the way to go.
 07:38 Okay, anyway let's go ahead and run this and see what happens.
 07:41 Notice the pause, this is it's actually computing the index
 07:44 right now the database is effectively down, now it's back,
 07:47 what do we get ok, we created collection automatically know it already existed
 07:51 a number of indexes before was one, now we have two
 07:54 and everything was a ok so if I refresh,
 07:57
 07:59 here's that index and I can actually edit this over here in Robomongo,
 08:05 go for the advanced properties, here is the create index and background
 08:09 whether it's sparse, how long it lives,
 08:11 whether it's based on text search or whatever, but here's just the basic thing.
 08:15
 08:18 We've added this index, remember this took 800 milliseconds
 08:21 ask the same question now, boom, 8 milliseconds.
 08:24 Ask it one more time, 2, here we go, 2, 2, 2, 3, 2, 2,
 08:28 right, the screen sharing is probably put in a pretty heavy load on the server
 08:32 that's also the database server, right but still,
 08:35 we're getting it down 350, 400 times faster by adding that.
 08:39 Now if I go back and I ask that question explain
 08:42 now we get something way better, winning plan is index scan
 08:50 index name search by service history price, that is really awesome;
 08:57 that means we're using our index which is so much faster.
 09:02 There was no rejected plans, so it only found one index
 09:06 it tried to use it if found that it was awesome, it's very happy.
 09:09
 09:16 Go back to my account more time,
 09:21 boom 2 milliseconds, and that's a really good answer,
 09:24 let's go run our Python code and see what answers we get now,
 09:27 that was already faster, let's go over here
 09:32 and load car name and ids with expensive prices and spark plugs,
 09:38 20 milliseconds this is actually a pretty complicated query
 09:43 we'll get into cars with expensive service, 1.9 milliseconds.
 09:47 This is exactly what we saw in Robomongo,
 09:51 so over here in MongoEngine, we're getting essentially the same results— how cool is that?
 09:56 Very nice, we're going to go through and in Python from now on
 10:02 we're going to add the necessary index to start making these
 10:05 almost all of these run super fast, all of them run fast
 10:09 some of them we can get incredibly fast, like one millisecond,
 10:11 others not quite that fast, but we'll still do good on all of them.
--- a/transcripts/ch8-performance/7.txt
+++ b/transcripts/ch8-performance/7.txt
@ -0,0 +1,298 @@
 00:01 Now that you've seen how to create indexes in the shell in Javascript effectively,
 00:04 let's go and see how to do this in MongoEngine.
 00:07 I think it's preferable to do this in MongoEngine because that means
 00:11 simply pushing your code into production will ensure
 00:14 that the database has all the right indexes set up for to operate correctly.
 00:19 You theoretically could end up with too many,
 00:21 if you have one in code and then you take it out
 00:23 but you can always manage that from the shell,
 00:26 this way at least the indexes that are required will be there.
 00:29 I dropped all the indexes again, let's go back through our questions here
 00:33 and see how we're doing.
 00:36 It says how many owners, how many cars,
 00:38 this is just based on the natural sort however it's in the database
 00:41 there's really nothing to do here,
 00:44 but this one, find the 10 thousandth car by owner, let's look at that;
 00:48 that is going to basically be this name, we'll use test,
 00:55 it doesn't really matter what we put here
 00:57 if we put explain, this should come back as column scan or something like that,
 01:01 yeah, no indexes, okay, so how long did it take to answer that question?
 01:06 Find the 10 thousandth owner by name,
 01:12 it didn't say by name, I'll go and add by name,
 01:16 well that took 300 milliseconds, well that seems bad
 01:21 and look we're actually using sorting,
 01:24 we're actually using paging skip and limit those types of things here,
 01:27 but in order for that to mean anything, we have to sort it,
 01:31 it's really the sort that we're running into.
 01:34 Maybe I should change this, like so,
 01:38 sort like so, we could just put one, I guess it's the way we're sorting it,
 01:47 so here you can see down there the sort pattern name is one
 01:49 and guess what, we're still doing column scan.
 01:53 Any time you want to do a filter by, a greater than, an equality,
 01:56 or you want to do a sort, you need an index.
 01:59 Let's go over to the owner here, this is the owner class
 02:04 and let's add the ability to sort it by name or
 02:08 equivalently also do a filter like find exactly by name,
 02:12 so we're going to come down here
 02:14 we're going to add another thing to this meta section,
 02:16 and we're going to add indexes,
 02:20 and indexes are a list of indexes,
 02:25 now this is going to be simple strings
 02:28 or they can be complex subdictionaries,
 02:31 for composite indexes or uniqueness constraints, things like that,
 02:34 but for name all we need is name.
 02:38 Let's run this, first of all, let's go over here
 02:41 and notice, if I go to owners and refresh, no name,
 02:46 let's run this code, find the 10 thousandth owner by name,
 02:52 19 milliseconds, that's pretty good,
 02:55 let me run it one more time,
 02:57 15 yeah okay, so that seems pretty stable,
 03:00 and let's go over here and do a refresh, hey look there's one by name;
 03:03 we can see it went from what was that,
 03:08 something like 300 milliseconds to 15 milliseconds, so that's good.
 03:11 How many cars are owned by the 10 thousandth owner,
 03:15 so that's 3 milliseconds, but let's go ahead and have a look at this question anyway.
 03:19 How many cars are owned by the 10 thousandth owner,
 03:22 so here's this function right here that we're calling
 03:25 it doesn't quite fit into a lambda expression, so we put it up here
 03:28 so we want to go and find the owner by id,
 03:30 that should be indexed right, that should be indexed right there
 03:34 because it's the id, the id always says an index,
 03:36 and now we are saying the id is in this set,
 03:40 so we're doing two queries, but both of them are hitting the id thing,
 03:44 so those should both be indexed and 3 milliseconds,
 03:47 well that really seems to indicate that that's the case.
 03:50 How many owners own the 10 thousandth car, that is right here.
 03:54 So we'll go find the car, ask how many owners own it.
 03:59 Now this one is interesting, so remember when we're doing this
 04:02 basically this in query, let's do a quick print of car id here,
 04:11 so if we go back over to this, we say let's go over to the owners
 04:17 save your documents, so this is going to be car ids,
 04:21 it's going to have an object id of that,
 04:26 all right, so run this, zero records, apparently this person owns nothing,
 04:33 but notice it's taking 77 milliseconds, we could do our explain again here
 04:37 and column scan, yet again, not the most amazing.
 04:43 So what we want is we want to have an index on car ids, right
 04:48 because column scan, not good,
 04:50 I think it's not really telling us in our store example
 04:53 but for the find it definitely should be.
 04:55 So we can come back to our owner over here,
 04:58 let's add also like an index on car_ids,
 05:02 If we'd run this once again, just the act of restarting it
 05:05 should regenerate the database, how long did it take over here—
 05:09 a little late now isn't it, because I did the explain,
 05:13 I can look at this one, how many cars,
 05:16 how many owners does the 10 thousandth car have,
 05:19 66 milliseconds, if we look at it now—
 05:22 how many owners own the 10 thousandth car, 1.9 milliseconds,
 05:29 so 33 times faster by adding that index, excellent,
 05:34 find the 50 thousandth owner by name, that's already done.
 05:38
 05:40 Alright we already have an index on owners name so that goes nice and quick,
 05:45 and how is this doing, one millisecond perfect,
 05:48 this one is super bad, the cars with expensive service 712 milliseconds,
 05:52 alright so here, we're looking at service history
 05:56 and then we're navigating that .relationship, that hierarchy,
 06:00 with the double underscore, going to the price,
 06:02 greater than, less than, equal it doesn't matter,
 06:05 we're basically working with this value here, this subdocument.
 06:08 Let's go over to the car and make that work,
 06:11 now the car doesn't yet have any indexes but it will in a second,
 06:14 so what we want to do is represent that here
 06:17 and in the the raw way of discussing this with MongoDB
 06:21 we use . (dot) not double underscore, so . represents the hierarchy here.
 06:25 Let's run that again, notice expensive service, 712,
 06:30 cars with expensive service, instead of 712 we have 2.4 milliseconds,
 06:39 now notice that first time I ran it there is was a pause,
 06:42 the second time it was like immediate,
 06:45 and that's because it basically was recreating that index
 06:47 and that pause time was how long that index took to create.
 06:51 So here we have cars with expensive service,
 06:53 now we're getting into something more interesting, look at this one with spark plugs,
 06:58 we're querying on two things, we're querying on the history and the service,
 07:04 let's actually put this over in the shell so we can look at it.
 07:07
 07:19 I've got to convert this over, do the dots there,
 07:23 this is going to be the dollar greater operator, colon, like so,
 07:30 all right, so we're comparing that service history.price
 07:35 and this one, again because you can't put dots in normal json,
 07:39 do the dot here and quotes, and this one is just spark plugs,
 07:46 alright, let's run this, okay 22 milliseconds,
 07:52 how long is it taking over here— 20 milliseconds,
 07:56 so that's actually pretty good and the reason I think it's pretty good is
 07:59 we already have an index on this half
 08:02 and so it has to just basically sort the result, let's find out.
 08:05
 08:11 Winning plan, index on this one, yes, exactly
 08:14 so this one is just going to be crank across there
 08:18 but we're going to use at least this index here, this by price
 08:22 so that gets part of the query there.
 08:25 Now maybe we want to be able to do a query just based on the description
 08:30 show me all the spark plugs, well that's a column scan,
 08:33 so let's go back and add over here one for the description.
 08:40 Now how do I know what goes in this part,
 08:44 see I have a service history here, if we actually look at the service record object
 08:49 it has a price and description, right
 08:52 so we know that that results in this hierarchy of
 08:54 service history.price, service history.description.
 08:57 If we'd run this again, it will regenerate those and let's go over here
 09:01 and run this, and let's see, now we're doing index scan on price,
 09:09 what else do we got, rejected plans, okay so we got this and query
 09:18 and it looks like we're still using the— yes, oh my goodness,
 09:24 how about that for a mistake, comma, so what did that do
 09:28 that created, in Python you can wrap these lines and that just created this,
 09:33 and obviously, that's not what we want, that comma is super important there.
 09:38 So let me go over here and drop this nonsense thing,
 09:41 try this again, I can see it's building index right now,
 09:47 okay, once again we can explain this, okay great,
 09:51 so now we're using price and actually we use the description this time
 09:58 and you can see the rejected plan is the one that would have used the price,
 10:04 so we're using description, not price,
 10:06 and how long does it take to run that query— 7.9 milliseconds, that's better
 10:13 but what would be even better still is if we could do
 10:16 the description and price as a single thing. How do we do that?
 10:22 This gets to be a little trickier, if we look at the query we're running,
 10:25 we're first asking for the price and then the description,
 10:30 so we can actually create a composite index here as well,
 10:35 and we do that by putting a little dictionary, saying fields
 10:39 and putting a list of the names of the fields
 10:44 and you can bet those go like this,
 10:48 now this turns out to be really important, the order that you put them here
 10:52 price and the description versus description price, for sorting,
 10:56 not so much for matching, run it one more time,
 11:00 alright, expensive cars with spark plugs,
 11:04
 11:07 here we go, look at that, less than one millisecond,
 11:10 so we added one index, it took it from like 66 milliseconds down to 15,
 11:16 and then, we added the description one, it turns out that was a better index
 11:21 and it took it from 15 to 9, we added the composite index,
 11:24 and we took it from 9 to half a millisecond, a 0.6 milliseconds, that is really cool.
 11:31 Notice over here, this got faster, let's go back and look at what that is.
 11:36 Load cars, so this is the one we are optimizing
 11:40 and what are we doing here— let me wrap this so you can see,
 11:43 we're doing a count, okay, we're doing a count
 11:46 and so it's basically having the database do all the work
 11:48 but there's zero serialization.
 11:52 Now in this one, we're actually calling list
 11:55 so we're deserializing, we're actually pulling all of those records back
 11:59 and let's just go over here and see how many there are,
 12:03
 12:08 well that's not super interesting, to have just one, is it,
 12:12 alright, that's good, but let's actually make this just this,
 12:17
 12:23 let's drop this spark plug thing and just see
 12:26 how many cars there are with this,
 12:30 okay there we go, now we have some data to work with,
 12:33 65 thousand cars had 15 thousand dollar service or higher,
 12:36 after all, this is a Ferrari dealership, right.
 12:39 Now, it turns out it's a really bad idea to pull back that many cars,
 12:43 let me stop this, let's limit that to just a thousand here as well.
 12:52
 12:54 Okay, so we're pulling back thousand cars because we're limited to this
 13:00 and we're pulling back a thousand cars here.
 13:03 But notice, this car name and id versus the entire car
 13:08 so let's go over here cars with expensive service, car name and id,
 13:13 so notice the time, so to pull back and serialize those thousand records
 13:17 took actually a while, so it took one basically a second,
 13:21 and if we don't ask for all the other pieces,
 13:25 if we just say give me just the make, the model and the id,
 13:29 here we're using the only keywords, it says don't pull back the other things
 13:34 just give me the these three fields when you create them,
 13:37 it makes it basically ten times faster,
 13:40 let's turn this down to a 100 and see, maybe get a little more realistic set of data.
 13:44 Okay, there we go, a 100 milliseconds down to 14 milliseconds,
 13:49 so it turns out that the deserialization step in MongoEngine is a little bit expensive
 13:55 so if you like blast a million cars into that list, it's going to take a little bit.
 14:01 If we can express like I only want to pull back these items,
 14:05 than it turns out to be quite a bit faster,
 14:10 in this case not quite faster, but definitely faster.
 14:15 Let's round this out here and finish this up.
 14:17 Here we're asking for the highly rated, highly priced cars,
 14:20 we're asking like hey for all the people that come and spend a lot of money
 14:26 how did they feel about it?
 14:29 And then also what cars had a low price and also a low rating,
 14:33 so maybe we could have just somehow changed our service
 14:37 for these sort of cheaper like oil change type people.
 14:39 It turns out that that one is quite fast,
 14:42 this one we could do some work and fixing one will really fix the other
 14:46 so we have this customer rating thing, we probably want to have an index on,
 14:52 and we already have one on the price,
 14:54 so I think that that's why it's pretty quick actually.
 14:57 Go over here, and we don't yet have one on the price, on the rating rather,
 15:03 so we can do that and see if things get better,
 15:07 not too much, it didn't really make too much of a difference,
 15:12 it's probably better to use the price than it is the rating,
 15:16 because we're kind of doing that together, so we're also going to go down here
 15:19 and have the price and customer rating,
 15:21 one of these composite indexes, once again,
 15:24 and maybe if we change price one more time,
 15:29 rating and price— it doesn't seem like we're getting much better,
 15:36 so down here this is about as fast as we can get, 16 milliseconds
 15:40 and this is less than one millisecond, so that's really good.
 15:44 The final thing is, we are looking for high mileage cars,
 15:47 so let's go down here and say find where the mileage of the car
 15:51 is greater than 140 thousand miles, do we have an index on that,
 15:55 you can bet the answer is no.
 15:58 Now we could go to the shell and see that, but no we don't have one,
 16:01 so let's go up here and add one more,
 16:04 and this is in fact the only index we have here in this thing
 16:07 that is on like just plain field, not one of these nested ones like this;
 16:13 so maybe we also want to be able to select by year,
 16:16 so we could have one for year as well. I'm going to add those in.
 16:21 Now this high mileage car goes from a hundred and something milliseconds
 16:26 down to six, maybe one more time just to make sure,
 16:28 yep, 5, 6, seems pretty stable around there.
 16:32 So we've gone and we've added these indexes
 16:34 to our models, our MongoEngine documents by adding indexes
 16:40 and we can have flat ones like this, or we have these here,
 16:48 and we also can have composite ones or richer things,
 16:52 if we create a little dictionary and we have fields and things like that.
 16:57 Similarly an owner, we didn't have as many things we were after
 17:00 but we did want to find them by their name and by car id,
 17:03 so we had those two indexes,
 17:05 honestly this is just a simpler document than the cars.
 17:08 So with these things added here, we can run this one more time
 17:11 and see how we're doing that code all runs really quick,
 17:14 if we kind of scan through here, there's nothing that stands out like super bad,
 17:18 5 milliseconds, half, 18, 6, half, 1, 3, 1, let's say,
 17:26 this one, I really wish we could do better,
 17:29 it just turns out there is like so many records there
 17:32 that if we run that here you can see that the whole thing runs in one millisecond,
 17:38 super, super fast, we can't make it any faster than that.
 17:41 The slowness is basically the allocation,
 17:45 assignment, verification of 100 car objects.
 17:48 I'd like to see a little better serialization time out of MongoEngine,
 17:53 if you have some part of your code that has to load tons of these things
 17:56 and it's super performance critical, you could drop down to PyMongo,
 18:00 talk to it directly and probably in the case where you're doing that
 18:05 you don't need to pull back many, many objects,
 18:07 but also you can see that if we limit what we ask for down here,
 18:12 that goes back to 14 miliseconds which is really great,
 18:15 here we're looking at a lot of events, this is like 16 thousand
 18:21 or no, 65 thousand, that's quite a bit, this one is really fast,
 18:25 this one is really fast, so I feel like from an index perspective
 18:28 we've done quite a good job, how do we know we're done?
 18:32 I guess this is the final question, this has been a bit of a long—
 18:35 how do we know we're done with this performance bit?
 18:39 We know we're done when all of these numbers come by
 18:43 and they're all within reason of what we're willing to take.
 18:47 Here I have set this up as these are the explicit queries
 18:51 we're going to ask and then we'll just time them,
 18:54 like your real application does not work that way.
 18:56 How do you know what questions is your applications asking and how long it's taking.
 19:01 So you want to set up profiling, so you can come over here
 19:05 and definitely google how to do profiling in MongoDB,
 19:08 so we can came over here and let's just say, db set profiling level
 19:13 and you can use this function to say I'm looking for slow queries
 19:18 and to me slow means 10 milliseconds, 20 milliseconds something like that,
 19:23 it will generate a table called system.profile and you can just go look in there
 19:29 and see what queries are slow, clear it out,
 19:33 run your app, see what shows up in there
 19:35 add a bunch of indexes, make them fast, clear that table,
 19:38 then turn around and run your app again,
 19:43 and just until stuff stops showing up in there,
 19:46 you can basically find the slowest one, make it faster, clear out the profile
 19:51 and just iterate on that process, and that will effectively like gather up
 19:55 all of the meaningful queries that your app is going to do,
 19:59 and then you can go through the same process here
 20:01 to figure out what indexes you need to create.
--- a/transcripts/ch8-performance/8.txt
+++ b/transcripts/ch8-performance/8.txt
@ -0,0 +1,33 @@
 00:01 We've seen how powerful adding indexes to MongoDB is
 00:04 and I talked a little bit how the nested nature of these documents means
 00:09 there's naturally fewer primary keys,
 00:11 so there's fewer on average actual indexes
 00:15 that get created just as part of working with the database;
 00:18 so creating these indexes is even more important in document databases
 00:22 than it is in relational databases.
 00:24 So here we are in the shell, this would be Robomongo
 00:27 or just the Mongo command line interface
 00:30 and we can create an index on a collection by saying db.collection name
 00:33 so here we have cars.createIndex
 00:35 and then we pass it two things, first one required, second one optional
 00:39 we pass it the actual fields we want to create the index on;
 00:44 so here we have service_history.customer_rating
 00:48 so we could traverse this hierarchy if necessary
 00:51 we just use that dot like we have been in the shell the whole time
 00:55 and then we say one or minus one,
 00:57 so do you want to sort ascending or descending.
 00:59 And this mostly matters for either what you might consider the natural sort
 01:03 or if you're doing a composite key or a composite index
 01:08 and that composite index is being used for sorting on both fields
 01:12 and all the orders have to line up exactly for the sort to use that index.
 01:17 Then we can pass additional information,
 01:19 here we have background as true and the name,
 01:21 I like to name my indexes if I'm doing this shell
 01:24 because then it's easier to see like okay why did I create this index
 01:28 here we want the customer ratings of service,
 01:31 so that's pretty nice, background true, that's not the default
 01:35 but that means it will run basically in the background
 01:38 without blocking the database operations,
 01:41 if you don't put that, when you hit go
 01:43 the database will stop doing any sort of database stuff
 01:46 until this index is generated so be aware.
--- a/transcripts/ch8-performance/9.txt
+++ b/transcripts/ch8-performance/9.txt
@ -0,0 +1,39 @@
 00:01 Now if we're using MongoEngine,
 00:03 we don't have to go to the shell and manually type all the indexes
 00:05 we basically go to each individual top level document
 00:08 so all the things that derive from mongoengine.document
 00:11 not the embedded documents, and we go to the meta section
 00:14 and we add an indexes, basically array
 00:17 so here we want to have, you can see the blue stuff that's highlighted
 00:20 we want an index on make, we want an index on service history
 00:23 and within service history, remember these are service records showing on the bottom
 00:27 we went an index the description and price.
 00:30 So for index that we put 'make', that's straightforward
 00:34 and then we have service_history.customer_rating
 00:37 so service history is the field name
 00:39 and then customer rating is the field name of service record
 00:42 and for some reason I don't have it blue, it's that last one down there
 00:45 but we also want this composite key
 00:47 so service_history.price and service_history.description
 00:50 we want to be able to find where both of those match
 00:53 and we're going to do that up by having
 00:56 a more complicated entry in the indexes bit here
 00:58 this is going to be a dictionary where the fields are set
 01:00 to be this array of strings and not just the flat string itself.
 01:04 So once we add this, when we run our code,
 01:07 it's actually going to first time we work with that document
 01:10 ensure that all the indexes are there,
 01:12 and remember that like hung up our application for just a little bit,
 01:16 but the real benefit here is our app is always going to be in sync,
 01:21 we don't have to go oh oops, I forgot to add the index,
 01:24 that one particular index to say the staging server,
 01:27 or when I push to production are there new indexes,
 01:30 I got to go out on the database,
 01:32 now you don't worry about that, you just push your code,
 01:34 restart your web app or whatever kind of app it is,
 01:36 and then as part of interacting with it,
 01:38 it will make sure that those indexes are there.
 01:41 If you don't want that pause to be there,
 01:43 just go and create the indexes you know the thing is going to create
 01:48 put them on the production server and then push the new version of code
 01:50 and it will just go great, these indexes exist.