Transcripts for chapter 8

master
Michael Kennedy 7 years ago
parent aac0d882b3
commit edf7c0ca1b

@ -0,0 +1,53 @@
00:01 Now that you know how to work MongoDB, you know how to work its shell,
00:04 what the query syntax is, you've seen PyMongo as well as MongoEngine,
00:07 it's time to turn our attention to tuning MongoDB
00:11 to be the best database it can possibly be.
00:14 We're going to focus on how to make our regular MongoDB server
00:18 a high performance MongoDB database
00:21 and you'll see there's no magic here, a lot of the things that you can do
00:24 are relatively straightforward, and there's a systematic way to go about it.
00:29 I want to start this section by maybe putting a little perspective on it.
00:34 I want to start this section, this chapter, by putting a little perspective out there.
00:41 When people come to NoSql and they start looking for alternative databases
00:44 often the allure of these databases is their performance
00:49 you hear about things like sharding, horizontally scaling them,
00:52 some incredible performance numbers, things like that.
00:55 That may be what you really need, that may be the most important thing
00:58 and certainly if you don't have performance out of your database it's a big problem.
01:03 We're going to certainly figure out how to make our databases faster
01:07 and the variety of techniques that we have available to us in MongoDB.
01:11 That said, your biggest problem probably isn't performance,
01:14 you may have a big data problem, you may have terabytes or petabytes of data
01:19 but most applications don't.
01:22 You may have a performance problem, it may be that you have so much data
01:26 or you are asking such complex queries that it really does take
01:30 very precise tuning and scaling to make it work.
01:33 So we're going to focus on some of these types of things.
01:36 That said, we all have a complexity problem with our application,
01:40 it's always a pain to maintain these databases
01:43 especially when we're working with relational databases,
01:46 you hear about things like migrations and updating your schema
01:49 adding, removing, transforming columns, all of this stuff is really complex
01:53 and it even makes deployment really, really challenging,
01:56 you want to release a new version of something based on SQLAlchemy
01:59 but you need to change the database scheme before it will even run—
02:02 okay, that sounds like it could be a little bit of a problem.
02:05 What you'll see with MongoDB and these document databases is
02:10 one of their biggest benefits is the simplicity that they bring.
02:14 The document structure means there's fewer tables,
02:18 there is much fewer connections between these tables,
02:21 so when you think about the trade-offs and performance and things like that
02:24 keep in mind that probably the biggest benefit
02:27 that you are going to get from MongoDB is you are going to have
02:30 simpler versioning, evolution, maintainability, development story.
02:33 I just want to put that out there, because I know sometimes
02:36 people will say well, I got MongoDB to perform at this speed
02:40 and I cut this other database, and if I tweak it like this and adapt it like that
02:44 maybe I could get it to go a little faster, so maybe we should use that instead.
02:47 And maybe, I don't know, it depends on the situation,
02:50 and this is very abstract, so it's hard to say, but keep in mind
02:53 that one of the biggest things these document databases
02:55 bring to you to the table here, is this simplicity.
02:59 It just so happens we can also make them really, really fast.
03:03 So simple and fast, sounds like a great combination,
03:05 so let's get into this section where we are going to make MongoDB much faster.

@ -0,0 +1,92 @@
00:01 One of the most important things you can do for performance
00:04 in your database and these document databases
00:06 is think about your document design,
00:08 should you embed stuff, should you not, what embeds where,
00:11 do you embed just ids, do you embed the whole thing;
00:14 all of these are really important questions
00:16 and it takes a little bit of experience to know what the right thing to do is.
00:20 It also really depends on your application's use case,
00:24 so something that's really obviously a thing we should consider
00:28 is this service history thing, this adds the most weight to these car objects,
00:34 so we've got this embedded document list field
00:38 so how often do we need these histories?
00:44 How many histories might a car have?
00:46 Should those maybe be in a separate collection
00:49 where it has all the stuff that service record, the class has,
00:52 plus car id, or something to that effect?
00:56 So this is a really important question,
00:59 and it really depends on how we're using this car object, this car document
01:05 if almost all the time we want to work with the service history,
01:07 it's probably good to go in and put it here,
01:10 unless these can be really large or something to that effect,
01:13 but if you don't need them often, you'll consider putting them in their own collection,
01:16 there's just a tension between complexity and separation,
01:20 safety and separation, speed of having them in separate
01:24 so you don't pull them back all the time;
01:26 you can also consider using the only keyword or only operator in MongoEngine
01:30 to say if I don't need it, exclude the service history,
01:34 it adds a little bit of complexity because you often know,
01:38 hey is this the car that came with service history
01:40 or is it a car where that was excluded, things like that,
01:42 but you could use performance profiling and tuning
01:45 to figure out where you might use only.
01:48 Let's look at one more thing around document design.
01:50 You want to consider the size of the document,
01:52 remember MongoDB has a limit on how large these documents can be,
01:56 that's 16 MB per record, that doesn't mean you should think
02:01 oh it's only 10 MB so everything is fine for my document design,
02:05 that might be terrible this is like a hard upper bound,
02:07 like the database stops working after it hits 16 MB,
02:11 so you really want to think about what is the right size,
02:14 so let's look at a couple examples:
02:16 we can go to any collection and say .stats
02:18 and it will talk about the size of the documents and things like that,
02:21 so here we ran db.cars.stats in MongoEngine,
02:25 and we see that the average object size is about 700 bytes,
02:29 there is information about how many there are, and all that kind of stuff,
02:33 but really the most interesting thing for this discussion is
02:35 what is the average object size, 700 bytes
02:38 that seems like a pretty good size to me, it's not huge by any means,
02:42 and this is the cars that contain those service histories,
02:45 so this is probably fine for what we're doing.
02:48 Let me give you a more realistic example.
02:50 Let's think about the Talk Python Training website,
02:52 and the courses and chapters, we talked about them before,
02:56 so here if we run that same thing, db.courses.stats
03:02 you can see that the average object size is 900 bytes for a course,
03:07 and remember the course has the description that shows on the page
03:10 and that's probably most the size, it has a few other things as well,
03:13 like student testimonials and whatnot,
03:16 but basically it's the description and a few hyperlinks.
03:19 So I think this is again a totally good object, average object size.
03:23 Now one of the considerations was I could have taken the chapters
03:27 which themselves contain all the lectures,
03:29 and embedded those within the course,
03:32 would that have been a good idea—
03:34 I think I might have even had it created that way
03:36 in the very beginning, and it was a lot slower than I was hoping for,
03:38 so I redesigned the documents.
03:40 If we run this on this chapter section, you can see
03:43 that the average object size is 2.3 KB,
03:46 this is starting to get a little bit big, on its own it's fine,
03:50 but think about the fact that a course on average has like 10 to 20 chapters,
03:55 so if I embedded the chapters in the course
03:58 instead of putting them to a separate document like I do,
04:02 this is how it actually runs at the time of the recording,
04:04 then it would be something like these courses would be
04:07 24 up to maybe 50 KB of data per entry,
04:12 think about that you go to like the courses page
04:15 and it shows you a big list of all the courses
04:17 and there might be 10 or later 20 courses,
04:20 we're pulling back and deserializing like megabytes of data
04:24 to render a really, really common page, that is probably not ok,
04:28 so this is why I did not embed the chapters and lectures inside the course,
04:34 I just said okay, this is the breaking point
04:37 I looked at the objects' size I looked at where the performance was
04:41 and I said you know what, really it's not that common
04:44 that we actually want more than one chapter at a time,
04:46 but it is common we want lectures, so it's probably the right partitioning,
04:51 but you build it one way, you try it, it doesn't work,
04:53 you just redesign your class structure, recreate the database and try it again,
04:57 but you do want to think about the average object size
05:00 and you can do it super easy with db.colection name.stats.

@ -0,0 +1,35 @@
00:01 One of the last simple tools you have in your tool belt
00:04 when we're working with MongoEngine or even in PyMongo, just different api
00:08 is this ability to restrict the data returned from the document.
00:13 In our car object we've got the make, the model, the id, some other things,
00:17 we've got the engine which is a subdocument or an embedded document there
00:22 and then the biggest thing that contributes to the size
00:25 is actually the service history which might be many service record entries.
00:30 If really all we care about is the make, the model and the id of a car,
00:34 and we're going to create like a list or something like that,
00:36 we can use this .only operator here
00:39 and dramatically reduce the amount of data returned from MongoDB
00:43 so this is an operation that we saw when we first learned about the api
00:46 actually operates at the database level,
00:48 you're able to restrict the elements returned from the queries
00:52 so when it gets back to MongoEngine
00:54 basically it looks at what comes back and it says,
00:57 alright, I need to create some cars
00:59 and I need to set their make to this, the model to that
01:01 and their id to whatever comes back,
01:03 and then nothing else is transferred, deserialized, anything.
01:05 So you can, if you don't need them, exclude the heavyweight things
01:09 like the engine and the service histories for this particular use case.
01:12 So this is kind of like select make, model, id from table such and such in SQL,
01:20 and it really can improve the performance
01:22 especially when you have either large documents or many documents.
01:27 So you've seen a lot of different ways to turn the knobs of MongoDB
01:31 to make it faster and to use MongoEngine to control those knobs.
01:35 Now this applies to a single individual database server
01:38 and if you use this to tune your database,
01:41 you can actually make the need for having a sharded cluster
01:45 and all these scaling things possibly go away,
01:48 but even if you do end up with one of these more interesting topologies,
01:52 all of these techniques still apply and they'll make your cluster go faster,
01:56 they'll make your replicas go faster, all of those things.
01:59 What you've learned here are really the foundational items of making MognoDB go fast.

@ -0,0 +1,124 @@
00:01 You've heard MongoDB is fast, really fast,
00:03 and you've gone through setting up your documents and modeling things,
00:07 you inserted, you imported your data, and you're ready to go;
00:11 and you run a query and it comes back,
00:13 so okay, I want to find all the service histories
00:15 that have a certain price, greater than such and such, how many are there—
00:18 apparently there's 989, but it took almost a second to answer that question.
00:22 So this is a new version of the database, so we are going to talk about it shortly.
00:27 Instead of having just a handful of cars and service histories
00:30 that we maybe entered in our little play-around app,
00:32 it has a quarter million cars with a million service histories, something to that effect.
00:38 And the fact that we were able to answer this query
00:41 of how many sort of nested documents had this property
00:44 in less than a second, on one hand that's kind of impressive,
00:48 but to be honest, it feels like MongoDB is just dragging,
00:51 this is not very special, this is not great.
00:55 So this is what you get out of the box, if you just follow what we've done so far
00:59 this is how MongoDB is going to perform.
01:02 However, in this chapter, we're going to make this better, a lot better.
01:06 How much— well, let's see, we're going to make it fast,
01:09 here's that same query after applying just some of the techniques of this chapter.
01:13 Notice now it runs in one millisecond, not 706 milliseconds.
01:17 So we've made our MongoDB just take off,
01:21 it's running over 700 times faster than what the default MongoDB does.
01:26 Well, how do we do it, how do we make this fast?
01:30 Let's have a look at the various knobs
01:33 that we can turn to control MongoDB performance.
01:35 Some of which we're going to cover in this course,
01:38 and some are well beyond the scope of what we're doing,
01:40 but it's still great to know about them.
01:42 The first knob are indexes, so it turns out
01:44 that there are not too many indexes added to MongoDB by default,
01:48 in fact, the only index that gets set up is on _id
01:52 which is basically an index as well as a uniqueness constraint,
01:55 but other than that, there are no indexes,
01:57 and it might be a little non intuitive at first, when you first hear about this,
02:02 but indexes and manually tuning and tweaking and understanding the indexes
02:06 in document databases is far more important
02:10 than understanding indexes in a third normal form designed relational database.
02:15 So why would that be? That seems really odd.
02:18 So think about a third normal form database,
02:21 you've broken everything up into little tiny tables that link back to each other
02:24 and they often have foreign key constraints traversing all of these relationships,
02:28 well, those foreign key constraints go back to primary keys on the main tables,
02:33 those are indexed, every time you have one of those relationships
02:35 it usually at least on one end has an index on that thing.
02:39 In document databases, because we take some of those external tables
02:43 and we embed them in documents,
02:45 those subdocuments while they kind of logically play the same role
02:49 there is no concept of an index being added to those.
02:52 So we have fewer tables, but we still have
02:55 basically the same amount of relationships
02:57 and because of the way documents work,
02:59 we actually have fewer indexes than we do in say a relational database.
03:04 So we're going to see that working with understanding
03:07 and basically exploring indexes is super, super important
03:09 and that's going to be the most important thing that we do.
03:12 In fact, the MongoDB folks, one of their things they do is
03:16 they sell like services, consulting and what not to help their customers
03:19 and you could hire them, say hey I got this big cluster and it's slow
03:24 can you help me make it faster— the single most dramatic thing that they do,
03:30 the thing that almost always is the problem is incorrect use of indexes.
03:34 So we're going to talk about how to use, discover and explore indexes for sure.
03:38 Next is document design, all that discussion about to embed or not to embed,
03:43 how should you relate documents, this is sort of the beginning of this conversation,
03:47 it turns out the document design has dramatic implications across the board
03:52 and we did talk quite a bit about this, but we'll touch on it again in this chapter.
03:56 Query style, how are you writing your queries,
04:01 is there a way that you could maybe restructure a query,
04:05 or ask the question differently and end up with
04:08 a more high performance query, maybe one example misses an index
04:12 and the other particular example uses a better index or something to this effect.
04:16 Projections and subsets are also something that we can control,
04:20 remember when we talked about the Javascript api
04:23 we saw that you could limit your set of returned responses
04:26 and this can be super helpful for performance;
04:29 you could write a query where it returns 5 MB of data
04:32 but if you restrict that to just the few fields that you actually care about
04:36 maybe its all K instead of 5 MB, it could be really dramatic,
04:40 depending on how large and nested your documents might be.
04:43 We're going to talk about how we can do this, especially from MongoEngine.
04:46 These are the knobs that we're going to turn in this course,
04:49 these are the things that will work even if you have a single individual database,
04:53 so you should always think about these things,
04:56 some of them happen on the database side, document design, indexes,
04:59 and the other, maybe is in your application interacting with the database, the other two,
05:04 but MongoDB being a NoSql database, allows for other types of interactions,
05:08 other configurations and network topologies and so on.
05:11 So, one of the things that it supports is something called replication,
05:14 now replication is largely responsible for redundancy and failover.
05:19 Instead of just having one server I could have three servers,
05:22 and they could work in triplicate, basically one is what's called the primary,
05:26 and you read and write from this database,
05:28 and the other two are just there ready to spring into action,
05:31 always getting themselves in sync with the primary,
05:34 and if one goes down, the other will spring in to be the primary
05:36 and they will sort of fix themselves as the what used to be the primary comes back.
05:40 There is no performance benefit from that at all.
05:43 However, there are ways to configure your connection to say
05:46 allow me to read not just from the primary one, but also from the secondary,
05:50 so you can configure a replication for a performance boost,
05:53 but mostly this is a durability thing.
05:55 The other type of network configuration you can do is what's called sharding.
05:59 This is where you take your data instead of putting all into one individual server,
06:02 you might spread this across 10 or 20 servers,
06:06 one 20th, hopefully, of evenly balanced,
06:09 across all of them, and then when you issue a query,
06:12 can either figure out where if it's based on the shard key,
06:15 which server to point that at and let that one
06:17 handle the query across the smaller set of data,
06:20 or if it's general like show me all the things with greater than this for the price,
06:23 it might need to fan that out to all 20 servers,
06:26 but it would run on parallel on 20 machines.
06:30 So sharding is all about speeding up performance,
06:32 especially write performance, but also queries as well,
06:35 so you can get tons of scalability out of sharding,
06:38 and you can even combine these like, when I said there is 20 shards,
06:41 each one of those could actually be a replica set,
06:43 so there is a lot of stuff you could do with network topology
06:46 and clustering and sharding and scaling and so on.
06:48 We're not turning those knobs in this course,
06:50 I'll show you how to make individual pieces fast,
06:52 the same idea applies to these replicas and shards,
06:54 just on a much grander scale if you want to go look at them.

@ -0,0 +1,30 @@
00:01 Let's return to our dealership.
00:03 This was the example we started back when we began the MongoEngine section,
00:05 and it turns out the dealership is super popular now.
00:08 Before we just had a couple of cars, now we have a quarter million cars
00:11 in our database, we have a 100 thousand owners,
00:15 I don't believe we talked about owners before in terms of what that looks like in our code,
00:19 but I've added this concept of owners
00:21 so we can ask interesting like cross-document related type questions,
00:25 and we'll look at the details of them, when we get to the code, in just a moment.
00:28 Each one of these owners, these 100 thousand owners,
00:31 owns an average of 2.5 cars, this is kind of like collectors, right,
00:36 not a standard person that drives to work or whatever, these are Ferraries,
00:39 and each car has on average about 5 service records
00:43 and that could be like a new engine,
00:45 change the tires, change the spark plug, whatever;
00:48 in particular, there's about 1.25 million service histories,
00:51 so when we ask questions about like those nested documents
00:54 that have to do with service histories like customer ratings and price,
00:57 you can see that that is really quite impressive I think,
01:01 we got the quarter million cars and within those quarter million documents
01:04 interspersed are 1.25 million service histories.
01:07 So our job is to make a lot of the typical things that we might ask this database,
01:11 the queries will run to do so in a couple of milliseconds, not in seconds,
01:17 so that's going to be what the basic goal of this whole section is.
01:23 Now, the other things you might want to know is
01:25 we've got about 180 megs of data
01:28 and on average each document of the various document kinds,
01:30 all average together is about 500 bytes per document.
01:33 So let's return to or example slightly transformed
01:37 and see how it's performing now and let's make it fast.

@ -0,0 +1,42 @@
00:01 Here we are in the github repository for this course
00:04 and notice we have this data section
00:06 and in here I have this thing called dealership db 250 K
00:09 that is this data that I just talked about,
00:12 with the 250 thousand cars, 100 thousand owners, that sort of thing.
00:16 So I'm going to put that over here on the desktop and unzip it
00:21 and if we look in here, you'll see that there's a cars collection and an owners collection,
00:29 and I don't believe we've spoken about how to get this data into MongoDB,
00:33 so let's go over here and I'll use RoboMongo,
00:37 notice we have these two dealership things that I have been playing with
00:42 and I want to create one called like test dealership or something to that effect.
00:46 We're going to restore this— how do we do that,
00:51 we'll go like this, we'll say mongorestore
00:55 and this is the way that we get this exported data imported into MongoDB,
01:00 now, the first thing you have to ask yourself is this additive to the database,
01:05 if it exists do you want to also insert this,
01:07 or do you want to have this be the database and replace anything it exists,
01:11 we want this one to replace existing data
01:14 so I'll say --drop and then I need to tell it what database
01:18 so I'll say db and I could say what you should say is this dealership,
01:23 but just because I don't want to wipe away what I currently have,
01:28 I'll say dealership example, but the code that you're going to run
01:31 expects the name of the database to be just a dealership;
01:34 and then I need to give it the folder that it's going to work from,
01:37 so I am just going to give it this folder like so, all right.
01:41 So mongorestore, drop to replace the data -- db to name it, and then the location,
01:46 we hit go, and it's going to go cranking away on this
01:50 and you can see it's inserting, inserting and done,
01:53 that was really fast for like close to 1.5 million records.
01:57 All right, so let's go over here and refresh
02:00 and here's our example and we can see that we have our collection,
02:03 here's our cars and we could just ask how many cars are there.
02:07 Notice, there is that many, and if we change this to owners,
02:11 remember you can also write it like this, owners like this,
02:16 Now notice, I think the restore data we got here,
02:19 you want to drop this index right here, I have it only have the id indexes, ok
02:26 so that's this example I just restored,
02:28 we're going to work with something you can imagine is exactly the same.
02:33 So we're going to work with this dealership code
02:36 but the way it got there, I'll show you the app I used to originally create it,
02:39 and then I just restored it using mongorestore just as I showed you up here.
02:43 So the way to generate the data that goes into mongorestore, you say mongo dump.

@ -0,0 +1,104 @@
00:01 Let's explore this slightly updated version of our code.
00:04 Here we are in the github repository,
00:07 and I am in the source folder and I've added an 08_perf section,
00:10 and we have the starter_big_dealership and we have the big_dealership
00:15 it even has instructions here to tell you basically how to restore
00:17 that database we did just in the previous video.
00:20 This one is going to be a snapshot of how this chapter starts,
00:24 it's what we're starting from now and will remain that way;
00:27 here we're going to take basically a copy of that one
00:29 and evolve it into the fast high performance version,
00:33 so let's go over here and see what we've got.
00:35 Now, we have a few things that are slightly different,
00:38 the car is basically unchanged from before
00:41 although I added a little comment about how do we get to the owners.
00:45 The one thing that is new here, in terms of the model is this owner idea,
00:50 so cars can now have an owner
00:52 and how do we know which cars are owned by this owner
00:57 is we have a list of object ids, those object ids are the object ids of the cars
01:02 so we're going to push the ids of the cars that are owned here
01:06 I guess we could run it as a many to many or one to many relationship,
01:10 just depending on how we treat the owner, but theoretically,
01:13 we can have owners where there is a single car that is multiple owners
01:17 and there are owners that own multiple cars, and we can manage it this way,
01:21 you almost never see like a car to owner intermediate table,
01:25 so you're almost always going to have something like
01:27 those ids are either embedded in the owner or in the car,
01:32 or under rare circumstances both.
01:35 So here's how we refer back to the cars,
01:38 then we have a few basic things like the name,
01:41 when was this owner created, how many times have they visited and things like that.
01:45 We want to call it owners in the database and it's just this core collection,
01:49 so other than that, there's not a whole lot going on here,
01:51 let's look over here, we now have these services,
01:54 I've taken all the car queries and moved them down here
01:57 do you want to create a car, you call this function,
01:59 do you want to record a customer visit, here we can go to the owner
02:03 and we can use this increment operator
02:06 to increment the number of visits in place.
02:09 Find cars by make, find owner by name and so on.
02:16 Number of cars with bad service, a lot of this stuff is what we wrote previously;
02:20 there was the program thing that we ran over here that was interactive
02:23 and I've replaced that with a few things,
02:25 one is this db stats and you can run this and it will tell you
02:28 like how many cars are there, how many owners are there,
02:31 what's the average number of histories,
02:33 this is basically those stats that I presented to you before,
02:36 this takes a while to run on this database, I don't recommend you run it
02:39 but if you want to just run it and see what you get you can.
02:42 The database was originally created using this script,
02:46 I am using something interesting you may not have heard about,
02:49 I am using this thing called Faker, so down here
02:53 Faker lets you create this thing and I'm seeding it
02:59 so it always generates exactly the same things,
03:01 I'm seeding random and fake and you can see down here
03:04 it's creating the owners and you can ask it for things like
03:06 give me a fake name, give me a fake date
03:08 between these two dates, things like that.
03:12 Similarly with cars, we're using random to get a hold of a lot of the numbers
03:15 then we can use fake for anything else we might.
03:18 We ran this, with the right amount of data, it'll build it all up for us,
03:25 so for some reason if you need to recreate it
03:27 run this low data thing, you can have it create a small one,
03:30 if you comment, uncomment that or a large one
03:32 if you only run it with those settings.
03:34 Those are all good, this is like the foundation and this is where we are.
03:37 Next, we're going to ask interesting questions of this database
03:43 and we want to know how long those questions take to answer,
03:46 so I've written this super simple function called time
03:48 you pass it a message and a function,
03:50 it will time how long the function takes to run
03:53 and then print out the message along with the time in terms of milliseconds.
03:57 And then we're going to go through
03:59 and we're going to ask interesting questions here
04:01 like how many owners, how many cars, who is the 10 thousandth owner,
04:05 notice the slicing here to give us a slice of item of length one
04:10 and then we'll just access it,
04:12 and then we can start asking interesting questions like
04:14 how many cars are owned by the 10 thousandth owner,
04:17 or if we go down here, how many owners own the 10 thousandth car,
04:21 so ask it in the reverse direction.
04:23 Here we want to find the 50 thousand owner by name,
04:26 so yes, technically have them but the idea is
04:30 we want to do a query based on the name field
04:32 and we originally won't have any performance
04:35 around these types of queries so it should be slow.
04:38 This one, how many cars are there with expensive service
04:40 this was the one with the snail
04:43 and in one of the first videos in this chapter,
04:46 I showed you look this takes 700 milliseconds to run to ask this question
04:49 how many cars have a service history with a price greater than 16800.
04:55 So we're going to be to be able to ask all of these questions
04:58 and this program will let us explore that
05:02 and we'll see how to add indexes
05:04 and I'll show you how to add indexes in the shell
05:06 and how to add them in MongoEngine, and MongoEngine is really nice
05:09 because as you evolve your indexes, as you add new ones
05:13 simply deploying your Python web app
05:15 will adapt the database that it goes and finds
05:18 to automatically upgrade to those indexes, so it's really really nice.
05:22 So here you can see we're going to run this code and ask a bunch of questions
05:25 we could load the data from here, we could generate the data,
05:28 but you're much better off importing the data from that zip file
05:32 because this takes like half an hour to run,
05:35 you saw that zip takes like five seconds.

@ -0,0 +1,165 @@
00:01 Let's go ahead and run this code, you've seen the minor changes
00:04 like the addition of this concept of an owner,
00:06 and how we generated all this data, and how you can restore it.
00:09 Let's go ahead and run it, and see what's happening.
00:13 Let's look at this from two perspectives, let's begin over actually in Robomongo,
00:17 so we're going to ask the question, basically how many owners own a certain car
00:21 the idea is more or less we're going to call this function which goes right here,
00:25 really what we're looking for is this query,
00:28 find me all of the owners where this car id is in their car ids collection,
00:33 just generate and deserialize that.
00:37 The other one that we're going to focus on is
00:39 show me the cars with the expensive service history,
00:42 how many cars or what cars had some kind of service
00:46 that cost over 16800 dollars.
00:49 Let's begin by looking at those in Robomongo.
00:54 Here we have this concept, we could simplify this a little bit, but it doesn't matter,
00:57 cars here's the service history, let's go to the price
01:00 where that's greater than 16800, how many of them are there.
01:05 If I run this, notice, it took a while to come back,
01:08 run it again, here's the speed right there, 0.724 sec, 0.731, 0.733,
01:14 so it's pretty reliably taking around 700 milliseconds to answer that question.
01:19 We're going to come back to this.
01:22 Here's a more interesting example, like go and randomly grab a car
01:25 somewhere deep in the list, in this case I put 61600,
01:30 grab that car and then find me all the owners,
01:33 where that car id appears in their id list, and then we'll just dump that out,
01:38 by saying var it doesn't appear if you just state the name it will show up down here,
01:43 so make sure to deselect it and run this,
01:45 and this is actually surprisingly fast, given all the stuff that's going on here,
01:48 but it's taking still about 75, 80 milliseconds to run here,
01:53 which, I don't know, maybe in your database
01:55 going across a 100 thousand records 80 milliseconds seems decent,
01:59 I can tell you in MongoDB 80 milliseconds is terrible
02:02 you should really think about making something that's 80 milliseconds faster
02:06 it's not always possible you can do it,
02:08 but most of the queries as we'll see are possible.
02:11 Let's take this one and just try to understand what's happening here
02:16 and then we're going to go look at it in Python,
02:19 but let's just explore it here in the shell for just a moment.
02:21 Why is this taking 700 milliseconds?
02:24 MongoDB has this way to basically ask how are you running this query,
02:29 and the way you do that is you say explain, like so,
02:35 so I can say this query instead of giving me a result tell me how you're running it,
02:38 if I unselect it, it just runs the selected stuff if there's something there,
02:42 so we can go and look at it in this mode,
02:44 so it says okay, here's what the query planner found for you,
02:47 we've parsed this query, and this is something
02:50 it's basically what went into the find,
02:52 it also might have something to the effect of like a sword
02:55 and other things that are happening, but this is a simple query.
02:58 Look down here, see this winning plan, stage column scan,
03:02 that is bad, that is really, really bad.
03:05 Also notice the rejected plan, so if there are multiple indexes
03:08 and other things that could have done
03:10 it might have attempted a bunch of them and said no, no, no this is the best,
03:13 let's see it doesn't seem to tell us any more about what it did there,
03:18 like sometimes it'll tell you how many records it scanned and things like this,
03:21 but it's just basically reading entirely in the forward direction
03:25 over this and just doing a comparison.
03:27 So that's why this was taking 700 milliseconds
03:32 as it was literally reading and comparing 100 thousand entries
03:36 or actually more, remember their is 1.2 million search histories
03:40 across those 250 thousand cars, so not 100 thousand,
03:43 1.2 million records it scanned over, that's bad, you don't want that.
03:47 So what we can do is we can actually add an index,
03:51 now there's two ways to add an index,
03:54 but before I add the index, let's go over here
03:58 just explain is super, super valuable,
04:00 any time something is slow we're going to explain
04:03 there's actually way to turn on profiling and say log all of the queries
04:07 that you see MongoDB that are slower than x,
04:11 you providing them like say 10 milliseconds might be great,
04:14 show me all the queries that take more than 10 milliseconds
04:17 and then you can drop them in here, put an explain
04:19 and then start creating indexes to make them faster.
04:22 So just google mongodb profile enable slow queries
04:26 or something like this, it's pretty straightforward.
04:29 Now let's run this code, we're asking a lot of questions
04:31 what we want to run is q and a, so we go over here and just right click and say run,
04:37 notice some of these things are taking time,
04:42 the database might be cold, it might have not loaded that stuff,
04:46 so let me run it one more time just to be fair,
04:49 there's a few things that are already really fast, and that's cool,
04:55 so let's go here and review, how many owners are there—
04:58 well, I can tell you it doesn't show the answer
05:01 it just sort of says this is the question I'm asking here is how long it takes.
05:04 Three milliseconds, that is solid, how many cars— half a millisecond.
05:07 That's pretty solid, I don't think we can improve the count on the entire collection
05:11 but this one, find the 10 thousandth owner— not good,
05:14 so let's see how many cars are owned by that person—
05:19 this is pretty fast actually, this is surprisingly fast,
05:23 how many owners this can have— 66 milliseconds
05:26 that's the one we were looking at in there.
05:29 I'm going to take these numbers and put them over here,
05:32 let's say, this will be Without indexes
05:36 we're going to get this, we don't really care about the exit code, do we?
05:41 With indexes, and we're going to kind of iterate on this a little bit
05:45 so let's begin over here, and we're going to talk about
05:49 how we can add an index in MongoDB and then for the most part
05:55 do this in MongoEngine because it's really part of the way our application works,
06:00 what the indexes are, and it's better to make that part of our document
06:03 then kind of do a separate database setup step;
06:07 we could create a script in Javascript and run it,
06:09 it will do these things and that may be fine, but let's go over here and work on this.
06:14 Again we had the count, here's the almost 800 milliseconds,
06:19 let's go over here and just I'll take this, I'll make a copy,
06:24
06:28 so here is what we can do, instead of doing the find operation
06:31 we can say create index,
06:35 and then we have the thing that we're doing the query on,
06:38 most the time this is one item but you can have composite indexes
06:43 they are a little more nuance so we'll talk about them later,
06:45 but let's just do this one, we want to be able to query by service history's price
06:52 Here we can put one of two things, one or minus one,
06:56 what do you want the default sort, descending or ascending?
06:59 A lot of times it doesn't really matter,
07:01 it can read from the back or it can read from the front, whatever,
07:04 you saw the forward direction on our column scan for example.
07:06 So over here we could say one, this creates an index, there's no count;
07:09 the other thing we can do is we can give it a name
07:13 so we can come over here and say name is search by service history price,
07:24 so if we go look in this little indexes, we'll see the name here,
07:27 we can also say run in the background,
07:30 if I don't say that it's going to block the database until the index is generated,
07:33 if you're doing this in production, and you have tons and tons of data
07:36 maybe background is the way to go.
07:38 Okay, anyway let's go ahead and run this and see what happens.
07:41 Notice the pause, this is it's actually computing the index
07:44 right now the database is effectively down, now it's back,
07:47 what do we get ok, we created collection automatically know it already existed
07:51 a number of indexes before was one, now we have two
07:54 and everything was a ok so if I refresh,
07:57
07:59 here's that index and I can actually edit this over here in Robomongo,
08:05 go for the advanced properties, here is the create index and background
08:09 whether it's sparse, how long it lives,
08:11 whether it's based on text search or whatever, but here's just the basic thing.
08:15
08:18 We've added this index, remember this took 800 milliseconds
08:21 ask the same question now, boom, 8 milliseconds.
08:24 Ask it one more time, 2, here we go, 2, 2, 2, 3, 2, 2,
08:28 right, the screen sharing is probably put in a pretty heavy load on the server
08:32 that's also the database server, right but still,
08:35 we're getting it down 350, 400 times faster by adding that.
08:39 Now if I go back and I ask that question explain
08:42 now we get something way better, winning plan is index scan
08:50 index name search by service history price, that is really awesome;
08:57 that means we're using our index which is so much faster.
09:02 There was no rejected plans, so it only found one index
09:06 it tried to use it if found that it was awesome, it's very happy.
09:09
09:16 Go back to my account more time,
09:21 boom 2 milliseconds, and that's a really good answer,
09:24 let's go run our Python code and see what answers we get now,
09:27 that was already faster, let's go over here
09:32 and load car name and ids with expensive prices and spark plugs,
09:38 20 milliseconds this is actually a pretty complicated query
09:43 we'll get into cars with expensive service, 1.9 milliseconds.
09:47 This is exactly what we saw in Robomongo,
09:51 so over here in MongoEngine, we're getting essentially the same results— how cool is that?
09:56 Very nice, we're going to go through and in Python from now on
10:02 we're going to add the necessary index to start making these
10:05 almost all of these run super fast, all of them run fast
10:09 some of them we can get incredibly fast, like one millisecond,
10:11 others not quite that fast, but we'll still do good on all of them.

@ -0,0 +1,298 @@
00:01 Now that you've seen how to create indexes in the shell in Javascript effectively,
00:04 let's go and see how to do this in MongoEngine.
00:07 I think it's preferable to do this in MongoEngine because that means
00:11 simply pushing your code into production will ensure
00:14 that the database has all the right indexes set up for to operate correctly.
00:19 You theoretically could end up with too many,
00:21 if you have one in code and then you take it out
00:23 but you can always manage that from the shell,
00:26 this way at least the indexes that are required will be there.
00:29 I dropped all the indexes again, let's go back through our questions here
00:33 and see how we're doing.
00:36 It says how many owners, how many cars,
00:38 this is just based on the natural sort however it's in the database
00:41 there's really nothing to do here,
00:44 but this one, find the 10 thousandth car by owner, let's look at that;
00:48 that is going to basically be this name, we'll use test,
00:55 it doesn't really matter what we put here
00:57 if we put explain, this should come back as column scan or something like that,
01:01 yeah, no indexes, okay, so how long did it take to answer that question?
01:06 Find the 10 thousandth owner by name,
01:12 it didn't say by name, I'll go and add by name,
01:16 well that took 300 milliseconds, well that seems bad
01:21 and look we're actually using sorting,
01:24 we're actually using paging skip and limit those types of things here,
01:27 but in order for that to mean anything, we have to sort it,
01:31 it's really the sort that we're running into.
01:34 Maybe I should change this, like so,
01:38 sort like so, we could just put one, I guess it's the way we're sorting it,
01:47 so here you can see down there the sort pattern name is one
01:49 and guess what, we're still doing column scan.
01:53 Any time you want to do a filter by, a greater than, an equality,
01:56 or you want to do a sort, you need an index.
01:59 Let's go over to the owner here, this is the owner class
02:04 and let's add the ability to sort it by name or
02:08 equivalently also do a filter like find exactly by name,
02:12 so we're going to come down here
02:14 we're going to add another thing to this meta section,
02:16 and we're going to add indexes,
02:20 and indexes are a list of indexes,
02:25 now this is going to be simple strings
02:28 or they can be complex subdictionaries,
02:31 for composite indexes or uniqueness constraints, things like that,
02:34 but for name all we need is name.
02:38 Let's run this, first of all, let's go over here
02:41 and notice, if I go to owners and refresh, no name,
02:46 let's run this code, find the 10 thousandth owner by name,
02:52 19 milliseconds, that's pretty good,
02:55 let me run it one more time,
02:57 15 yeah okay, so that seems pretty stable,
03:00 and let's go over here and do a refresh, hey look there's one by name;
03:03 we can see it went from what was that,
03:08 something like 300 milliseconds to 15 milliseconds, so that's good.
03:11 How many cars are owned by the 10 thousandth owner,
03:15 so that's 3 milliseconds, but let's go ahead and have a look at this question anyway.
03:19 How many cars are owned by the 10 thousandth owner,
03:22 so here's this function right here that we're calling
03:25 it doesn't quite fit into a lambda expression, so we put it up here
03:28 so we want to go and find the owner by id,
03:30 that should be indexed right, that should be indexed right there
03:34 because it's the id, the id always says an index,
03:36 and now we are saying the id is in this set,
03:40 so we're doing two queries, but both of them are hitting the id thing,
03:44 so those should both be indexed and 3 milliseconds,
03:47 well that really seems to indicate that that's the case.
03:50 How many owners own the 10 thousandth car, that is right here.
03:54 So we'll go find the car, ask how many owners own it.
03:59 Now this one is interesting, so remember when we're doing this
04:02 basically this in query, let's do a quick print of car id here,
04:11 so if we go back over to this, we say let's go over to the owners
04:17 save your documents, so this is going to be car ids,
04:21 it's going to have an object id of that,
04:26 all right, so run this, zero records, apparently this person owns nothing,
04:33 but notice it's taking 77 milliseconds, we could do our explain again here
04:37 and column scan, yet again, not the most amazing.
04:43 So what we want is we want to have an index on car ids, right
04:48 because column scan, not good,
04:50 I think it's not really telling us in our store example
04:53 but for the find it definitely should be.
04:55 So we can come back to our owner over here,
04:58 let's add also like an index on car_ids,
05:02 If we'd run this once again, just the act of restarting it
05:05 should regenerate the database, how long did it take over here—
05:09 a little late now isn't it, because I did the explain,
05:13 I can look at this one, how many cars,
05:16 how many owners does the 10 thousandth car have,
05:19 66 milliseconds, if we look at it now—
05:22 how many owners own the 10 thousandth car, 1.9 milliseconds,
05:29 so 33 times faster by adding that index, excellent,
05:34 find the 50 thousandth owner by name, that's already done.
05:38
05:40 Alright we already have an index on owners name so that goes nice and quick,
05:45 and how is this doing, one millisecond perfect,
05:48 this one is super bad, the cars with expensive service 712 milliseconds,
05:52 alright so here, we're looking at service history
05:56 and then we're navigating that .relationship, that hierarchy,
06:00 with the double underscore, going to the price,
06:02 greater than, less than, equal it doesn't matter,
06:05 we're basically working with this value here, this subdocument.
06:08 Let's go over to the car and make that work,
06:11 now the car doesn't yet have any indexes but it will in a second,
06:14 so what we want to do is represent that here
06:17 and in the the raw way of discussing this with MongoDB
06:21 we use . (dot) not double underscore, so . represents the hierarchy here.
06:25 Let's run that again, notice expensive service, 712,
06:30 cars with expensive service, instead of 712 we have 2.4 milliseconds,
06:39 now notice that first time I ran it there is was a pause,
06:42 the second time it was like immediate,
06:45 and that's because it basically was recreating that index
06:47 and that pause time was how long that index took to create.
06:51 So here we have cars with expensive service,
06:53 now we're getting into something more interesting, look at this one with spark plugs,
06:58 we're querying on two things, we're querying on the history and the service,
07:04 let's actually put this over in the shell so we can look at it.
07:07
07:19 I've got to convert this over, do the dots there,
07:23 this is going to be the dollar greater operator, colon, like so,
07:30 all right, so we're comparing that service history.price
07:35 and this one, again because you can't put dots in normal json,
07:39 do the dot here and quotes, and this one is just spark plugs,
07:46 alright, let's run this, okay 22 milliseconds,
07:52 how long is it taking over here— 20 milliseconds,
07:56 so that's actually pretty good and the reason I think it's pretty good is
07:59 we already have an index on this half
08:02 and so it has to just basically sort the result, let's find out.
08:05
08:11 Winning plan, index on this one, yes, exactly
08:14 so this one is just going to be crank across there
08:18 but we're going to use at least this index here, this by price
08:22 so that gets part of the query there.
08:25 Now maybe we want to be able to do a query just based on the description
08:30 show me all the spark plugs, well that's a column scan,
08:33 so let's go back and add over here one for the description.
08:40 Now how do I know what goes in this part,
08:44 see I have a service history here, if we actually look at the service record object
08:49 it has a price and description, right
08:52 so we know that that results in this hierarchy of
08:54 service history.price, service history.description.
08:57 If we'd run this again, it will regenerate those and let's go over here
09:01 and run this, and let's see, now we're doing index scan on price,
09:09 what else do we got, rejected plans, okay so we got this and query
09:18 and it looks like we're still using the— yes, oh my goodness,
09:24 how about that for a mistake, comma, so what did that do
09:28 that created, in Python you can wrap these lines and that just created this,
09:33 and obviously, that's not what we want, that comma is super important there.
09:38 So let me go over here and drop this nonsense thing,
09:41 try this again, I can see it's building index right now,
09:47 okay, once again we can explain this, okay great,
09:51 so now we're using price and actually we use the description this time
09:58 and you can see the rejected plan is the one that would have used the price,
10:04 so we're using description, not price,
10:06 and how long does it take to run that query— 7.9 milliseconds, that's better
10:13 but what would be even better still is if we could do
10:16 the description and price as a single thing. How do we do that?
10:22 This gets to be a little trickier, if we look at the query we're running,
10:25 we're first asking for the price and then the description,
10:30 so we can actually create a composite index here as well,
10:35 and we do that by putting a little dictionary, saying fields
10:39 and putting a list of the names of the fields
10:44 and you can bet those go like this,
10:48 now this turns out to be really important, the order that you put them here
10:52 price and the description versus description price, for sorting,
10:56 not so much for matching, run it one more time,
11:00 alright, expensive cars with spark plugs,
11:04
11:07 here we go, look at that, less than one millisecond,
11:10 so we added one index, it took it from like 66 milliseconds down to 15,
11:16 and then, we added the description one, it turns out that was a better index
11:21 and it took it from 15 to 9, we added the composite index,
11:24 and we took it from 9 to half a millisecond, a 0.6 milliseconds, that is really cool.
11:31 Notice over here, this got faster, let's go back and look at what that is.
11:36 Load cars, so this is the one we are optimizing
11:40 and what are we doing here— let me wrap this so you can see,
11:43 we're doing a count, okay, we're doing a count
11:46 and so it's basically having the database do all the work
11:48 but there's zero serialization.
11:52 Now in this one, we're actually calling list
11:55 so we're deserializing, we're actually pulling all of those records back
11:59 and let's just go over here and see how many there are,
12:03
12:08 well that's not super interesting, to have just one, is it,
12:12 alright, that's good, but let's actually make this just this,
12:17
12:23 let's drop this spark plug thing and just see
12:26 how many cars there are with this,
12:30 okay there we go, now we have some data to work with,
12:33 65 thousand cars had 15 thousand dollar service or higher,
12:36 after all, this is a Ferrari dealership, right.
12:39 Now, it turns out it's a really bad idea to pull back that many cars,
12:43 let me stop this, let's limit that to just a thousand here as well.
12:52
12:54 Okay, so we're pulling back thousand cars because we're limited to this
13:00 and we're pulling back a thousand cars here.
13:03 But notice, this car name and id versus the entire car
13:08 so let's go over here cars with expensive service, car name and id,
13:13 so notice the time, so to pull back and serialize those thousand records
13:17 took actually a while, so it took one basically a second,
13:21 and if we don't ask for all the other pieces,
13:25 if we just say give me just the make, the model and the id,
13:29 here we're using the only keywords, it says don't pull back the other things
13:34 just give me the these three fields when you create them,
13:37 it makes it basically ten times faster,
13:40 let's turn this down to a 100 and see, maybe get a little more realistic set of data.
13:44 Okay, there we go, a 100 milliseconds down to 14 milliseconds,
13:49 so it turns out that the deserialization step in MongoEngine is a little bit expensive
13:55 so if you like blast a million cars into that list, it's going to take a little bit.
14:01 If we can express like I only want to pull back these items,
14:05 than it turns out to be quite a bit faster,
14:10 in this case not quite faster, but definitely faster.
14:15 Let's round this out here and finish this up.
14:17 Here we're asking for the highly rated, highly priced cars,
14:20 we're asking like hey for all the people that come and spend a lot of money
14:26 how did they feel about it?
14:29 And then also what cars had a low price and also a low rating,
14:33 so maybe we could have just somehow changed our service
14:37 for these sort of cheaper like oil change type people.
14:39 It turns out that that one is quite fast,
14:42 this one we could do some work and fixing one will really fix the other
14:46 so we have this customer rating thing, we probably want to have an index on,
14:52 and we already have one on the price,
14:54 so I think that that's why it's pretty quick actually.
14:57 Go over here, and we don't yet have one on the price, on the rating rather,
15:03 so we can do that and see if things get better,
15:07 not too much, it didn't really make too much of a difference,
15:12 it's probably better to use the price than it is the rating,
15:16 because we're kind of doing that together, so we're also going to go down here
15:19 and have the price and customer rating,
15:21 one of these composite indexes, once again,
15:24 and maybe if we change price one more time,
15:29 rating and price— it doesn't seem like we're getting much better,
15:36 so down here this is about as fast as we can get, 16 milliseconds
15:40 and this is less than one millisecond, so that's really good.
15:44 The final thing is, we are looking for high mileage cars,
15:47 so let's go down here and say find where the mileage of the car
15:51 is greater than 140 thousand miles, do we have an index on that,
15:55 you can bet the answer is no.
15:58 Now we could go to the shell and see that, but no we don't have one,
16:01 so let's go up here and add one more,
16:04 and this is in fact the only index we have here in this thing
16:07 that is on like just plain field, not one of these nested ones like this;
16:13 so maybe we also want to be able to select by year,
16:16 so we could have one for year as well. I'm going to add those in.
16:21 Now this high mileage car goes from a hundred and something milliseconds
16:26 down to six, maybe one more time just to make sure,
16:28 yep, 5, 6, seems pretty stable around there.
16:32 So we've gone and we've added these indexes
16:34 to our models, our MongoEngine documents by adding indexes
16:40 and we can have flat ones like this, or we have these here,
16:48 and we also can have composite ones or richer things,
16:52 if we create a little dictionary and we have fields and things like that.
16:57 Similarly an owner, we didn't have as many things we were after
17:00 but we did want to find them by their name and by car id,
17:03 so we had those two indexes,
17:05 honestly this is just a simpler document than the cars.
17:08 So with these things added here, we can run this one more time
17:11 and see how we're doing that code all runs really quick,
17:14 if we kind of scan through here, there's nothing that stands out like super bad,
17:18 5 milliseconds, half, 18, 6, half, 1, 3, 1, let's say,
17:26 this one, I really wish we could do better,
17:29 it just turns out there is like so many records there
17:32 that if we run that here you can see that the whole thing runs in one millisecond,
17:38 super, super fast, we can't make it any faster than that.
17:41 The slowness is basically the allocation,
17:45 assignment, verification of 100 car objects.
17:48 I'd like to see a little better serialization time out of MongoEngine,
17:53 if you have some part of your code that has to load tons of these things
17:56 and it's super performance critical, you could drop down to PyMongo,
18:00 talk to it directly and probably in the case where you're doing that
18:05 you don't need to pull back many, many objects,
18:07 but also you can see that if we limit what we ask for down here,
18:12 that goes back to 14 miliseconds which is really great,
18:15 here we're looking at a lot of events, this is like 16 thousand
18:21 or no, 65 thousand, that's quite a bit, this one is really fast,
18:25 this one is really fast, so I feel like from an index perspective
18:28 we've done quite a good job, how do we know we're done?
18:32 I guess this is the final question, this has been a bit of a long—
18:35 how do we know we're done with this performance bit?
18:39 We know we're done when all of these numbers come by
18:43 and they're all within reason of what we're willing to take.
18:47 Here I have set this up as these are the explicit queries
18:51 we're going to ask and then we'll just time them,
18:54 like your real application does not work that way.
18:56 How do you know what questions is your applications asking and how long it's taking.
19:01 So you want to set up profiling, so you can come over here
19:05 and definitely google how to do profiling in MongoDB,
19:08 so we can came over here and let's just say, db set profiling level
19:13 and you can use this function to say I'm looking for slow queries
19:18 and to me slow means 10 milliseconds, 20 milliseconds something like that,
19:23 it will generate a table called system.profile and you can just go look in there
19:29 and see what queries are slow, clear it out,
19:33 run your app, see what shows up in there
19:35 add a bunch of indexes, make them fast, clear that table,
19:38 then turn around and run your app again,
19:43 and just until stuff stops showing up in there,
19:46 you can basically find the slowest one, make it faster, clear out the profile
19:51 and just iterate on that process, and that will effectively like gather up
19:55 all of the meaningful queries that your app is going to do,
19:59 and then you can go through the same process here
20:01 to figure out what indexes you need to create.

@ -0,0 +1,33 @@
00:01 We've seen how powerful adding indexes to MongoDB is
00:04 and I talked a little bit how the nested nature of these documents means
00:09 there's naturally fewer primary keys,
00:11 so there's fewer on average actual indexes
00:15 that get created just as part of working with the database;
00:18 so creating these indexes is even more important in document databases
00:22 than it is in relational databases.
00:24 So here we are in the shell, this would be Robomongo
00:27 or just the Mongo command line interface
00:30 and we can create an index on a collection by saying db.collection name
00:33 so here we have cars.createIndex
00:35 and then we pass it two things, first one required, second one optional
00:39 we pass it the actual fields we want to create the index on;
00:44 so here we have service_history.customer_rating
00:48 so we could traverse this hierarchy if necessary
00:51 we just use that dot like we have been in the shell the whole time
00:55 and then we say one or minus one,
00:57 so do you want to sort ascending or descending.
00:59 And this mostly matters for either what you might consider the natural sort
01:03 or if you're doing a composite key or a composite index
01:08 and that composite index is being used for sorting on both fields
01:12 and all the orders have to line up exactly for the sort to use that index.
01:17 Then we can pass additional information,
01:19 here we have background as true and the name,
01:21 I like to name my indexes if I'm doing this shell
01:24 because then it's easier to see like okay why did I create this index
01:28 here we want the customer ratings of service,
01:31 so that's pretty nice, background true, that's not the default
01:35 but that means it will run basically in the background
01:38 without blocking the database operations,
01:41 if you don't put that, when you hit go
01:43 the database will stop doing any sort of database stuff
01:46 until this index is generated so be aware.

@ -0,0 +1,39 @@
00:01 Now if we're using MongoEngine,
00:03 we don't have to go to the shell and manually type all the indexes
00:05 we basically go to each individual top level document
00:08 so all the things that derive from mongoengine.document
00:11 not the embedded documents, and we go to the meta section
00:14 and we add an indexes, basically array
00:17 so here we want to have, you can see the blue stuff that's highlighted
00:20 we want an index on make, we want an index on service history
00:23 and within service history, remember these are service records showing on the bottom
00:27 we went an index the description and price.
00:30 So for index that we put 'make', that's straightforward
00:34 and then we have service_history.customer_rating
00:37 so service history is the field name
00:39 and then customer rating is the field name of service record
00:42 and for some reason I don't have it blue, it's that last one down there
00:45 but we also want this composite key
00:47 so service_history.price and service_history.description
00:50 we want to be able to find where both of those match
00:53 and we're going to do that up by having
00:56 a more complicated entry in the indexes bit here
00:58 this is going to be a dictionary where the fields are set
01:00 to be this array of strings and not just the flat string itself.
01:04 So once we add this, when we run our code,
01:07 it's actually going to first time we work with that document
01:10 ensure that all the indexes are there,
01:12 and remember that like hung up our application for just a little bit,
01:16 but the real benefit here is our app is always going to be in sync,
01:21 we don't have to go oh oops, I forgot to add the index,
01:24 that one particular index to say the staging server,
01:27 or when I push to production are there new indexes,
01:30 I got to go out on the database,
01:32 now you don't worry about that, you just push your code,
01:34 restart your web app or whatever kind of app it is,
01:36 and then as part of interacting with it,
01:38 it will make sure that those indexes are there.
01:41 If you don't want that pause to be there,
01:43 just go and create the indexes you know the thing is going to create
01:48 put them on the production server and then push the new version of code
01:50 and it will just go great, these indexes exist.
Loading…
Cancel
Save