00:01 Now that you've seen how to create indexes in the shell in Javascript effectively, 00:04 let's go and see how to do this in MongoEngine. 00:07 I think it's preferable to do this in MongoEngine because that means 00:11 simply pushing your code into production will ensure 00:14 that the database has all the right indexes set up for to operate correctly. 00:19 You theoretically could end up with too many, 00:21 if you have one in code and then you take it out 00:23 but you can always manage that from the shell, 00:26 this way at least the indexes that are required will be there. 00:29 I dropped all the indexes again, let's go back through our questions here 00:33 and see how we're doing. 00:36 It says how many owners, how many cars, 00:38 this is just based on the natural sort however it's in the database 00:41 there's really nothing to do here, 00:44 but this one, find the 10 thousandth car by owner, let's look at that; 00:48 that is going to basically be this name, we'll use test, 00:55 it doesn't really matter what we put here 00:57 if we put explain, this should come back as column scan or something like that, 01:01 yeah, no indexes, okay, so how long did it take to answer that question? 01:06 Find the 10 thousandth owner by name, 01:12 it didn't say by name, I'll go and add by name, 01:16 well that took 300 milliseconds, well that seems bad 01:21 and look we're actually using sorting, 01:24 we're actually using paging skip and limit those types of things here, 01:27 but in order for that to mean anything, we have to sort it, 01:31 it's really the sort that we're running into. 01:34 Maybe I should change this, like so, 01:38 sort like so, we could just put one, I guess it's the way we're sorting it, 01:47 so here you can see down there the sort pattern name is one 01:49 and guess what, we're still doing column scan. 01:53 Any time you want to do a filter by, a greater than, an equality, 01:56 or you want to do a sort, you need an index. 01:59 Let's go over to the owner here, this is the owner class 02:04 and let's add the ability to sort it by name or 02:08 equivalently also do a filter like find exactly by name, 02:12 so we're going to come down here 02:14 we're going to add another thing to this meta section, 02:16 and we're going to add indexes, 02:20 and indexes are a list of indexes, 02:25 now this is going to be simple strings 02:28 or they can be complex subdictionaries, 02:31 for composite indexes or uniqueness constraints, things like that, 02:34 but for name all we need is name. 02:38 Let's run this, first of all, let's go over here 02:41 and notice, if I go to owners and refresh, no name, 02:46 let's run this code, find the 10 thousandth owner by name, 02:52 19 milliseconds, that's pretty good, 02:55 let me run it one more time, 02:57 15 yeah okay, so that seems pretty stable, 03:00 and let's go over here and do a refresh, hey look there's one by name; 03:03 we can see it went from what was that, 03:08 something like 300 milliseconds to 15 milliseconds, so that's good. 03:11 How many cars are owned by the 10 thousandth owner, 03:15 so that's 3 milliseconds, but let's go ahead and have a look at this question anyway. 03:19 How many cars are owned by the 10 thousandth owner, 03:22 so here's this function right here that we're calling 03:25 it doesn't quite fit into a lambda expression, so we put it up here 03:28 so we want to go and find the owner by id, 03:30 that should be indexed right, that should be indexed right there 03:34 because it's the id, the id always says an index, 03:36 and now we are saying the id is in this set, 03:40 so we're doing two queries, but both of them are hitting the id thing, 03:44 so those should both be indexed and 3 milliseconds, 03:47 well that really seems to indicate that that's the case. 03:50 How many owners own the 10 thousandth car, that is right here. 03:54 So we'll go find the car, ask how many owners own it. 03:59 Now this one is interesting, so remember when we're doing this 04:02 basically this in query, let's do a quick print of car id here, 04:11 so if we go back over to this, we say let's go over to the owners 04:17 save your documents, so this is going to be car ids, 04:21 it's going to have an object id of that, 04:26 all right, so run this, zero records, apparently this person owns nothing, 04:33 but notice it's taking 77 milliseconds, we could do our explain again here 04:37 and column scan, yet again, not the most amazing. 04:43 So what we want is we want to have an index on car ids, right 04:48 because column scan, not good, 04:50 I think it's not really telling us in our store example 04:53 but for the find it definitely should be. 04:55 So we can come back to our owner over here, 04:58 let's add also like an index on car_ids, 05:02 If we'd run this once again, just the act of restarting it 05:05 should regenerate the database, how long did it take over here— 05:09 a little late now isn't it, because I did the explain, 05:13 I can look at this one, how many cars, 05:16 how many owners does the 10 thousandth car have, 05:19 66 milliseconds, if we look at it now— 05:22 how many owners own the 10 thousandth car, 1.9 milliseconds, 05:29 so 33 times faster by adding that index, excellent, 05:34 find the 50 thousandth owner by name, that's already done. 05:38 05:40 Alright we already have an index on owners name so that goes nice and quick, 05:45 and how is this doing, one millisecond perfect, 05:48 this one is super bad, the cars with expensive service 712 milliseconds, 05:52 alright so here, we're looking at service history 05:56 and then we're navigating that .relationship, that hierarchy, 06:00 with the double underscore, going to the price, 06:02 greater than, less than, equal it doesn't matter, 06:05 we're basically working with this value here, this subdocument. 06:08 Let's go over to the car and make that work, 06:11 now the car doesn't yet have any indexes but it will in a second, 06:14 so what we want to do is represent that here 06:17 and in the the raw way of discussing this with MongoDB 06:21 we use . (dot) not double underscore, so . represents the hierarchy here. 06:25 Let's run that again, notice expensive service, 712, 06:30 cars with expensive service, instead of 712 we have 2.4 milliseconds, 06:39 now notice that first time I ran it there is was a pause, 06:42 the second time it was like immediate, 06:45 and that's because it basically was recreating that index 06:47 and that pause time was how long that index took to create. 06:51 So here we have cars with expensive service, 06:53 now we're getting into something more interesting, look at this one with spark plugs, 06:58 we're querying on two things, we're querying on the history and the service, 07:04 let's actually put this over in the shell so we can look at it. 07:07 07:19 I've got to convert this over, do the dots there, 07:23 this is going to be the dollar greater operator, colon, like so, 07:30 all right, so we're comparing that service history.price 07:35 and this one, again because you can't put dots in normal json, 07:39 do the dot here and quotes, and this one is just spark plugs, 07:46 alright, let's run this, okay 22 milliseconds, 07:52 how long is it taking over here— 20 milliseconds, 07:56 so that's actually pretty good and the reason I think it's pretty good is 07:59 we already have an index on this half 08:02 and so it has to just basically sort the result, let's find out. 08:05 08:11 Winning plan, index on this one, yes, exactly 08:14 so this one is just going to be crank across there 08:18 but we're going to use at least this index here, this by price 08:22 so that gets part of the query there. 08:25 Now maybe we want to be able to do a query just based on the description 08:30 show me all the spark plugs, well that's a column scan, 08:33 so let's go back and add over here one for the description. 08:40 Now how do I know what goes in this part, 08:44 see I have a service history here, if we actually look at the service record object 08:49 it has a price and description, right 08:52 so we know that that results in this hierarchy of 08:54 service history.price, service history.description. 08:57 If we'd run this again, it will regenerate those and let's go over here 09:01 and run this, and let's see, now we're doing index scan on price, 09:09 what else do we got, rejected plans, okay so we got this and query 09:18 and it looks like we're still using the— yes, oh my goodness, 09:24 how about that for a mistake, comma, so what did that do 09:28 that created, in Python you can wrap these lines and that just created this, 09:33 and obviously, that's not what we want, that comma is super important there. 09:38 So let me go over here and drop this nonsense thing, 09:41 try this again, I can see it's building index right now, 09:47 okay, once again we can explain this, okay great, 09:51 so now we're using price and actually we use the description this time 09:58 and you can see the rejected plan is the one that would have used the price, 10:04 so we're using description, not price, 10:06 and how long does it take to run that query— 7.9 milliseconds, that's better 10:13 but what would be even better still is if we could do 10:16 the description and price as a single thing. How do we do that? 10:22 This gets to be a little trickier, if we look at the query we're running, 10:25 we're first asking for the price and then the description, 10:30 so we can actually create a composite index here as well, 10:35 and we do that by putting a little dictionary, saying fields 10:39 and putting a list of the names of the fields 10:44 and you can bet those go like this, 10:48 now this turns out to be really important, the order that you put them here 10:52 price and the description versus description price, for sorting, 10:56 not so much for matching, run it one more time, 11:00 alright, expensive cars with spark plugs, 11:04 11:07 here we go, look at that, less than one millisecond, 11:10 so we added one index, it took it from like 66 milliseconds down to 15, 11:16 and then, we added the description one, it turns out that was a better index 11:21 and it took it from 15 to 9, we added the composite index, 11:24 and we took it from 9 to half a millisecond, a 0.6 milliseconds, that is really cool. 11:31 Notice over here, this got faster, let's go back and look at what that is. 11:36 Load cars, so this is the one we are optimizing 11:40 and what are we doing here— let me wrap this so you can see, 11:43 we're doing a count, okay, we're doing a count 11:46 and so it's basically having the database do all the work 11:48 but there's zero serialization. 11:52 Now in this one, we're actually calling list 11:55 so we're deserializing, we're actually pulling all of those records back 11:59 and let's just go over here and see how many there are, 12:03 12:08 well that's not super interesting, to have just one, is it, 12:12 alright, that's good, but let's actually make this just this, 12:17 12:23 let's drop this spark plug thing and just see 12:26 how many cars there are with this, 12:30 okay there we go, now we have some data to work with, 12:33 65 thousand cars had 15 thousand dollar service or higher, 12:36 after all, this is a Ferrari dealership, right. 12:39 Now, it turns out it's a really bad idea to pull back that many cars, 12:43 let me stop this, let's limit that to just a thousand here as well. 12:52 12:54 Okay, so we're pulling back thousand cars because we're limited to this 13:00 and we're pulling back a thousand cars here. 13:03 But notice, this car name and id versus the entire car 13:08 so let's go over here cars with expensive service, car name and id, 13:13 so notice the time, so to pull back and serialize those thousand records 13:17 took actually a while, so it took one basically a second, 13:21 and if we don't ask for all the other pieces, 13:25 if we just say give me just the make, the model and the id, 13:29 here we're using the only keywords, it says don't pull back the other things 13:34 just give me the these three fields when you create them, 13:37 it makes it basically ten times faster, 13:40 let's turn this down to a 100 and see, maybe get a little more realistic set of data. 13:44 Okay, there we go, a 100 milliseconds down to 14 milliseconds, 13:49 so it turns out that the deserialization step in MongoEngine is a little bit expensive 13:55 so if you like blast a million cars into that list, it's going to take a little bit. 14:01 If we can express like I only want to pull back these items, 14:05 than it turns out to be quite a bit faster, 14:10 in this case not quite faster, but definitely faster. 14:15 Let's round this out here and finish this up. 14:17 Here we're asking for the highly rated, highly priced cars, 14:20 we're asking like hey for all the people that come and spend a lot of money 14:26 how did they feel about it? 14:29 And then also what cars had a low price and also a low rating, 14:33 so maybe we could have just somehow changed our service 14:37 for these sort of cheaper like oil change type people. 14:39 It turns out that that one is quite fast, 14:42 this one we could do some work and fixing one will really fix the other 14:46 so we have this customer rating thing, we probably want to have an index on, 14:52 and we already have one on the price, 14:54 so I think that that's why it's pretty quick actually. 14:57 Go over here, and we don't yet have one on the price, on the rating rather, 15:03 so we can do that and see if things get better, 15:07 not too much, it didn't really make too much of a difference, 15:12 it's probably better to use the price than it is the rating, 15:16 because we're kind of doing that together, so we're also going to go down here 15:19 and have the price and customer rating, 15:21 one of these composite indexes, once again, 15:24 and maybe if we change price one more time, 15:29 rating and price— it doesn't seem like we're getting much better, 15:36 so down here this is about as fast as we can get, 16 milliseconds 15:40 and this is less than one millisecond, so that's really good. 15:44 The final thing is, we are looking for high mileage cars, 15:47 so let's go down here and say find where the mileage of the car 15:51 is greater than 140 thousand miles, do we have an index on that, 15:55 you can bet the answer is no. 15:58 Now we could go to the shell and see that, but no we don't have one, 16:01 so let's go up here and add one more, 16:04 and this is in fact the only index we have here in this thing 16:07 that is on like just plain field, not one of these nested ones like this; 16:13 so maybe we also want to be able to select by year, 16:16 so we could have one for year as well. I'm going to add those in. 16:21 Now this high mileage car goes from a hundred and something milliseconds 16:26 down to six, maybe one more time just to make sure, 16:28 yep, 5, 6, seems pretty stable around there. 16:32 So we've gone and we've added these indexes 16:34 to our models, our MongoEngine documents by adding indexes 16:40 and we can have flat ones like this, or we have these here, 16:48 and we also can have composite ones or richer things, 16:52 if we create a little dictionary and we have fields and things like that. 16:57 Similarly an owner, we didn't have as many things we were after 17:00 but we did want to find them by their name and by car id, 17:03 so we had those two indexes, 17:05 honestly this is just a simpler document than the cars. 17:08 So with these things added here, we can run this one more time 17:11 and see how we're doing that code all runs really quick, 17:14 if we kind of scan through here, there's nothing that stands out like super bad, 17:18 5 milliseconds, half, 18, 6, half, 1, 3, 1, let's say, 17:26 this one, I really wish we could do better, 17:29 it just turns out there is like so many records there 17:32 that if we run that here you can see that the whole thing runs in one millisecond, 17:38 super, super fast, we can't make it any faster than that. 17:41 The slowness is basically the allocation, 17:45 assignment, verification of 100 car objects. 17:48 I'd like to see a little better serialization time out of MongoEngine, 17:53 if you have some part of your code that has to load tons of these things 17:56 and it's super performance critical, you could drop down to PyMongo, 18:00 talk to it directly and probably in the case where you're doing that 18:05 you don't need to pull back many, many objects, 18:07 but also you can see that if we limit what we ask for down here, 18:12 that goes back to 14 miliseconds which is really great, 18:15 here we're looking at a lot of events, this is like 16 thousand 18:21 or no, 65 thousand, that's quite a bit, this one is really fast, 18:25 this one is really fast, so I feel like from an index perspective 18:28 we've done quite a good job, how do we know we're done? 18:32 I guess this is the final question, this has been a bit of a long— 18:35 how do we know we're done with this performance bit? 18:39 We know we're done when all of these numbers come by 18:43 and they're all within reason of what we're willing to take. 18:47 Here I have set this up as these are the explicit queries 18:51 we're going to ask and then we'll just time them, 18:54 like your real application does not work that way. 18:56 How do you know what questions is your applications asking and how long it's taking. 19:01 So you want to set up profiling, so you can come over here 19:05 and definitely google how to do profiling in MongoDB, 19:08 so we can came over here and let's just say, db set profiling level 19:13 and you can use this function to say I'm looking for slow queries 19:18 and to me slow means 10 milliseconds, 20 milliseconds something like that, 19:23 it will generate a table called system.profile and you can just go look in there 19:29 and see what queries are slow, clear it out, 19:33 run your app, see what shows up in there 19:35 add a bunch of indexes, make them fast, clear that table, 19:38 then turn around and run your app again, 19:43 and just until stuff stops showing up in there, 19:46 you can basically find the slowest one, make it faster, clear out the profile 19:51 and just iterate on that process, and that will effectively like gather up 19:55 all of the meaningful queries that your app is going to do, 19:59 and then you can go through the same process here 20:01 to figure out what indexes you need to create.