You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

165 lines
10 KiB
Plaintext

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

00:01 Let's go ahead and run this code, you've seen the minor changes
00:04 like the addition of this concept of an owner,
00:06 and how we generated all this data, and how you can restore it.
00:09 Let's go ahead and run it, and see what's happening.
00:13 Let's look at this from two perspectives, let's begin over actually in Robomongo,
00:17 so we're going to ask the question, basically how many owners own a certain car
00:21 the idea is more or less we're going to call this function which goes right here,
00:25 really what we're looking for is this query,
00:28 find me all of the owners where this car id is in their car ids collection,
00:33 just generate and deserialize that.
00:37 The other one that we're going to focus on is
00:39 show me the cars with the expensive service history,
00:42 how many cars or what cars had some kind of service
00:46 that cost over 16800 dollars.
00:49 Let's begin by looking at those in Robomongo.
00:54 Here we have this concept, we could simplify this a little bit, but it doesn't matter,
00:57 cars here's the service history, let's go to the price
01:00 where that's greater than 16800, how many of them are there.
01:05 If I run this, notice, it took a while to come back,
01:08 run it again, here's the speed right there, 0.724 sec, 0.731, 0.733,
01:14 so it's pretty reliably taking around 700 milliseconds to answer that question.
01:19 We're going to come back to this.
01:22 Here's a more interesting example, like go and randomly grab a car
01:25 somewhere deep in the list, in this case I put 61600,
01:30 grab that car and then find me all the owners,
01:33 where that car id appears in their id list, and then we'll just dump that out,
01:38 by saying var it doesn't appear if you just state the name it will show up down here,
01:43 so make sure to deselect it and run this,
01:45 and this is actually surprisingly fast, given all the stuff that's going on here,
01:48 but it's taking still about 75, 80 milliseconds to run here,
01:53 which, I don't know, maybe in your database
01:55 going across a 100 thousand records 80 milliseconds seems decent,
01:59 I can tell you in MongoDB 80 milliseconds is terrible
02:02 you should really think about making something that's 80 milliseconds faster
02:06 it's not always possible you can do it,
02:08 but most of the queries as we'll see are possible.
02:11 Let's take this one and just try to understand what's happening here
02:16 and then we're going to go look at it in Python,
02:19 but let's just explore it here in the shell for just a moment.
02:21 Why is this taking 700 milliseconds?
02:24 MongoDB has this way to basically ask how are you running this query,
02:29 and the way you do that is you say explain, like so,
02:35 so I can say this query instead of giving me a result tell me how you're running it,
02:38 if I unselect it, it just runs the selected stuff if there's something there,
02:42 so we can go and look at it in this mode,
02:44 so it says okay, here's what the query planner found for you,
02:47 we've parsed this query, and this is something
02:50 it's basically what went into the find,
02:52 it also might have something to the effect of like a sword
02:55 and other things that are happening, but this is a simple query.
02:58 Look down here, see this winning plan, stage column scan,
03:02 that is bad, that is really, really bad.
03:05 Also notice the rejected plan, so if there are multiple indexes
03:08 and other things that could have done
03:10 it might have attempted a bunch of them and said no, no, no this is the best,
03:13 let's see it doesn't seem to tell us any more about what it did there,
03:18 like sometimes it'll tell you how many records it scanned and things like this,
03:21 but it's just basically reading entirely in the forward direction
03:25 over this and just doing a comparison.
03:27 So that's why this was taking 700 milliseconds
03:32 as it was literally reading and comparing 100 thousand entries
03:36 or actually more, remember their is 1.2 million search histories
03:40 across those 250 thousand cars, so not 100 thousand,
03:43 1.2 million records it scanned over, that's bad, you don't want that.
03:47 So what we can do is we can actually add an index,
03:51 now there's two ways to add an index,
03:54 but before I add the index, let's go over here
03:58 just explain is super, super valuable,
04:00 any time something is slow we're going to explain
04:03 there's actually way to turn on profiling and say log all of the queries
04:07 that you see MongoDB that are slower than x,
04:11 you providing them like say 10 milliseconds might be great,
04:14 show me all the queries that take more than 10 milliseconds
04:17 and then you can drop them in here, put an explain
04:19 and then start creating indexes to make them faster.
04:22 So just google mongodb profile enable slow queries
04:26 or something like this, it's pretty straightforward.
04:29 Now let's run this code, we're asking a lot of questions
04:31 what we want to run is q and a, so we go over here and just right click and say run,
04:37 notice some of these things are taking time,
04:42 the database might be cold, it might have not loaded that stuff,
04:46 so let me run it one more time just to be fair,
04:49 there's a few things that are already really fast, and that's cool,
04:55 so let's go here and review, how many owners are there—
04:58 well, I can tell you it doesn't show the answer
05:01 it just sort of says this is the question I'm asking here is how long it takes.
05:04 Three milliseconds, that is solid, how many cars— half a millisecond.
05:07 That's pretty solid, I don't think we can improve the count on the entire collection
05:11 but this one, find the 10 thousandth owner— not good,
05:14 so let's see how many cars are owned by that person—
05:19 this is pretty fast actually, this is surprisingly fast,
05:23 how many owners this can have— 66 milliseconds
05:26 that's the one we were looking at in there.
05:29 I'm going to take these numbers and put them over here,
05:32 let's say, this will be Without indexes
05:36 we're going to get this, we don't really care about the exit code, do we?
05:41 With indexes, and we're going to kind of iterate on this a little bit
05:45 so let's begin over here, and we're going to talk about
05:49 how we can add an index in MongoDB and then for the most part
05:55 do this in MongoEngine because it's really part of the way our application works,
06:00 what the indexes are, and it's better to make that part of our document
06:03 then kind of do a separate database setup step;
06:07 we could create a script in Javascript and run it,
06:09 it will do these things and that may be fine, but let's go over here and work on this.
06:14 Again we had the count, here's the almost 800 milliseconds,
06:19 let's go over here and just I'll take this, I'll make a copy,
06:24
06:28 so here is what we can do, instead of doing the find operation
06:31 we can say create index,
06:35 and then we have the thing that we're doing the query on,
06:38 most the time this is one item but you can have composite indexes
06:43 they are a little more nuance so we'll talk about them later,
06:45 but let's just do this one, we want to be able to query by service history's price
06:52 Here we can put one of two things, one or minus one,
06:56 what do you want the default sort, descending or ascending?
06:59 A lot of times it doesn't really matter,
07:01 it can read from the back or it can read from the front, whatever,
07:04 you saw the forward direction on our column scan for example.
07:06 So over here we could say one, this creates an index, there's no count;
07:09 the other thing we can do is we can give it a name
07:13 so we can come over here and say name is search by service history price,
07:24 so if we go look in this little indexes, we'll see the name here,
07:27 we can also say run in the background,
07:30 if I don't say that it's going to block the database until the index is generated,
07:33 if you're doing this in production, and you have tons and tons of data
07:36 maybe background is the way to go.
07:38 Okay, anyway let's go ahead and run this and see what happens.
07:41 Notice the pause, this is it's actually computing the index
07:44 right now the database is effectively down, now it's back,
07:47 what do we get ok, we created collection automatically know it already existed
07:51 a number of indexes before was one, now we have two
07:54 and everything was a ok so if I refresh,
07:57
07:59 here's that index and I can actually edit this over here in Robomongo,
08:05 go for the advanced properties, here is the create index and background
08:09 whether it's sparse, how long it lives,
08:11 whether it's based on text search or whatever, but here's just the basic thing.
08:15
08:18 We've added this index, remember this took 800 milliseconds
08:21 ask the same question now, boom, 8 milliseconds.
08:24 Ask it one more time, 2, here we go, 2, 2, 2, 3, 2, 2,
08:28 right, the screen sharing is probably put in a pretty heavy load on the server
08:32 that's also the database server, right but still,
08:35 we're getting it down 350, 400 times faster by adding that.
08:39 Now if I go back and I ask that question explain
08:42 now we get something way better, winning plan is index scan
08:50 index name search by service history price, that is really awesome;
08:57 that means we're using our index which is so much faster.
09:02 There was no rejected plans, so it only found one index
09:06 it tried to use it if found that it was awesome, it's very happy.
09:09
09:16 Go back to my account more time,
09:21 boom 2 milliseconds, and that's a really good answer,
09:24 let's go run our Python code and see what answers we get now,
09:27 that was already faster, let's go over here
09:32 and load car name and ids with expensive prices and spark plugs,
09:38 20 milliseconds this is actually a pretty complicated query
09:43 we'll get into cars with expensive service, 1.9 milliseconds.
09:47 This is exactly what we saw in Robomongo,
09:51 so over here in MongoEngine, we're getting essentially the same results— how cool is that?
09:56 Very nice, we're going to go through and in Python from now on
10:02 we're going to add the necessary index to start making these
10:05 almost all of these run super fast, all of them run fast
10:09 some of them we can get incredibly fast, like one millisecond,
10:11 others not quite that fast, but we'll still do good on all of them.