diff --git a/transcripts/ch8-performance/1.txt b/transcripts/ch8-performance/1.txt
new file mode 100644
index 0000000..5ceb8d1
--- /dev/null
+++ b/transcripts/ch8-performance/1.txt
@@ -0,0 +1,53 @@
+00:01 Now that you know how to work MongoDB, you know how to work its shell,
+00:04 what the query syntax is, you've seen PyMongo as well as MongoEngine,
+00:07 it's time to turn our attention to tuning MongoDB
+00:11 to be the best database it can possibly be.
+00:14 We're going to focus on how to make our regular MongoDB server
+00:18 a high performance MongoDB database
+00:21 and you'll see there's no magic here, a lot of the things that you can do
+00:24 are relatively straightforward, and there's a systematic way to go about it.
+00:29 I want to start this section by maybe putting a little perspective on it.
+00:34 I want to start this section, this chapter, by putting a little perspective out there.
+00:41 When people come to NoSql and they start looking for alternative databases
+00:44 often the allure of these databases is their performance
+00:49 you hear about things like sharding, horizontally scaling them,
+00:52 some incredible performance numbers, things like that.
+00:55 That may be what you really need, that may be the most important thing
+00:58 and certainly if you don't have performance out of your database it's a big problem.
+01:03 We're going to certainly figure out how to make our databases faster
+01:07 and the variety of techniques that we have available to us in MongoDB.
+01:11 That said, your biggest problem probably isn't performance,
+01:14 you may have a big data problem, you may have terabytes or petabytes of data
+01:19 but most applications don't.
+01:22 You may have a performance problem, it may be that you have so much data
+01:26 or you are asking such complex queries that it really does take
+01:30 very precise tuning and scaling to make it work.
+01:33 So we're going to focus on some of these types of things.
+01:36 That said, we all have a complexity problem with our application,
+01:40 it's always a pain to maintain these databases
+01:43 especially when we're working with relational databases,
+01:46 you hear about things like migrations and updating your schema
+01:49 adding, removing, transforming columns, all of this stuff is really complex
+01:53 and it even makes deployment really, really challenging,
+01:56 you want to release a new version of something based on SQLAlchemy
+01:59 but you need to change the database scheme before it will even run—
+02:02 okay, that sounds like it could be a little bit of a problem.
+02:05 What you'll see with MongoDB and these document databases is
+02:10 one of their biggest benefits is the simplicity that they bring.
+02:14 The document structure means there's fewer tables,
+02:18 there is much fewer connections between these tables,
+02:21 so when you think about the trade-offs and performance and things like that
+02:24 keep in mind that probably the biggest benefit
+02:27 that you are going to get from MongoDB is you are going to have
+02:30 simpler versioning, evolution, maintainability, development story.
+02:33 I just want to put that out there, because I know sometimes
+02:36 people will say well, I got MongoDB to perform at this speed
+02:40 and I cut this other database, and if I tweak it like this and adapt it like that
+02:44 maybe I could get it to go a little faster, so maybe we should use that instead.
+02:47 And maybe, I don't know, it depends on the situation,
+02:50 and this is very abstract, so it's hard to say, but keep in mind
+02:53 that one of the biggest things these document databases
+02:55 bring to you to the table here, is this simplicity.
+02:59 It just so happens we can also make them really, really fast.
+03:03 So simple and fast, sounds like a great combination,
+03:05 so let's get into this section where we are going to make MongoDB much faster.
\ No newline at end of file
diff --git a/transcripts/ch8-performance/10.txt b/transcripts/ch8-performance/10.txt
new file mode 100644
index 0000000..854c294
--- /dev/null
+++ b/transcripts/ch8-performance/10.txt
@@ -0,0 +1,92 @@
+00:01 One of the most important things you can do for performance
+00:04 in your database and these document databases
+00:06 is think about your document design,
+00:08 should you embed stuff, should you not, what embeds where,
+00:11 do you embed just ids, do you embed the whole thing;
+00:14 all of these are really important questions
+00:16 and it takes a little bit of experience to know what the right thing to do is.
+00:20 It also really depends on your application's use case,
+00:24 so something that's really obviously a thing we should consider
+00:28 is this service history thing, this adds the most weight to these car objects,
+00:34 so we've got this embedded document list field
+00:38 so how often do we need these histories?
+00:44 How many histories might a car have?
+00:46 Should those maybe be in a separate collection
+00:49 where it has all the stuff that service record, the class has,
+00:52 plus car id, or something to that effect?
+00:56 So this is a really important question,
+00:59 and it really depends on how we're using this car object, this car document
+01:05 if almost all the time we want to work with the service history,
+01:07 it's probably good to go in and put it here,
+01:10 unless these can be really large or something to that effect,
+01:13 but if you don't need them often, you'll consider putting them in their own collection,
+01:16 there's just a tension between complexity and separation,
+01:20 safety and separation, speed of having them in separate
+01:24 so you don't pull them back all the time;
+01:26 you can also consider using the only keyword or only operator in MongoEngine
+01:30 to say if I don't need it, exclude the service history,
+01:34 it adds a little bit of complexity because you often know,
+01:38 hey is this the car that came with service history
+01:40 or is it a car where that was excluded, things like that,
+01:42 but you could use performance profiling and tuning
+01:45 to figure out where you might use only.
+01:48 Let's look at one more thing around document design.
+01:50 You want to consider the size of the document,
+01:52 remember MongoDB has a limit on how large these documents can be,
+01:56 that's 16 MB per record, that doesn't mean you should think
+02:01 oh it's only 10 MB so everything is fine for my document design,
+02:05 that might be terrible this is like a hard upper bound,
+02:07 like the database stops working after it hits 16 MB,
+02:11 so you really want to think about what is the right size,
+02:14 so let's look at a couple examples:
+02:16 we can go to any collection and say .stats
+02:18 and it will talk about the size of the documents and things like that,
+02:21 so here we ran db.cars.stats in MongoEngine,
+02:25 and we see that the average object size is about 700 bytes,
+02:29 there is information about how many there are, and all that kind of stuff,
+02:33 but really the most interesting thing for this discussion is
+02:35 what is the average object size, 700 bytes
+02:38 that seems like a pretty good size to me, it's not huge by any means,
+02:42 and this is the cars that contain those service histories,
+02:45 so this is probably fine for what we're doing.
+02:48 Let me give you a more realistic example.
+02:50 Let's think about the Talk Python Training website,
+02:52 and the courses and chapters, we talked about them before,
+02:56 so here if we run that same thing, db.courses.stats
+03:02 you can see that the average object size is 900 bytes for a course,
+03:07 and remember the course has the description that shows on the page
+03:10 and that's probably most the size, it has a few other things as well,
+03:13 like student testimonials and whatnot,
+03:16 but basically it's the description and a few hyperlinks.
+03:19 So I think this is again a totally good object, average object size.
+03:23 Now one of the considerations was I could have taken the chapters
+03:27 which themselves contain all the lectures,
+03:29 and embedded those within the course,
+03:32 would that have been a good idea—
+03:34 I think I might have even had it created that way
+03:36 in the very beginning, and it was a lot slower than I was hoping for,
+03:38 so I redesigned the documents.
+03:40 If we run this on this chapter section, you can see
+03:43 that the average object size is 2.3 KB,
+03:46 this is starting to get a little bit big, on its own it's fine,
+03:50 but think about the fact that a course on average has like 10 to 20 chapters,
+03:55 so if I embedded the chapters in the course
+03:58 instead of putting them to a separate document like I do,
+04:02 this is how it actually runs at the time of the recording,
+04:04 then it would be something like these courses would be
+04:07 24 up to maybe 50 KB of data per entry,
+04:12 think about that you go to like the courses page
+04:15 and it shows you a big list of all the courses
+04:17 and there might be 10 or later 20 courses,
+04:20 we're pulling back and deserializing like megabytes of data
+04:24 to render a really, really common page, that is probably not ok,
+04:28 so this is why I did not embed the chapters and lectures inside the course,
+04:34 I just said okay, this is the breaking point
+04:37 I looked at the objects' size I looked at where the performance was
+04:41 and I said you know what, really it's not that common
+04:44 that we actually want more than one chapter at a time,
+04:46 but it is common we want lectures, so it's probably the right partitioning,
+04:51 but you build it one way, you try it, it doesn't work,
+04:53 you just redesign your class structure, recreate the database and try it again,
+04:57 but you do want to think about the average object size
+05:00 and you can do it super easy with db.colection name.stats.
\ No newline at end of file
diff --git a/transcripts/ch8-performance/11.txt b/transcripts/ch8-performance/11.txt
new file mode 100644
index 0000000..39f5620
--- /dev/null
+++ b/transcripts/ch8-performance/11.txt
@@ -0,0 +1,35 @@
+00:01 One of the last simple tools you have in your tool belt
+00:04 when we're working with MongoEngine or even in PyMongo, just different api
+00:08 is this ability to restrict the data returned from the document.
+00:13 In our car object we've got the make, the model, the id, some other things,
+00:17 we've got the engine which is a subdocument or an embedded document there
+00:22 and then the biggest thing that contributes to the size
+00:25 is actually the service history which might be many service record entries.
+00:30 If really all we care about is the make, the model and the id of a car,
+00:34 and we're going to create like a list or something like that,
+00:36 we can use this .only operator here
+00:39 and dramatically reduce the amount of data returned from MongoDB
+00:43 so this is an operation that we saw when we first learned about the api
+00:46 actually operates at the database level,
+00:48 you're able to restrict the elements returned from the queries
+00:52 so when it gets back to MongoEngine
+00:54 basically it looks at what comes back and it says,
+00:57 alright, I need to create some cars
+00:59 and I need to set their make to this, the model to that
+01:01 and their id to whatever comes back,
+01:03 and then nothing else is transferred, deserialized, anything.
+01:05 So you can, if you don't need them, exclude the heavyweight things
+01:09 like the engine and the service histories for this particular use case.
+01:12 So this is kind of like select make, model, id from table such and such in SQL,
+01:20 and it really can improve the performance
+01:22 especially when you have either large documents or many documents.
+01:27 So you've seen a lot of different ways to turn the knobs of MongoDB
+01:31 to make it faster and to use MongoEngine to control those knobs.
+01:35 Now this applies to a single individual database server
+01:38 and if you use this to tune your database,
+01:41 you can actually make the need for having a sharded cluster
+01:45 and all these scaling things possibly go away,
+01:48 but even if you do end up with one of these more interesting topologies,
+01:52 all of these techniques still apply and they'll make your cluster go faster,
+01:56 they'll make your replicas go faster, all of those things.
+01:59 What you've learned here are really the foundational items of making MognoDB go fast.
\ No newline at end of file
diff --git a/transcripts/ch8-performance/2.txt b/transcripts/ch8-performance/2.txt
new file mode 100644
index 0000000..2fa2e57
--- /dev/null
+++ b/transcripts/ch8-performance/2.txt
@@ -0,0 +1,124 @@
+00:01 You've heard MongoDB is fast, really fast,
+00:03 and you've gone through setting up your documents and modeling things,
+00:07 you inserted, you imported your data, and you're ready to go;
+00:11 and you run a query and it comes back,
+00:13 so okay, I want to find all the service histories
+00:15 that have a certain price, greater than such and such, how many are there—
+00:18 apparently there's 989, but it took almost a second to answer that question.
+00:22 So this is a new version of the database, so we are going to talk about it shortly.
+00:27 Instead of having just a handful of cars and service histories
+00:30 that we maybe entered in our little play-around app,
+00:32 it has a quarter million cars with a million service histories, something to that effect.
+00:38 And the fact that we were able to answer this query
+00:41 of how many sort of nested documents had this property
+00:44 in less than a second, on one hand that's kind of impressive,
+00:48 but to be honest, it feels like MongoDB is just dragging,
+00:51 this is not very special, this is not great.
+00:55 So this is what you get out of the box, if you just follow what we've done so far
+00:59 this is how MongoDB is going to perform.
+01:02 However, in this chapter, we're going to make this better, a lot better.
+01:06 How much— well, let's see, we're going to make it fast,
+01:09 here's that same query after applying just some of the techniques of this chapter.
+01:13 Notice now it runs in one millisecond, not 706 milliseconds.
+01:17 So we've made our MongoDB just take off,
+01:21 it's running over 700 times faster than what the default MongoDB does.
+01:26 Well, how do we do it, how do we make this fast?
+01:30 Let's have a look at the various knobs
+01:33 that we can turn to control MongoDB performance.
+01:35 Some of which we're going to cover in this course,
+01:38 and some are well beyond the scope of what we're doing,
+01:40 but it's still great to know about them.
+01:42 The first knob are indexes, so it turns out
+01:44 that there are not too many indexes added to MongoDB by default,
+01:48 in fact, the only index that gets set up is on _id
+01:52 which is basically an index as well as a uniqueness constraint,
+01:55 but other than that, there are no indexes,
+01:57 and it might be a little non intuitive at first, when you first hear about this,
+02:02 but indexes and manually tuning and tweaking and understanding the indexes
+02:06 in document databases is far more important
+02:10 than understanding indexes in a third normal form designed relational database.
+02:15 So why would that be? That seems really odd.
+02:18 So think about a third normal form database,
+02:21 you've broken everything up into little tiny tables that link back to each other
+02:24 and they often have foreign key constraints traversing all of these relationships,
+02:28 well, those foreign key constraints go back to primary keys on the main tables,
+02:33 those are indexed, every time you have one of those relationships
+02:35 it usually at least on one end has an index on that thing.
+02:39 In document databases, because we take some of those external tables
+02:43 and we embed them in documents,
+02:45 those subdocuments while they kind of logically play the same role
+02:49 there is no concept of an index being added to those.
+02:52 So we have fewer tables, but we still have
+02:55 basically the same amount of relationships
+02:57 and because of the way documents work,
+02:59 we actually have fewer indexes than we do in say a relational database.
+03:04 So we're going to see that working with understanding
+03:07 and basically exploring indexes is super, super important
+03:09 and that's going to be the most important thing that we do.
+03:12 In fact, the MongoDB folks, one of their things they do is
+03:16 they sell like services, consulting and what not to help their customers
+03:19 and you could hire them, say hey I got this big cluster and it's slow
+03:24 can you help me make it faster— the single most dramatic thing that they do,
+03:30 the thing that almost always is the problem is incorrect use of indexes.
+03:34 So we're going to talk about how to use, discover and explore indexes for sure.
+03:38 Next is document design, all that discussion about to embed or not to embed,
+03:43 how should you relate documents, this is sort of the beginning of this conversation,
+03:47 it turns out the document design has dramatic implications across the board
+03:52 and we did talk quite a bit about this, but we'll touch on it again in this chapter.
+03:56 Query style, how are you writing your queries,
+04:01 is there a way that you could maybe restructure a query,
+04:05 or ask the question differently and end up with
+04:08 a more high performance query, maybe one example misses an index
+04:12 and the other particular example uses a better index or something to this effect.
+04:16 Projections and subsets are also something that we can control,
+04:20 remember when we talked about the Javascript api
+04:23 we saw that you could limit your set of returned responses
+04:26 and this can be super helpful for performance;
+04:29 you could write a query where it returns 5 MB of data
+04:32 but if you restrict that to just the few fields that you actually care about
+04:36 maybe its all K instead of 5 MB, it could be really dramatic,
+04:40 depending on how large and nested your documents might be.
+04:43 We're going to talk about how we can do this, especially from MongoEngine.
+04:46 These are the knobs that we're going to turn in this course,
+04:49 these are the things that will work even if you have a single individual database,
+04:53 so you should always think about these things,
+04:56 some of them happen on the database side, document design, indexes,
+04:59 and the other, maybe is in your application interacting with the database, the other two,
+05:04 but MongoDB being a NoSql database, allows for other types of interactions,
+05:08 other configurations and network topologies and so on.
+05:11 So, one of the things that it supports is something called replication,
+05:14 now replication is largely responsible for redundancy and failover.
+05:19 Instead of just having one server I could have three servers,
+05:22 and they could work in triplicate, basically one is what's called the primary,
+05:26 and you read and write from this database,
+05:28 and the other two are just there ready to spring into action,
+05:31 always getting themselves in sync with the primary,
+05:34 and if one goes down, the other will spring in to be the primary
+05:36 and they will sort of fix themselves as the what used to be the primary comes back.
+05:40 There is no performance benefit from that at all.
+05:43 However, there are ways to configure your connection to say
+05:46 allow me to read not just from the primary one, but also from the secondary,
+05:50 so you can configure a replication for a performance boost,
+05:53 but mostly this is a durability thing.
+05:55 The other type of network configuration you can do is what's called sharding.
+05:59 This is where you take your data instead of putting all into one individual server,
+06:02 you might spread this across 10 or 20 servers,
+06:06 one 20th, hopefully, of evenly balanced,
+06:09 across all of them, and then when you issue a query,
+06:12 can either figure out where if it's based on the shard key,
+06:15 which server to point that at and let that one
+06:17 handle the query across the smaller set of data,
+06:20 or if it's general like show me all the things with greater than this for the price,
+06:23 it might need to fan that out to all 20 servers,
+06:26 but it would run on parallel on 20 machines.
+06:30 So sharding is all about speeding up performance,
+06:32 especially write performance, but also queries as well,
+06:35 so you can get tons of scalability out of sharding,
+06:38 and you can even combine these like, when I said there is 20 shards,
+06:41 each one of those could actually be a replica set,
+06:43 so there is a lot of stuff you could do with network topology
+06:46 and clustering and sharding and scaling and so on.
+06:48 We're not turning those knobs in this course,
+06:50 I'll show you how to make individual pieces fast,
+06:52 the same idea applies to these replicas and shards,
+06:54 just on a much grander scale if you want to go look at them.
\ No newline at end of file
diff --git a/transcripts/ch8-performance/3.txt b/transcripts/ch8-performance/3.txt
new file mode 100644
index 0000000..8c22450
--- /dev/null
+++ b/transcripts/ch8-performance/3.txt
@@ -0,0 +1,30 @@
+00:01 Let's return to our dealership.
+00:03 This was the example we started back when we began the MongoEngine section,
+00:05 and it turns out the dealership is super popular now.
+00:08 Before we just had a couple of cars, now we have a quarter million cars
+00:11 in our database, we have a 100 thousand owners,
+00:15 I don't believe we talked about owners before in terms of what that looks like in our code,
+00:19 but I've added this concept of owners
+00:21 so we can ask interesting like cross-document related type questions,
+00:25 and we'll look at the details of them, when we get to the code, in just a moment.
+00:28 Each one of these owners, these 100 thousand owners,
+00:31 owns an average of 2.5 cars, this is kind of like collectors, right,
+00:36 not a standard person that drives to work or whatever, these are Ferraries,
+00:39 and each car has on average about 5 service records
+00:43 and that could be like a new engine,
+00:45 change the tires, change the spark plug, whatever;
+00:48 in particular, there's about 1.25 million service histories,
+00:51 so when we ask questions about like those nested documents
+00:54 that have to do with service histories like customer ratings and price,
+00:57 you can see that that is really quite impressive I think,
+01:01 we got the quarter million cars and within those quarter million documents
+01:04 interspersed are 1.25 million service histories.
+01:07 So our job is to make a lot of the typical things that we might ask this database,
+01:11 the queries will run to do so in a couple of milliseconds, not in seconds,
+01:17 so that's going to be what the basic goal of this whole section is.
+01:23 Now, the other things you might want to know is
+01:25 we've got about 180 megs of data
+01:28 and on average each document of the various document kinds,
+01:30 all average together is about 500 bytes per document.
+01:33 So let's return to or example slightly transformed
+01:37 and see how it's performing now and let's make it fast.
\ No newline at end of file
diff --git a/transcripts/ch8-performance/4.txt b/transcripts/ch8-performance/4.txt
new file mode 100644
index 0000000..cbede0b
--- /dev/null
+++ b/transcripts/ch8-performance/4.txt
@@ -0,0 +1,42 @@
+00:01 Here we are in the github repository for this course
+00:04 and notice we have this data section
+00:06 and in here I have this thing called dealership db 250 K
+00:09 that is this data that I just talked about,
+00:12 with the 250 thousand cars, 100 thousand owners, that sort of thing.
+00:16 So I'm going to put that over here on the desktop and unzip it
+00:21 and if we look in here, you'll see that there's a cars collection and an owners collection,
+00:29 and I don't believe we've spoken about how to get this data into MongoDB,
+00:33 so let's go over here and I'll use RoboMongo,
+00:37 notice we have these two dealership things that I have been playing with
+00:42 and I want to create one called like test dealership or something to that effect.
+00:46 We're going to restore this— how do we do that,
+00:51 we'll go like this, we'll say mongorestore
+00:55 and this is the way that we get this exported data imported into MongoDB,
+01:00 now, the first thing you have to ask yourself is this additive to the database,
+01:05 if it exists do you want to also insert this,
+01:07 or do you want to have this be the database and replace anything it exists,
+01:11 we want this one to replace existing data
+01:14 so I'll say --drop and then I need to tell it what database
+01:18 so I'll say db and I could say what you should say is this dealership,
+01:23 but just because I don't want to wipe away what I currently have,
+01:28 I'll say dealership example, but the code that you're going to run
+01:31 expects the name of the database to be just a dealership;
+01:34 and then I need to give it the folder that it's going to work from,
+01:37 so I am just going to give it this folder like so, all right.
+01:41 So mongorestore, drop to replace the data -- db to name it, and then the location,
+01:46 we hit go, and it's going to go cranking away on this
+01:50 and you can see it's inserting, inserting and done,
+01:53 that was really fast for like close to 1.5 million records.
+01:57 All right, so let's go over here and refresh
+02:00 and here's our example and we can see that we have our collection,
+02:03 here's our cars and we could just ask how many cars are there.
+02:07 Notice, there is that many, and if we change this to owners,
+02:11 remember you can also write it like this, owners like this,
+02:16 Now notice, I think the restore data we got here,
+02:19 you want to drop this index right here, I have it only have the id indexes, ok
+02:26 so that's this example I just restored,
+02:28 we're going to work with something you can imagine is exactly the same.
+02:33 So we're going to work with this dealership code
+02:36 but the way it got there, I'll show you the app I used to originally create it,
+02:39 and then I just restored it using mongorestore just as I showed you up here.
+02:43 So the way to generate the data that goes into mongorestore, you say mongo dump.
\ No newline at end of file
diff --git a/transcripts/ch8-performance/5.txt b/transcripts/ch8-performance/5.txt
new file mode 100644
index 0000000..fae1172
--- /dev/null
+++ b/transcripts/ch8-performance/5.txt
@@ -0,0 +1,104 @@
+00:01 Let's explore this slightly updated version of our code.
+00:04 Here we are in the github repository,
+00:07 and I am in the source folder and I've added an 08_perf section,
+00:10 and we have the starter_big_dealership and we have the big_dealership
+00:15 it even has instructions here to tell you basically how to restore
+00:17 that database we did just in the previous video.
+00:20 This one is going to be a snapshot of how this chapter starts,
+00:24 it's what we're starting from now and will remain that way;
+00:27 here we're going to take basically a copy of that one
+00:29 and evolve it into the fast high performance version,
+00:33 so let's go over here and see what we've got.
+00:35 Now, we have a few things that are slightly different,
+00:38 the car is basically unchanged from before
+00:41 although I added a little comment about how do we get to the owners.
+00:45 The one thing that is new here, in terms of the model is this owner idea,
+00:50 so cars can now have an owner
+00:52 and how do we know which cars are owned by this owner
+00:57 is we have a list of object ids, those object ids are the object ids of the cars
+01:02 so we're going to push the ids of the cars that are owned here
+01:06 I guess we could run it as a many to many or one to many relationship,
+01:10 just depending on how we treat the owner, but theoretically,
+01:13 we can have owners where there is a single car that is multiple owners
+01:17 and there are owners that own multiple cars, and we can manage it this way,
+01:21 you almost never see like a car to owner intermediate table,
+01:25 so you're almost always going to have something like
+01:27 those ids are either embedded in the owner or in the car,
+01:32 or under rare circumstances both.
+01:35 So here's how we refer back to the cars,
+01:38 then we have a few basic things like the name,
+01:41 when was this owner created, how many times have they visited and things like that.
+01:45 We want to call it owners in the database and it's just this core collection,
+01:49 so other than that, there's not a whole lot going on here,
+01:51 let's look over here, we now have these services,
+01:54 I've taken all the car queries and moved them down here
+01:57 do you want to create a car, you call this function,
+01:59 do you want to record a customer visit, here we can go to the owner
+02:03 and we can use this increment operator
+02:06 to increment the number of visits in place.
+02:09 Find cars by make, find owner by name and so on.
+02:16 Number of cars with bad service, a lot of this stuff is what we wrote previously;
+02:20 there was the program thing that we ran over here that was interactive
+02:23 and I've replaced that with a few things,
+02:25 one is this db stats and you can run this and it will tell you
+02:28 like how many cars are there, how many owners are there,
+02:31 what's the average number of histories,
+02:33 this is basically those stats that I presented to you before,
+02:36 this takes a while to run on this database, I don't recommend you run it
+02:39 but if you want to just run it and see what you get you can.
+02:42 The database was originally created using this script,
+02:46 I am using something interesting you may not have heard about,
+02:49 I am using this thing called Faker, so down here
+02:53 Faker lets you create this thing and I'm seeding it
+02:59 so it always generates exactly the same things,
+03:01 I'm seeding random and fake and you can see down here
+03:04 it's creating the owners and you can ask it for things like
+03:06 give me a fake name, give me a fake date
+03:08 between these two dates, things like that.
+03:12 Similarly with cars, we're using random to get a hold of a lot of the numbers
+03:15 then we can use fake for anything else we might.
+03:18 We ran this, with the right amount of data, it'll build it all up for us,
+03:25 so for some reason if you need to recreate it
+03:27 run this low data thing, you can have it create a small one,
+03:30 if you comment, uncomment that or a large one
+03:32 if you only run it with those settings.
+03:34 Those are all good, this is like the foundation and this is where we are.
+03:37 Next, we're going to ask interesting questions of this database
+03:43 and we want to know how long those questions take to answer,
+03:46 so I've written this super simple function called time
+03:48 you pass it a message and a function,
+03:50 it will time how long the function takes to run
+03:53 and then print out the message along with the time in terms of milliseconds.
+03:57 And then we're going to go through
+03:59 and we're going to ask interesting questions here
+04:01 like how many owners, how many cars, who is the 10 thousandth owner,
+04:05 notice the slicing here to give us a slice of item of length one
+04:10 and then we'll just access it,
+04:12 and then we can start asking interesting questions like
+04:14 how many cars are owned by the 10 thousandth owner,
+04:17 or if we go down here, how many owners own the 10 thousandth car,
+04:21 so ask it in the reverse direction.
+04:23 Here we want to find the 50 thousand owner by name,
+04:26 so yes, technically have them but the idea is
+04:30 we want to do a query based on the name field
+04:32 and we originally won't have any performance
+04:35 around these types of queries so it should be slow.
+04:38 This one, how many cars are there with expensive service
+04:40 this was the one with the snail
+04:43 and in one of the first videos in this chapter,
+04:46 I showed you look this takes 700 milliseconds to run to ask this question
+04:49 how many cars have a service history with a price greater than 16800.
+04:55 So we're going to be to be able to ask all of these questions
+04:58 and this program will let us explore that
+05:02 and we'll see how to add indexes
+05:04 and I'll show you how to add indexes in the shell
+05:06 and how to add them in MongoEngine, and MongoEngine is really nice
+05:09 because as you evolve your indexes, as you add new ones
+05:13 simply deploying your Python web app
+05:15 will adapt the database that it goes and finds
+05:18 to automatically upgrade to those indexes, so it's really really nice.
+05:22 So here you can see we're going to run this code and ask a bunch of questions
+05:25 we could load the data from here, we could generate the data,
+05:28 but you're much better off importing the data from that zip file
+05:32 because this takes like half an hour to run,
+05:35 you saw that zip takes like five seconds.
diff --git a/transcripts/ch8-performance/6.txt b/transcripts/ch8-performance/6.txt
new file mode 100644
index 0000000..3b530e8
--- /dev/null
+++ b/transcripts/ch8-performance/6.txt
@@ -0,0 +1,165 @@
+00:01 Let's go ahead and run this code, you've seen the minor changes
+00:04 like the addition of this concept of an owner,
+00:06 and how we generated all this data, and how you can restore it.
+00:09 Let's go ahead and run it, and see what's happening.
+00:13 Let's look at this from two perspectives, let's begin over actually in Robomongo,
+00:17 so we're going to ask the question, basically how many owners own a certain car
+00:21 the idea is more or less we're going to call this function which goes right here,
+00:25 really what we're looking for is this query,
+00:28 find me all of the owners where this car id is in their car ids collection,
+00:33 just generate and deserialize that.
+00:37 The other one that we're going to focus on is
+00:39 show me the cars with the expensive service history,
+00:42 how many cars or what cars had some kind of service
+00:46 that cost over 16800 dollars.
+00:49 Let's begin by looking at those in Robomongo.
+00:54 Here we have this concept, we could simplify this a little bit, but it doesn't matter,
+00:57 cars here's the service history, let's go to the price
+01:00 where that's greater than 16800, how many of them are there.
+01:05 If I run this, notice, it took a while to come back,
+01:08 run it again, here's the speed right there, 0.724 sec, 0.731, 0.733,
+01:14 so it's pretty reliably taking around 700 milliseconds to answer that question.
+01:19 We're going to come back to this.
+01:22 Here's a more interesting example, like go and randomly grab a car
+01:25 somewhere deep in the list, in this case I put 61600,
+01:30 grab that car and then find me all the owners,
+01:33 where that car id appears in their id list, and then we'll just dump that out,
+01:38 by saying var it doesn't appear if you just state the name it will show up down here,
+01:43 so make sure to deselect it and run this,
+01:45 and this is actually surprisingly fast, given all the stuff that's going on here,
+01:48 but it's taking still about 75, 80 milliseconds to run here,
+01:53 which, I don't know, maybe in your database
+01:55 going across a 100 thousand records 80 milliseconds seems decent,
+01:59 I can tell you in MongoDB 80 milliseconds is terrible
+02:02 you should really think about making something that's 80 milliseconds faster
+02:06 it's not always possible you can do it,
+02:08 but most of the queries as we'll see are possible.
+02:11 Let's take this one and just try to understand what's happening here
+02:16 and then we're going to go look at it in Python,
+02:19 but let's just explore it here in the shell for just a moment.
+02:21 Why is this taking 700 milliseconds?
+02:24 MongoDB has this way to basically ask how are you running this query,
+02:29 and the way you do that is you say explain, like so,
+02:35 so I can say this query instead of giving me a result tell me how you're running it,
+02:38 if I unselect it, it just runs the selected stuff if there's something there,
+02:42 so we can go and look at it in this mode,
+02:44 so it says okay, here's what the query planner found for you,
+02:47 we've parsed this query, and this is something
+02:50 it's basically what went into the find,
+02:52 it also might have something to the effect of like a sword
+02:55 and other things that are happening, but this is a simple query.
+02:58 Look down here, see this winning plan, stage column scan,
+03:02 that is bad, that is really, really bad.
+03:05 Also notice the rejected plan, so if there are multiple indexes
+03:08 and other things that could have done
+03:10 it might have attempted a bunch of them and said no, no, no this is the best,
+03:13 let's see it doesn't seem to tell us any more about what it did there,
+03:18 like sometimes it'll tell you how many records it scanned and things like this,
+03:21 but it's just basically reading entirely in the forward direction
+03:25 over this and just doing a comparison.
+03:27 So that's why this was taking 700 milliseconds
+03:32 as it was literally reading and comparing 100 thousand entries
+03:36 or actually more, remember their is 1.2 million search histories
+03:40 across those 250 thousand cars, so not 100 thousand,
+03:43 1.2 million records it scanned over, that's bad, you don't want that.
+03:47 So what we can do is we can actually add an index,
+03:51 now there's two ways to add an index,
+03:54 but before I add the index, let's go over here
+03:58 just explain is super, super valuable,
+04:00 any time something is slow we're going to explain
+04:03 there's actually way to turn on profiling and say log all of the queries
+04:07 that you see MongoDB that are slower than x,
+04:11 you providing them like say 10 milliseconds might be great,
+04:14 show me all the queries that take more than 10 milliseconds
+04:17 and then you can drop them in here, put an explain
+04:19 and then start creating indexes to make them faster.
+04:22 So just google mongodb profile enable slow queries
+04:26 or something like this, it's pretty straightforward.
+04:29 Now let's run this code, we're asking a lot of questions
+04:31 what we want to run is q and a, so we go over here and just right click and say run,
+04:37 notice some of these things are taking time,
+04:42 the database might be cold, it might have not loaded that stuff,
+04:46 so let me run it one more time just to be fair,
+04:49 there's a few things that are already really fast, and that's cool,
+04:55 so let's go here and review, how many owners are there—
+04:58 well, I can tell you it doesn't show the answer
+05:01 it just sort of says this is the question I'm asking here is how long it takes.
+05:04 Three milliseconds, that is solid, how many cars— half a millisecond.
+05:07 That's pretty solid, I don't think we can improve the count on the entire collection
+05:11 but this one, find the 10 thousandth owner— not good,
+05:14 so let's see how many cars are owned by that person—
+05:19 this is pretty fast actually, this is surprisingly fast,
+05:23 how many owners this can have— 66 milliseconds
+05:26 that's the one we were looking at in there.
+05:29 I'm going to take these numbers and put them over here,
+05:32 let's say, this will be Without indexes
+05:36 we're going to get this, we don't really care about the exit code, do we?
+05:41 With indexes, and we're going to kind of iterate on this a little bit
+05:45 so let's begin over here, and we're going to talk about
+05:49 how we can add an index in MongoDB and then for the most part
+05:55 do this in MongoEngine because it's really part of the way our application works,
+06:00 what the indexes are, and it's better to make that part of our document
+06:03 then kind of do a separate database setup step;
+06:07 we could create a script in Javascript and run it,
+06:09 it will do these things and that may be fine, but let's go over here and work on this.
+06:14 Again we had the count, here's the almost 800 milliseconds,
+06:19 let's go over here and just I'll take this, I'll make a copy,
+06:24
+06:28 so here is what we can do, instead of doing the find operation
+06:31 we can say create index,
+06:35 and then we have the thing that we're doing the query on,
+06:38 most the time this is one item but you can have composite indexes
+06:43 they are a little more nuance so we'll talk about them later,
+06:45 but let's just do this one, we want to be able to query by service history's price
+06:52 Here we can put one of two things, one or minus one,
+06:56 what do you want the default sort, descending or ascending?
+06:59 A lot of times it doesn't really matter,
+07:01 it can read from the back or it can read from the front, whatever,
+07:04 you saw the forward direction on our column scan for example.
+07:06 So over here we could say one, this creates an index, there's no count;
+07:09 the other thing we can do is we can give it a name
+07:13 so we can come over here and say name is search by service history price,
+07:24 so if we go look in this little indexes, we'll see the name here,
+07:27 we can also say run in the background,
+07:30 if I don't say that it's going to block the database until the index is generated,
+07:33 if you're doing this in production, and you have tons and tons of data
+07:36 maybe background is the way to go.
+07:38 Okay, anyway let's go ahead and run this and see what happens.
+07:41 Notice the pause, this is it's actually computing the index
+07:44 right now the database is effectively down, now it's back,
+07:47 what do we get ok, we created collection automatically know it already existed
+07:51 a number of indexes before was one, now we have two
+07:54 and everything was a ok so if I refresh,
+07:57
+07:59 here's that index and I can actually edit this over here in Robomongo,
+08:05 go for the advanced properties, here is the create index and background
+08:09 whether it's sparse, how long it lives,
+08:11 whether it's based on text search or whatever, but here's just the basic thing.
+08:15
+08:18 We've added this index, remember this took 800 milliseconds
+08:21 ask the same question now, boom, 8 milliseconds.
+08:24 Ask it one more time, 2, here we go, 2, 2, 2, 3, 2, 2,
+08:28 right, the screen sharing is probably put in a pretty heavy load on the server
+08:32 that's also the database server, right but still,
+08:35 we're getting it down 350, 400 times faster by adding that.
+08:39 Now if I go back and I ask that question explain
+08:42 now we get something way better, winning plan is index scan
+08:50 index name search by service history price, that is really awesome;
+08:57 that means we're using our index which is so much faster.
+09:02 There was no rejected plans, so it only found one index
+09:06 it tried to use it if found that it was awesome, it's very happy.
+09:09
+09:16 Go back to my account more time,
+09:21 boom 2 milliseconds, and that's a really good answer,
+09:24 let's go run our Python code and see what answers we get now,
+09:27 that was already faster, let's go over here
+09:32 and load car name and ids with expensive prices and spark plugs,
+09:38 20 milliseconds this is actually a pretty complicated query
+09:43 we'll get into cars with expensive service, 1.9 milliseconds.
+09:47 This is exactly what we saw in Robomongo,
+09:51 so over here in MongoEngine, we're getting essentially the same results— how cool is that?
+09:56 Very nice, we're going to go through and in Python from now on
+10:02 we're going to add the necessary index to start making these
+10:05 almost all of these run super fast, all of them run fast
+10:09 some of them we can get incredibly fast, like one millisecond,
+10:11 others not quite that fast, but we'll still do good on all of them.
\ No newline at end of file
diff --git a/transcripts/ch8-performance/7.txt b/transcripts/ch8-performance/7.txt
new file mode 100644
index 0000000..a5b7f8c
--- /dev/null
+++ b/transcripts/ch8-performance/7.txt
@@ -0,0 +1,298 @@
+00:01 Now that you've seen how to create indexes in the shell in Javascript effectively,
+00:04 let's go and see how to do this in MongoEngine.
+00:07 I think it's preferable to do this in MongoEngine because that means
+00:11 simply pushing your code into production will ensure
+00:14 that the database has all the right indexes set up for to operate correctly.
+00:19 You theoretically could end up with too many,
+00:21 if you have one in code and then you take it out
+00:23 but you can always manage that from the shell,
+00:26 this way at least the indexes that are required will be there.
+00:29 I dropped all the indexes again, let's go back through our questions here
+00:33 and see how we're doing.
+00:36 It says how many owners, how many cars,
+00:38 this is just based on the natural sort however it's in the database
+00:41 there's really nothing to do here,
+00:44 but this one, find the 10 thousandth car by owner, let's look at that;
+00:48 that is going to basically be this name, we'll use test,
+00:55 it doesn't really matter what we put here
+00:57 if we put explain, this should come back as column scan or something like that,
+01:01 yeah, no indexes, okay, so how long did it take to answer that question?
+01:06 Find the 10 thousandth owner by name,
+01:12 it didn't say by name, I'll go and add by name,
+01:16 well that took 300 milliseconds, well that seems bad
+01:21 and look we're actually using sorting,
+01:24 we're actually using paging skip and limit those types of things here,
+01:27 but in order for that to mean anything, we have to sort it,
+01:31 it's really the sort that we're running into.
+01:34 Maybe I should change this, like so,
+01:38 sort like so, we could just put one, I guess it's the way we're sorting it,
+01:47 so here you can see down there the sort pattern name is one
+01:49 and guess what, we're still doing column scan.
+01:53 Any time you want to do a filter by, a greater than, an equality,
+01:56 or you want to do a sort, you need an index.
+01:59 Let's go over to the owner here, this is the owner class
+02:04 and let's add the ability to sort it by name or
+02:08 equivalently also do a filter like find exactly by name,
+02:12 so we're going to come down here
+02:14 we're going to add another thing to this meta section,
+02:16 and we're going to add indexes,
+02:20 and indexes are a list of indexes,
+02:25 now this is going to be simple strings
+02:28 or they can be complex subdictionaries,
+02:31 for composite indexes or uniqueness constraints, things like that,
+02:34 but for name all we need is name.
+02:38 Let's run this, first of all, let's go over here
+02:41 and notice, if I go to owners and refresh, no name,
+02:46 let's run this code, find the 10 thousandth owner by name,
+02:52 19 milliseconds, that's pretty good,
+02:55 let me run it one more time,
+02:57 15 yeah okay, so that seems pretty stable,
+03:00 and let's go over here and do a refresh, hey look there's one by name;
+03:03 we can see it went from what was that,
+03:08 something like 300 milliseconds to 15 milliseconds, so that's good.
+03:11 How many cars are owned by the 10 thousandth owner,
+03:15 so that's 3 milliseconds, but let's go ahead and have a look at this question anyway.
+03:19 How many cars are owned by the 10 thousandth owner,
+03:22 so here's this function right here that we're calling
+03:25 it doesn't quite fit into a lambda expression, so we put it up here
+03:28 so we want to go and find the owner by id,
+03:30 that should be indexed right, that should be indexed right there
+03:34 because it's the id, the id always says an index,
+03:36 and now we are saying the id is in this set,
+03:40 so we're doing two queries, but both of them are hitting the id thing,
+03:44 so those should both be indexed and 3 milliseconds,
+03:47 well that really seems to indicate that that's the case.
+03:50 How many owners own the 10 thousandth car, that is right here.
+03:54 So we'll go find the car, ask how many owners own it.
+03:59 Now this one is interesting, so remember when we're doing this
+04:02 basically this in query, let's do a quick print of car id here,
+04:11 so if we go back over to this, we say let's go over to the owners
+04:17 save your documents, so this is going to be car ids,
+04:21 it's going to have an object id of that,
+04:26 all right, so run this, zero records, apparently this person owns nothing,
+04:33 but notice it's taking 77 milliseconds, we could do our explain again here
+04:37 and column scan, yet again, not the most amazing.
+04:43 So what we want is we want to have an index on car ids, right
+04:48 because column scan, not good,
+04:50 I think it's not really telling us in our store example
+04:53 but for the find it definitely should be.
+04:55 So we can come back to our owner over here,
+04:58 let's add also like an index on car_ids,
+05:02 If we'd run this once again, just the act of restarting it
+05:05 should regenerate the database, how long did it take over here—
+05:09 a little late now isn't it, because I did the explain,
+05:13 I can look at this one, how many cars,
+05:16 how many owners does the 10 thousandth car have,
+05:19 66 milliseconds, if we look at it now—
+05:22 how many owners own the 10 thousandth car, 1.9 milliseconds,
+05:29 so 33 times faster by adding that index, excellent,
+05:34 find the 50 thousandth owner by name, that's already done.
+05:38
+05:40 Alright we already have an index on owners name so that goes nice and quick,
+05:45 and how is this doing, one millisecond perfect,
+05:48 this one is super bad, the cars with expensive service 712 milliseconds,
+05:52 alright so here, we're looking at service history
+05:56 and then we're navigating that .relationship, that hierarchy,
+06:00 with the double underscore, going to the price,
+06:02 greater than, less than, equal it doesn't matter,
+06:05 we're basically working with this value here, this subdocument.
+06:08 Let's go over to the car and make that work,
+06:11 now the car doesn't yet have any indexes but it will in a second,
+06:14 so what we want to do is represent that here
+06:17 and in the the raw way of discussing this with MongoDB
+06:21 we use . (dot) not double underscore, so . represents the hierarchy here.
+06:25 Let's run that again, notice expensive service, 712,
+06:30 cars with expensive service, instead of 712 we have 2.4 milliseconds,
+06:39 now notice that first time I ran it there is was a pause,
+06:42 the second time it was like immediate,
+06:45 and that's because it basically was recreating that index
+06:47 and that pause time was how long that index took to create.
+06:51 So here we have cars with expensive service,
+06:53 now we're getting into something more interesting, look at this one with spark plugs,
+06:58 we're querying on two things, we're querying on the history and the service,
+07:04 let's actually put this over in the shell so we can look at it.
+07:07
+07:19 I've got to convert this over, do the dots there,
+07:23 this is going to be the dollar greater operator, colon, like so,
+07:30 all right, so we're comparing that service history.price
+07:35 and this one, again because you can't put dots in normal json,
+07:39 do the dot here and quotes, and this one is just spark plugs,
+07:46 alright, let's run this, okay 22 milliseconds,
+07:52 how long is it taking over here— 20 milliseconds,
+07:56 so that's actually pretty good and the reason I think it's pretty good is
+07:59 we already have an index on this half
+08:02 and so it has to just basically sort the result, let's find out.
+08:05
+08:11 Winning plan, index on this one, yes, exactly
+08:14 so this one is just going to be crank across there
+08:18 but we're going to use at least this index here, this by price
+08:22 so that gets part of the query there.
+08:25 Now maybe we want to be able to do a query just based on the description
+08:30 show me all the spark plugs, well that's a column scan,
+08:33 so let's go back and add over here one for the description.
+08:40 Now how do I know what goes in this part,
+08:44 see I have a service history here, if we actually look at the service record object
+08:49 it has a price and description, right
+08:52 so we know that that results in this hierarchy of
+08:54 service history.price, service history.description.
+08:57 If we'd run this again, it will regenerate those and let's go over here
+09:01 and run this, and let's see, now we're doing index scan on price,
+09:09 what else do we got, rejected plans, okay so we got this and query
+09:18 and it looks like we're still using the— yes, oh my goodness,
+09:24 how about that for a mistake, comma, so what did that do
+09:28 that created, in Python you can wrap these lines and that just created this,
+09:33 and obviously, that's not what we want, that comma is super important there.
+09:38 So let me go over here and drop this nonsense thing,
+09:41 try this again, I can see it's building index right now,
+09:47 okay, once again we can explain this, okay great,
+09:51 so now we're using price and actually we use the description this time
+09:58 and you can see the rejected plan is the one that would have used the price,
+10:04 so we're using description, not price,
+10:06 and how long does it take to run that query— 7.9 milliseconds, that's better
+10:13 but what would be even better still is if we could do
+10:16 the description and price as a single thing. How do we do that?
+10:22 This gets to be a little trickier, if we look at the query we're running,
+10:25 we're first asking for the price and then the description,
+10:30 so we can actually create a composite index here as well,
+10:35 and we do that by putting a little dictionary, saying fields
+10:39 and putting a list of the names of the fields
+10:44 and you can bet those go like this,
+10:48 now this turns out to be really important, the order that you put them here
+10:52 price and the description versus description price, for sorting,
+10:56 not so much for matching, run it one more time,
+11:00 alright, expensive cars with spark plugs,
+11:04
+11:07 here we go, look at that, less than one millisecond,
+11:10 so we added one index, it took it from like 66 milliseconds down to 15,
+11:16 and then, we added the description one, it turns out that was a better index
+11:21 and it took it from 15 to 9, we added the composite index,
+11:24 and we took it from 9 to half a millisecond, a 0.6 milliseconds, that is really cool.
+11:31 Notice over here, this got faster, let's go back and look at what that is.
+11:36 Load cars, so this is the one we are optimizing
+11:40 and what are we doing here— let me wrap this so you can see,
+11:43 we're doing a count, okay, we're doing a count
+11:46 and so it's basically having the database do all the work
+11:48 but there's zero serialization.
+11:52 Now in this one, we're actually calling list
+11:55 so we're deserializing, we're actually pulling all of those records back
+11:59 and let's just go over here and see how many there are,
+12:03
+12:08 well that's not super interesting, to have just one, is it,
+12:12 alright, that's good, but let's actually make this just this,
+12:17
+12:23 let's drop this spark plug thing and just see
+12:26 how many cars there are with this,
+12:30 okay there we go, now we have some data to work with,
+12:33 65 thousand cars had 15 thousand dollar service or higher,
+12:36 after all, this is a Ferrari dealership, right.
+12:39 Now, it turns out it's a really bad idea to pull back that many cars,
+12:43 let me stop this, let's limit that to just a thousand here as well.
+12:52
+12:54 Okay, so we're pulling back thousand cars because we're limited to this
+13:00 and we're pulling back a thousand cars here.
+13:03 But notice, this car name and id versus the entire car
+13:08 so let's go over here cars with expensive service, car name and id,
+13:13 so notice the time, so to pull back and serialize those thousand records
+13:17 took actually a while, so it took one basically a second,
+13:21 and if we don't ask for all the other pieces,
+13:25 if we just say give me just the make, the model and the id,
+13:29 here we're using the only keywords, it says don't pull back the other things
+13:34 just give me the these three fields when you create them,
+13:37 it makes it basically ten times faster,
+13:40 let's turn this down to a 100 and see, maybe get a little more realistic set of data.
+13:44 Okay, there we go, a 100 milliseconds down to 14 milliseconds,
+13:49 so it turns out that the deserialization step in MongoEngine is a little bit expensive
+13:55 so if you like blast a million cars into that list, it's going to take a little bit.
+14:01 If we can express like I only want to pull back these items,
+14:05 than it turns out to be quite a bit faster,
+14:10 in this case not quite faster, but definitely faster.
+14:15 Let's round this out here and finish this up.
+14:17 Here we're asking for the highly rated, highly priced cars,
+14:20 we're asking like hey for all the people that come and spend a lot of money
+14:26 how did they feel about it?
+14:29 And then also what cars had a low price and also a low rating,
+14:33 so maybe we could have just somehow changed our service
+14:37 for these sort of cheaper like oil change type people.
+14:39 It turns out that that one is quite fast,
+14:42 this one we could do some work and fixing one will really fix the other
+14:46 so we have this customer rating thing, we probably want to have an index on,
+14:52 and we already have one on the price,
+14:54 so I think that that's why it's pretty quick actually.
+14:57 Go over here, and we don't yet have one on the price, on the rating rather,
+15:03 so we can do that and see if things get better,
+15:07 not too much, it didn't really make too much of a difference,
+15:12 it's probably better to use the price than it is the rating,
+15:16 because we're kind of doing that together, so we're also going to go down here
+15:19 and have the price and customer rating,
+15:21 one of these composite indexes, once again,
+15:24 and maybe if we change price one more time,
+15:29 rating and price— it doesn't seem like we're getting much better,
+15:36 so down here this is about as fast as we can get, 16 milliseconds
+15:40 and this is less than one millisecond, so that's really good.
+15:44 The final thing is, we are looking for high mileage cars,
+15:47 so let's go down here and say find where the mileage of the car
+15:51 is greater than 140 thousand miles, do we have an index on that,
+15:55 you can bet the answer is no.
+15:58 Now we could go to the shell and see that, but no we don't have one,
+16:01 so let's go up here and add one more,
+16:04 and this is in fact the only index we have here in this thing
+16:07 that is on like just plain field, not one of these nested ones like this;
+16:13 so maybe we also want to be able to select by year,
+16:16 so we could have one for year as well. I'm going to add those in.
+16:21 Now this high mileage car goes from a hundred and something milliseconds
+16:26 down to six, maybe one more time just to make sure,
+16:28 yep, 5, 6, seems pretty stable around there.
+16:32 So we've gone and we've added these indexes
+16:34 to our models, our MongoEngine documents by adding indexes
+16:40 and we can have flat ones like this, or we have these here,
+16:48 and we also can have composite ones or richer things,
+16:52 if we create a little dictionary and we have fields and things like that.
+16:57 Similarly an owner, we didn't have as many things we were after
+17:00 but we did want to find them by their name and by car id,
+17:03 so we had those two indexes,
+17:05 honestly this is just a simpler document than the cars.
+17:08 So with these things added here, we can run this one more time
+17:11 and see how we're doing that code all runs really quick,
+17:14 if we kind of scan through here, there's nothing that stands out like super bad,
+17:18 5 milliseconds, half, 18, 6, half, 1, 3, 1, let's say,
+17:26 this one, I really wish we could do better,
+17:29 it just turns out there is like so many records there
+17:32 that if we run that here you can see that the whole thing runs in one millisecond,
+17:38 super, super fast, we can't make it any faster than that.
+17:41 The slowness is basically the allocation,
+17:45 assignment, verification of 100 car objects.
+17:48 I'd like to see a little better serialization time out of MongoEngine,
+17:53 if you have some part of your code that has to load tons of these things
+17:56 and it's super performance critical, you could drop down to PyMongo,
+18:00 talk to it directly and probably in the case where you're doing that
+18:05 you don't need to pull back many, many objects,
+18:07 but also you can see that if we limit what we ask for down here,
+18:12 that goes back to 14 miliseconds which is really great,
+18:15 here we're looking at a lot of events, this is like 16 thousand
+18:21 or no, 65 thousand, that's quite a bit, this one is really fast,
+18:25 this one is really fast, so I feel like from an index perspective
+18:28 we've done quite a good job, how do we know we're done?
+18:32 I guess this is the final question, this has been a bit of a long—
+18:35 how do we know we're done with this performance bit?
+18:39 We know we're done when all of these numbers come by
+18:43 and they're all within reason of what we're willing to take.
+18:47 Here I have set this up as these are the explicit queries
+18:51 we're going to ask and then we'll just time them,
+18:54 like your real application does not work that way.
+18:56 How do you know what questions is your applications asking and how long it's taking.
+19:01 So you want to set up profiling, so you can come over here
+19:05 and definitely google how to do profiling in MongoDB,
+19:08 so we can came over here and let's just say, db set profiling level
+19:13 and you can use this function to say I'm looking for slow queries
+19:18 and to me slow means 10 milliseconds, 20 milliseconds something like that,
+19:23 it will generate a table called system.profile and you can just go look in there
+19:29 and see what queries are slow, clear it out,
+19:33 run your app, see what shows up in there
+19:35 add a bunch of indexes, make them fast, clear that table,
+19:38 then turn around and run your app again,
+19:43 and just until stuff stops showing up in there,
+19:46 you can basically find the slowest one, make it faster, clear out the profile
+19:51 and just iterate on that process, and that will effectively like gather up
+19:55 all of the meaningful queries that your app is going to do,
+19:59 and then you can go through the same process here
+20:01 to figure out what indexes you need to create.
\ No newline at end of file
diff --git a/transcripts/ch8-performance/8.txt b/transcripts/ch8-performance/8.txt
new file mode 100644
index 0000000..00cc6a3
--- /dev/null
+++ b/transcripts/ch8-performance/8.txt
@@ -0,0 +1,33 @@
+00:01 We've seen how powerful adding indexes to MongoDB is
+00:04 and I talked a little bit how the nested nature of these documents means
+00:09 there's naturally fewer primary keys,
+00:11 so there's fewer on average actual indexes
+00:15 that get created just as part of working with the database;
+00:18 so creating these indexes is even more important in document databases
+00:22 than it is in relational databases.
+00:24 So here we are in the shell, this would be Robomongo
+00:27 or just the Mongo command line interface
+00:30 and we can create an index on a collection by saying db.collection name
+00:33 so here we have cars.createIndex
+00:35 and then we pass it two things, first one required, second one optional
+00:39 we pass it the actual fields we want to create the index on;
+00:44 so here we have service_history.customer_rating
+00:48 so we could traverse this hierarchy if necessary
+00:51 we just use that dot like we have been in the shell the whole time
+00:55 and then we say one or minus one,
+00:57 so do you want to sort ascending or descending.
+00:59 And this mostly matters for either what you might consider the natural sort
+01:03 or if you're doing a composite key or a composite index
+01:08 and that composite index is being used for sorting on both fields
+01:12 and all the orders have to line up exactly for the sort to use that index.
+01:17 Then we can pass additional information,
+01:19 here we have background as true and the name,
+01:21 I like to name my indexes if I'm doing this shell
+01:24 because then it's easier to see like okay why did I create this index
+01:28 here we want the customer ratings of service,
+01:31 so that's pretty nice, background true, that's not the default
+01:35 but that means it will run basically in the background
+01:38 without blocking the database operations,
+01:41 if you don't put that, when you hit go
+01:43 the database will stop doing any sort of database stuff
+01:46 until this index is generated so be aware.
diff --git a/transcripts/ch8-performance/9.txt b/transcripts/ch8-performance/9.txt
new file mode 100644
index 0000000..3c590b3
--- /dev/null
+++ b/transcripts/ch8-performance/9.txt
@@ -0,0 +1,39 @@
+00:01 Now if we're using MongoEngine,
+00:03 we don't have to go to the shell and manually type all the indexes
+00:05 we basically go to each individual top level document
+00:08 so all the things that derive from mongoengine.document
+00:11 not the embedded documents, and we go to the meta section
+00:14 and we add an indexes, basically array
+00:17 so here we want to have, you can see the blue stuff that's highlighted
+00:20 we want an index on make, we want an index on service history
+00:23 and within service history, remember these are service records showing on the bottom
+00:27 we went an index the description and price.
+00:30 So for index that we put 'make', that's straightforward
+00:34 and then we have service_history.customer_rating
+00:37 so service history is the field name
+00:39 and then customer rating is the field name of service record
+00:42 and for some reason I don't have it blue, it's that last one down there
+00:45 but we also want this composite key
+00:47 so service_history.price and service_history.description
+00:50 we want to be able to find where both of those match
+00:53 and we're going to do that up by having
+00:56 a more complicated entry in the indexes bit here
+00:58 this is going to be a dictionary where the fields are set
+01:00 to be this array of strings and not just the flat string itself.
+01:04 So once we add this, when we run our code,
+01:07 it's actually going to first time we work with that document
+01:10 ensure that all the indexes are there,
+01:12 and remember that like hung up our application for just a little bit,
+01:16 but the real benefit here is our app is always going to be in sync,
+01:21 we don't have to go oh oops, I forgot to add the index,
+01:24 that one particular index to say the staging server,
+01:27 or when I push to production are there new indexes,
+01:30 I got to go out on the database,
+01:32 now you don't worry about that, you just push your code,
+01:34 restart your web app or whatever kind of app it is,
+01:36 and then as part of interacting with it,
+01:38 it will make sure that those indexes are there.
+01:41 If you don't want that pause to be there,
+01:43 just go and create the indexes you know the thing is going to create
+01:48 put them on the production server and then push the new version of code
+01:50 and it will just go great, these indexes exist.
\ No newline at end of file