Transcripts for chapters 4, 5, and 6.

8 years ago · f8d917fcd1
parent 6c0f760eb3
commit f8d917fcd1
29 changed files with 1901 additions and 0 deletions
--- a/transcripts/ch4-mongo-shell/1.txt
+++ b/transcripts/ch4-mongo-shell/1.txt
@ -0,0 +1,71 @@
+00:01 So we've talked a lot about NoSQL document databases and MongoDB.
+00:05 Now it's time to actually start using MongoDB.
+00:08 So what we're going to learn in this chapter is twofold:
+00:11 one, how do you connect to it and manage it,
+00:13 with the management tools if you will,
+00:16 that is more or less the shell, and some additional tools,
+00:19 but also how do you query it from that shell.
+00:22 So maybe in Python in a traditional relational database
+00:25 you might be using say SQLAlchemy to talk to a relational databases,
+00:29 so you wouldn't necessarily use SQL, the language, in Python
+00:32 but if you want to connect to the database directly and work with it
+00:35 then you need to use ddl and SQL and things like that,
+00:38 there is the same parallel here in that we're going to use the shell
+00:41 and we need to use MongoDB's native query syntax
+00:44 which turns out to be very similar to Python's under certain circumstances,
+00:47 so it's going to be serving dual purpose there.
+00:50 So the primary MongoDB shell is a command line tool, right,
+00:55 we just type mongo name of the server, some connection string options,
+00:58 you can see all that the title here in this terminal.
+01:02 And then we just issue commands like if I want to go and use
+01:05 the training database out of the server, I'd say use training;
+01:09 and if I want to say go the courses and find
+01:12 the one with id 5 and display it not in a minimized, minified,
+01:16 but in a readable version, I would say db.courses.find
+01:20 and I'd give it the little json thing, id is 5 and I'd say pretty,
+01:25 So this is going to be entirely done in Javascript,
+01:28 so these statements that you type here,
+01:31 although you don't see any semicolons,
+01:33 these are either shell statements like use training
+01:36 otherwise, they're entirely pure Javascript.
+01:39 So what we're going to do is we're going to learn the Javascript api
+01:43 to talk to MongoDB, to query MongoDB,
+01:45 to do all the crud operations, there's a find, there's a delete,
+01:49 there's an insert, there's an update, of course there's sorts, there's upserts,
+01:52 there's all the things you would do in a standard database,
+01:55 the query syntax uses sort of a json model to help represent
+01:59 either operators or hierarchies and things like that.
+02:03 Now, you may be thinking, Michael, I came to a Python course,
+02:06 I don't want to learn the Javascript api, I want to learn the Python api—
+02:09 you will, you will learn the Python api for sure,
+02:12 and luckily, it's really, really similar, it's not identical,
+02:15 they made the Pythonic api Pythonic
+02:18 and the Javascript one follow the idioms of Javascript,
+02:20 but nonetheless, other than the slight like variations
+02:23 in naming around those ideas, they're basically identical,
+02:26 in Python we would use {_id : 5 } as a dictionary,
+02:31 here we use it as a json object;
+02:34 so on one hand, learning the Javascript api
+02:36 it is more less learning the Python api.
+02:38 But on the other, if you work with MongoDB,
+02:41 if this drives your application and you actually work with Mongo, in a real way,
+02:45 you will have to go into the shell, you will have to talk to the database directly,
+02:49 you have to maintain it, and manage it, and back it up, and do all those things;
+02:52 in order to do that, you need to know the Javascript capabilities,
+02:56 the way to do this in Javascript, as much as you do the Python way.
+03:00 Ultimately, the end game is to use something like MongoEngine
+03:03 which is equivalent to SQLAlchemy, sort of analogous to SQLAlchemy,
+03:08 in that we won't even be speaking in this syntax,
+03:11 but still, you'll need to know how these translate down into these queries
+03:15 because you might want to say add an index
+03:17 to make your MongoEngine perform much, much faster, things like this.
+03:21 So we're going to focus on Javascript now, and then for the rest of the class,
+03:26 we're going to basically be doing Python, but like I said,
+03:29 in order to actually use, manage, run,
+03:32 work with an application that lives on MongoDB,
+03:34 you have to be able to use the shell, and to use the shell you do Javascript.
+03:38 So just like anybody who writes web apps, we're all Javascript developers,
+03:41 if we write any form of web app, similarly here,
+03:44 if you work with MongoDB, we're all Javascript developers
+03:47 and we got to do just a tiny bit, but you'll find it like I said,
+03:49 it's super, super similar to what we're going to do in Python.
--- a/transcripts/ch4-mongo-shell/10.txt
+++ b/transcripts/ch4-mongo-shell/10.txt
@ -0,0 +1,39 @@
+00:01 So here's an interesting question— what if I want to find all the books
+00:03 where user 720 has rated that book exactly a nine.
+00:09 You would think that this would do it, right,
+00:11 we're using both values in this prototypical object or this document here
+00:15 and it says that the book is going to have to have
+00:18 a rating of nine and user id 720 has rated it.
+00:21 However, when we run this, you'll see we get mixed results.
+00:24 The bottom one looks perfect, we got a book with the user id 720
+00:29 an a value of nine in the ratings, great;
+00:32 but this other one, what's up with this, the red one?
+00:34 Well, user 601 rated this as a nine,
+00:38 and user 720 actually hated the book, they gave it a one.
+00:41 However, taken as a whole, does the book have a rating by user id 720— yes,
+00:46 does it have a rating of nine— yes, so it matches this and clause.
+00:49 So, oftentimes if you're looking for this exact subdocument match
+00:54 and that thing you're looking in is an array
+00:56 so ratings is an array of documents, if ratings was one subdocument,
+01:00 this would work fine, but if it's an array and you want to say
+01:04 I need to make sure that the thing in that array is
+01:07 that subdocument itself matches value and user id as I've specified here
+01:11 you need a different query operator, and that is dollar element match;
+01:15 so you can run this and it'll look down inside and say
+01:18 I want to find all the things in ratings,
+01:21 where both, the user id is 720 and the value is nine.
+01:25 So this is a slightly more complex version
+01:27 that you have to run and you have to use
+01:29 because you run into that problem we had before
+01:31 where somebody voted a 9, user 720 voted,
+01:33 but it was not user 720 who voted nine.
+01:35 So a little bit different than if you were working in
+01:38 say a sequel traditional tabular language
+01:41 because you don't ever have this kind of duplication within the one result,
+01:45 so it would be a lot simpler, but this is something
+01:48 that you kind of got to get your head around a little bit,
+01:50 you luckily don't use it very often, and if you are using the higher level of things
+01:54 like MongoEngine, you won't run into it,
+01:56 but down here at the shell or in PyMongo,
+01:58 you have to be really careful if this is actually
+02:00 the question you're trying to ask and answer.
--- a/transcripts/ch4-mongo-shell/11.txt
+++ b/transcripts/ch4-mongo-shell/11.txt
@ -0,0 +1,56 @@
+00:01 So, we're pretty good at finding and filtering down our result sets.
+00:04 The other super important things that databases do
+00:07 is to sort them, put them in order, so I would like the best selling book
+00:12 and then the second best, and then the third best in this category,
+00:15 that's a perfect sort by category, order by best selling this, right.
+00:20 So how do we do that in Mongo?
+00:22 Let's go over here and it turns out that there's a sort that we can run,
+00:25 and the sort takes something, right, kind of like our projection does here,
+00:30 so let me just show you before if I run this that this is not in order,
+00:33 so here we have c, c, d, f,  and then t, p, w,
+00:40 and eventually we're back just, you know, something before w,
+00:43 it is not sorted by title, not sorted by published date either,
+00:47 these three seem to be descending but the next one is not, ok.
+00:50 So it's not sorted at all, it's just however it comes back,
+00:53 probably by object id or something like this.
+00:56 Anyway, let's go and sort it, so let's suppose I would like sorted by title;
+01:00 so very much like our filter thing or maybe even closer, actually,
+01:05 like our projection here is I can come say I would like to sort
+01:09 and then this part that goes here, this one is ascending, right,
+01:13 so something that is positive means ascending, if it were negative,
+01:16 it would mean go in reverse order.
+01:18 So let's run this, now you can see, actually this is the beginning of the title,
+01:22 this exclamation mark and then some other exclamation marks,
+01:25 and then let's get past the symbols, a lot of symbols, anyway,
+01:30 you can see this is sorted by this, sorted by the title, not sorted by date,
+01:36 1994, 1993, 1996, we can also sort by date, let's comment this out,
+01:41 say .sort, published, let's sort in reverse order,
+01:47 newest which was 2050, I think we might have been fooling around with that
+01:52 or no actually I don't know where those came from.
+01:55 Anyway, 2050, 2038, 2037, 2030 and so on.
+01:58 Obviously, sorted in reverse order.
+02:01 What if I want to sort by the title and then any time the title matches
+02:05 I want to see the newest one of those.
+02:08 We can do that as well, so very very similarly we can say sort
+02:14 and then we just give it one of these objects with multiple values,
+02:16 so you want to sort by title, there's your sort by title ascending
+02:22 and then after that, if any of the titles match,
+02:26 let's show the newest one first, so sort by title ascending
+02:30 and then published descending, let's try that.
+02:33 Great, ok so here notice that these titles are the same,
+02:36 you might have noticed that before, but here's 1994 and here's 1993,
+02:40 so any time the title matches, we get the newest one first,
+02:44 I don't know if any others are in here with title matches.
+02:46 This first one must prove it right, this is how it works,
+02:50 sort by that and then by and you can have as many then buys as you like
+02:54 and they can either be ascending or descending,
+02:57 so here we're sorting by title first and then by published.
+02:59 The other thing that's important to notice is
+03:02 everything in MongoDB is case sensitive, when you're working with strings
+03:05 so that's probably going to play into this somewhere along the way.
+03:09 All right, so sorting pretty straightforward, just use these field names
+03:13 and then the direction you want to sort.
+03:15 The other thing that's worth paying attention to is
+03:18 you are going to want to make sure that you have an index
+03:20 so this sorting is actually fast, and we'll talk about that
+03:22 when we get to the performance section.
--- a/transcripts/ch4-mongo-shell/12.txt
+++ b/transcripts/ch4-mongo-shell/12.txt
@ -0,0 +1,21 @@
+00:01 Let's review sorting as a concept.
+00:03 So there's a sort function right on the result set
+00:05 on the cursor that comes back from find,
+00:07 and the way it works is we pass it some prototypical json document;
+00:11 but now instead of equality meaning matching,
+00:14 it means tell me the thing and the direction that you want to sort.
+00:17 So here we want to say sort all the books descending
+00:21 show me the most recently published to the oldest, right,
+00:25 show me the most recent books basically.
+00:27 Now this works pretty well, we could put anything that has a direction
+00:29 like a minus one, or one, I think you could even put higher multiples
+00:33 like ten and 20, 50, -10, but use one and minus one, keep your sanity.
+00:37 So this works well for one field, if we want to sort just by published,
+00:40 but if I want to sort by one thing, and then another,
+00:43 well we just put more into this document that we passed to sort,
+00:46 so we're going to say sort by title ascending
+00:49 and then sort by published descending,
+00:51 we run this, we saw that we get the results in our demo,
+00:54 first we sorted ascending by the title and any time they matched
+00:59 we sorted descending by the publish date.
+01:02 So first the 1994, A Nutshell Handbook, and then the 1993 one.
--- a/transcripts/ch4-mongo-shell/13.txt
+++ b/transcripts/ch4-mongo-shell/13.txt
@ -0,0 +1,27 @@
+00:01 Inserts are one of the simpler operations in MongoDB actually.
+00:04 So we just go db.collection name, in this case db.book.insert
+00:08 and we give it the thing to insert.
+00:10 Now, if we don't specify an id, an _id
+00:13 which generally we want to let the database generate
+00:16 but it's not always true, like we could have people and their primary key,
+00:20 their id could be there social security number, and that you would provide,
+00:23 so in this case, we're not going to provide an id,
+00:26 we're going to type in title and isbn, and those kinds of things.
+00:29 And then if we just do a find, that would come back and get the first one
+00:32 maybe say this is our first insert, we'd get something back like this,
+00:35 let's say we specified the isbn, the title,
+00:37 the author, the published and the publisher,
+00:40 this is a relationship over to the publisher table,
+00:42 which we haven't played with yet.
+00:44 So those were all set by us, you can see "Winning With MongoDB"
+00:47 and down here we have "Winning With MongoDB:,
+00:50 but the _id, because we didn't specify it was auto generated to an object id.
+00:54 So unless you have a good reason to pick another type of id,
+00:57 this is probably the best one for Mongo, but it could have been a string,
+01:00 like I said, it could have been a social security number
+01:03 or it could be just numerical if you want to have a 1234,
+01:06 all of those kind of put the burden on you to manage the uniqueness of that id
+01:10 and there is a unique disconstraint on _id for every table or collection.
+01:15 So that's how inserts work, you just give it this document
+01:17 and it stores it more or less directly in the database
+01:20 except for that it will generate this _id as an object id if needed.
--- a/transcripts/ch4-mongo-shell/14.txt
+++ b/transcripts/ch4-mongo-shell/14.txt
@ -0,0 +1,39 @@
+00:01 If inserts are simple, updates maybe not so much.
+00:03 In fact, there are two types of updates that we're going to look at;
+00:06 first, we're going to look at what is the conceptually more simple one,
+00:09 but also slightly more problematic.
+00:11 So I'm going to call this the whole document update
+00:14 and the way you might use this is you might go to the database,
+00:17 do a query, get a document back, make a change to it
+00:19 and say here, push this whole document
+00:22 back over top the existing one in the database, kind of orm style.
+00:26 The other one that we're not talking about here would be the in place updates,
+00:31 so you might say go increment the view count of this post
+00:34 without retrieving it, without changing the other parts,
+00:38 ok, so how does the whole document update work?
+00:40 Well, first of all, we're going to do an update
+00:43 if we come back and we look at it, we'll see maybe we've changed the title here,
+00:46 the author is still the same, but we had to pass the author,
+00:48 we had to pass the published and the isbn back,
+00:52 okay, in fact also the id, so all that stuff we had to put back,
+00:55 basically the way it works is we're going to do a where clause here
+00:58 so find it by the primary key, this great long object id
+01:02 and then here is the entire whole document
+01:05 we want to replace that document with.
+01:07 Now because of the way it's working here,
+01:09 there's a couple of features or settings you might want to control here,
+01:12 so you might need to set these, you might not depending on what you're doing,
+01:16 the default is if the where clause does not match, nothing will happen,
+01:20 there will be no kind of upsert, there will not be a new document added
+01:24 because we didn't find one, just nothing happens.
+01:26 So if you say upsert is true and you run this update,
+01:29 it will say I didn't find this document, so let me create it for you,
+01:32 so you could control that here.
+01:34 Similarly with multi equal true, normally unlike sql statements
+01:37 update only updates the first item it finds
+01:40 even if the where clause would match ten things, it only updates one of them.
+01:43 So that's a little bit funky, but if you think it's entirely replacing the record
+01:48 like why would that hole record be duplicated ten times,
+01:51 I don't know, it's kind of weird, but if you do want to update multiple objects,
+01:54 multiple documents in this collection, be sure to set multi to true,
+01:57 both of those orange values, their default values are false.
--- a/transcripts/ch4-mongo-shell/15.txt
+++ b/transcripts/ch4-mongo-shell/15.txt
@ -0,0 +1,11 @@
+00:00 After you've inserted some documents and maybe updated a few,
+00:03 it might be time to get rid of the old ones, so let's talk about deleting them .
+00:06 So again, it's db.collection name. and we're going to apply delete operation.
+00:11 And here we can say I'd like to delete one of them,
+00:13 delete one, or maybe I want to delete a whole set of them, right,
+00:18 the delete one we're passing in something that should be unique,
+00:20 like the primary key, and delete many, maybe a bunch of them have the title,
+00:23 maybe there is a couple of additions
+00:25 like a kindle and a paperback version or something like that.
+00:27 So just get rid of all of them with the title being some title.
+00:30 So, delete one, delete many— pretty straightforward.
--- a/transcripts/ch4-mongo-shell/16.txt
+++ b/transcripts/ch4-mongo-shell/16.txt
@ -0,0 +1,90 @@
+00:01 It's time to look at the atomic updates.
+00:03 We already talked about the whole document updates and how they work,
+00:05 but sometimes it's not really what you want;
+00:08 in the beginning when we talked about NoSQL,
+00:10 we saw that the NoSQL databases gave up things
+00:12 that traditional relational databases embraced or considered sacred.
+00:17 One of those where the acid properties, or some part of the acid properties
+00:21 and MongoDB does say look things like joints and transactions,
+00:24 transactions mainly being part of the acid properties
+00:27 is something that MongoDB doesn't promise
+00:30 so this whole document updates really require an additional layer in the app tier
+00:35 called optimistic concurrency, and usually it's fine,
+00:38 sometimes it's not, and you can catch it and say hey look
+00:41 somebody saved this out from under you
+00:44 and you do want to keep your changes, their changes,
+00:45 there's things you can do about those types of situations,
+00:47 but not in the database in your app.
+00:50 On the other hand, MongoDB does support atomic transactional behavior
+00:54 long as it happens on a single document,
+00:57 so if we have a document and let's go ahead and create
+01:00 a whole new collection here called BookReads
+01:03 notice it doesn't exist yet, and we're going to insert just an isbn
+01:05 and then how many times it's been read,
+01:08 I think of like the Goodreads service or something like that,
+01:10 like I want to know how many of my friends read this book,
+01:13 we'll you a simple, simple version of that.
+01:15 So let's go over here and notice we inserted one and if I refresh,
+01:18 we should now have that in here in our one record, like so.
+01:22 So we could go and we could do this for this whole document style things,
+01:26 I could say book and of course we will be doing this in Python very likely
+01:33 we're just about to leave the Javascript in the dust,
+01:36 so let's just print out our book that we got here,
+01:39 notice this has actually given us the same thing back,
+01:41 and we could say the read count += 1, we could increment that,
+01:47 and then we could say go over here to the same collection,
+01:50 we could say update, I would like to update with this,
+01:55 here's the where clause, and the thing I want to update with is the book,
+01:58 so let's say _id : book._id, okay, so this should do that like so,
+02:07 and let's run one more query here at the end to get it back, to see it again.
+02:12 Oh yes, find is not going to work, find one however,
+02:19 we don't want to update a whole query,
+02:21 whatever that means it doesn't make any sense
+02:23 but let's get one of them back, we know this is really going to be unique
+02:25 and then let's make this change, ok
+02:27 so notice, now we've got a read count of one, we do this a few times, bam bam
+02:32 a read count is incrementing over and over and over down here,
+02:35 and we're updating one record,
+02:37 so this is cool but this is not part of the acid property guarantees,
+02:40 this could be problematic in lots of ways
+02:43 so what we're going to look at now, are the operators that we can use
+02:46 to basically do almost transactional stuff
+02:49 and do it in a much more high performance way.
+02:52 So let's go over here again, and let me grab this little clause here,
+02:57 all right so we got our document back again
+02:59 and now what we're going to do, is we're going to do our db
+03:03 let me just grab this collection bit,
+03:06 and we're going to do our update, in fact update is going to look almost the same,
+03:09 we are going to do this, but instead of passing the whole document
+03:16 we're going to pass just an in place atomic operator,
+03:18 all right so what are we going to do, let's suppose we want somebody
+03:23 to basically do the same thing, increment that
+03:26 alright, I guess we could just use isbn, that works as well right;
+03:32
+03:37 we're going to need something in our little where clause here, isbn will do.
+03:41 Now by default, this is going to replace whatever's in there,
+03:44 that's going to be bad, but what we really want to do is
+03:47 we want to increment that one value, so we can use another operator,
+03:50 say inc for increment, and then what do I want to increment,
+03:58 I want to increment let's see what is it called— ReadCount,
+04:04 so I want to increment ReadCount by one,
+04:07 I could increment it by negative one, I could increment it by ten.
+04:10 So let's run this, now notice we updated one record
+04:13 and let's put this in a way that looks better, nine, ten, eleven, twelve—
+04:18 there we go, check that out, isn't that cool?
+04:21 So what's happening here is it's actually going into Mongo,
+04:24 go find the document, just change that number right there,
+04:27 just add one to it for me, you don't have to pull the whole thing back,
+04:31 make changes and possibly try to put it back and someone else changed it,
+04:34 none of those things, this is entirely atomic and safe
+04:38 in a multi threaded, multi server environment,
+04:42 because MongoDB guarantees individual updates
+04:44 to individual documents are atomic
+04:47 and because we're not depending on the value,
+04:50 we're not like reading it changing in memory and putting it back
+04:53 change it in our programs memory not Mongo's and put it back,
+04:56 then we're not going to have any problems.
+04:58 There's a bunch of cool operators like this
+05:00 and we'll see that MongoEngine actually naturally behaves in this style
+05:04 not the document style, even though it's an object document mapper
+05:08 which is really really delightful.
--- a/transcripts/ch4-mongo-shell/17.txt
+++ b/transcripts/ch4-mongo-shell/17.txt
@ -0,0 +1,59 @@
+00:01 Despite the fact that MongoDB is a NoSQL database
+00:03 it does adhere to the acid properties under certain circumstances.
+00:07 Primarily that means updates to individual documents are guaranteed to be atomic,
+00:12 and along with those, we can get great performance
+00:15 as well as safety if we don't pull the document back for the database,
+00:19 make changes and push it back hoping no one else has changed it
+00:22 during that intervening time there,
+00:24 but in fact we can go to the database and go make this change here
+00:27 I don't care if it's a 100k document, don't pull anything back
+00:30 just make this little change and that happens atomically and safely.
+00:34 So the operators that we have to work with are increment, multiply,
+00:37 rename a field, set on insert set unset, like basically delete a field,
+00:42 min and max so I would like to set the value
+00:45 but only if this value is lower than the one I'm passing,
+00:48 or the one that's in the document or set it to the max,
+00:51 like only set the value to this if this new value is bigger than the existing one.
+00:55 You can also use current date to basically grab the server date and save it there as well.
+01:00 So these are the in place individual updates and we can see how that works
+01:03 so we'll come over here and let's insert just a book
+01:06 and this time our book has a view count, right, the view count is zero,
+01:09 maybe every time somebody pulls up the book we want to increment that,
+01:13 so we can say test.update and give it the object id
+01:16 right here is a real simple one so it was fits onto the screen basically
+01:20 you can say $inc increment view count by one,
+01:23 and we do this a few times, so we've done it three times
+01:26 it should go from zero to— well you guessed it, three
+01:29 and it all happened atomically in the database,
+01:32 without us ever pulling it back or worrying about any sort of concurrency whatsoever.
+01:36 So this is great for working with individual fields
+01:39 sometimes we need to work with arrays,
+01:42 so we saw like for example our ratings object
+01:44 maybe we want to work with that atomically.
+01:47 So MongoDB has operators for that as well,
+01:50 so we have things like add to set, so suppose it's got like a votes list,
+01:55 people who have voted on this book,
+01:57 not the values just keep it simple, just the users who have voted
+02:00 and that contains user id, so you could say add to set user id when they vote
+02:04 and that would actually only add them there, if they're not already in that list;
+02:08 what's cool about that is
+02:11 if they push the little vote button twice, it doesn't count twice,
+02:14 just either you add it there and the person has now voted for or they haven't.
+02:17 Another good example is tags, like think stack overflow, I want to tag a post
+02:21 so you could say add the tag python, add the tag mongo,
+02:24 and if it's already there, it's just going to leave it alone
+02:26 if it's new, if it's not there it will actually add the tag.
+02:29 So these are really cool to add to set for kind of uniqueness on these subarrays.
+02:33 We also have pop and pull for pulling things out,
+02:35 pull all say I want to remove all the votes by a particular user, things like that.
+02:40 Also push, so push is like add the set
+02:42 without the unique desk constraint, and that's it,
+02:45 I definitely recommend you think about these atomic updates,
+02:47 they are not simple, but they are better performing
+02:51 and they are definitely safer as well.
+02:54 Like I said before, it's great that the odm, the object document mapper
+02:59 that we're going to look at, MongoEngine automatically does this behind the scenes,
+03:02 we don't ever have to even know how they work,
+03:05 but it's important that you know that they exist and why they're good for you
+03:08 when you look at the logs, and you look the performance
+03:11 and think about things in that way.
--- a/transcripts/ch4-mongo-shell/2.txt
+++ b/transcripts/ch4-mongo-shell/2.txt
@ -0,0 +1,93 @@
+00:01 So let's connect to MongoDB,
+00:03 I already have it running as a separate process hidden away,
+00:05 we'll talk about how to run MongoDB later,
+00:08 you should have seen in the setup how to get it started
+00:11 and then we'll talk about the deployment side of things later in the class.
+00:14 So MongoDB is running, it's running the local machine under default ports,
+00:18 no security, nothing like that for getting started,
+00:21 it's only listening on 127.0.0.1
+00:25 so it's not listening on the public network, on my machine,
+00:28 so for that reason, more or less plus firewalls,
+00:31 the authentication part we're going to turn off for a little bit,
+00:35 just so we can start from the beginning;
+00:37 okay, the other thing I have is I have set up MongoDB in my path,
+00:41 so I can ask which Mongo, and it comes back with something,
+00:45 so what I actually did is I went to MongoDB
+00:48 and I just downloaded the tarball, and I unzipped it,
+00:51 and I sort of changed the naming around, so it's in this path here,
+00:53 so here's the actual executable.
+00:55 Mongo is the name of the shell, mongod is the name of the server for deamon
+00:59 so in order to connect to MongoDB, there's a ton of options we could give it
+01:03 and like I said, when we get to the deployment and production stuff at the end,
+01:06 we'll have to pass all sorts of things like authentication,
+01:09 an ssl flags, and whatnot, server names here
+01:12 but in the beginning, we can just type mongo.
+01:15 And you'll see, right here, we're running 3.4.4
+01:19 and it's connected to local host 27017,
+01:24 that's the default port for standalone servers,
+01:26 there's 27 thousand, 18, 19 and 20 are reserved
+01:30 or typically the default for other types of things.
+01:32 So my system is not exactly set up right,
+01:35 but it's not a production machine it's just my dev machine, okay.
+01:37 So now we're connected, what do you do?
+01:40 Well, probably the first thing you want to do is
+01:42 focus on a particular database, so you can say show dbs
+01:45 and it will show you the various databases, how large they are things like that,
+01:50 so we're going to work with the bookstore for our examples in this chapter.
+01:55 Later, we're going to work on something that maps over to a car dealership,
+01:59 so those are the two databases that we're going to be working with,
+02:02 you can see that I have got some for my various sites here and things like this,
+02:06 I have actually broken it apart so like Talk Python the core data
+02:09 it's not really zero gigs, it's just rounding down, it's like 20 MB or something,
+02:14 but the analytics is half a gig here, and it's actually much more if you export it.
+02:20 So we may have more than one database for our app like I have on my podcast,
+02:23 or you might just have one for the trading site, like we do here.
+02:27 Great, so now I want to maybe find a book in the bookstore,
+02:31 so how do I do that— the first thing you have to do is
+02:34 you have to activate the database, so you're going to say
+02:37 db.command, whatever that is, and give it some command here,
+02:40 where db refers to one of these databases, so the way we do that
+02:43 is we say use say bookstore, like this,
+02:47 now it says great, we switched to bookstore,
+02:49 and then we could say db. first of all what are the equivalent of tables
+02:53 in MongoDB these are called collections, because they're not tabular,
+02:56 so we can say show collections, and this is what is contained inside of bookstore,
+03:01 there's a Book, case sensitive, Publisher, Test and User, ok.
+03:06 So if I wanted to find the books let's say db.Book.find
+03:09 let's say just limit one, so it doesn't go crazy on our shell here,
+03:13 so basically, the way it works is we connect,
+03:16 we figure out what the database we want to work with is,
+03:19 we say use that database and then we say db.collection name
+03:22 and then we typically fire these commands at the collection.
+03:25 Now, what's interesting that is missing here
+03:28 is there's not like a create database or inside of
+03:32 here there's not a create table or create collection command,
+03:36 so like Python in some ways, MongoDB is very, very dynamic,
+03:41 so if we wanted to create a table, let's go and just create a collection
+03:45 and we won't create a whole new database,
+03:47 so what database we have, we have a bookstore
+03:49 and we have those for collections bookl publisher, test and user,
+03:52 so if I want to create one called logins—
+03:56 let's say just log for history
+04:00 I could even issue a find command against that
+04:03 and there's just nothing, it's just empty.
+04:06 If we go up here and we say what's here, there's no log,
+04:09 but if I actually try to interact with this, we'll talk about inserts in a little bit,
+04:12 but let's just really quickly see how this works,
+04:17 I would just say let's say name or action is view, something like that,
+04:22 if I insert this, no just crazily this works and something was inserted,
+04:26 if we look there's now a log, so db.Log, case sensitive .find, there
+04:32 and it inserted this thing, action with a view and I gave it the id whatever it is,
+04:36 this is called an object id, we'll talk about that later.
+04:39 Okay, so this shell is how we work with MongoDB,
+04:41 if I want to get rid of it, I could go here and say drop collection,
+04:47
+04:50 just drop, right,
+04:54 and now log is gone again.
+04:56 So this is your base level admin tool
+04:59 and it works everywhere, so we could ssh into our Linux server
+05:02 Digital Ocean, or on aws or whatever,
+05:06 and we could do this, we could even sort of tunnel this through there,
+05:10 but we're going to see that there is actually some better options
+05:13 any time we're running somewhere
+05:15 where we can even just tunnel over to the server.
--- a/transcripts/ch4-mongo-shell/3.txt
+++ b/transcripts/ch4-mongo-shell/3.txt
@ -0,0 +1,31 @@
+00:01 So let's review the main concepts around using the shell.
+00:03 Remember you just typed mongo enter
+00:05 and it will connect your local default, everything default port,
+00:08 default local host, no account etc,
+00:11 and once we're connected, we'll be in here,
+00:15 and it'll say connected to the server,
+00:17 what version of the shell, what version of the server,
+00:19 3.4.4 is the latest at the time of this recording,
+00:22 but maybe not at the time you are watching it,
+00:24 like all things that are server's, newer is better.
+00:26 Ok, so first thing that we might want to do is say what databases are here,
+00:29 and we do that with the show dbs command, we hit enter,
+00:32 and it shows you the various databases that are listed.
+00:35 Then next we want to activate one,
+00:37 so that we can issue commands to it through the db.collection
+00:40 or other high level operations,
+00:42 so we'd say in this case let's work with talk_python, so we'd use talk_python.
+00:46 And it'll say great, we switch to database talk_python,
+00:50 and in you're wondering you can always trying as you saw me do
+00:53 db enter and it will say talk_python, cool, and then,
+00:55 we could say well what collections exist in talk_python?
+00:57 This is actually pretty straightforward,
+00:59 the document design I think is pretty interesting
+01:01 but there's not many collections,
+01:04 so we have episodes, guest, reviews and then while developing it,
+01:08 I turned on profiling to see where it was slow and where it was fast,
+01:11 where I need indexes, we'll talk more about that near the end of the course.
+01:14 So we have these four collections, and now if we want to find an episode
+01:18 we'd say db.episodes.find and give it some search,
+01:23 or sort, or something to that effect.
+01:25 So this is how we get started and connect with the shell.
--- a/transcripts/ch4-mongo-shell/4.txt
+++ b/transcripts/ch4-mongo-shell/4.txt
@ -0,0 +1,182 @@
+00:01 Now let's see how we do probably the main thing
+00:04 that you do in databases and that is query.
+00:06 So here we are in the Mongo shell still,
+00:09 and I'm using the bookstore database,
+00:11 so what I want to do is find some particular books;
+00:14 remember, we have book, publisher, test
+00:18 we can really remove test, not actually do anything, and then user,
+00:21 so those three actually used one.
+00:23 Let's go and remove test just so that it is gone.
+00:29 Now we have the ones we're actually using.
+00:31 Now, when we're getting started, it's probably worthwhile to just say db.Book.find
+00:36 as an empty query just like kind of select star if you will,
+00:39 you know show all of the things that are in there,
+00:42 there, that's totally obvious what that is, right,
+00:44 you see the structure, right if you can like kind of exist in the matrix
+00:47 you could entirely see the structure there, but let's do that better.
+00:50 Notice a certain number of items, I don't know it's 20 or 50 were returned
+00:54 there's actually like a quarter million books,
+00:57 so we didn't get them all which is good,
+00:59 so if we want more, we just type "it" and it will actually get more and so on.
+01:03 Okay, so this is not super helpful, let's make this more helpful;
+01:06 so here we can go over and say I want this to be like that pretty
+01:10 and in fact, if I just want one of them I could just say limit this to the first one,
+01:14 or let's just say limited to two so we see a couple of examples.
+01:17 There, now we're starting to see the structure.
+01:21
+01:23 Let's go here, ok so now we've got a book,
+01:26 right here you can see the top level document,
+01:29 it doesn't put the results in arrays,
+01:32 like it doesn't print out an array it just prints
+01:34 a whole bunch of individual results in this case two,
+01:36 so here we have our id, there's always an underscore id in the database
+01:40 like this is the name of the primary key,
+01:42 you can have it look different in Python,
+01:44 you can say this thing maps actually to the primary key
+01:47 when you are modeling this with classes and so on,
+01:50 but down at the Javascript and the MongoDB level,
+01:53 this is always the name of the primary key,
+01:55 if you don't give it one when you insert the thing, it's auto generated,
+01:58 and so if you don't have a great reason to care about what id looks like
+02:02 probably using this object id is the best bet.
+02:05 So our books have isbns, they have titles, they have authors,
+02:08 I kind of wish it was little more pythonic with lower case ts and as,
+02:12 but this database came from somewhere else and it's like this
+02:15 so we're just going to roll with it.
+02:17 Ok, so we've got dates notice, json doesn't support dates
+02:20 nor does it support object ids, but the results here do
+02:23 and so dates and object ids are sort of extensions that bson brings to json.
+02:28 Alright, and then we have a list of these image url objects
+02:32  which have both the size and url, and so on,
+02:35 and then they also have ratings, this one has one rating, so not too many,
+02:39 let's look at the next one— it has a lot of ratings, right,
+02:42 so it has a user id that is foreign key constraint
+02:45 a foreign key link soft not enforced by the database,
+02:48 but a link over to the user table and then a value here;
+02:51 so this is what this database looks like,
+02:55 we have a title, we have an isbn,  and these are like the flat things,
+03:00 and then we have most importantly we'll go play with the ratings a little bit,
+03:03 so let's start by asking this question about the books.
+03:06 So the way it works is db.Book.find put some space in here
+03:11 so the way MongoDB queries it doesn't have a where clause
+03:15 basically what you put in here is the where clause,
+03:18 and the way we do is we pass what I think of as a prototypical json object
+03:22 so the json object that we're going to put here,
+03:25 maybe would have something like this, let's say title, case sensitive remember,
+03:30
+03:34 is "From the Corner of His Eye", if I put this in, here we go,
+03:39 so "From the Corner of His Eye", now this is a book
+03:41 that should be in this database and we'll be able to do some queries for it
+03:47 what this says to MongoDB is go to the book collection
+03:49 and find every single document that has the title equal to
+03:54 "From the Corner of His Eye", and I think that there's more than one, let's see—
+03:57 yes, so we can come over here and we can do a .count,
+04:00 there's three, alright, so this is nice,
+04:03 however, what you saw come back there was even if I did a pretty,
+04:07 still because we've got the ratings and the image urls
+04:10 and this one has a crazy amount of ratings and so on, we might want to get less,
+04:14 so with his find thing, this is like— let's put it here,
+04:18 this part where is this title, that is the where clause
+04:21 but in SQL, you could say like select title, id, isbn, from this table
+04:28 so we can do that in MongoDB as well, we can do this like sub projection
+04:31 so I can come down here and say I'm interested in title
+04:34 and anything that's truthy in Javascript, so I could put high,
+04:38 I could put one, I could put true, I like to put one, I don't know why
+04:43 and let's say we want the isbn, this is case sensitive as well
+04:47 and watch what comes back now — okay, so there's our three records
+04:51 now interestingly, each one has three keys and we specified two.
+04:55 So the way it works is Mongo is like
+04:57 you're probably going to need that primary key
+04:59 so unless you explicitly say you don't want it, you're getting it right,
+05:02 so if we want to do this again, and I could come over here
+05:05 and I could explicitly suppress id and put something falsy here like zero
+05:08 and then I just get isbn and title, okay.
+05:11
+05:15 So let's go back to this. Now suppose I want to find the book with this title
+05:19 and this isbn, how do I do an and here?
+05:23 Well the way these queries work is everything,
+05:26 basically every property of that little subdocument must be
+05:29 a subset of the thing it matches for,
+05:31 so when I say title is "From the Corner of His Eye",
+05:33 that matches the title, but I could equally come up here
+05:36 and do this again and say oh also that isbn,
+05:39
+05:43 actually I don't know what it's supposed to be let me run this real quick,
+05:46 let's say we're looking for this one, the one that ends in 41,
+05:50 so now I could come over here and say that isbn,
+05:55 so json or Javascript you don't technically need to put a name there
+05:58 but this is a string, so it goes like that, right
+06:01 see it starts with zero, it wouldn't just be a number.
+06:03 So now, if I run this, I just get the one,
+06:05 so this is the and clause, select star from book where title is this and isbn is that
+06:11 so you can create these documents
+06:14 to basically and together all the pieces that you need.
+06:18 So this is all well and good, this looks a lot like a standard database,
+06:22 standard relational database type of thing
+06:25 but remember when I talked about documents,
+06:27 I said their superpower is they get this nested thing
+06:31 so let's go over here and just throw this back,
+06:34 we'll just get one of them so we can look at it again,
+06:38 their super power is that they can reach, let's get the next one
+06:43 so per page you would use skip and limit,
+06:46 so we can reach into like say the ratings and say
+06:49 I'd like to find all of the books that have a rating of let's say eight
+06:54 or all the books that have been rated let's do this,
+06:58 I don't know how many books that person has rated
+07:00 but we can find out in a second, so I want to find all the books
+07:02 that have ratings where the user id was that particular id, right there,
+07:06 so how do we do that— let's come up here again, we don't need this anymore,
+07:10 so in here we kind of want to say something like this
+07:14 like rating, and then if this was an object we would navigate it with .syntax
+07:18 but it's not going to work out so well here,
+07:20 so this would be user id like this, let me just paste this in
+07:26 so I can get my little object id out, when you're quering by object id
+07:29 and you just say object id,
+07:32 the question is is that valid Javascript, and the answer is no, it is not.
+07:37 So any time you have this sort of hierarchy thing traversal
+07:41 you have to put quotes right, if it's a single item is optional
+07:44 if you're doing something funky like an operator or something like this
+07:47 then you're going to have to do like this.
+07:50 So let's just show, let's select back here
+07:53 we're just going to say give me the title is one
+07:55 and I don't even care about the id;
+07:58 if I can write a query like this, go down into the ratings,
+08:01 and show me all the ones that have this user voted,
+08:04 that means even though I've kind of pre-joined and embedded this ratings concept,
+08:08 I can still query it as if it was a separate table, separate collection
+08:12 and that's the document databases superpower,
+08:15 let's see if I can get it to work now;
+08:17 apparently I did not get it to work what am I missing here?
+08:21 Oh, notice I think I said rating and the actual schema is ratings plural,
+08:26 I think that's good, it's representing a pluralized thing down there
+08:30 so the problem was I did this, now notice MongoDB didn't crash,
+08:35 it didn't go oh there's no such thing as a ratings field on this,
+08:39 it just said no nothing matches that,
+08:41 so it's really powerful, it means it's super easy
+08:43 to sort of evolve and work with the data
+08:46 and it doesn't break under the tiniest lightest of schema changes, pretty good,
+08:50 but you just got to be careful, so let's try it again.
+08:53 There we go, so apparently we could even ask
+08:55 because that was not all of them, there's a lot of books this person has rated
+08:58 so I think this data might be partly just generated
+09:01 okay, so here these are the books that that person rated,
+09:05 let's find another, let's try to do this again,
+09:08 come down here I will get this object id,
+09:11
+09:14 we can say I want to find the books rated by that person
+09:18 how many are there— 107.
+09:21 And if I actually wanted to see what they are, there's the titles of the first set of them,
+09:24 notice that's really, really fast, I think I have indexes set up right now
+09:28 we'll talk about indexes when we get to the performance part of this course,
+09:31 but we can do these queries down into the ratings embedded part
+09:35 the embedded documents into the books
+09:38 just as if they were their own table,
+09:41 I told you there's about a quarter million books, there's 1.25 million ratings
+09:45 so notice the response time here almost instant, in fact it's like milliseconds.
+09:51 So not only can we do this query, we can do this query extraordinarily fast.
+09:56 All right, so this is one of the things that makes document databases interesting
+10:00 and also challenging, how do you define the documents,
+10:03 should you embed them, should you not,
+10:05 we'll get to that in a whole different chapter,
+10:07 but for now, just know it does have this super power
+10:09 to reach down in here and do these queries.
--- a/transcripts/ch4-mongo-shell/5.txt
+++ b/transcripts/ch4-mongo-shell/5.txt
@ -0,0 +1,44 @@
+00:01 We've explored the shell a little bit, we've done some querying,
+00:03 let's look at the concepts behind it, so you have them nice and concise,
+00:06 in case you want to come back for a reference.
+00:09 So if we want to query say the Book collection in the bookstore database
+00:12 where the title is 'From the Corner of His Eye',
+00:17 we can type find and give it this little prototypical json object,
+00:20 hit enter, and boom everything comes back that has the same title,
+00:25 different isbns, different primary keys and so on,
+00:27 but releases, different versions,
+00:29 maybe one is paper back on is kindle, who knows;
+00:31 so the idea is we're going to come up with these prototypical json objects,
+00:35 here title: whatever the title is.
+00:39 Now, if we want to do more than just what is the title here
+00:42 we want to say give me the book with the title this and the isbn that,
+00:47 given that the isbn is probably unique,
+00:50 we could maybe just search for it instead,
+00:52 but we want to demonstrate the and clause, right.
+00:54 So here we'll give it this prototypical sub document
+00:56 with the title being the title we're looking for, and the isbn being this one.
+01:00 And notice, now we only get one record back,
+01:03 so our prototype will document is basically an and clause, every field must match.
+01:09 We also saw that one of the excellent ways to group related data,
+01:14 this would be what you might call an aggregate in domain driven design,
+01:18 is to embed items into the document,
+01:22 so here we have ratings that ratings have little sub objects,
+01:25 sub documents that have things like user ids and values
+01:28 and at the very beginning, and in the example you saw,
+01:30 the superpower of these document databases, is that they can query them,
+01:34 so I want to find all the books that have been rated by this highlighted user id—
+01:38 how do I do that? So we just pretend we're traversing the objects
+01:41 Ratings.UserId, so down here we'll say find Ratings.UserId
+01:46 and we give it the object id that we're looking for
+01:49 because ''Ratings.UserId'' is not a valid key or a field name in a Javascript object
+01:54 we have to put it in quotes, but other than that, it's basically the same idea
+01:58 and here we get back all the books that have been rated by this particular user.
+02:02 So we just use this dotted notation to traverse the hierarchy
+02:07 one other interesting point is maybe ratings just contained the number
+02:11 like it was at 7,5,... then you could actually just if I want to say
+02:17 find all the books that have a rating of seven
+02:20 I could just say find ratings:7,
+02:23 I don't have to do this dot notation or anything like that,
+02:25 but because I'm looking within that document inside ratings,
+02:27 regardless of whether it's an array or it's a single rating thing,
+02:31 you do it like this that dot notation.
--- a/transcripts/ch4-mongo-shell/6.txt
+++ b/transcripts/ch4-mongo-shell/6.txt
@ -0,0 +1,89 @@
+00:01 The shell is pretty nice and it's ubiquitous
+00:03 and that you can run it anywhere, you ssh to and things like that,
+00:07 so that really good, and this is more or less the tools
+00:09 that MnongoDB ships, you could work on something else that's coming along
+00:12 but there's a really great, better shell in my opinion
+00:16 much, much better, I really love it, it's called Robomongo,
+00:19 so we talked about Robomongo in the setup how we installed it and so on,
+00:24 so let's see how it works and how it compares to the shell here.
+00:28 So here it is, you can see it hanging out down there
+00:31 and we click start, maybe it's empty let's go ahead and start from scratch,
+00:35 so now if we open it up it's empty, let's create a connection,
+00:38 I'll just call this local or whatever,
+00:42 and it's going to default the local host 27017,
+00:45 all this stuff turned off, things like that, and we'll just say save and connect
+00:50 and now you can see, let's put these little more side by side,
+00:53 you can see over here we have our bookstore or charge watcher and so on.
+00:58 And now we have the benefit that we can open this up
+01:02 we can look at the book, we could say explore the indexes
+01:04 we could even go over and say edit this index and make changes,
+01:08 make it unique, do some other things about sparseness and so on.
+01:12 We'll talk more about that later.
+01:15 Over here, we could say something like use bookstore and it switches there,
+01:19 the equivalent over here would be something like right click and say open shell,
+01:23 how interesting, so I know a lot of people prefer the command line interface
+01:27 but what's really awesome about Robomongo is
+01:31 you have the entire cli right here, so I could say something like
+01:35 db.Book, notice the auto completion, book, publisher, user, auth, etc,
+01:42 .what do you want to find, find and modify, find one,
+01:44 let's find one where, what did we have before, we had something with the title
+01:48 and let me go back and find the title we were using—
+01:55 so here we can say title like this, and now if I run it, I get a result down here
+02:03 and I can explore it, I can see the ratings and so on,
+02:06 and this, you know if we run this over here,
+02:08 I get I did the little projection, I could do that as well.
+02:12 So I get this text version and I actually don't really love this too much,
+02:15 so you can actually just switch it to the text version here as well,
+02:18 and you get color coding, highlighting, all sorts of stuff.
+02:22 You also get this version which is kind of a flat version, I never use this
+02:27 but you can use it if you want.
+02:29 What is really cool is I can come over here and say
+02:31 I want to maybe edit this document,
+02:36 if I come over and do a find, I think—
+02:41 here I get three, now if I do a straight find, not a find one,
+02:44 I can actually go and edit this, so if I wanted to change
+02:46 the date that this was done on, so let's say 2011,
+02:52 save, rerun this, this is one with so many ratings,
+02:59 here, this is the one I changed, number 2, now it's 2011.
+03:05 So of course I could run an update command,
+03:08 but you can do all sorts of interesting sort of UI things
+03:10 so I really really like using Robomongo,
+03:12 because it's one hundred percent as capable as the shell
+03:15 so for example, I could come over here, this is like just typing Mongo
+03:19 you could create variables, I could say var page,
+03:25 let's do something with a paging here, so I come and say this
+03:27 now notice, this uses get collection and it doesn't use the .Book like this,
+03:32 I think it does that because it gets better intellisense or auto completion,
+03:36 not really sure, anyway, you can do it either way, they are equivalent.
+03:41 Now, let's go over here and imagine we're going to do some paging,
+03:45 so first of all, let's just select the titles
+03:47 remember the thing I did with the projection, exactly the same thing here,
+03:51 there we go, I forgot to rerun it, okay.
+03:53 So rerun it, now we get just the titles,
+03:56 there's "Classical Mythology", "Clara Callan" and "Decision in Normandy" and so on,
+04:00 so suppose we want to do paging,
+04:02 I'd basically want to show you that this is like a full Javascript shell
+04:05 plus kind of an editor, so watch this,
+04:07 so if I put some semi colons in here, I can type let's say var page size is three
+04:12 var page num, like what number are we on, let's say were on page two,
+04:17 than down here I could say ok, this is what I want to do,
+04:20 and I could do skip and page to actually do the paging,
+04:24 so I could say skip and we'll do what page num, minus one, times page size
+04:30 that's how many we want to skip,
+04:34 and then we want to limit it to page size like this
+04:38 so now I should get, let's see, go back to the beginning,
+04:41 three things per page, we're going to be on page two,
+04:45 so it should be the Flu, the Mummies and the Kitchen God's Wife, and that's it.
+04:52 Oh, by the way if you highlight something, it just runs that expression
+04:56 which apparently evaluates the two, run the whole thing—
+05:00 notice Flu, Mummies, Kitchen, so we can do this basically
+05:03 as much as we want to type up here,
+05:06 but it's also a little editor, I mean just in almost every way
+05:08 this is better than the shell and I could even use this
+05:13 to connect to my remote MongoDB server, using ssh tunneling,
+05:16 again, we'll talk about those kinds of things
+05:18 when we get to the deployment section
+05:20 but for pretty much the rest of the course, we're going to be using Robomongo
+05:24 because it's just better in every way in my opinion.
+05:28 All right, and as you saw Robomongo installs
+05:30 on Windows, Linux and MacOs, so it's all good.
--- a/transcripts/ch4-mongo-shell/7.txt
+++ b/transcripts/ch4-mongo-shell/7.txt
@ -0,0 +1,69 @@
+00:01 Now, let's use Robomongo, our shiny new shell
+00:04 that I contend is better than just the cli one,
+00:07 let's use it to explore some more advanced query filtering and sorting options.
+00:12 So here's just a blank find showing me all the records,
+00:16 how many are there in the database, there's 271 thousand books,
+00:20 so this is the same database we've been playing with for a while now.
+00:23 So let's ask some questions about the ratings.
+00:27
+00:31 So we're going to go into the ratings array
+00:32 which contains a bunch of objects, which have values,
+00:35 so I want to say how many of them have the value nine,
+00:40 so what's that actually answering— what question is that answering
+00:44 that is answering how many books have been rated
+00:47 at some point by somebody with a nine,
+00:50 how about with ten— a little bit more,
+00:54 so there are some books that were really, really popular
+00:56 people loved them, this is a 1 to 10 type of scale,
+00:59 I think it might also include zero.
+01:01 So that's great, this is our prototypical json object here.
+01:04 However, what if I want to say show me all the books
+01:08 that have a moderately high rating, what does that mean,
+01:12 let's say it has an eight, a nine or a ten as a rating,
+01:15 how do I express that as a prototype? You can't do it,
+01:19 and so that's why MongoDB has something slightly more complex
+01:22 and nuanced than just straight comparison, right,
+01:27 so this is like an equality query, so instead of putting a value here
+01:29 we can put a little sub search document here
+01:33 and into this, we can say I'd like to apply an operator instead of an exact match,
+01:39 so the operator might be greater than operator >,
+01:42 so the way you know it's an operator is the dollar
+01:46 and gte greater than or equal to is going to be the thing
+01:50 and then we're going to put the value of eight,
+01:52 so show me the books that have a rating of eight or above,
+01:55 tell me how many there are because we're doing a count,
+01:57 so let's run that, look at that 98 thousand books
+02:00 have a rating of eight, a nine or a ten.
+02:03 Does it mean their average rating as eight, nine or ten,
+02:06 that means somebody somewhere has rated it eight, nine or ten.
+02:09 So we also have things like greater than,
+02:12 without the equal, just flat up greater than so that's nine or ten right there,
+02:16 so we have a number of these operators,
+02:19 greater than, greater than or equal to, and so on.
+02:22 Another one that's really interesting is in,
+02:24 this is super important for really powerful queries,
+02:27 so when we have documents that contain sub arrays of other documents
+02:33 you can think of those as basically being pre joined
+02:36 but when you normalize those, that are not contained within each other,
+02:39 then you need a way to still go back and say
+02:42 basically do the join, and this in operator is the key to making that happen,
+02:47 this is not really what's happening here, because this is a sub document,
+02:50 but it's the operator that's involved, so what we can do is
+02:53 say I would like to find me the ratings
+02:55 that have let's say prime numbers as ratings,
+02:58 it's kind of silly, but whatever,
+03:02 here we go, so those are the prime numbers between one and ten,
+03:05 and we could say I would like to find all the ratings where the value,
+03:08 one of the values right, remember they have multiple ratings
+03:11 but one of the values is actually in this set,
+03:14 so the way this usually manifests is like go to the database
+03:17 and maybe I pull back some items, and it's got like a sub array of let's say ids
+03:23 and then I can go back to the database
+03:26 and say give me all the items in this other collection
+03:28 where the idea is in one of this like sub ids,
+03:31 so an example might be in the Talk Python Training stuff
+03:34 that remember the course contains all the chapter ids
+03:36 and I can go back into one single query
+03:39 that will give me all the chapters for a course it's this in operator, so let's try that.
+03:43 So there we go, apparently 69 thousand have a prime rating at some point
+03:49 not that that means anything, but it shows you how these operators work.
--- a/transcripts/ch4-mongo-shell/8.txt
+++ b/transcripts/ch4-mongo-shell/8.txt
@ -0,0 +1,38 @@
+00:01 So here's the list of the quering operators, all the complex ones.
+00:04 So we saw that normally we pass these prototypical json objects,
+00:07 ratings.values is five, and that just doesn't exactly match,
+00:11 but we saw that that doesn't really solve all our problems,
+00:14 often we want ranges or properties like
+00:17 I want all of the ratings that are greater than eight, things like that;
+00:20 so instead of putting a number into that prototypical json element,
+00:23 we're going to put an operator, so we might say $eq for equality
+00:27 that's kind of the same thing, but the others are not,
+00:30 so $gt for greater than, greater than or equal to,
+00:32 lt for less than, less than or equal to,
+00:34 not equal to, so you could say I want to see all the ones where
+00:37 there's no vote, or no rating of value ten, right,
+00:42 there's no rating that has a value of ten.
+00:44 And we talked about the in operator, this is kind of your two step join process
+00:48 that we'll talk much more about when we get to the Python side of things,
+00:51 there's also the inverse of that negation, not in the set.
+00:54 So here's an example how we might use the greater than or equal to operator
+00:58 to find all the books that have a rating of nine or ten
+01:01 that are super highly rated by at least one person,
+01:04 remember this is not like every single one in there has to be this,
+01:07 but there exists of rating which is a nine or a ten.
+01:10 We also have some joining operators or some combining operators joining,
+01:14 so we can go in and say and pretty easily by just having
+01:18 all the properties we're looking for in a single document,
+01:21 but if for some reason these are coming from multiple places
+01:23 you can actually combine them with the and operator
+01:26 so that's nice, but what you really sometimes need is the or clause,
+01:30 I want this or that, and there's no way to do that those prototypical json objects
+01:35 but the or operator will let you do this.
+01:38 You also have not and nor, so neither of these in an or,
+01:41 sort of the negation of an or;
+01:43 now I recommend you check out this link at the bottom for each one of them,
+01:46 like so where the operator appears, does it appear to the right hand side
+01:50 of the property or field name or the left hand side,
+01:53 it kind of depends on the type of operator you're using,
+01:56 so you can just click on this or the and and so on
+01:58 and in the docks and it'll give you a little example.
--- a/transcripts/ch4-mongo-shell/9.txt
+++ b/transcripts/ch4-mongo-shell/9.txt
@ -0,0 +1,25 @@
+00:01 Now sometimes you don't want all the data back,
+00:03 usually it doesn't really matter to you if it comes back or it doesn't come back,
+00:05 in the shell you're printing it out, it probably matters,
+00:08 but in practice, in your app, you rarely care
+00:10 from a display perspective or an interaction perspective,
+00:14 whether some field or list that you are not using has data or not
+00:19 but from a performance perspective, you very much may care.
+00:22 Suppose that you have a document that's 50k in size
+00:25 and all you want back is the isbn and the title and those are 1k,
+00:30 and you're getting a bunch of them back,
+00:32 it turns out that that can make a really big difference in terms of performance.
+00:35 So whether it's for display purposes or it's for performance network purposes
+00:39 using this second argument here we can say
+00:44 only return the isbn and the title, and don't give me all of the ratings,
+00:48 don't give me the images, everything else that might be in this book.
+00:51 So we run this, and we get back these objects here, these documents,
+00:56 and notice, we have the isbn and the title, like we asked for
+00:59 but we also have the _id,
+01:01 so unless you explicitly forbid the id from coming back
+01:04 the id always comes, and everything else defaults to not appearing,
+01:07 unless you indicate it if you pass some document here
+01:10 for the projection or the restriction of things that come back.
+01:14 If for some reason you don't want the id to come back,
+01:16 just say_id:0 or false or something like this,
+01:19 and then it will just have isbn and title exactly.
--- a/transcripts/ch5-connecting-with-python/1.txt
+++ b/transcripts/ch5-connecting-with-python/1.txt
@ -0,0 +1,57 @@
+00:01 All right, the moment you've probably been waiting for is finally here,
+00:04 we're going to start moving away from Javascript
+00:06 and doing Python for the rest of this course to talk to MongoDB.
+00:09 That doesn't mean we might not use the Javascript API in the shell,
+00:12 just a little bit more, but for the most part
+00:14 we're going to focus now on writing applications
+00:17 that talk to and work with MongoDB.
+00:19 So we're going to look at in MongoDB's nomenclature
+00:23 something called a driver, so a driver is the underlying library or framework
+00:27 that you used to talk between your application and MongoDB.
+00:30 So here we've got our web app
+00:33 and it's going to be using the database MongoDB here.
+00:35 A request is going to come in, into our web app
+00:38 and it's going to use a particular package, right,
+00:40 this is not built into Python, this is something we have to go out and get.
+00:43 So the package that we're going to work with
+00:45 is built and maintained by MongoDB themselves, and is called PyMongo.
+00:50 So this is the core, lowest level access to the database server
+00:54 and it does the tone of things for us,
+00:56 in fact if you look at many of the odms the object document mappers
+01:00 the equivalent of the NoSql orm, they build upon PyMongo, right
+01:04 so PyMongo is almost always involved
+01:06 when you're talking to MongoDB from Python.
+01:09 And it does many things for us, it connects to the database
+01:12 whether it's local, remote, over ssl, with authentication, with certificates,
+01:16 all that kind of stuff, it actually manages replica sets
+01:20 so it knows how to find all the different servers participating in a replica
+01:24 and do the fail over if one fails,
+01:27 it knows how to go over to the other one, things like that;
+01:30 it also knows how to deal with sharding,
+01:32 so maybe you have a cluster of ten MongoDB servers
+01:36 that are all managing part of the data
+01:38 and then participate as a group in the queries, PyMongo does that for us,
+01:41 this is generally where you do the crud operations,
+01:44 the find, insert, update, delete, and those kinds of things;
+01:47 you do the other admin stuff as well,
+01:49 like drop tables or create indexes and so on,
+01:53 and it even does connection pulling,
+01:55 so really this does all the stuff that you need to talk to MongoDB
+01:58 and the api is very, very, very similar to what we saw with the Javascript API
+02:03 which is why I didn't skim over it, I wanted to say, okay,
+02:06 you really learned the Javascript api,
+02:08 now you basically also know the PyMongo api,
+02:11 findOne with a capital O, no spaces,
+02:13 is now find_one, with a lower case o, for example,
+02:17 there's a few variations for like say pythonic naming
+02:20 but other than that, PyMongo is going to sound
+02:22 and feel very, very familiar to you at this point.
+02:26 Like many things from MongoDB, PyMongo is open source
+02:29 so you can come over here to github.com/mongodb/mongo-python-driver,
+02:35 and that is PyMongo.
+02:37 So you'll see that you can go look around,
+02:40 you can see it's under active development and things like that,
+02:43 a lot of stars, so this is like I said, the official driver
+02:45 but you also have access to the source, right here.
+02:48 So now that we know about PyMongo,
+02:50 I hope you're ready to go write some code.
--- a/transcripts/ch5-connecting-with-python/2.txt
+++ b/transcripts/ch5-connecting-with-python/2.txt
@ -0,0 +1,128 @@
+00:01 So finally we're here in our github repository for our demos,
+00:04 we have something to share, so I have the source folder here
+00:07 and let's start with this play around PyMongo.
+00:09 Now, throughout this course, we are going to build what I think
+00:12 the pretty comprehensive demo that we're going to work on it for a few hours,
+00:15 it's going to have tons of data, and we're going to consider
+00:18 both the design and the performance of the database.
+00:20 But for PyMongo, let's just sort of fool around a little bit here
+00:23 and then when we get to MongoEngine, we will take on our proper demo there.
+00:27 So we'll begin by opening this in PyCharm,
+00:30 do that little drag and drop trick in MacOS,
+00:34 but on Windows and Linux you've got to say open folder.
+00:39 All right, everything is loaded up,
+00:42 and I have created a virtual environment in here
+00:45 a Python 3.6 virtual environment, you can run wherever,
+00:48 but that's the one I'm using;
+00:50 now, let's start by adding a file here, so we'll just call this program,
+00:53 we won't do too much structuring and refactoring
+00:56 and organizing for this particular demo, we will of course for our proper demo.
+01:02 So, before we can do anything, we just want to type import PyMongo,
+01:06 this is not going to turn out well for us, we'll go over here and try to run this,
+01:10 nope, there's no module named PyMongo, so let's go fix that.
+01:14 If we all open up the terminal in PyCharm,
+01:17 it's going to automatically find that virtual environment and activate it for us,
+01:20 okay, you can see the prompt says .env,
+01:23 that means that we have our virtual environment active,
+01:27 so let's see what is here— not so much, just to be safe
+01:32 let's go ahead and upgrade setuptools
+01:39 why are we doing that— because PyMongo actually use a C extensions
+01:43 and depending on your system, sometimes setuptools
+01:46 has a little better chance of compiling those, if you have the latest version.
+01:49 It doesn't always work that way, and it has a way to fall back to just pure Python
+01:54 but the C extensions do make it faster, so that's worth checking out.
+01:58 Alright, so we can pip install PyMongo, now things are looking good,
+02:05 let's try a program again, code zero, that means happy, zero is happy.
+02:10 Alright, so we are able to create, or basically import the library,
+02:14 now the thing we've got to do is we could just go and create what's called a client
+02:17 and use all the default settings, but in a real app
+02:20 you're probably not going to talk to an unauthenticated local database server,
+02:25 you're probably talking to one on another machine,
+02:27 maybe there's security, maybe there's ssl, whatever.
+02:30 So let's go ahead and set up the connection string
+02:32 even if you have like sharting, a replication,
+02:34 all these things require a connection string.
+02:35 So let's go over here and create a connection string
+02:37 and we'll just put the default values,
+02:39 so they always start with the scheme mongodb:// like so,
+02:43 and then local host, and then 270017,
+02:47 so this is sort of the default local host sets the default port,
+02:52 it's running locally and the scheme is always here.
+02:55 We'll talk about how you can add things like authentication and ssl and what not there.
+03:00 So the next thing we need to do is create what's called a mongo client.
+03:03 You can work with connections directly from PyMongo, but you shouldn't—
+03:08 why, because PyMongo manages connection pulling for you and reconnect
+03:13 and all these different things, so if you work with a client
+03:16 it goes through the connection pulling and that kind of stuff,
+03:19 if you work with the connection directly, you're kind of locking yourself
+03:21 into that single connection which is not the best.
+03:24 So we're going to create a pymongo.MongoClient, like this
+03:28 I want to give it the connection string like so;
+03:32 now, the way this works, this is basically the equivalent of opening up the shell
+03:36 the way it worked in Javascript was, we said use a database,
+03:40 in Python it's a little bit different, in Python we say
+03:44 the database is client. make up a database name,
+03:49 literally I could put TheFunBookStore here
+03:53 and now this would actually start working with the database called exactly that,
+03:57 we do case sensitivity in MongoDB.
+04:00 so let's just call this the_small_bookstore,
+04:04 okay because we're just going to poke around at it
+04:06 we're not going to work with that big set of data that we had before yet
+04:08 and we're also not going to work with our main demo.
+04:10 So let's call it the_small_bookstore.
+04:13 Now let's go over here and say insert some data
+04:17 it's not fun to have a database with no data, right,
+04:22 in fact, let's just really quickly have a glance over here
+04:27 if I connect, notice there is no the_small_bookstore,
+04:30 refresh, no, no small bookstore, okay, so this act here almost creates it,
+04:36 when you do a modifying statement against this thing you'll see that it does.
+04:40 So let's go over here to books, let's make it a little more explicit,
+04:43 I'll say db. so it looks like the Javascript api.
+04:46 So db.books is what we are going to call it,
+04:50 we'll say insert and what you want to insert, let's say title,
+04:54 now this is not Javascript, this is not json,
+04:56 this is Python dictionaries so you've got to make sure you have the quotes
+04:59 but otherwise it's really really simple.
+05:02 The first book, and let's say it has a isbn,
+05:06 let's just put some numbers in there like that
+05:10 and let's do another one, we'll say the second book
+05:14 it's going to have an entirely different isbn
+05:18 and while we're at it, let's say go over here and print out the results
+05:22 and let's do it again, we'll grab the value and let's print out
+05:32 r.inserted_id, so here let's take a look at the whole thing
+05:36 and we'll even print out the type of r,
+05:38 and then the thing that we are usually interested with here is
+05:42 when you're doing an insert, remember the _id thing was generated
+05:47 well what was it, what if you want to actually say I inserted it
+05:50 and here's the idea of the thing I created for you, somewhere in your app
+05:54 alright, so if we capture the response we can check out the inserted_id
+05:59 ok so let's go and run this real quick.
+06:02 Oh whoops, no this is actually just the id, sorry,
+06:06 if you do a bulk answer, I believe you get this
+06:09 or you could do, we can come over here and say insert one
+06:14 be a little more focused, now if we insert one we'll have our inserted id,
+06:19 let's make this third and the fourth book and make a little change here,
+06:25 there we go, one more time, perfect okay,
+06:29 so if you do an insert one we get an inserted one result
+06:32 which is in results insert one result, and here you can see the inserted id
+06:37 so we've inserted some stuff, let's go look back at our data base here
+06:40 we should have now, if we refresh it we now have the_small_bookstore,
+06:45 if we go to the collections we have our books
+06:47 and we look in the books, that should not be super surprising right,
+06:50 those are the things we just inserted,
+06:53 okay so now, let's go over here and do a little test
+06:57 we'll say if db.books.count is zero, we'll print inserting data
+07:06 and like this, we'll say else print books already inserted skipping
+07:15 and maybe even spell that right huh?
+07:19 Now we run it, nope, there's already books in here
+07:23 we're not going to insert duplicate books, so that's all well and good,
+07:27 so we've gone over here and we've connected to the database,
+07:31 we've created a client using the connection string
+07:34 and trust me this can get way more complicated
+07:37 to handle all the various complications and features of MongoDB,
+07:42 and once we have a client we say the database name
+07:43 here I've aliased it to db so it looks like the Javascript api
+07:47 or the shell api you're used to working with, and then we work with the collection
+07:51 and we issue commands like find and count and insert, insert one and so on.
+07:56 So now we have some data, let's go maybe do a query against it,
+07:59 maybe make some in place updates things like that.
--- a/transcripts/ch5-connecting-with-python/3.txt
+++ b/transcripts/ch5-connecting-with-python/3.txt
@ -0,0 +1,66 @@
+00:01 Let's look at how we can do some basic crud operations
+00:03 and connect to MongoDb with Python via PyMongo.
+00:06 So if we're going to use PyMongo, let's start by importing PyMongo,
+00:10 and I'm going to not import the items or the classes out of this
+00:14 but actually just the module and use the name space style
+00:17 to make it really clear where this stuff comes from.
+00:19 Actually I like to do this in a lot of my programs, even in production.
+00:23 So we import PyMongo, and then we have to create a connection string
+00:26 and feed it off to the pymongo.MongoClient, right
+00:30 so this is a concrete class in PyMongo,
+00:33 and we can give it any sort of connection string,
+00:36 in fact if you give it no connection string, I think it'll use
+00:38 what I have written here basically, no auth, no ssl,
+00:41 local host 27017 which is the default standalone MongoDB port.
+00:45 Alright, so this is cool, we've got our client here,
+00:47 and now then it gets a little bit trippy,
+00:49 a little bit dynamic here, which is kind of fun.
+00:51 So the next thing we're going to do,
+00:53 is we are going to go to the client, we're going to say . some database name,
+00:56 not table name, database name.
+00:58 Now, this thing doesn't even have to exist at this point
+01:01 this, as you saw on the demo, is actually how we created
+01:04 this database called the_small_bookstore,
+01:06 we just said db = client.the_small_bookstore
+01:09 and by basically saying that it exists, or implying that it exists
+01:12 it's going to since we do some kind of write, or modifying operation to it.
+01:16 Ok, so just be aware that this is case sensitive, right,
+01:19 so capital T capital S capital B, would not be
+01:22 the same database as lower case t s b.
+01:25 Right, so let's go, and now we're going to actually do
+01:27 a lot of things that look extremely similar to what we saw in the Javascript shell,
+01:31 that's why I spent so much time in that section
+01:33 it's because the apis are so, so similar at this level.
+01:36 So now we can just operate on the database via collection
+01:40 so just like we said client.database name,
+01:42 we're going to say db . collection name
+01:44 and those collections also don't necessarily have to exist,
+01:47 even for queries, if they don't exist, you just get nothing back that's not an error.
+01:51 So for example, we can do a query against the books collection
+01:55 and ask how many there are, so db.books.count
+01:58 and that'll tell us how many books there are
+02:00 and like I said, even if the database doesn't exist,
+02:03 if the collection doesn't exist or both, it's still going to work,
+02:05 it will just return zero, because guess what,
+02:07 there are no books in the nonexistent database.
+02:10 We could do a find_one and this will pull back just one item
+02:14 by whatever the default sort the MongoDB happens to be using
+02:17 and we can say find_one and give it
+02:21 one of these prototypical not json but Python dictionary type of objects.
+02:25 Now this find one is the first place where we're seeing the Python api
+02:28 ever so slightly vary from the Javascript api;
+02:31 in Javascript it's findOne, and in Python it's find_one
+02:38 and they've adapted the api to be pythonic, right,
+02:41 it would look weird to say findOne,
+02:44 but just be aware that they're not identical, you kind of have to keep in mind
+02:47 which language you're working in, but other than that,
+02:49 what you feed to it and how they work it's more or less the same.
+02:52 If we want to insert something we say db.books.insert_one
+02:57 and then we give it the document to insert
+02:59 and we get a result and we saw that the result actually comes back
+03:03 and has an inserted _id and the inserted _id is the generated id of the thing
+03:09 that was autogenerated in the database, notice we didn't pass _id,
+03:14 but if we care we can get it back for whatever purpose.
+03:17 When working at higher levels with like MongoEngine,
+03:19 this will automatically just happen on the class
+03:21 and get set we won't have to worry about it.
--- a/transcripts/ch5-connecting-with-python/4.txt
+++ b/transcripts/ch5-connecting-with-python/4.txt
@ -0,0 +1,60 @@
+00:01 So in our example we saw that we pass a connection string
+00:03 to the Mongo client and it was super simple,
+00:05 it was just the MongoDB scheme and local host and the default port,
+00:09 like I said, we could even omit the the connection string,
+00:12 I believe it would still be totally picking all the defaults.
+00:15 So let's look at some non default options.
+00:18 So here, if I want to connect to a remote server
+00:21 and I've either put some kind of dns records somewhere
+00:25 or I've just hacked my local hosts file to say
+00:27 there's a thing called mongo_server which is maybe within
+00:31 a virtual private network or at least in the same data center zone,
+00:35 if I'm doing cloud hosting like a Digital Ocean or something like this,
+00:39 and if I want to connect it on the default port, which is still 27017,
+00:44 I could just say mongodb://mongo_server, and then we could connect that way.
+00:48 Well, maybe you want to connect on an alternate port,
+00:52 so port 2000, instead of 27017, this is probably a good idea,
+00:56 there's a lot of people scanning the internet for open MongoDB ports,
+01:01 27017, 27018, up to 20020 I believe,
+01:06 it's probably the range that they're looking at,
+01:08 because different services run on different ports,
+01:11 like replication versus sharding versus whatever.
+01:13 So you probably don't want to run on that port,
+01:16 and when we get the deployment section,
+01:18 we'll look at all the steps we need to take in order to make our server safe,
+01:21 so be sure you do not put MongoDB in production
+01:25 until you watch that chapter at the end of the course,
+01:27 but let's just assume that one of the things we might want to do is
+01:30 run on a non default port, we just obviously like any web address type thing,
+01:34 we just say mongodb://mongo_server:2000
+01:38 okay great, so now we have a separate server on a non default port
+01:42 we probably want to have authentication
+01:44 so if we had a user name and password
+01:47 again we'll talk about this in the deployment section at the end
+01:49 we would have jeff:supersecure, so user name jeff
+01:53 ultra secure password is supersecure, and then we can have everything else.
+01:57 And if we wanted to talk to a replica set, so this is a set of cooperating
+02:02 duplicated fail over MongoDB servers that can be working together
+02:07 so in case one of them goes down,
+02:10 or you have to take one offline for some reason,
+02:12 it will just switch over and a different server will become the primary
+02:16 and start to store the data.
+02:18 This doesn't lead to eventual consistency and things like that,
+02:20 there still is one primary place things go to,
+02:22 but depending on how the state of the cluster is,
+02:25 it could be any one of these replicas, and the replica sets.
+02:28 So here we would say server one port one, server two port two,
+02:31 server three port three— well, the first two are actually
+02:34 both running on the same machine, so in case the process dies
+02:37 but we also have a separate server, Mongo server two
+02:39 that is running on a different port as well,
+02:41 in fact, this might not be all of the replica sets,
+02:44 all the servers in the replica set, this might just be sufficiently many,
+02:48 so that once it connects it finds all the others,
+02:50 and then it will start participating in all of them.
+02:52 And we also need to say replicaSet=prod
+02:55 or whatever we're calling a replica set.
+02:57 So we have all these options in terms of connection strings
+02:59 and then once you have this, well you pretty much use it the same way,
+03:02 you create a client by passing the connection string off to it
+03:05 and it figures out all the details for you.
--- a/transcripts/ch5-connecting-with-python/5.txt
+++ b/transcripts/ch5-connecting-with-python/5.txt
@ -0,0 +1,88 @@
+00:00 So back to our example, we've inserted some data
+00:03 and we have this little guard here to say
+00:05 don't insert duplicate data, things like that,
+00:07 so let's make some changes to our book here.
+00:10 Let's first of all change the title of the third and fourth book,
+00:14 let's just change this mess with this book for example,
+00:16 let's change this to like this, third book like so, all right;
+00:20 so we have two ways to do this, one way would be to
+00:23 pull back the entire document, work on it and push it back,
+00:27 and this is what I think of as the orm style of working.
+00:30 So we'll say book = db.books.find_one, let's do find_one here
+00:36 and we're just going to give it the isbn that we have there.
+00:39
+00:44 Let's just do a quick little print out of the book
+00:46 and just so you understand what we're working with
+00:48 we'll also print out the type, so if we run this,
+00:50 we obviously get the book back, super, and you can see it is a dictionary, cool;
+00:55 so, I said I want to change the name here,
+00:57 let's actually change something slightly different,
+01:00 so we can work with some more advanced features.
+01:02 What I want to do is I want to add the ability to have a user like favorite this book
+01:06 and this might not be a good way to do it,
+01:08 I haven't really thought it through because it's just a toy example,
+01:11 but let's suppose we want to have the book store the ids
+01:15 of the people who have favorited it, in practice maybe it's better
+01:17 to have the user accounts store the ids of the books
+01:20 that they individually favorited, but the mechanics would be identical.
+01:23 So how we're going to do that? Well, to this book, I'm going to add
+01:27 something called favorited_by, and this is just going to be an empty list here.
+01:36 Then any time we want to work with it, we can come over here and say
+01:41 .append the user 42 did this, and then we can say
+01:47 db.books.update and give it a little query here so we would say the id
+01:52 and that's got to be, once we're in Python that's got to be in quotes,
+01:56 say book.get_id,  it's going to be the value there
+02:02 and then what we're going to put back is just this book,
+02:05 and let's just one more time after this get it back and print out book,
+02:08 this should make sure that everything went sort of round trip just fine.
+02:12 Ready? All right, look, oh yeah look at that, we got a favorited_by right there, 42.
+02:17 If we run it again, now we won't need to do this,
+02:23 we can run it again with 100, now we have two people,
+02:28 two user ids who have favorited this and so on.
+02:31 Okay, so this is all pretty well and good, but let's do something better,
+02:35 sometimes it makes sense to go and pull a whole document back,
+02:38 look at it, make changes to it and save it.
+02:40 In fact, that's something you'll do quite often,
+02:42 but in this case, we just kind of want to say add this little id here
+02:47 to this list called favorited_by and maybe it doesn't even exist.
+02:53 So let's do this, let's a copy this again and change this,
+02:57 so now we're not going to use that, we'll use our isbn
+03:00 and let's modify book four here, so this does not even have a favorited_by yet.
+03:06 Let's put this in here, so we're going to modify that
+03:09 and then let's actually also get it back and print it out at the end;
+03:13
+03:25 there we go, so we're going to get the book back
+03:27 but we're not going to pass the whole book
+03:30 we're going to use one of these in place operators;
+03:32 remember add to set, so what we're going to do is we're going to use add to set.
+03:36 So in Javascript we could type this really in the shell we can type this $addToSet
+03:41 but obviously, PyCharm is telling us not super good Python
+03:44 so what we got to do is put that in quotes,
+03:48 and then the value, we can have actually multiple stuff here,
+03:51 so we're going to say favorited_by, and then the thing let's add user id 101,
+03:58 now, this seems to be telling me I've got something a little bit off here,
+04:02 yes, so we need that to be the entire update document;
+04:06 ok, what we're going to do is we're going to say go find this document,
+04:10 this book with this id, which is notice, it ends in 73,
+04:14 this is going to be book four, actually let me comment this out really quick
+04:18 and we'll just print out, 73 rather, print out notice there's not even a favorited by yet.
+04:23 So what we're going to do is we want to go add this id here
+04:28 so it should actually create this list
+04:31 and then put 101 in it let's see if that's going to work.
+04:34 Boom, favorited_by 101, and this time we did not pull it back
+04:38 we used one of our cool operators.
+04:41 Now, if this was just push, dollar push is another sort of equivalent,
+04:45 this would have more and more and more 101s,
+04:49 but add to set, I should be able to run this code over and over and over
+04:55 and 101 is already there so it's not going in,
+04:57 it's better if I say 120, now I run it, now we have those two right,
+05:02 so this add to set is super nice, I don't even need to go to the database and go
+05:05 well are they there, no they're not there, ok then I'm going to add them.
+05:08 All right, so I don't even need to do that check,
+05:10 I can just use this cool little add to set operator, very very nice.
+05:14 So here's how we use the in place operators,
+05:16 there's really not much difference other than we have to put more stuff in strings
+05:21 because it's not the shell, it doesn't have like the special understanding
+05:26 of what those mean and even over here,
+05:28 it's not Javascript, it's Python dictionaries,
+05:30 which those keys there need to be strings in this case.
--- a/transcripts/ch5-connecting-with-python/6.txt
+++ b/transcripts/ch5-connecting-with-python/6.txt
@ -0,0 +1,44 @@
+00:01 Let's review the ideas behind these in place updates.
+00:03 So here we have more or less a complete MongoDB Python program
+00:06 using PyMongo here, so we're going to import PyMongo,
+00:09 connect the local database, all the default options,
+00:11 and we're going to either create or get access to the bookstore
+00:16 by saying client.bookstore, now we're going to insert an object
+00:19 that has no favorited by element, right no list, it just has a title and isbn,
+00:24 so after the insert, we're going to end up with an _id and a title and an isbn.
+00:28 And then maybe we want to add this idea of favorited by,
+00:34 maybe you want to design this already that way
+00:36 and have an empty list there, but whichever more or less would work the same,
+00:41 so we can say I would like to go find the book,
+00:43 the first part of our update statement is the where clause,
+00:46 so find by primary key and remember, that's when we call insert_one
+00:50 that's results.inserted_id, so that's going to find the one and only the item
+00:55 and then we're going to use the add to set operator
+00:58 and we just pass that as a string in PyMongo,
+01:00 and then we'll push on favorited by such and such.
+01:03 We could also use $set to set, say $setTitle: the new book with updated title,
+01:10 or something like this right, so you can use this all over the place
+01:14 and what's really cool, now you may be thinking oh this api is kind of crazy,
+01:18 we've got these these dollar operators
+01:21 and it's a lot to learn if you're totally new to it, I realize
+01:24 but when we get to MongoEngine, you'll see that
+01:26 MongoEngine does this transparently under the cover for us,
+01:29 so you can actually not have to do this,
+01:31 you won't have to necessarily remember all of these
+01:34 but you'll get all the benefits that we're describing here.
+01:36 If you're using PyMongo, you have to know the api really intimately
+01:38 so we're going to push this 1001 user id on to favorited by
+01:42 and maybe we'll push 1002 as well
+01:46 if people signed up at the same time, they saw the same book, they loved it
+01:50 and let's go head and push this 1002 again,
+01:52 well not the push operator, but the add to set operator,
+01:54 do this again, because it's add to set
+01:56 we're going to get a new document that has new book title,
+02:00 the same isbn and two items and it's favorited by
+02:03 and it's going to be 1001 and 1002, because add to set is item potent
+02:07 calling it once or calling it a hundred thousand times,
+02:10 it has the same result, other than it might take longer
+02:13 to call it a hundred thousand times, right.
+02:15 So if it's already there it makes no difference
+02:16 but if it's not there to push it in super cool operator,
+02:19 really taking advantage of the hierarchical nature of these documents.
--- a/transcripts/ch5-connecting-with-python/7.txt
+++ b/transcripts/ch5-connecting-with-python/7.txt
@ -0,0 +1,40 @@
+00:01 Now, when you go to mongodb.com and you look through the documentation
+00:04 so docs.mongodb.com, you will find stuff about updates and inserts,
+00:08 and queries and aggregation and so on, and so on;
+00:11 all of these are going to be in the Javascript api,
+00:14 notice at the bottom of this web page here, db.collection.insertOne
+00:18 is new in version 3.2, so if you're trying to look up these operations
+00:22 you will most likely find them in the Javascript style, and the Javascript api,
+00:28 that's how MongoDB talks about it, you'll probably find them on Stack Overflow.
+00:31 So, because that's the way the shell works, MongoDB is kind of standardized
+00:36 on here is how we're going to do our documentation in Javscript,
+00:39 once again, yet another reason we spent so much time on the Javascript api,
+00:42 even though none of us are necessarily Javascript developers.
+00:46 So, here we have the crud operations, now we have the query
+00:48 and projection operators and things like that,
+00:52 so if you want to know how to map these over to PyMongo,
+00:56 then there's one page really that you need for most things,
+01:00 and that's the collection documentation.
+01:03 So over here at api.mongodb.com/python/current/api/pymongo/collection.html
+01:10 you can see right at the top, we've got all of the stuff you can do
+01:13 on the collection itself, so for example, we were passing one and minus one
+01:18 as the sorting operators in the shell,
+01:20 here you could say pymongo.ascending, pymongo.descending,
+01:22 a little bit more explicit, but this is a really good place to go
+01:25 because you'll find like the insert_one and the find_one and all the various ways
+01:30 in which you need to adapt the documentation you find in Javascript
+01:34 over to the PyMongo api, this is probably the biggest bang for the buck right here.
+01:38 Okay, so if you want to write an app,
+01:41 PyMongo could totally be your data access layer,
+01:44 it would completely solve the problem, it's really great,
+01:47 it's what a lot of applications use to talk to MongoDB from Python.
+01:50 We're going to talk about some additional things going forward
+01:53 but one of the bigger decisions you need to make is
+01:56 are you going to use an odm that maps classes to MongoDB,
+02:00 with additional features as we'll see in a lot of interesting ways,
+02:03 or are you going to work down at the dictionary level,
+02:06 it's very similar to say I'm going to work with say the DB api and sql strings,
+02:10 versus SQLAlchemy or Django orm or something like that, right.
+02:15 So, you kind of got the low level way to talk to MongoDB,
+02:18 now, we're going to move on to talk about document design
+02:21 and mapping higher level objects like classes with MongoEngine later In the course.
--- a/transcripts/ch6-modeling-data/1.txt
+++ b/transcripts/ch6-modeling-data/1.txt
@ -0,0 +1,72 @@
+00:01 We've come to a pretty exciting part in the course,
+00:03 we're going to talk about document design
+00:05 and modeling with document databases.
+00:08 So let's take a step back and think about relational databases.
+00:11 There is in fact a couple of really systematic, well known,
+00:15 widely taught ways of modeling with relational databases;
+00:20 there's still a bit of an art to it, but basically it comes down to
+00:24 third normal form, first normal form, some of these well known ways
+00:29 to take your data, break them apart, generate the relationships between them,
+00:33 so if we're going to model like a bookstore with publishers
+00:36 and users who buy books at the bookstore,
+00:38 and they rate books at the bookstore, it might look like this—
+00:41 we have a book, the book would have a publisher,
+00:44 so there is a one to many relationship from publisher to books,
+00:47 you can see the one on the star and the little relationship there,
+00:50 and we have some flat properties like title and published
+00:52 and publisher id for that relationship, and similarly,
+00:55 we have a navigational relationship over to the ratings,
+00:58 so a book is rated, so the ratings would have almost normalization table
+01:03 or many to many table there has the book id and the user id
+01:06 and then the value and we just happen to have a auto increment id there,
+01:10 it's not necessarily the way we have to do it,
+01:13 we could have a composite key, we've got our user
+01:15 and the user can go navigate to the ratings, and things like that.
+01:17 Now, of course, this is a very simplified model
+01:20 in a real bookstore with real ecommerce happening and all that
+01:23 and categories and pictures and all those things,
+01:26 this would be way more complicated,
+01:28 but the whole idea going forward is going to be pretty similar
+01:30 and I think keeping it simple enough that you quickly understand the model
+01:34 and don't get lost in the details, is the most important thing here.
+01:37 So this more or less follows third normal form here.
+01:40 in terms of how we're modeling this in the relational database.
+01:44 Could we move this to MongoDB, could we move this to a document database—
+01:47 sure, we could have exactly the structure.
+01:50 Now those relationships, those are not full on foreign key constraints,
+01:52 those would be loosely enforced, not enforced in the database
+01:56 but enforced in the app relationships between the what would be collections;
+02:00 but certainly, we could do this, is it the best way though?
+02:03 The answer is usually not, maybe, but probably not.
+02:07 So what we're going to focus on now is how do we take our traditional knowledge
+02:12 of modeling databases and relational databases
+02:14 and how does that change, what are the trade-offs we have to deal with
+02:18 when we get to a document database.
+02:20 So the good news is, usually things get simpler in document databases
+02:24 in terms of the relationships, you might have
+02:27 what would have been four or five separate tables with relationships,
+02:31 it might get consumed into a single item,
+02:34 a single collection or single document really,
+02:36 so here this is how we're going to model our bookstore
+02:40 that we just looked at in third normal form, but now in a document database.
+02:43 And really, the right choice here comes down to
+02:46 how is your app using this data, what type of questions do you usually ask,
+02:50 what's the performance implications, things like this.
+02:53 So now we have a books, we have a publisher and a user
+02:56 and these have similar top level items,
+02:58 and we do have some traditional relationships.
+03:01 So there's a one to many relationship between publisher and books
+03:05 theoretically we can embed the book into the publisher
+03:08 but there's many, many books for some publishers
+03:10 and that would be really a bad idea;
+03:12 so we have this traditional relationship, like you might have in a relational database.
+03:15 Now again, not enforced by Mongo, but enforced by your app, so same basic idea.
+03:19 Next up, we have the ratings, remember we have that
+03:22 like many to many table from users to book ratings,
+03:26 now that has actually moved and now we're storing these items
+03:30 in an embedded array of objects inside the book table, or the book collection.
+03:35 So now each book has a ratings array, it has the number of ratings,
+03:39 those are just put right in there, so is this the right design— maybe,
+03:43 it's certainly a possible design, and it's the design that we're going to go with
+03:47 for our examples, but we'll talk about when it's actually the right design.
+03:51 And I'll help you make those trade-offs next.
--- a/transcripts/ch6-modeling-data/2.txt
+++ b/transcripts/ch6-modeling-data/2.txt
@ -0,0 +1,121 @@
+00:01 When it comes down to modeling with document databases
+00:04 you apply a lot of the same thinking as you do with relational databases
+00:07 about what the entity should be, and so on.
+00:10 However, there's one fundamental question that you often ask
+00:13 that really does take some thinking about maybe working through
+00:18 some of the guidelines, and that is to embed or not to embed related items.
+00:23 So in our previous example, you saw that we had a book
+00:26 and the book had ratings embedded within it,
+00:28 but we could just as well have the ratings be a separate table
+00:30 or the ratings could have even gone into the user object
+00:33 about reference back to the book, instead of the reverse.
+00:36 So should we embed that ratings, and if we do,
+00:40 does it go in books, does it go in users, or does it not go there at all.
+00:43 So what I'm going to do, is I'm going to give you some guidelines,
+00:46 these are soft rules, we don't have like a really prescriptive way of doing things
+00:51 like third normal form here, but some of the thinking there does help;
+00:54 so let's get into the rules.
+00:56 First of all, the question you want to ask is is that embedded data
+00:59 wanted eighty percent of the time that you get the original object;
+01:02 do I usually want the rating information when I have the book?
+01:08 If it would have resulted in me doing a join in a traditional database
+01:11 or going back and doing a second query to Mongo to pull that data out,
+01:14 it's very beneficial to have that rating data embedded in the book.
+01:19 We designed it that way, so let's suppose like most of our query patterns
+01:22 and most the way our application works is
+01:25 we want to list the number of ratings, the average number of ratings,
+01:29 things like this we want to surface that in almost all the time,
+01:32 we want that embedded data when we get a book.
+01:35 So that would guide us to embed the data, if this is not true,
+01:40 if you only very rarely want that data,
+01:42 then you most likely will not want to embed it,
+01:45 there's a serious performance cost for what you might think of as dead weight,
+01:48 other embedded stuff that comes along with the object
+01:51 that you generally don't care about most of the time,
+01:54 you can do things like suppress those items coming back,
+01:57 so you can basically suppress the ratings object,
+02:00 but if you are doing that, it's probably a sign like
+02:02 hey maybe I shouldn't really be designing it this way.
+02:04 A lot of considerations, but here's the first rule—
+02:07 do you want the embedded data most of the time?
+02:11 Next, how often do you want the embedded data without the containing document?
+02:15 The way our things are structured now is I cannot get the ratings
+02:19 without getting the books, I cannot get individual
+02:22 ratings without getting all of the ratings.
+02:24 So if what I wanted to do was on the user profile page
+02:27 show here are all of my individual ratings as a user
+02:31 listed on my like favorites page, or things I've rated or something like this,
+02:36 that's actually a little bit challenging the way things are written.
+02:39 We can definitely do it, and if there's just one
+02:41 query we do it that way it's totally fine,
+02:43 but this is one of the tensions, you can't get the ratings without getting the books
+02:47 you can't get individual ratings, without getting all the other ratings
+02:50 from that particular book, there's no way MongoDB
+02:53 to actually suppress that, I don't think, like you can suppress the other fields
+02:56 we're using a projection right, you get all the ratings, or none of the ratings.
+03:00 So how often is it necessary to get a rating without getting a book itself?
+03:04 Right, if that's something you want to do often
+03:07 or it's a very very hot spot in your application
+03:09 maybe again you do not want to embed it,
+03:11 if you want the object without the containing document.
+03:14 Another really important question to answer is
+03:17 is the embedded data a bounded set?
+03:19 If it is just a single nested item, fine, that's no problem,
+03:22 if it's a list or an array, like we have in the context of ratings,
+03:25 how big could the ratings get,
+03:28 how many ratings might a book have reasonably speaking;
+03:31 if there's ten ratings, it's probably totally fine
+03:34 to have the rating data embedded in the book,
+03:36 it's nice self contained, you get a little atomicity
+03:39 and some nice features of have it embedded there.
+03:41 If there's a hundred ratings, maybe it's good,
+03:45 if there's a thousand ratings, if there's an unbounded number of ratings
+03:48 you do not want to embed it, right so is it a bounded set, first of all
+03:53 and related to that, is the bounded set small,
+03:55 because every time you get the book back
+03:58 you're pulling all of that stuff off disk, possibly out of memory,
+04:01 over network for deserialization or serialization
+04:04 depending on the side that you're working with.
+04:06 So that comes with a cost, and in fact,
+04:08 MongoDB puts a limit on the size of these documents,
+04:12 you're not allowed to have a document larger than 16 MB,
+04:17 in fact, if you try to take a document that's larger than 16 MB
+04:20 and save it into MongoDB, even if you pull it back,
+04:23 add something it makes it a little bit bigger and you call save
+04:26 it's going to totally fail and say no, no, no this is over the limit.
+04:29 So this should not be thought of as like a safe upper bound
+04:33 this should be thought of as like the absolute limit
+04:36 if you've got a document that's ten megabytes,
+04:38 it doesn't mean like wow, we're only halfway there, this is amazing or great,
+04:41 no, that's a huge performance cost to pull 10 MB over
+04:46 every time you need a little bit of something out of there.
+04:48 So really, you should aim for a much, much, much smaller thing
+04:51 than the upper limit of 16 MB, but the point here is
+04:53 there is actually a limit where if this embedded data outgrows that 16 MB
+04:59 you just cannot save it back to the database,
+05:02 that's a will no longer operate problem,
+05:04 is the bound small is more of a performance trade-off type of problem, right,
+05:08 but you want to think about these very, very carefully,
+05:10 average size of a document is definitely something worth keeping in mind.
+05:14 How varied are your queries?
+05:17 Do you have like a web app and it asks like maybe ten really common questions
+05:21 and you very much know the structure,
+05:24 like these are the types of queries my app asks,
+05:26 these are the really hot pages and here's what I want to optimize for,
+05:29 or is this more of like a bi type thing where people and analysts come along
+05:34 and they can ask like almost any sort of reporting question whatsoever;
+05:38 it turns out the more focused your queries are,
+05:41 the more likely you are to embed data in other things, right,
+05:44 if you know that you typically use these things together,
+05:47 then embedding them often makes a lot of sense.
+05:49 If you're not really sure about the use case,
+05:51 it's hard to answer the above questions,
+05:53 do you want the data eighty percent of the time, I have no idea,
+05:55 there's all sorts of queries, some of the time, right,
+05:58 and so the more varied your queries, the more likely you are going to
+06:00 tend towards the normalized data, not the embedded modeling data.
+06:06 And finally, related to this how varied are your queries
+06:09 as are you working with an integration database that lives at the center
+06:14 and almost is used for inter-process, inter-application communication
+06:17 or is it very focused application database?
+06:19 We're going to dig into that idea next.
--- a/transcripts/ch6-modeling-data/3.txt
+++ b/transcripts/ch6-modeling-data/3.txt
@ -0,0 +1,65 @@
+00:00 In order to answer this question about whether you have
+00:02 an integration database or an application database,
+00:04 let's do a quick compare and contrast,
+00:07 especially in large enterprises, you'll see that they use databases
+00:11 almost as a means of inter-application communication,
+00:15 so maybe you have this huge relational database that lives in the center
+00:18 with many, many constraints, many, many store procedures,
+00:21 lots and lots of structures and rules, and so on,
+00:25 why— well, because we have a bunch of different applications
+00:28 and they all need to access this data,
+00:30 maybe the one in the top left here it needs users
+00:33 but so does the one on the right, and their idea of users is slightly different
+00:36 so this user is not like a real simple thing, it's really quite complex
+00:40 it's kind of the thing that will solve the user problem for all of these apps
+00:43 and so on and so on, through the constraints and the way you use it.
+00:47 This is a decent, well, it's typically a good role for relational databases,
+00:51 you're better off with other architectural patterns anyway,
+00:55 but relational databases are a good guarding against this kind of use case,
+00:58 they have a fixed schema, they have lots of constraints and relationships
+01:03 and they are very good at enforcing and kicking it back to the app
+01:06 and go no, you got it wrong, you messed up the data.
+01:08 So they can be like this strong rock in the middle.
+01:11 The problem with rocks is they're not very adaptable,
+01:14 they can't be massaged into new and interesting things;
+01:18 a rock is a rock, and it's extremely hard to change.
+01:21 So that's partly why some of these major enterprises
+01:25 will have like weekends where they deploy a new version of an app,
+01:28 like we're going to take it down and everybody's going to come in
+01:30 and we're going to release it;
+01:32 that is not a super place to be, it's also not a great use case
+01:36 for document databases with their flexibility in schema design,
+01:40 their less enforcement at the database level
+01:43 and more enforcement inside the app,
+01:45 because how is the app on the left going to help
+01:47 enforce things for the app on the right, that's not great.
+01:49 So, this is an integration database, and it's generally not a good use case
+01:53 for document databases, if you're still using that
+01:56 this sort of style of document databases, it means your queries will be more varied
+01:59 and you probably need to model in a more relational style,
+02:03 less embedded style, just as a rule of thumb.
+02:06 So what's the opposite? Well, it might look like this,
+02:09 we have all of our little apps again,
+02:11 and instead of them all sharing a single massive database
+02:13 you can maybe think of this is more like a micro service type of architecture;
+02:17 each one of them is going to have their own database
+02:20 and they're going to talk to it, and then when they need to exchange information
+02:22 we'll do that through some sort of web api,
+02:25 so they will exchange it through some kind of service broker way
+02:29 they like negotiate and locate the other services, right,
+02:32 maybe the one in the left is about orders,
+02:34 the one on the right is about users and accounts.
+02:37 So what that means though is each one of these little apps is much simpler,
+02:41 it can have its own database with its own focused query patterns,
+02:45 which is more focused, easier to understand,
+02:48 and the application can enforce the structure and the integrity at its api level,
+02:53 so this is a much better use case when you're sharing data with a document database.
+02:58 And in fact, this sort of whole pattern here means
+03:01 we don't have to make it NoSQL versus SQL choice,
+03:03 maybe three out of these six are using MongoDB,
+03:07 one is using a graph database and two are using MySQL,
+03:10 it's up to the individual application to decide what the best way
+03:13 and model basically with the best database and its underlying model is.
+03:18 So when we have an application database like this
+03:20 you are more likely to have slightly more embedded objects
+03:24 because the query patterns are going to be simpler and more focused and more constraint.
--- a/transcripts/ch6-modeling-data/4.txt
+++ b/transcripts/ch6-modeling-data/4.txt
@ -0,0 +1,152 @@
+00:01 So let's look inside the application that you're using right now
+00:03 to take this course as an example.
+00:06 So at the time of this recording, here's what the Talk Python training
+00:10 website database looks like for courses and users.
+00:14 So, first let's focus on the course side of things,
+00:17 there's a couple of interesting ideas here,
+00:19 one, we have an id which is not an object id, why is it not an object id,
+00:24 well, it was actually migrated from a relational database initially,
+00:27 this was using SQLAlchemy, and it was easier to keep this id here as a number
+00:33 rather than switch to MongoDB's object id,
+00:36 it's also easier to refer to it in other areas,
+00:39 like say in the commerce system I can put the id in without using,
+00:42 I don't have very much space in terms of the message,
+00:46 that can go into the e commerce system based on their api,
+00:49 so one is much easier than like 32 characters,
+00:51 so we're using the non standard id which is generated in the app
+00:55 but for these types of things, that is really no big deal,
+00:58 for the users, I think we might be using object ids.
+01:01 We have somewhat sort of flat things here, we have the url and the title
+01:04 and when it was published, things like that,
+01:07 so this is the Learn Python by Building Ten Apps Jumpstart Course
+01:10 and you can see a lot of the initial ideas here,
+01:13 and the initial pieces of data are totally straightforward
+01:16 and they would look exactly the same in a relational database.
+01:19 However, there's two things that are very different
+01:21 than I want to pull your attention to;
+01:24 first is not actually the embedded stuff, but is this duration in seconds,
+01:27 when I created the MongoDB version of this web app,
+01:31 I realized one of the things I do all the time on the home page,
+01:36 on the course listing page, and many many places,
+01:39 is I say how long is the course, this course is 6.5 hours,
+01:42 I think this one is 7.1 hours or something to that effect.
+01:46 Using quick math you can figure out duration in second.
+01:48 So there was actually a pretty serious bottleneck
+01:51 where I'd have to go and in this case pull back 12 chapters
+01:55 and then from the chapters I could get the lectures
+01:58 and from the lectures I could get how long each individual one was,
+02:01 I had that all up and then I could print out that number.
+02:05 And then I would do that for say like on the course catalog page,
+02:08 there was like ten courses, I would have to go through so many of these chapters
+02:13 and then their subsequent lectures, and that was a huge huge bottleneck.
+02:15 So what I decided to do was in the application,
+02:18 any time I save or update the course, I'm going to compute this on save
+02:22 which is extremely rare, and then I'm going to stash this here,
+02:26 so this is actually computed from the chapters
+02:29 which are computed from the lectures themselves,
+02:31 and this is data duplication, but you'll find that a little bit of data duplication,
+02:36 I find usually most apps is like one or two little pieces like this that
+02:40 just unlock a lot of performance
+02:42 because actually computing this turns out to be really really computationally expensive,
+02:47 but storing it here on this object made it super fast.
+02:50 So this is one thing, this data duplication
+02:53 which I try to stay away from as much as I can
+02:55 but the trade-off here was so worth it.
+02:57 Now, the other part we want to focus on is down here,
+02:59 we said I'd like to associate these chapter ids with a particular course,
+03:03 now if this was a relational database,
+03:05 I might have a course to chapter normalization table, right,
+03:09 it'd have the course id and the chapter id
+03:11 and I do some query some kind of join on that;
+03:14 you almost never ever, ever see that in MongoDB and document databases.
+03:19 Usually, at least the ids are embedded on one side of that, one to many relationships
+03:23 so here we have the course, the course has some chapters,
+03:27 so we're just storing the ids here.
+03:29 Now, we also have the chapters, you can see chapter 1001 goes right here
+03:35 and this one is a little bit more interesting,
+03:37 we've got again our duration in seconds
+03:40 which is another thing computed from if you look at the individual lectures
+03:44 they've got duration in seconds, and that's the real raw number.
+03:48 So this is another duplication, because at many, many levels
+03:51 I need to show the time of a chapter,
+03:53 and that was turning out to be computationally expensive at many levels,
+03:56 so again, these two places, this is the one bit of duplicated data
+04:00 and you will see that this is more common
+04:03 in a document database than in a relational one.
+04:05 So here we've got our chapter which has this soft relationship
+04:08 from the course over to the id,
+04:10 we also have the course id down there and below it,
+04:12 so it's kind of this bidirectional relationship;
+04:15 then we have lectures, and lectures is interested in that
+04:18 almost every time that we get a hold of a chapter
+04:22 we care about its lectures, we usually want to display them in a list
+04:27 any time that I get a lecture, this is the thing like you're watching right now,
+04:30 this is the lecture, right, an individual video let's say,
+04:32 any time you have one of those, you almost always need the other ones,
+04:36 at least the ones before and after it, so like if you look in this particular player
+04:40 you'll see there is a forward and a backward within the course button
+04:45 that you can skip ahead or skip back, that is the other lectures
+04:48 so what I find is grouping the chapter along with the lectures into one blob
+04:51 that makes it super fast and I almost always want the other lectures
+04:57 when I have one lecture, and if I have the lecture,
+04:59 I usually need to display the chapter title, and things like that.
+05:02 Anyway, so these are really well suited to be put together in this embedded style,
+05:06 so I don't have a lectures table, I have course, courses
+05:09 and I have chapters, and then in the chapters those are embedding the lectures,
+05:12 and we also saw that little bit of data duplication.
+05:15 So you can see down here is an individual embedded lecture,
+05:18 here's one that talks about doing the exercises
+05:20 in this course and it's apparently 202 seconds,
+05:24 so I hope this look behind the scenes has helped you understand
+05:28 how you might model this stuff, you can look at the course page
+05:30 and the player and think about some of the trade-offs,
+05:33 I don't know that this is perfect, but it is absolutely working well for the web app.
+05:37 Let's look at one more thing.
+05:39 Down here we have the users, and we have a couple of items
+05:41 that we're going to focus on when we get to the users,
+05:44 I have blurred some out, we're using object id now for the user id
+05:46 I covered the password and things like that,
+05:49 but we've got some flat stuff like whether or not you're opting out of email,
+05:52 what your user name is, what your email address is, things like that.
+05:55 And then, I have this concept of an origin,
+05:58 so if you come from like some particular marketing source
+06:01 it might record like hey this person created their account
+06:04 and they originally came from Facebook,
+06:06 this person originally came from the podcast or something like that,
+06:08 so that's pretty interesting, we also have the courses that you are taking,
+06:11 so right here, this particular person, this is me,
+06:14 so I gave myself basically all the courses,
+06:17 these are the ids of the courses that I am a student in,
+06:20 so again, there's not a users, there's not a courses in a user courses
+06:23 sort of normalization thing is very common that when I as a user
+06:29 am loaded into the database, I very often need to know about the courses.
+06:32 Now I can't easily embed the course into the user, right,
+06:36 that'd be like insane levels of duplication,
+06:38 but closest thing I can do is I can get this list
+06:40 and then I can go back and do another queer
+06:42 say give me all the courses where the course id is in this list of owned courses,
+06:46 so basically two queries I have everything I need.
+06:49 We also have the bundle id and some other things going on here.
+06:52 So that embedded course id, that's actually a list
+06:55 one more thing to look at down here is this preferences,
+06:58 so this is short name, somewhat short name,
+07:02 this is the preferences for your player
+07:05 so when you're in the video player, you can choose different qualities,
+07:08 you can turn on captions or you can turn off captions,
+07:12 subtitles, transcripts basically and you can choose a playback speed,
+07:15 it could be like .75 up to two or three or something crazy like this.
+07:19 One of the primary actions a user does on this site is to go through the course,
+07:25 each course might have 150 lectures
+07:28 so as a user, you come in you look round a little bit
+07:31 and then you go through 150 lectures,
+07:33 so this preferences thing needs to be pulled back frequently.
+07:36 And so we got to get the user anyway and embedding them together means
+07:39 it's basically instant access any time I'm in the player
+07:42 to figure out how to preconfigure the player
+07:46 to render your video the way that you like it.
+07:48 So this is an embedded item, but not an embedded list
+07:51 just an embedded preference object.
+07:53 So there you have it, a look inside Talk Python Training
+07:57 at least as it was when we recorded this,
+07:59 so hopefully this helps you think through some of the challenges
+08:03 of building a more realistic app.
--- a/transcripts/ch6-modeling-data/5.txt
+++ b/transcripts/ch6-modeling-data/5.txt
@ -0,0 +1,24 @@
+00:01 Let's close out this chapter with a few more sources
+00:03 you can get some patterns here;
+00:05 so recently I had Rick Copeland who is in the MongoDB masters program
+00:10 along with myself, and I had him on the podcast on episode 109
+00:14 to talk about applied MongoDB design patterns.
+00:18 So this concept of embedding and modeling
+00:20 and data duplication and all these things,
+00:23 certainly we talked about on the podcast, and he talks about in his book,
+00:25 but he has a lot of really interesting use cases
+00:28 and actually some performance trade-offs,
+00:31 using some of the atomic update operators, one versus the other or not at all,
+00:37 just to see how that might work out.
+00:40 So he's got a bunch of use cases and you might flip through his book
+00:43 once you really get into things and say does one of the patterns he talks about
+00:47 really closely match what I'm doing— you might get a huge jumpstart
+00:50 on modeling your data with actual performance numbers behind it.
+00:54 So check out the podcast, it's free
+00:56 and check out his book if you find it to be helpful.
+00:59 And final thought on modeling with these document databases is
+01:02 there is no perfect answer, it's always this tension of
+01:06 I could model it this way and this part of my app gets better,
+01:09 I could model it another way, and that part is not quite as good,
+01:12 but another part becomes more flexible or becomes better,
+01:14 so it's really about balancing the trade-offs, not right versus wrong.