From f8d917fcd12dc19911c3caab09ebfeeea3545085 Mon Sep 17 00:00:00 2001 From: Michael Kennedy Date: Mon, 9 Oct 2017 14:06:25 -0700 Subject: [PATCH] Transcripts for chapters 4, 5, and 6. --- transcripts/ch4-mongo-shell/1.txt | 71 ++++++++ transcripts/ch4-mongo-shell/10.txt | 39 ++++ transcripts/ch4-mongo-shell/11.txt | 56 ++++++ transcripts/ch4-mongo-shell/12.txt | 21 +++ transcripts/ch4-mongo-shell/13.txt | 27 +++ transcripts/ch4-mongo-shell/14.txt | 39 ++++ transcripts/ch4-mongo-shell/15.txt | 11 ++ transcripts/ch4-mongo-shell/16.txt | 90 +++++++++ transcripts/ch4-mongo-shell/17.txt | 59 ++++++ transcripts/ch4-mongo-shell/2.txt | 93 ++++++++++ transcripts/ch4-mongo-shell/3.txt | 31 ++++ transcripts/ch4-mongo-shell/4.txt | 182 +++++++++++++++++++ transcripts/ch4-mongo-shell/5.txt | 44 +++++ transcripts/ch4-mongo-shell/6.txt | 89 +++++++++ transcripts/ch4-mongo-shell/7.txt | 69 +++++++ transcripts/ch4-mongo-shell/8.txt | 38 ++++ transcripts/ch4-mongo-shell/9.txt | 25 +++ transcripts/ch5-connecting-with-python/1.txt | 57 ++++++ transcripts/ch5-connecting-with-python/2.txt | 128 +++++++++++++ transcripts/ch5-connecting-with-python/3.txt | 66 +++++++ transcripts/ch5-connecting-with-python/4.txt | 60 ++++++ transcripts/ch5-connecting-with-python/5.txt | 88 +++++++++ transcripts/ch5-connecting-with-python/6.txt | 44 +++++ transcripts/ch5-connecting-with-python/7.txt | 40 ++++ transcripts/ch6-modeling-data/1.txt | 72 ++++++++ transcripts/ch6-modeling-data/2.txt | 121 ++++++++++++ transcripts/ch6-modeling-data/3.txt | 65 +++++++ transcripts/ch6-modeling-data/4.txt | 152 ++++++++++++++++ transcripts/ch6-modeling-data/5.txt | 24 +++ 29 files changed, 1901 insertions(+) create mode 100644 transcripts/ch4-mongo-shell/1.txt create mode 100644 transcripts/ch4-mongo-shell/10.txt create mode 100644 transcripts/ch4-mongo-shell/11.txt create mode 100644 transcripts/ch4-mongo-shell/12.txt create mode 100644 transcripts/ch4-mongo-shell/13.txt create mode 100644 transcripts/ch4-mongo-shell/14.txt create mode 100644 transcripts/ch4-mongo-shell/15.txt create mode 100644 transcripts/ch4-mongo-shell/16.txt create mode 100644 transcripts/ch4-mongo-shell/17.txt create mode 100644 transcripts/ch4-mongo-shell/2.txt create mode 100644 transcripts/ch4-mongo-shell/3.txt create mode 100644 transcripts/ch4-mongo-shell/4.txt create mode 100644 transcripts/ch4-mongo-shell/5.txt create mode 100644 transcripts/ch4-mongo-shell/6.txt create mode 100644 transcripts/ch4-mongo-shell/7.txt create mode 100644 transcripts/ch4-mongo-shell/8.txt create mode 100644 transcripts/ch4-mongo-shell/9.txt create mode 100644 transcripts/ch5-connecting-with-python/1.txt create mode 100644 transcripts/ch5-connecting-with-python/2.txt create mode 100644 transcripts/ch5-connecting-with-python/3.txt create mode 100644 transcripts/ch5-connecting-with-python/4.txt create mode 100644 transcripts/ch5-connecting-with-python/5.txt create mode 100644 transcripts/ch5-connecting-with-python/6.txt create mode 100644 transcripts/ch5-connecting-with-python/7.txt create mode 100644 transcripts/ch6-modeling-data/1.txt create mode 100644 transcripts/ch6-modeling-data/2.txt create mode 100644 transcripts/ch6-modeling-data/3.txt create mode 100644 transcripts/ch6-modeling-data/4.txt create mode 100644 transcripts/ch6-modeling-data/5.txt diff --git a/transcripts/ch4-mongo-shell/1.txt b/transcripts/ch4-mongo-shell/1.txt new file mode 100644 index 0000000..2a2ee4f --- /dev/null +++ b/transcripts/ch4-mongo-shell/1.txt @@ -0,0 +1,71 @@ +00:01 So we've talked a lot about NoSQL document databases and MongoDB. +00:05 Now it's time to actually start using MongoDB. +00:08 So what we're going to learn in this chapter is twofold: +00:11 one, how do you connect to it and manage it, +00:13 with the management tools if you will, +00:16 that is more or less the shell, and some additional tools, +00:19 but also how do you query it from that shell. +00:22 So maybe in Python in a traditional relational database +00:25 you might be using say SQLAlchemy to talk to a relational databases, +00:29 so you wouldn't necessarily use SQL, the language, in Python +00:32 but if you want to connect to the database directly and work with it +00:35 then you need to use ddl and SQL and things like that, +00:38 there is the same parallel here in that we're going to use the shell +00:41 and we need to use MongoDB's native query syntax +00:44 which turns out to be very similar to Python's under certain circumstances, +00:47 so it's going to be serving dual purpose there. +00:50 So the primary MongoDB shell is a command line tool, right, +00:55 we just type mongo name of the server, some connection string options, +00:58 you can see all that the title here in this terminal. +01:02 And then we just issue commands like if I want to go and use +01:05 the training database out of the server, I'd say use training; +01:09 and if I want to say go the courses and find +01:12 the one with id 5 and display it not in a minimized, minified, +01:16 but in a readable version, I would say db.courses.find +01:20 and I'd give it the little json thing, id is 5 and I'd say pretty, +01:25 So this is going to be entirely done in Javascript, +01:28 so these statements that you type here, +01:31 although you don't see any semicolons, +01:33 these are either shell statements like use training +01:36 otherwise, they're entirely pure Javascript. +01:39 So what we're going to do is we're going to learn the Javascript api +01:43 to talk to MongoDB, to query MongoDB, +01:45 to do all the crud operations, there's a find, there's a delete, +01:49 there's an insert, there's an update, of course there's sorts, there's upserts, +01:52 there's all the things you would do in a standard database, +01:55 the query syntax uses sort of a json model to help represent +01:59 either operators or hierarchies and things like that. +02:03 Now, you may be thinking, Michael, I came to a Python course, +02:06 I don't want to learn the Javascript api, I want to learn the Python api— +02:09 you will, you will learn the Python api for sure, +02:12 and luckily, it's really, really similar, it's not identical, +02:15 they made the Pythonic api Pythonic +02:18 and the Javascript one follow the idioms of Javascript, +02:20 but nonetheless, other than the slight like variations +02:23 in naming around those ideas, they're basically identical, +02:26 in Python we would use {_id : 5 } as a dictionary, +02:31 here we use it as a json object; +02:34 so on one hand, learning the Javascript api +02:36 it is more less learning the Python api. +02:38 But on the other, if you work with MongoDB, +02:41 if this drives your application and you actually work with Mongo, in a real way, +02:45 you will have to go into the shell, you will have to talk to the database directly, +02:49 you have to maintain it, and manage it, and back it up, and do all those things; +02:52 in order to do that, you need to know the Javascript capabilities, +02:56 the way to do this in Javascript, as much as you do the Python way. +03:00 Ultimately, the end game is to use something like MongoEngine +03:03 which is equivalent to SQLAlchemy, sort of analogous to SQLAlchemy, +03:08 in that we won't even be speaking in this syntax, +03:11 but still, you'll need to know how these translate down into these queries +03:15 because you might want to say add an index +03:17 to make your MongoEngine perform much, much faster, things like this. +03:21 So we're going to focus on Javascript now, and then for the rest of the class, +03:26 we're going to basically be doing Python, but like I said, +03:29 in order to actually use, manage, run, +03:32 work with an application that lives on MongoDB, +03:34 you have to be able to use the shell, and to use the shell you do Javascript. +03:38 So just like anybody who writes web apps, we're all Javascript developers, +03:41 if we write any form of web app, similarly here, +03:44 if you work with MongoDB, we're all Javascript developers +03:47 and we got to do just a tiny bit, but you'll find it like I said, +03:49 it's super, super similar to what we're going to do in Python. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/10.txt b/transcripts/ch4-mongo-shell/10.txt new file mode 100644 index 0000000..adab2ef --- /dev/null +++ b/transcripts/ch4-mongo-shell/10.txt @@ -0,0 +1,39 @@ +00:01 So here's an interesting question— what if I want to find all the books +00:03 where user 720 has rated that book exactly a nine. +00:09 You would think that this would do it, right, +00:11 we're using both values in this prototypical object or this document here +00:15 and it says that the book is going to have to have +00:18 a rating of nine and user id 720 has rated it. +00:21 However, when we run this, you'll see we get mixed results. +00:24 The bottom one looks perfect, we got a book with the user id 720 +00:29 an a value of nine in the ratings, great; +00:32 but this other one, what's up with this, the red one? +00:34 Well, user 601 rated this as a nine, +00:38 and user 720 actually hated the book, they gave it a one. +00:41 However, taken as a whole, does the book have a rating by user id 720— yes, +00:46 does it have a rating of nine— yes, so it matches this and clause. +00:49 So, oftentimes if you're looking for this exact subdocument match +00:54 and that thing you're looking in is an array +00:56 so ratings is an array of documents, if ratings was one subdocument, +01:00 this would work fine, but if it's an array and you want to say +01:04 I need to make sure that the thing in that array is +01:07 that subdocument itself matches value and user id as I've specified here +01:11 you need a different query operator, and that is dollar element match; +01:15 so you can run this and it'll look down inside and say +01:18 I want to find all the things in ratings, +01:21 where both, the user id is 720 and the value is nine. +01:25 So this is a slightly more complex version +01:27 that you have to run and you have to use +01:29 because you run into that problem we had before +01:31 where somebody voted a 9, user 720 voted, +01:33 but it was not user 720 who voted nine. +01:35 So a little bit different than if you were working in +01:38 say a sequel traditional tabular language +01:41 because you don't ever have this kind of duplication within the one result, +01:45 so it would be a lot simpler, but this is something +01:48 that you kind of got to get your head around a little bit, +01:50 you luckily don't use it very often, and if you are using the higher level of things +01:54 like MongoEngine, you won't run into it, +01:56 but down here at the shell or in PyMongo, +01:58 you have to be really careful if this is actually +02:00 the question you're trying to ask and answer. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/11.txt b/transcripts/ch4-mongo-shell/11.txt new file mode 100644 index 0000000..29d03fd --- /dev/null +++ b/transcripts/ch4-mongo-shell/11.txt @@ -0,0 +1,56 @@ +00:01 So, we're pretty good at finding and filtering down our result sets. +00:04 The other super important things that databases do +00:07 is to sort them, put them in order, so I would like the best selling book +00:12 and then the second best, and then the third best in this category, +00:15 that's a perfect sort by category, order by best selling this, right. +00:20 So how do we do that in Mongo? +00:22 Let's go over here and it turns out that there's a sort that we can run, +00:25 and the sort takes something, right, kind of like our projection does here, +00:30 so let me just show you before if I run this that this is not in order, +00:33 so here we have c, c, d, f,  and then t, p, w, +00:40 and eventually we're back just, you know, something before w, +00:43 it is not sorted by title, not sorted by published date either, +00:47 these three seem to be descending but the next one is not, ok. +00:50 So it's not sorted at all, it's just however it comes back, +00:53 probably by object id or something like this. +00:56 Anyway, let's go and sort it, so let's suppose I would like sorted by title; +01:00 so very much like our filter thing or maybe even closer, actually, +01:05 like our projection here is I can come say I would like to sort +01:09 and then this part that goes here, this one is ascending, right, +01:13 so something that is positive means ascending, if it were negative, +01:16 it would mean go in reverse order. +01:18 So let's run this, now you can see, actually this is the beginning of the title, +01:22 this exclamation mark and then some other exclamation marks, +01:25 and then let's get past the symbols, a lot of symbols, anyway, +01:30 you can see this is sorted by this, sorted by the title, not sorted by date, +01:36 1994, 1993, 1996, we can also sort by date, let's comment this out, +01:41 say .sort, published, let's sort in reverse order, +01:47 newest which was 2050, I think we might have been fooling around with that +01:52 or no actually I don't know where those came from. +01:55 Anyway, 2050, 2038, 2037, 2030 and so on. +01:58 Obviously, sorted in reverse order. +02:01 What if I want to sort by the title and then any time the title matches +02:05 I want to see the newest one of those. +02:08 We can do that as well, so very very similarly we can say sort +02:14 and then we just give it one of these objects with multiple values, +02:16 so you want to sort by title, there's your sort by title ascending +02:22 and then after that, if any of the titles match, +02:26 let's show the newest one first, so sort by title ascending +02:30 and then published descending, let's try that. +02:33 Great, ok so here notice that these titles are the same, +02:36 you might have noticed that before, but here's 1994 and here's 1993, +02:40 so any time the title matches, we get the newest one first, +02:44 I don't know if any others are in here with title matches. +02:46 This first one must prove it right, this is how it works, +02:50 sort by that and then by and you can have as many then buys as you like +02:54 and they can either be ascending or descending, +02:57 so here we're sorting by title first and then by published. +02:59 The other thing that's important to notice is +03:02 everything in MongoDB is case sensitive, when you're working with strings +03:05 so that's probably going to play into this somewhere along the way. +03:09 All right, so sorting pretty straightforward, just use these field names +03:13 and then the direction you want to sort. +03:15 The other thing that's worth paying attention to is +03:18 you are going to want to make sure that you have an index +03:20 so this sorting is actually fast, and we'll talk about that +03:22 when we get to the performance section. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/12.txt b/transcripts/ch4-mongo-shell/12.txt new file mode 100644 index 0000000..3cdfd0f --- /dev/null +++ b/transcripts/ch4-mongo-shell/12.txt @@ -0,0 +1,21 @@ +00:01 Let's review sorting as a concept. +00:03 So there's a sort function right on the result set +00:05 on the cursor that comes back from find, +00:07 and the way it works is we pass it some prototypical json document; +00:11 but now instead of equality meaning matching, +00:14 it means tell me the thing and the direction that you want to sort. +00:17 So here we want to say sort all the books descending +00:21 show me the most recently published to the oldest, right, +00:25 show me the most recent books basically. +00:27 Now this works pretty well, we could put anything that has a direction +00:29 like a minus one, or one, I think you could even put higher multiples +00:33 like ten and 20, 50, -10, but use one and minus one, keep your sanity. +00:37 So this works well for one field, if we want to sort just by published, +00:40 but if I want to sort by one thing, and then another, +00:43 well we just put more into this document that we passed to sort, +00:46 so we're going to say sort by title ascending +00:49 and then sort by published descending, +00:51 we run this, we saw that we get the results in our demo, +00:54 first we sorted ascending by the title and any time they matched +00:59 we sorted descending by the publish date. +01:02 So first the 1994, A Nutshell Handbook, and then the 1993 one. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/13.txt b/transcripts/ch4-mongo-shell/13.txt new file mode 100644 index 0000000..cb548b4 --- /dev/null +++ b/transcripts/ch4-mongo-shell/13.txt @@ -0,0 +1,27 @@ +00:01 Inserts are one of the simpler operations in MongoDB actually. +00:04 So we just go db.collection name, in this case db.book.insert +00:08 and we give it the thing to insert. +00:10 Now, if we don't specify an id, an _id +00:13 which generally we want to let the database generate +00:16 but it's not always true, like we could have people and their primary key, +00:20 their id could be there social security number, and that you would provide, +00:23 so in this case, we're not going to provide an id, +00:26 we're going to type in title and isbn, and those kinds of things. +00:29 And then if we just do a find, that would come back and get the first one +00:32 maybe say this is our first insert, we'd get something back like this, +00:35 let's say we specified the isbn, the title, +00:37 the author, the published and the publisher, +00:40 this is a relationship over to the publisher table, +00:42 which we haven't played with yet. +00:44 So those were all set by us, you can see "Winning With MongoDB" +00:47 and down here we have "Winning With MongoDB:, +00:50 but the _id, because we didn't specify it was auto generated to an object id. +00:54 So unless you have a good reason to pick another type of id, +00:57 this is probably the best one for Mongo, but it could have been a string, +01:00 like I said, it could have been a social security number +01:03 or it could be just numerical if you want to have a 1234, +01:06 all of those kind of put the burden on you to manage the uniqueness of that id +01:10 and there is a unique disconstraint on _id for every table or collection. +01:15 So that's how inserts work, you just give it this document +01:17 and it stores it more or less directly in the database +01:20 except for that it will generate this _id as an object id if needed. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/14.txt b/transcripts/ch4-mongo-shell/14.txt new file mode 100644 index 0000000..87f5809 --- /dev/null +++ b/transcripts/ch4-mongo-shell/14.txt @@ -0,0 +1,39 @@ +00:01 If inserts are simple, updates maybe not so much. +00:03 In fact, there are two types of updates that we're going to look at; +00:06 first, we're going to look at what is the conceptually more simple one, +00:09 but also slightly more problematic. +00:11 So I'm going to call this the whole document update +00:14 and the way you might use this is you might go to the database, +00:17 do a query, get a document back, make a change to it +00:19 and say here, push this whole document +00:22 back over top the existing one in the database, kind of orm style. +00:26 The other one that we're not talking about here would be the in place updates, +00:31 so you might say go increment the view count of this post +00:34 without retrieving it, without changing the other parts, +00:38 ok, so how does the whole document update work? +00:40 Well, first of all, we're going to do an update +00:43 if we come back and we look at it, we'll see maybe we've changed the title here, +00:46 the author is still the same, but we had to pass the author, +00:48 we had to pass the published and the isbn back, +00:52 okay, in fact also the id, so all that stuff we had to put back, +00:55 basically the way it works is we're going to do a where clause here +00:58 so find it by the primary key, this great long object id +01:02 and then here is the entire whole document +01:05 we want to replace that document with. +01:07 Now because of the way it's working here, +01:09 there's a couple of features or settings you might want to control here, +01:12 so you might need to set these, you might not depending on what you're doing, +01:16 the default is if the where clause does not match, nothing will happen, +01:20 there will be no kind of upsert, there will not be a new document added +01:24 because we didn't find one, just nothing happens. +01:26 So if you say upsert is true and you run this update, +01:29 it will say I didn't find this document, so let me create it for you, +01:32 so you could control that here. +01:34 Similarly with multi equal true, normally unlike sql statements +01:37 update only updates the first item it finds +01:40 even if the where clause would match ten things, it only updates one of them. +01:43 So that's a little bit funky, but if you think it's entirely replacing the record +01:48 like why would that hole record be duplicated ten times, +01:51 I don't know, it's kind of weird, but if you do want to update multiple objects, +01:54 multiple documents in this collection, be sure to set multi to true, +01:57 both of those orange values, their default values are false. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/15.txt b/transcripts/ch4-mongo-shell/15.txt new file mode 100644 index 0000000..e6865bd --- /dev/null +++ b/transcripts/ch4-mongo-shell/15.txt @@ -0,0 +1,11 @@ +00:00 After you've inserted some documents and maybe updated a few, +00:03 it might be time to get rid of the old ones, so let's talk about deleting them . +00:06 So again, it's db.collection name. and we're going to apply delete operation. +00:11 And here we can say I'd like to delete one of them, +00:13 delete one, or maybe I want to delete a whole set of them, right, +00:18 the delete one we're passing in something that should be unique, +00:20 like the primary key, and delete many, maybe a bunch of them have the title, +00:23 maybe there is a couple of additions +00:25 like a kindle and a paperback version or something like that. +00:27 So just get rid of all of them with the title being some title. +00:30 So, delete one, delete many— pretty straightforward. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/16.txt b/transcripts/ch4-mongo-shell/16.txt new file mode 100644 index 0000000..6c0f1ac --- /dev/null +++ b/transcripts/ch4-mongo-shell/16.txt @@ -0,0 +1,90 @@ +00:01 It's time to look at the atomic updates. +00:03 We already talked about the whole document updates and how they work, +00:05 but sometimes it's not really what you want; +00:08 in the beginning when we talked about NoSQL, +00:10 we saw that the NoSQL databases gave up things +00:12 that traditional relational databases embraced or considered sacred. +00:17 One of those where the acid properties, or some part of the acid properties +00:21 and MongoDB does say look things like joints and transactions, +00:24 transactions mainly being part of the acid properties +00:27 is something that MongoDB doesn't promise +00:30 so this whole document updates really require an additional layer in the app tier +00:35 called optimistic concurrency, and usually it's fine, +00:38 sometimes it's not, and you can catch it and say hey look +00:41 somebody saved this out from under you +00:44 and you do want to keep your changes, their changes, +00:45 there's things you can do about those types of situations, +00:47 but not in the database in your app. +00:50 On the other hand, MongoDB does support atomic transactional behavior +00:54 long as it happens on a single document, +00:57 so if we have a document and let's go ahead and create +01:00 a whole new collection here called BookReads +01:03 notice it doesn't exist yet, and we're going to insert just an isbn +01:05 and then how many times it's been read, +01:08 I think of like the Goodreads service or something like that, +01:10 like I want to know how many of my friends read this book, +01:13 we'll you a simple, simple version of that. +01:15 So let's go over here and notice we inserted one and if I refresh, +01:18 we should now have that in here in our one record, like so. +01:22 So we could go and we could do this for this whole document style things, +01:26 I could say book and of course we will be doing this in Python very likely +01:33 we're just about to leave the Javascript in the dust, +01:36 so let's just print out our book that we got here, +01:39 notice this has actually given us the same thing back, +01:41 and we could say the read count += 1, we could increment that, +01:47 and then we could say go over here to the same collection, +01:50 we could say update, I would like to update with this, +01:55 here's the where clause, and the thing I want to update with is the book, +01:58 so let's say _id : book._id, okay, so this should do that like so, +02:07 and let's run one more query here at the end to get it back, to see it again. +02:12 Oh yes, find is not going to work, find one however, +02:19 we don't want to update a whole query, +02:21 whatever that means it doesn't make any sense +02:23 but let's get one of them back, we know this is really going to be unique +02:25 and then let's make this change, ok +02:27 so notice, now we've got a read count of one, we do this a few times, bam bam +02:32 a read count is incrementing over and over and over down here, +02:35 and we're updating one record, +02:37 so this is cool but this is not part of the acid property guarantees, +02:40 this could be problematic in lots of ways +02:43 so what we're going to look at now, are the operators that we can use +02:46 to basically do almost transactional stuff +02:49 and do it in a much more high performance way. +02:52 So let's go over here again, and let me grab this little clause here, +02:57 all right so we got our document back again +02:59 and now what we're going to do, is we're going to do our db +03:03 let me just grab this collection bit, +03:06 and we're going to do our update, in fact update is going to look almost the same, +03:09 we are going to do this, but instead of passing the whole document +03:16 we're going to pass just an in place atomic operator, +03:18 all right so what are we going to do, let's suppose we want somebody +03:23 to basically do the same thing, increment that +03:26 alright, I guess we could just use isbn, that works as well right; +03:32 +03:37 we're going to need something in our little where clause here, isbn will do. +03:41 Now by default, this is going to replace whatever's in there, +03:44 that's going to be bad, but what we really want to do is +03:47 we want to increment that one value, so we can use another operator, +03:50 say inc for increment, and then what do I want to increment, +03:58 I want to increment let's see what is it called— ReadCount, +04:04 so I want to increment ReadCount by one, +04:07 I could increment it by negative one, I could increment it by ten. +04:10 So let's run this, now notice we updated one record +04:13 and let's put this in a way that looks better, nine, ten, eleven, twelve— +04:18 there we go, check that out, isn't that cool? +04:21 So what's happening here is it's actually going into Mongo, +04:24 go find the document, just change that number right there, +04:27 just add one to it for me, you don't have to pull the whole thing back, +04:31 make changes and possibly try to put it back and someone else changed it, +04:34 none of those things, this is entirely atomic and safe +04:38 in a multi threaded, multi server environment, +04:42 because MongoDB guarantees individual updates +04:44 to individual documents are atomic +04:47 and because we're not depending on the value, +04:50 we're not like reading it changing in memory and putting it back +04:53 change it in our programs memory not Mongo's and put it back, +04:56 then we're not going to have any problems. +04:58 There's a bunch of cool operators like this +05:00 and we'll see that MongoEngine actually naturally behaves in this style +05:04 not the document style, even though it's an object document mapper +05:08 which is really really delightful. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/17.txt b/transcripts/ch4-mongo-shell/17.txt new file mode 100644 index 0000000..bb6273b --- /dev/null +++ b/transcripts/ch4-mongo-shell/17.txt @@ -0,0 +1,59 @@ +00:01 Despite the fact that MongoDB is a NoSQL database +00:03 it does adhere to the acid properties under certain circumstances. +00:07 Primarily that means updates to individual documents are guaranteed to be atomic, +00:12 and along with those, we can get great performance +00:15 as well as safety if we don't pull the document back for the database, +00:19 make changes and push it back hoping no one else has changed it +00:22 during that intervening time there, +00:24 but in fact we can go to the database and go make this change here +00:27 I don't care if it's a 100k document, don't pull anything back +00:30 just make this little change and that happens atomically and safely. +00:34 So the operators that we have to work with are increment, multiply, +00:37 rename a field, set on insert set unset, like basically delete a field, +00:42 min and max so I would like to set the value +00:45 but only if this value is lower than the one I'm passing, +00:48 or the one that's in the document or set it to the max, +00:51 like only set the value to this if this new value is bigger than the existing one. +00:55 You can also use current date to basically grab the server date and save it there as well. +01:00 So these are the in place individual updates and we can see how that works +01:03 so we'll come over here and let's insert just a book +01:06 and this time our book has a view count, right, the view count is zero, +01:09 maybe every time somebody pulls up the book we want to increment that, +01:13 so we can say test.update and give it the object id +01:16 right here is a real simple one so it was fits onto the screen basically +01:20 you can say $inc increment view count by one, +01:23 and we do this a few times, so we've done it three times +01:26 it should go from zero to— well you guessed it, three +01:29 and it all happened atomically in the database, +01:32 without us ever pulling it back or worrying about any sort of concurrency whatsoever. +01:36 So this is great for working with individual fields +01:39 sometimes we need to work with arrays, +01:42 so we saw like for example our ratings object +01:44 maybe we want to work with that atomically. +01:47 So MongoDB has operators for that as well, +01:50 so we have things like add to set, so suppose it's got like a votes list, +01:55 people who have voted on this book, +01:57 not the values just keep it simple, just the users who have voted +02:00 and that contains user id, so you could say add to set user id when they vote +02:04 and that would actually only add them there, if they're not already in that list; +02:08 what's cool about that is +02:11 if they push the little vote button twice, it doesn't count twice, +02:14 just either you add it there and the person has now voted for or they haven't. +02:17 Another good example is tags, like think stack overflow, I want to tag a post +02:21 so you could say add the tag python, add the tag mongo, +02:24 and if it's already there, it's just going to leave it alone +02:26 if it's new, if it's not there it will actually add the tag. +02:29 So these are really cool to add to set for kind of uniqueness on these subarrays. +02:33 We also have pop and pull for pulling things out, +02:35 pull all say I want to remove all the votes by a particular user, things like that. +02:40 Also push, so push is like add the set +02:42 without the unique desk constraint, and that's it, +02:45 I definitely recommend you think about these atomic updates, +02:47 they are not simple, but they are better performing +02:51 and they are definitely safer as well. +02:54 Like I said before, it's great that the odm, the object document mapper +02:59 that we're going to look at, MongoEngine automatically does this behind the scenes, +03:02 we don't ever have to even know how they work, +03:05 but it's important that you know that they exist and why they're good for you +03:08 when you look at the logs, and you look the performance +03:11 and think about things in that way. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/2.txt b/transcripts/ch4-mongo-shell/2.txt new file mode 100644 index 0000000..82e10d8 --- /dev/null +++ b/transcripts/ch4-mongo-shell/2.txt @@ -0,0 +1,93 @@ +00:01 So let's connect to MongoDB, +00:03 I already have it running as a separate process hidden away, +00:05 we'll talk about how to run MongoDB later, +00:08 you should have seen in the setup how to get it started +00:11 and then we'll talk about the deployment side of things later in the class. +00:14 So MongoDB is running, it's running the local machine under default ports, +00:18 no security, nothing like that for getting started, +00:21 it's only listening on 127.0.0.1 +00:25 so it's not listening on the public network, on my machine, +00:28 so for that reason, more or less plus firewalls, +00:31 the authentication part we're going to turn off for a little bit, +00:35 just so we can start from the beginning; +00:37 okay, the other thing I have is I have set up MongoDB in my path, +00:41 so I can ask which Mongo, and it comes back with something, +00:45 so what I actually did is I went to MongoDB +00:48 and I just downloaded the tarball, and I unzipped it, +00:51 and I sort of changed the naming around, so it's in this path here, +00:53 so here's the actual executable. +00:55 Mongo is the name of the shell, mongod is the name of the server for deamon +00:59 so in order to connect to MongoDB, there's a ton of options we could give it +01:03 and like I said, when we get to the deployment and production stuff at the end, +01:06 we'll have to pass all sorts of things like authentication, +01:09 an ssl flags, and whatnot, server names here +01:12 but in the beginning, we can just type mongo. +01:15 And you'll see, right here, we're running 3.4.4 +01:19 and it's connected to local host 27017, +01:24 that's the default port for standalone servers, +01:26 there's 27 thousand, 18, 19 and 20 are reserved +01:30 or typically the default for other types of things. +01:32 So my system is not exactly set up right, +01:35 but it's not a production machine it's just my dev machine, okay. +01:37 So now we're connected, what do you do? +01:40 Well, probably the first thing you want to do is +01:42 focus on a particular database, so you can say show dbs +01:45 and it will show you the various databases, how large they are things like that, +01:50 so we're going to work with the bookstore for our examples in this chapter. +01:55 Later, we're going to work on something that maps over to a car dealership, +01:59 so those are the two databases that we're going to be working with, +02:02 you can see that I have got some for my various sites here and things like this, +02:06 I have actually broken it apart so like Talk Python the core data +02:09 it's not really zero gigs, it's just rounding down, it's like 20 MB or something, +02:14 but the analytics is half a gig here, and it's actually much more if you export it. +02:20 So we may have more than one database for our app like I have on my podcast, +02:23 or you might just have one for the trading site, like we do here. +02:27 Great, so now I want to maybe find a book in the bookstore, +02:31 so how do I do that— the first thing you have to do is +02:34 you have to activate the database, so you're going to say +02:37 db.command, whatever that is, and give it some command here, +02:40 where db refers to one of these databases, so the way we do that +02:43 is we say use say bookstore, like this, +02:47 now it says great, we switched to bookstore, +02:49 and then we could say db. first of all what are the equivalent of tables +02:53 in MongoDB these are called collections, because they're not tabular, +02:56 so we can say show collections, and this is what is contained inside of bookstore, +03:01 there's a Book, case sensitive, Publisher, Test and User, ok. +03:06 So if I wanted to find the books let's say db.Book.find +03:09 let's say just limit one, so it doesn't go crazy on our shell here, +03:13 so basically, the way it works is we connect, +03:16 we figure out what the database we want to work with is, +03:19 we say use that database and then we say db.collection name +03:22 and then we typically fire these commands at the collection. +03:25 Now, what's interesting that is missing here +03:28 is there's not like a create database or inside of +03:32 here there's not a create table or create collection command, +03:36 so like Python in some ways, MongoDB is very, very dynamic, +03:41 so if we wanted to create a table, let's go and just create a collection +03:45 and we won't create a whole new database, +03:47 so what database we have, we have a bookstore +03:49 and we have those for collections bookl publisher, test and user, +03:52 so if I want to create one called logins— +03:56 let's say just log for history +04:00 I could even issue a find command against that +04:03 and there's just nothing, it's just empty. +04:06 If we go up here and we say what's here, there's no log, +04:09 but if I actually try to interact with this, we'll talk about inserts in a little bit, +04:12 but let's just really quickly see how this works, +04:17 I would just say let's say name or action is view, something like that, +04:22 if I insert this, no just crazily this works and something was inserted, +04:26 if we look there's now a log, so db.Log, case sensitive .find, there +04:32 and it inserted this thing, action with a view and I gave it the id whatever it is, +04:36 this is called an object id, we'll talk about that later. +04:39 Okay, so this shell is how we work with MongoDB, +04:41 if I want to get rid of it, I could go here and say drop collection, +04:47 +04:50 just drop, right, +04:54 and now log is gone again. +04:56 So this is your base level admin tool +04:59 and it works everywhere, so we could ssh into our Linux server +05:02 Digital Ocean, or on aws or whatever, +05:06 and we could do this, we could even sort of tunnel this through there, +05:10 but we're going to see that there is actually some better options +05:13 any time we're running somewhere +05:15 where we can even just tunnel over to the server. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/3.txt b/transcripts/ch4-mongo-shell/3.txt new file mode 100644 index 0000000..60ae546 --- /dev/null +++ b/transcripts/ch4-mongo-shell/3.txt @@ -0,0 +1,31 @@ +00:01 So let's review the main concepts around using the shell. +00:03 Remember you just typed mongo enter +00:05 and it will connect your local default, everything default port, +00:08 default local host, no account etc, +00:11 and once we're connected, we'll be in here, +00:15 and it'll say connected to the server, +00:17 what version of the shell, what version of the server, +00:19 3.4.4 is the latest at the time of this recording, +00:22 but maybe not at the time you are watching it, +00:24 like all things that are server's, newer is better. +00:26 Ok, so first thing that we might want to do is say what databases are here, +00:29 and we do that with the show dbs command, we hit enter, +00:32 and it shows you the various databases that are listed. +00:35 Then next we want to activate one, +00:37 so that we can issue commands to it through the db.collection +00:40 or other high level operations, +00:42 so we'd say in this case let's work with talk_python, so we'd use talk_python. +00:46 And it'll say great, we switch to database talk_python, +00:50 and in you're wondering you can always trying as you saw me do +00:53 db enter and it will say talk_python, cool, and then, +00:55 we could say well what collections exist in talk_python? +00:57 This is actually pretty straightforward, +00:59 the document design I think is pretty interesting +01:01 but there's not many collections, +01:04 so we have episodes, guest, reviews and then while developing it, +01:08 I turned on profiling to see where it was slow and where it was fast, +01:11 where I need indexes, we'll talk more about that near the end of the course. +01:14 So we have these four collections, and now if we want to find an episode +01:18 we'd say db.episodes.find and give it some search, +01:23 or sort, or something to that effect. +01:25 So this is how we get started and connect with the shell. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/4.txt b/transcripts/ch4-mongo-shell/4.txt new file mode 100644 index 0000000..bb8c8de --- /dev/null +++ b/transcripts/ch4-mongo-shell/4.txt @@ -0,0 +1,182 @@ +00:01 Now let's see how we do probably the main thing +00:04 that you do in databases and that is query. +00:06 So here we are in the Mongo shell still, +00:09 and I'm using the bookstore database, +00:11 so what I want to do is find some particular books; +00:14 remember, we have book, publisher, test +00:18 we can really remove test, not actually do anything, and then user, +00:21 so those three actually used one. +00:23 Let's go and remove test just so that it is gone. +00:29 Now we have the ones we're actually using. +00:31 Now, when we're getting started, it's probably worthwhile to just say db.Book.find +00:36 as an empty query just like kind of select star if you will, +00:39 you know show all of the things that are in there, +00:42 there, that's totally obvious what that is, right, +00:44 you see the structure, right if you can like kind of exist in the matrix +00:47 you could entirely see the structure there, but let's do that better. +00:50 Notice a certain number of items, I don't know it's 20 or 50 were returned +00:54 there's actually like a quarter million books, +00:57 so we didn't get them all which is good, +00:59 so if we want more, we just type "it" and it will actually get more and so on. +01:03 Okay, so this is not super helpful, let's make this more helpful; +01:06 so here we can go over and say I want this to be like that pretty +01:10 and in fact, if I just want one of them I could just say limit this to the first one, +01:14 or let's just say limited to two so we see a couple of examples. +01:17 There, now we're starting to see the structure. +01:21 +01:23 Let's go here, ok so now we've got a book, +01:26 right here you can see the top level document, +01:29 it doesn't put the results in arrays, +01:32 like it doesn't print out an array it just prints +01:34 a whole bunch of individual results in this case two, +01:36 so here we have our id, there's always an underscore id in the database +01:40 like this is the name of the primary key, +01:42 you can have it look different in Python, +01:44 you can say this thing maps actually to the primary key +01:47 when you are modeling this with classes and so on, +01:50 but down at the Javascript and the MongoDB level, +01:53 this is always the name of the primary key, +01:55 if you don't give it one when you insert the thing, it's auto generated, +01:58 and so if you don't have a great reason to care about what id looks like +02:02 probably using this object id is the best bet. +02:05 So our books have isbns, they have titles, they have authors, +02:08 I kind of wish it was little more pythonic with lower case ts and as, +02:12 but this database came from somewhere else and it's like this +02:15 so we're just going to roll with it. +02:17 Ok, so we've got dates notice, json doesn't support dates +02:20 nor does it support object ids, but the results here do +02:23 and so dates and object ids are sort of extensions that bson brings to json. +02:28 Alright, and then we have a list of these image url objects +02:32  which have both the size and url, and so on, +02:35 and then they also have ratings, this one has one rating, so not too many, +02:39 let's look at the next one— it has a lot of ratings, right, +02:42 so it has a user id that is foreign key constraint +02:45 a foreign key link soft not enforced by the database, +02:48 but a link over to the user table and then a value here; +02:51 so this is what this database looks like, +02:55 we have a title, we have an isbn,  and these are like the flat things, +03:00 and then we have most importantly we'll go play with the ratings a little bit, +03:03 so let's start by asking this question about the books. +03:06 So the way it works is db.Book.find put some space in here +03:11 so the way MongoDB queries it doesn't have a where clause +03:15 basically what you put in here is the where clause, +03:18 and the way we do is we pass what I think of as a prototypical json object +03:22 so the json object that we're going to put here, +03:25 maybe would have something like this, let's say title, case sensitive remember, +03:30 +03:34 is "From the Corner of His Eye", if I put this in, here we go, +03:39 so "From the Corner of His Eye", now this is a book +03:41 that should be in this database and we'll be able to do some queries for it +03:47 what this says to MongoDB is go to the book collection +03:49 and find every single document that has the title equal to +03:54 "From the Corner of His Eye", and I think that there's more than one, let's see— +03:57 yes, so we can come over here and we can do a .count, +04:00 there's three, alright, so this is nice, +04:03 however, what you saw come back there was even if I did a pretty, +04:07 still because we've got the ratings and the image urls +04:10 and this one has a crazy amount of ratings and so on, we might want to get less, +04:14 so with his find thing, this is like— let's put it here, +04:18 this part where is this title, that is the where clause +04:21 but in SQL, you could say like select title, id, isbn, from this table +04:28 so we can do that in MongoDB as well, we can do this like sub projection +04:31 so I can come down here and say I'm interested in title +04:34 and anything that's truthy in Javascript, so I could put high, +04:38 I could put one, I could put true, I like to put one, I don't know why +04:43 and let's say we want the isbn, this is case sensitive as well +04:47 and watch what comes back now — okay, so there's our three records +04:51 now interestingly, each one has three keys and we specified two. +04:55 So the way it works is Mongo is like +04:57 you're probably going to need that primary key +04:59 so unless you explicitly say you don't want it, you're getting it right, +05:02 so if we want to do this again, and I could come over here +05:05 and I could explicitly suppress id and put something falsy here like zero +05:08 and then I just get isbn and title, okay. +05:11 +05:15 So let's go back to this. Now suppose I want to find the book with this title +05:19 and this isbn, how do I do an and here? +05:23 Well the way these queries work is everything, +05:26 basically every property of that little subdocument must be +05:29 a subset of the thing it matches for, +05:31 so when I say title is "From the Corner of His Eye", +05:33 that matches the title, but I could equally come up here +05:36 and do this again and say oh also that isbn, +05:39 +05:43 actually I don't know what it's supposed to be let me run this real quick, +05:46 let's say we're looking for this one, the one that ends in 41, +05:50 so now I could come over here and say that isbn, +05:55 so json or Javascript you don't technically need to put a name there +05:58 but this is a string, so it goes like that, right +06:01 see it starts with zero, it wouldn't just be a number. +06:03 So now, if I run this, I just get the one, +06:05 so this is the and clause, select star from book where title is this and isbn is that +06:11 so you can create these documents +06:14 to basically and together all the pieces that you need. +06:18 So this is all well and good, this looks a lot like a standard database, +06:22 standard relational database type of thing +06:25 but remember when I talked about documents, +06:27 I said their superpower is they get this nested thing +06:31 so let's go over here and just throw this back, +06:34 we'll just get one of them so we can look at it again, +06:38 their super power is that they can reach, let's get the next one +06:43 so per page you would use skip and limit, +06:46 so we can reach into like say the ratings and say +06:49 I'd like to find all of the books that have a rating of let's say eight +06:54 or all the books that have been rated let's do this, +06:58 I don't know how many books that person has rated +07:00 but we can find out in a second, so I want to find all the books +07:02 that have ratings where the user id was that particular id, right there, +07:06 so how do we do that— let's come up here again, we don't need this anymore, +07:10 so in here we kind of want to say something like this +07:14 like rating, and then if this was an object we would navigate it with .syntax +07:18 but it's not going to work out so well here, +07:20 so this would be user id like this, let me just paste this in +07:26 so I can get my little object id out, when you're quering by object id +07:29 and you just say object id, +07:32 the question is is that valid Javascript, and the answer is no, it is not. +07:37 So any time you have this sort of hierarchy thing traversal +07:41 you have to put quotes right, if it's a single item is optional +07:44 if you're doing something funky like an operator or something like this +07:47 then you're going to have to do like this. +07:50 So let's just show, let's select back here +07:53 we're just going to say give me the title is one +07:55 and I don't even care about the id; +07:58 if I can write a query like this, go down into the ratings, +08:01 and show me all the ones that have this user voted, +08:04 that means even though I've kind of pre-joined and embedded this ratings concept, +08:08 I can still query it as if it was a separate table, separate collection +08:12 and that's the document databases superpower, +08:15 let's see if I can get it to work now; +08:17 apparently I did not get it to work what am I missing here? +08:21 Oh, notice I think I said rating and the actual schema is ratings plural, +08:26 I think that's good, it's representing a pluralized thing down there +08:30 so the problem was I did this, now notice MongoDB didn't crash, +08:35 it didn't go oh there's no such thing as a ratings field on this, +08:39 it just said no nothing matches that, +08:41 so it's really powerful, it means it's super easy +08:43 to sort of evolve and work with the data +08:46 and it doesn't break under the tiniest lightest of schema changes, pretty good, +08:50 but you just got to be careful, so let's try it again. +08:53 There we go, so apparently we could even ask +08:55 because that was not all of them, there's a lot of books this person has rated +08:58 so I think this data might be partly just generated +09:01 okay, so here these are the books that that person rated, +09:05 let's find another, let's try to do this again, +09:08 come down here I will get this object id, +09:11 +09:14 we can say I want to find the books rated by that person +09:18 how many are there— 107. +09:21 And if I actually wanted to see what they are, there's the titles of the first set of them, +09:24 notice that's really, really fast, I think I have indexes set up right now +09:28 we'll talk about indexes when we get to the performance part of this course, +09:31 but we can do these queries down into the ratings embedded part +09:35 the embedded documents into the books +09:38 just as if they were their own table, +09:41 I told you there's about a quarter million books, there's 1.25 million ratings +09:45 so notice the response time here almost instant, in fact it's like milliseconds. +09:51 So not only can we do this query, we can do this query extraordinarily fast. +09:56 All right, so this is one of the things that makes document databases interesting +10:00 and also challenging, how do you define the documents, +10:03 should you embed them, should you not, +10:05 we'll get to that in a whole different chapter, +10:07 but for now, just know it does have this super power +10:09 to reach down in here and do these queries. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/5.txt b/transcripts/ch4-mongo-shell/5.txt new file mode 100644 index 0000000..bb9fb27 --- /dev/null +++ b/transcripts/ch4-mongo-shell/5.txt @@ -0,0 +1,44 @@ +00:01 We've explored the shell a little bit, we've done some querying, +00:03 let's look at the concepts behind it, so you have them nice and concise, +00:06 in case you want to come back for a reference. +00:09 So if we want to query say the Book collection in the bookstore database +00:12 where the title is 'From the Corner of His Eye', +00:17 we can type find and give it this little prototypical json object, +00:20 hit enter, and boom everything comes back that has the same title, +00:25 different isbns, different primary keys and so on, +00:27 but releases, different versions, +00:29 maybe one is paper back on is kindle, who knows; +00:31 so the idea is we're going to come up with these prototypical json objects, +00:35 here title: whatever the title is. +00:39 Now, if we want to do more than just what is the title here +00:42 we want to say give me the book with the title this and the isbn that, +00:47 given that the isbn is probably unique, +00:50 we could maybe just search for it instead, +00:52 but we want to demonstrate the and clause, right. +00:54 So here we'll give it this prototypical sub document +00:56 with the title being the title we're looking for, and the isbn being this one. +01:00 And notice, now we only get one record back, +01:03 so our prototype will document is basically an and clause, every field must match. +01:09 We also saw that one of the excellent ways to group related data, +01:14 this would be what you might call an aggregate in domain driven design, +01:18 is to embed items into the document, +01:22 so here we have ratings that ratings have little sub objects, +01:25 sub documents that have things like user ids and values +01:28 and at the very beginning, and in the example you saw, +01:30 the superpower of these document databases, is that they can query them, +01:34 so I want to find all the books that have been rated by this highlighted user id— +01:38 how do I do that? So we just pretend we're traversing the objects +01:41 Ratings.UserId, so down here we'll say find Ratings.UserId +01:46 and we give it the object id that we're looking for +01:49 because ''Ratings.UserId'' is not a valid key or a field name in a Javascript object +01:54 we have to put it in quotes, but other than that, it's basically the same idea +01:58 and here we get back all the books that have been rated by this particular user. +02:02 So we just use this dotted notation to traverse the hierarchy +02:07 one other interesting point is maybe ratings just contained the number +02:11 like it was at 7,5,... then you could actually just if I want to say +02:17 find all the books that have a rating of seven +02:20 I could just say find ratings:7, +02:23 I don't have to do this dot notation or anything like that, +02:25 but because I'm looking within that document inside ratings, +02:27 regardless of whether it's an array or it's a single rating thing, +02:31 you do it like this that dot notation. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/6.txt b/transcripts/ch4-mongo-shell/6.txt new file mode 100644 index 0000000..3dbc65e --- /dev/null +++ b/transcripts/ch4-mongo-shell/6.txt @@ -0,0 +1,89 @@ +00:01 The shell is pretty nice and it's ubiquitous +00:03 and that you can run it anywhere, you ssh to and things like that, +00:07 so that really good, and this is more or less the tools +00:09 that MnongoDB ships, you could work on something else that's coming along +00:12 but there's a really great, better shell in my opinion +00:16 much, much better, I really love it, it's called Robomongo, +00:19 so we talked about Robomongo in the setup how we installed it and so on, +00:24 so let's see how it works and how it compares to the shell here. +00:28 So here it is, you can see it hanging out down there +00:31 and we click start, maybe it's empty let's go ahead and start from scratch, +00:35 so now if we open it up it's empty, let's create a connection, +00:38 I'll just call this local or whatever, +00:42 and it's going to default the local host 27017, +00:45 all this stuff turned off, things like that, and we'll just say save and connect +00:50 and now you can see, let's put these little more side by side, +00:53 you can see over here we have our bookstore or charge watcher and so on. +00:58 And now we have the benefit that we can open this up +01:02 we can look at the book, we could say explore the indexes +01:04 we could even go over and say edit this index and make changes, +01:08 make it unique, do some other things about sparseness and so on. +01:12 We'll talk more about that later. +01:15 Over here, we could say something like use bookstore and it switches there, +01:19 the equivalent over here would be something like right click and say open shell, +01:23 how interesting, so I know a lot of people prefer the command line interface +01:27 but what's really awesome about Robomongo is +01:31 you have the entire cli right here, so I could say something like +01:35 db.Book, notice the auto completion, book, publisher, user, auth, etc, +01:42 .what do you want to find, find and modify, find one, +01:44 let's find one where, what did we have before, we had something with the title +01:48 and let me go back and find the title we were using— +01:55 so here we can say title like this, and now if I run it, I get a result down here +02:03 and I can explore it, I can see the ratings and so on, +02:06 and this, you know if we run this over here, +02:08 I get I did the little projection, I could do that as well. +02:12 So I get this text version and I actually don't really love this too much, +02:15 so you can actually just switch it to the text version here as well, +02:18 and you get color coding, highlighting, all sorts of stuff. +02:22 You also get this version which is kind of a flat version, I never use this +02:27 but you can use it if you want. +02:29 What is really cool is I can come over here and say +02:31 I want to maybe edit this document, +02:36 if I come over and do a find, I think— +02:41 here I get three, now if I do a straight find, not a find one, +02:44 I can actually go and edit this, so if I wanted to change +02:46 the date that this was done on, so let's say 2011, +02:52 save, rerun this, this is one with so many ratings, +02:59 here, this is the one I changed, number 2, now it's 2011. +03:05 So of course I could run an update command, +03:08 but you can do all sorts of interesting sort of UI things +03:10 so I really really like using Robomongo, +03:12 because it's one hundred percent as capable as the shell +03:15 so for example, I could come over here, this is like just typing Mongo +03:19 you could create variables, I could say var page, +03:25 let's do something with a paging here, so I come and say this +03:27 now notice, this uses get collection and it doesn't use the .Book like this, +03:32 I think it does that because it gets better intellisense or auto completion, +03:36 not really sure, anyway, you can do it either way, they are equivalent. +03:41 Now, let's go over here and imagine we're going to do some paging, +03:45 so first of all, let's just select the titles +03:47 remember the thing I did with the projection, exactly the same thing here, +03:51 there we go, I forgot to rerun it, okay. +03:53 So rerun it, now we get just the titles, +03:56 there's "Classical Mythology", "Clara Callan" and "Decision in Normandy" and so on, +04:00 so suppose we want to do paging, +04:02 I'd basically want to show you that this is like a full Javascript shell +04:05 plus kind of an editor, so watch this, +04:07 so if I put some semi colons in here, I can type let's say var page size is three +04:12 var page num, like what number are we on, let's say were on page two, +04:17 than down here I could say ok, this is what I want to do, +04:20 and I could do skip and page to actually do the paging, +04:24 so I could say skip and we'll do what page num, minus one, times page size +04:30 that's how many we want to skip, +04:34 and then we want to limit it to page size like this +04:38 so now I should get, let's see, go back to the beginning, +04:41 three things per page, we're going to be on page two, +04:45 so it should be the Flu, the Mummies and the Kitchen God's Wife, and that's it. +04:52 Oh, by the way if you highlight something, it just runs that expression +04:56 which apparently evaluates the two, run the whole thing— +05:00 notice Flu, Mummies, Kitchen, so we can do this basically +05:03 as much as we want to type up here, +05:06 but it's also a little editor, I mean just in almost every way +05:08 this is better than the shell and I could even use this +05:13 to connect to my remote MongoDB server, using ssh tunneling, +05:16 again, we'll talk about those kinds of things +05:18 when we get to the deployment section +05:20 but for pretty much the rest of the course, we're going to be using Robomongo +05:24 because it's just better in every way in my opinion. +05:28 All right, and as you saw Robomongo installs +05:30 on Windows, Linux and MacOs, so it's all good. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/7.txt b/transcripts/ch4-mongo-shell/7.txt new file mode 100644 index 0000000..c5f769b --- /dev/null +++ b/transcripts/ch4-mongo-shell/7.txt @@ -0,0 +1,69 @@ +00:01 Now, let's use Robomongo, our shiny new shell +00:04 that I contend is better than just the cli one, +00:07 let's use it to explore some more advanced query filtering and sorting options. +00:12 So here's just a blank find showing me all the records, +00:16 how many are there in the database, there's 271 thousand books, +00:20 so this is the same database we've been playing with for a while now. +00:23 So let's ask some questions about the ratings. +00:27 +00:31 So we're going to go into the ratings array +00:32 which contains a bunch of objects, which have values, +00:35 so I want to say how many of them have the value nine, +00:40 so what's that actually answering— what question is that answering +00:44 that is answering how many books have been rated +00:47 at some point by somebody with a nine, +00:50 how about with ten— a little bit more, +00:54 so there are some books that were really, really popular +00:56 people loved them, this is a 1 to 10 type of scale, +00:59 I think it might also include zero. +01:01 So that's great, this is our prototypical json object here. +01:04 However, what if I want to say show me all the books +01:08 that have a moderately high rating, what does that mean, +01:12 let's say it has an eight, a nine or a ten as a rating, +01:15 how do I express that as a prototype? You can't do it, +01:19 and so that's why MongoDB has something slightly more complex +01:22 and nuanced than just straight comparison, right, +01:27 so this is like an equality query, so instead of putting a value here +01:29 we can put a little sub search document here +01:33 and into this, we can say I'd like to apply an operator instead of an exact match, +01:39 so the operator might be greater than operator >, +01:42 so the way you know it's an operator is the dollar +01:46 and gte greater than or equal to is going to be the thing +01:50 and then we're going to put the value of eight, +01:52 so show me the books that have a rating of eight or above, +01:55 tell me how many there are because we're doing a count, +01:57 so let's run that, look at that 98 thousand books +02:00 have a rating of eight, a nine or a ten. +02:03 Does it mean their average rating as eight, nine or ten, +02:06 that means somebody somewhere has rated it eight, nine or ten. +02:09 So we also have things like greater than, +02:12 without the equal, just flat up greater than so that's nine or ten right there, +02:16 so we have a number of these operators, +02:19 greater than, greater than or equal to, and so on. +02:22 Another one that's really interesting is in, +02:24 this is super important for really powerful queries, +02:27 so when we have documents that contain sub arrays of other documents +02:33 you can think of those as basically being pre joined +02:36 but when you normalize those, that are not contained within each other, +02:39 then you need a way to still go back and say +02:42 basically do the join, and this in operator is the key to making that happen, +02:47 this is not really what's happening here, because this is a sub document, +02:50 but it's the operator that's involved, so what we can do is +02:53 say I would like to find me the ratings +02:55 that have let's say prime numbers as ratings, +02:58 it's kind of silly, but whatever, +03:02 here we go, so those are the prime numbers between one and ten, +03:05 and we could say I would like to find all the ratings where the value, +03:08 one of the values right, remember they have multiple ratings +03:11 but one of the values is actually in this set, +03:14 so the way this usually manifests is like go to the database +03:17 and maybe I pull back some items, and it's got like a sub array of let's say ids +03:23 and then I can go back to the database +03:26 and say give me all the items in this other collection +03:28 where the idea is in one of this like sub ids, +03:31 so an example might be in the Talk Python Training stuff +03:34 that remember the course contains all the chapter ids +03:36 and I can go back into one single query +03:39 that will give me all the chapters for a course it's this in operator, so let's try that. +03:43 So there we go, apparently 69 thousand have a prime rating at some point +03:49 not that that means anything, but it shows you how these operators work. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/8.txt b/transcripts/ch4-mongo-shell/8.txt new file mode 100644 index 0000000..1793d45 --- /dev/null +++ b/transcripts/ch4-mongo-shell/8.txt @@ -0,0 +1,38 @@ +00:01 So here's the list of the quering operators, all the complex ones. +00:04 So we saw that normally we pass these prototypical json objects, +00:07 ratings.values is five, and that just doesn't exactly match, +00:11 but we saw that that doesn't really solve all our problems, +00:14 often we want ranges or properties like +00:17 I want all of the ratings that are greater than eight, things like that; +00:20 so instead of putting a number into that prototypical json element, +00:23 we're going to put an operator, so we might say $eq for equality +00:27 that's kind of the same thing, but the others are not, +00:30 so $gt for greater than, greater than or equal to, +00:32 lt for less than, less than or equal to, +00:34 not equal to, so you could say I want to see all the ones where +00:37 there's no vote, or no rating of value ten, right, +00:42 there's no rating that has a value of ten. +00:44 And we talked about the in operator, this is kind of your two step join process +00:48 that we'll talk much more about when we get to the Python side of things, +00:51 there's also the inverse of that negation, not in the set. +00:54 So here's an example how we might use the greater than or equal to operator +00:58 to find all the books that have a rating of nine or ten +01:01 that are super highly rated by at least one person, +01:04 remember this is not like every single one in there has to be this, +01:07 but there exists of rating which is a nine or a ten. +01:10 We also have some joining operators or some combining operators joining, +01:14 so we can go in and say and pretty easily by just having +01:18 all the properties we're looking for in a single document, +01:21 but if for some reason these are coming from multiple places +01:23 you can actually combine them with the and operator +01:26 so that's nice, but what you really sometimes need is the or clause, +01:30 I want this or that, and there's no way to do that those prototypical json objects +01:35 but the or operator will let you do this. +01:38 You also have not and nor, so neither of these in an or, +01:41 sort of the negation of an or; +01:43 now I recommend you check out this link at the bottom for each one of them, +01:46 like so where the operator appears, does it appear to the right hand side +01:50 of the property or field name or the left hand side, +01:53 it kind of depends on the type of operator you're using, +01:56 so you can just click on this or the and and so on +01:58 and in the docks and it'll give you a little example. \ No newline at end of file diff --git a/transcripts/ch4-mongo-shell/9.txt b/transcripts/ch4-mongo-shell/9.txt new file mode 100644 index 0000000..d55c964 --- /dev/null +++ b/transcripts/ch4-mongo-shell/9.txt @@ -0,0 +1,25 @@ +00:01 Now sometimes you don't want all the data back, +00:03 usually it doesn't really matter to you if it comes back or it doesn't come back, +00:05 in the shell you're printing it out, it probably matters, +00:08 but in practice, in your app, you rarely care +00:10 from a display perspective or an interaction perspective, +00:14 whether some field or list that you are not using has data or not +00:19 but from a performance perspective, you very much may care. +00:22 Suppose that you have a document that's 50k in size +00:25 and all you want back is the isbn and the title and those are 1k, +00:30 and you're getting a bunch of them back, +00:32 it turns out that that can make a really big difference in terms of performance. +00:35 So whether it's for display purposes or it's for performance network purposes +00:39 using this second argument here we can say +00:44 only return the isbn and the title, and don't give me all of the ratings, +00:48 don't give me the images, everything else that might be in this book. +00:51 So we run this, and we get back these objects here, these documents, +00:56 and notice, we have the isbn and the title, like we asked for +00:59 but we also have the _id, +01:01 so unless you explicitly forbid the id from coming back +01:04 the id always comes, and everything else defaults to not appearing, +01:07 unless you indicate it if you pass some document here +01:10 for the projection or the restriction of things that come back. +01:14 If for some reason you don't want the id to come back, +01:16 just say_id:0 or false or something like this, +01:19 and then it will just have isbn and title exactly. \ No newline at end of file diff --git a/transcripts/ch5-connecting-with-python/1.txt b/transcripts/ch5-connecting-with-python/1.txt new file mode 100644 index 0000000..5f50ccc --- /dev/null +++ b/transcripts/ch5-connecting-with-python/1.txt @@ -0,0 +1,57 @@ +00:01 All right, the moment you've probably been waiting for is finally here, +00:04 we're going to start moving away from Javascript +00:06 and doing Python for the rest of this course to talk to MongoDB. +00:09 That doesn't mean we might not use the Javascript API in the shell, +00:12 just a little bit more, but for the most part +00:14 we're going to focus now on writing applications +00:17 that talk to and work with MongoDB. +00:19 So we're going to look at in MongoDB's nomenclature +00:23 something called a driver, so a driver is the underlying library or framework +00:27 that you used to talk between your application and MongoDB. +00:30 So here we've got our web app +00:33 and it's going to be using the database MongoDB here. +00:35 A request is going to come in, into our web app +00:38 and it's going to use a particular package, right, +00:40 this is not built into Python, this is something we have to go out and get. +00:43 So the package that we're going to work with +00:45 is built and maintained by MongoDB themselves, and is called PyMongo. +00:50 So this is the core, lowest level access to the database server +00:54 and it does the tone of things for us, +00:56 in fact if you look at many of the odms the object document mappers +01:00 the equivalent of the NoSql orm, they build upon PyMongo, right +01:04 so PyMongo is almost always involved +01:06 when you're talking to MongoDB from Python. +01:09 And it does many things for us, it connects to the database +01:12 whether it's local, remote, over ssl, with authentication, with certificates, +01:16 all that kind of stuff, it actually manages replica sets +01:20 so it knows how to find all the different servers participating in a replica +01:24 and do the fail over if one fails, +01:27 it knows how to go over to the other one, things like that; +01:30 it also knows how to deal with sharding, +01:32 so maybe you have a cluster of ten MongoDB servers +01:36 that are all managing part of the data +01:38 and then participate as a group in the queries, PyMongo does that for us, +01:41 this is generally where you do the crud operations, +01:44 the find, insert, update, delete, and those kinds of things; +01:47 you do the other admin stuff as well, +01:49 like drop tables or create indexes and so on, +01:53 and it even does connection pulling, +01:55 so really this does all the stuff that you need to talk to MongoDB +01:58 and the api is very, very, very similar to what we saw with the Javascript API +02:03 which is why I didn't skim over it, I wanted to say, okay, +02:06 you really learned the Javascript api, +02:08 now you basically also know the PyMongo api, +02:11 findOne with a capital O, no spaces, +02:13 is now find_one, with a lower case o, for example, +02:17 there's a few variations for like say pythonic naming +02:20 but other than that, PyMongo is going to sound +02:22 and feel very, very familiar to you at this point. +02:26 Like many things from MongoDB, PyMongo is open source +02:29 so you can come over here to github.com/mongodb/mongo-python-driver, +02:35 and that is PyMongo. +02:37 So you'll see that you can go look around, +02:40 you can see it's under active development and things like that, +02:43 a lot of stars, so this is like I said, the official driver +02:45 but you also have access to the source, right here. +02:48 So now that we know about PyMongo, +02:50 I hope you're ready to go write some code. \ No newline at end of file diff --git a/transcripts/ch5-connecting-with-python/2.txt b/transcripts/ch5-connecting-with-python/2.txt new file mode 100644 index 0000000..f024665 --- /dev/null +++ b/transcripts/ch5-connecting-with-python/2.txt @@ -0,0 +1,128 @@ +00:01 So finally we're here in our github repository for our demos, +00:04 we have something to share, so I have the source folder here +00:07 and let's start with this play around PyMongo. +00:09 Now, throughout this course, we are going to build what I think +00:12 the pretty comprehensive demo that we're going to work on it for a few hours, +00:15 it's going to have tons of data, and we're going to consider +00:18 both the design and the performance of the database. +00:20 But for PyMongo, let's just sort of fool around a little bit here +00:23 and then when we get to MongoEngine, we will take on our proper demo there. +00:27 So we'll begin by opening this in PyCharm, +00:30 do that little drag and drop trick in MacOS, +00:34 but on Windows and Linux you've got to say open folder. +00:39 All right, everything is loaded up, +00:42 and I have created a virtual environment in here +00:45 a Python 3.6 virtual environment, you can run wherever, +00:48 but that's the one I'm using; +00:50 now, let's start by adding a file here, so we'll just call this program, +00:53 we won't do too much structuring and refactoring +00:56 and organizing for this particular demo, we will of course for our proper demo. +01:02 So, before we can do anything, we just want to type import PyMongo, +01:06 this is not going to turn out well for us, we'll go over here and try to run this, +01:10 nope, there's no module named PyMongo, so let's go fix that. +01:14 If we all open up the terminal in PyCharm, +01:17 it's going to automatically find that virtual environment and activate it for us, +01:20 okay, you can see the prompt says .env, +01:23 that means that we have our virtual environment active, +01:27 so let's see what is here— not so much, just to be safe +01:32 let's go ahead and upgrade setuptools +01:39 why are we doing that— because PyMongo actually use a C extensions +01:43 and depending on your system, sometimes setuptools +01:46 has a little better chance of compiling those, if you have the latest version. +01:49 It doesn't always work that way, and it has a way to fall back to just pure Python +01:54 but the C extensions do make it faster, so that's worth checking out. +01:58 Alright, so we can pip install PyMongo, now things are looking good, +02:05 let's try a program again, code zero, that means happy, zero is happy. +02:10 Alright, so we are able to create, or basically import the library, +02:14 now the thing we've got to do is we could just go and create what's called a client +02:17 and use all the default settings, but in a real app +02:20 you're probably not going to talk to an unauthenticated local database server, +02:25 you're probably talking to one on another machine, +02:27 maybe there's security, maybe there's ssl, whatever. +02:30 So let's go ahead and set up the connection string +02:32 even if you have like sharting, a replication, +02:34 all these things require a connection string. +02:35 So let's go over here and create a connection string +02:37 and we'll just put the default values, +02:39 so they always start with the scheme mongodb:// like so, +02:43 and then local host, and then 270017, +02:47 so this is sort of the default local host sets the default port, +02:52 it's running locally and the scheme is always here. +02:55 We'll talk about how you can add things like authentication and ssl and what not there. +03:00 So the next thing we need to do is create what's called a mongo client. +03:03 You can work with connections directly from PyMongo, but you shouldn't— +03:08 why, because PyMongo manages connection pulling for you and reconnect +03:13 and all these different things, so if you work with a client +03:16 it goes through the connection pulling and that kind of stuff, +03:19 if you work with the connection directly, you're kind of locking yourself +03:21 into that single connection which is not the best. +03:24 So we're going to create a pymongo.MongoClient, like this +03:28 I want to give it the connection string like so; +03:32 now, the way this works, this is basically the equivalent of opening up the shell +03:36 the way it worked in Javascript was, we said use a database, +03:40 in Python it's a little bit different, in Python we say +03:44 the database is client. make up a database name, +03:49 literally I could put TheFunBookStore here +03:53 and now this would actually start working with the database called exactly that, +03:57 we do case sensitivity in MongoDB. +04:00 so let's just call this the_small_bookstore, +04:04 okay because we're just going to poke around at it +04:06 we're not going to work with that big set of data that we had before yet +04:08 and we're also not going to work with our main demo. +04:10 So let's call it the_small_bookstore. +04:13 Now let's go over here and say insert some data +04:17 it's not fun to have a database with no data, right, +04:22 in fact, let's just really quickly have a glance over here +04:27 if I connect, notice there is no the_small_bookstore, +04:30 refresh, no, no small bookstore, okay, so this act here almost creates it, +04:36 when you do a modifying statement against this thing you'll see that it does. +04:40 So let's go over here to books, let's make it a little more explicit, +04:43 I'll say db. so it looks like the Javascript api. +04:46 So db.books is what we are going to call it, +04:50 we'll say insert and what you want to insert, let's say title, +04:54 now this is not Javascript, this is not json, +04:56 this is Python dictionaries so you've got to make sure you have the quotes +04:59 but otherwise it's really really simple. +05:02 The first book, and let's say it has a isbn, +05:06 let's just put some numbers in there like that +05:10 and let's do another one, we'll say the second book +05:14 it's going to have an entirely different isbn +05:18 and while we're at it, let's say go over here and print out the results +05:22 and let's do it again, we'll grab the value and let's print out +05:32 r.inserted_id, so here let's take a look at the whole thing +05:36 and we'll even print out the type of r, +05:38 and then the thing that we are usually interested with here is +05:42 when you're doing an insert, remember the _id thing was generated +05:47 well what was it, what if you want to actually say I inserted it +05:50 and here's the idea of the thing I created for you, somewhere in your app +05:54 alright, so if we capture the response we can check out the inserted_id +05:59 ok so let's go and run this real quick. +06:02 Oh whoops, no this is actually just the id, sorry, +06:06 if you do a bulk answer, I believe you get this +06:09 or you could do, we can come over here and say insert one +06:14 be a little more focused, now if we insert one we'll have our inserted id, +06:19 let's make this third and the fourth book and make a little change here, +06:25 there we go, one more time, perfect okay, +06:29 so if you do an insert one we get an inserted one result +06:32 which is in results insert one result, and here you can see the inserted id +06:37 so we've inserted some stuff, let's go look back at our data base here +06:40 we should have now, if we refresh it we now have the_small_bookstore, +06:45 if we go to the collections we have our books +06:47 and we look in the books, that should not be super surprising right, +06:50 those are the things we just inserted, +06:53 okay so now, let's go over here and do a little test +06:57 we'll say if db.books.count is zero, we'll print inserting data +07:06 and like this, we'll say else print books already inserted skipping +07:15 and maybe even spell that right huh? +07:19 Now we run it, nope, there's already books in here +07:23 we're not going to insert duplicate books, so that's all well and good, +07:27 so we've gone over here and we've connected to the database, +07:31 we've created a client using the connection string +07:34 and trust me this can get way more complicated +07:37 to handle all the various complications and features of MongoDB, +07:42 and once we have a client we say the database name +07:43 here I've aliased it to db so it looks like the Javascript api +07:47 or the shell api you're used to working with, and then we work with the collection +07:51 and we issue commands like find and count and insert, insert one and so on. +07:56 So now we have some data, let's go maybe do a query against it, +07:59 maybe make some in place updates things like that. \ No newline at end of file diff --git a/transcripts/ch5-connecting-with-python/3.txt b/transcripts/ch5-connecting-with-python/3.txt new file mode 100644 index 0000000..acd9bd2 --- /dev/null +++ b/transcripts/ch5-connecting-with-python/3.txt @@ -0,0 +1,66 @@ +00:01 Let's look at how we can do some basic crud operations +00:03 and connect to MongoDb with Python via PyMongo. +00:06 So if we're going to use PyMongo, let's start by importing PyMongo, +00:10 and I'm going to not import the items or the classes out of this +00:14 but actually just the module and use the name space style +00:17 to make it really clear where this stuff comes from. +00:19 Actually I like to do this in a lot of my programs, even in production. +00:23 So we import PyMongo, and then we have to create a connection string +00:26 and feed it off to the pymongo.MongoClient, right +00:30 so this is a concrete class in PyMongo, +00:33 and we can give it any sort of connection string, +00:36 in fact if you give it no connection string, I think it'll use +00:38 what I have written here basically, no auth, no ssl, +00:41 local host 27017 which is the default standalone MongoDB port. +00:45 Alright, so this is cool, we've got our client here, +00:47 and now then it gets a little bit trippy, +00:49 a little bit dynamic here, which is kind of fun. +00:51 So the next thing we're going to do, +00:53 is we are going to go to the client, we're going to say . some database name, +00:56 not table name, database name. +00:58 Now, this thing doesn't even have to exist at this point +01:01 this, as you saw on the demo, is actually how we created +01:04 this database called the_small_bookstore, +01:06 we just said db = client.the_small_bookstore +01:09 and by basically saying that it exists, or implying that it exists +01:12 it's going to since we do some kind of write, or modifying operation to it. +01:16 Ok, so just be aware that this is case sensitive, right, +01:19 so capital T capital S capital B, would not be +01:22 the same database as lower case t s b. +01:25 Right, so let's go, and now we're going to actually do +01:27 a lot of things that look extremely similar to what we saw in the Javascript shell, +01:31 that's why I spent so much time in that section +01:33 it's because the apis are so, so similar at this level. +01:36 So now we can just operate on the database via collection +01:40 so just like we said client.database name, +01:42 we're going to say db . collection name +01:44 and those collections also don't necessarily have to exist, +01:47 even for queries, if they don't exist, you just get nothing back that's not an error. +01:51 So for example, we can do a query against the books collection +01:55 and ask how many there are, so db.books.count +01:58 and that'll tell us how many books there are +02:00 and like I said, even if the database doesn't exist, +02:03 if the collection doesn't exist or both, it's still going to work, +02:05 it will just return zero, because guess what, +02:07 there are no books in the nonexistent database. +02:10 We could do a find_one and this will pull back just one item +02:14 by whatever the default sort the MongoDB happens to be using +02:17 and we can say find_one and give it +02:21 one of these prototypical not json but Python dictionary type of objects. +02:25 Now this find one is the first place where we're seeing the Python api +02:28 ever so slightly vary from the Javascript api; +02:31 in Javascript it's findOne, and in Python it's find_one +02:38 and they've adapted the api to be pythonic, right, +02:41 it would look weird to say findOne, +02:44 but just be aware that they're not identical, you kind of have to keep in mind +02:47 which language you're working in, but other than that, +02:49 what you feed to it and how they work it's more or less the same. +02:52 If we want to insert something we say db.books.insert_one +02:57 and then we give it the document to insert +02:59 and we get a result and we saw that the result actually comes back +03:03 and has an inserted _id and the inserted _id is the generated id of the thing +03:09 that was autogenerated in the database, notice we didn't pass _id, +03:14 but if we care we can get it back for whatever purpose. +03:17 When working at higher levels with like MongoEngine, +03:19 this will automatically just happen on the class +03:21 and get set we won't have to worry about it. \ No newline at end of file diff --git a/transcripts/ch5-connecting-with-python/4.txt b/transcripts/ch5-connecting-with-python/4.txt new file mode 100644 index 0000000..66bdbca --- /dev/null +++ b/transcripts/ch5-connecting-with-python/4.txt @@ -0,0 +1,60 @@ +00:01 So in our example we saw that we pass a connection string +00:03 to the Mongo client and it was super simple, +00:05 it was just the MongoDB scheme and local host and the default port, +00:09 like I said, we could even omit the the connection string, +00:12 I believe it would still be totally picking all the defaults. +00:15 So let's look at some non default options. +00:18 So here, if I want to connect to a remote server +00:21 and I've either put some kind of dns records somewhere +00:25 or I've just hacked my local hosts file to say +00:27 there's a thing called mongo_server which is maybe within +00:31 a virtual private network or at least in the same data center zone, +00:35 if I'm doing cloud hosting like a Digital Ocean or something like this, +00:39 and if I want to connect it on the default port, which is still 27017, +00:44 I could just say mongodb://mongo_server, and then we could connect that way. +00:48 Well, maybe you want to connect on an alternate port, +00:52 so port 2000, instead of 27017, this is probably a good idea, +00:56 there's a lot of people scanning the internet for open MongoDB ports, +01:01 27017, 27018, up to 20020 I believe, +01:06 it's probably the range that they're looking at, +01:08 because different services run on different ports, +01:11 like replication versus sharding versus whatever. +01:13 So you probably don't want to run on that port, +01:16 and when we get the deployment section, +01:18 we'll look at all the steps we need to take in order to make our server safe, +01:21 so be sure you do not put MongoDB in production +01:25 until you watch that chapter at the end of the course, +01:27 but let's just assume that one of the things we might want to do is +01:30 run on a non default port, we just obviously like any web address type thing, +01:34 we just say mongodb://mongo_server:2000 +01:38 okay great, so now we have a separate server on a non default port +01:42 we probably want to have authentication +01:44 so if we had a user name and password +01:47 again we'll talk about this in the deployment section at the end +01:49 we would have jeff:supersecure, so user name jeff +01:53 ultra secure password is supersecure, and then we can have everything else. +01:57 And if we wanted to talk to a replica set, so this is a set of cooperating +02:02 duplicated fail over MongoDB servers that can be working together +02:07 so in case one of them goes down, +02:10 or you have to take one offline for some reason, +02:12 it will just switch over and a different server will become the primary +02:16 and start to store the data. +02:18 This doesn't lead to eventual consistency and things like that, +02:20 there still is one primary place things go to, +02:22 but depending on how the state of the cluster is, +02:25 it could be any one of these replicas, and the replica sets. +02:28 So here we would say server one port one, server two port two, +02:31 server three port three— well, the first two are actually +02:34 both running on the same machine, so in case the process dies +02:37 but we also have a separate server, Mongo server two +02:39 that is running on a different port as well, +02:41 in fact, this might not be all of the replica sets, +02:44 all the servers in the replica set, this might just be sufficiently many, +02:48 so that once it connects it finds all the others, +02:50 and then it will start participating in all of them. +02:52 And we also need to say replicaSet=prod +02:55 or whatever we're calling a replica set. +02:57 So we have all these options in terms of connection strings +02:59 and then once you have this, well you pretty much use it the same way, +03:02 you create a client by passing the connection string off to it +03:05 and it figures out all the details for you. \ No newline at end of file diff --git a/transcripts/ch5-connecting-with-python/5.txt b/transcripts/ch5-connecting-with-python/5.txt new file mode 100644 index 0000000..62945d7 --- /dev/null +++ b/transcripts/ch5-connecting-with-python/5.txt @@ -0,0 +1,88 @@ +00:00 So back to our example, we've inserted some data +00:03 and we have this little guard here to say +00:05 don't insert duplicate data, things like that, +00:07 so let's make some changes to our book here. +00:10 Let's first of all change the title of the third and fourth book, +00:14 let's just change this mess with this book for example, +00:16 let's change this to like this, third book like so, all right; +00:20 so we have two ways to do this, one way would be to +00:23 pull back the entire document, work on it and push it back, +00:27 and this is what I think of as the orm style of working. +00:30 So we'll say book = db.books.find_one, let's do find_one here +00:36 and we're just going to give it the isbn that we have there. +00:39 +00:44 Let's just do a quick little print out of the book +00:46 and just so you understand what we're working with +00:48 we'll also print out the type, so if we run this, +00:50 we obviously get the book back, super, and you can see it is a dictionary, cool; +00:55 so, I said I want to change the name here, +00:57 let's actually change something slightly different, +01:00 so we can work with some more advanced features. +01:02 What I want to do is I want to add the ability to have a user like favorite this book +01:06 and this might not be a good way to do it, +01:08 I haven't really thought it through because it's just a toy example, +01:11 but let's suppose we want to have the book store the ids +01:15 of the people who have favorited it, in practice maybe it's better +01:17 to have the user accounts store the ids of the books +01:20 that they individually favorited, but the mechanics would be identical. +01:23 So how we're going to do that? Well, to this book, I'm going to add +01:27 something called favorited_by, and this is just going to be an empty list here. +01:36 Then any time we want to work with it, we can come over here and say +01:41 .append the user 42 did this, and then we can say +01:47 db.books.update and give it a little query here so we would say the id +01:52 and that's got to be, once we're in Python that's got to be in quotes, +01:56 say book.get_id,  it's going to be the value there +02:02 and then what we're going to put back is just this book, +02:05 and let's just one more time after this get it back and print out book, +02:08 this should make sure that everything went sort of round trip just fine. +02:12 Ready? All right, look, oh yeah look at that, we got a favorited_by right there, 42. +02:17 If we run it again, now we won't need to do this, +02:23 we can run it again with 100, now we have two people, +02:28 two user ids who have favorited this and so on. +02:31 Okay, so this is all pretty well and good, but let's do something better, +02:35 sometimes it makes sense to go and pull a whole document back, +02:38 look at it, make changes to it and save it. +02:40 In fact, that's something you'll do quite often, +02:42 but in this case, we just kind of want to say add this little id here +02:47 to this list called favorited_by and maybe it doesn't even exist. +02:53 So let's do this, let's a copy this again and change this, +02:57 so now we're not going to use that, we'll use our isbn +03:00 and let's modify book four here, so this does not even have a favorited_by yet. +03:06 Let's put this in here, so we're going to modify that +03:09 and then let's actually also get it back and print it out at the end; +03:13 +03:25 there we go, so we're going to get the book back +03:27 but we're not going to pass the whole book +03:30 we're going to use one of these in place operators; +03:32 remember add to set, so what we're going to do is we're going to use add to set. +03:36 So in Javascript we could type this really in the shell we can type this $addToSet +03:41 but obviously, PyCharm is telling us not super good Python +03:44 so what we got to do is put that in quotes, +03:48 and then the value, we can have actually multiple stuff here, +03:51 so we're going to say favorited_by, and then the thing let's add user id 101, +03:58 now, this seems to be telling me I've got something a little bit off here, +04:02 yes, so we need that to be the entire update document; +04:06 ok, what we're going to do is we're going to say go find this document, +04:10 this book with this id, which is notice, it ends in 73, +04:14 this is going to be book four, actually let me comment this out really quick +04:18 and we'll just print out, 73 rather, print out notice there's not even a favorited by yet. +04:23 So what we're going to do is we want to go add this id here +04:28 so it should actually create this list +04:31 and then put 101 in it let's see if that's going to work. +04:34 Boom, favorited_by 101, and this time we did not pull it back +04:38 we used one of our cool operators. +04:41 Now, if this was just push, dollar push is another sort of equivalent, +04:45 this would have more and more and more 101s, +04:49 but add to set, I should be able to run this code over and over and over +04:55 and 101 is already there so it's not going in, +04:57 it's better if I say 120, now I run it, now we have those two right, +05:02 so this add to set is super nice, I don't even need to go to the database and go +05:05 well are they there, no they're not there, ok then I'm going to add them. +05:08 All right, so I don't even need to do that check, +05:10 I can just use this cool little add to set operator, very very nice. +05:14 So here's how we use the in place operators, +05:16 there's really not much difference other than we have to put more stuff in strings +05:21 because it's not the shell, it doesn't have like the special understanding +05:26 of what those mean and even over here, +05:28 it's not Javascript, it's Python dictionaries, +05:30 which those keys there need to be strings in this case. \ No newline at end of file diff --git a/transcripts/ch5-connecting-with-python/6.txt b/transcripts/ch5-connecting-with-python/6.txt new file mode 100644 index 0000000..fd08bd6 --- /dev/null +++ b/transcripts/ch5-connecting-with-python/6.txt @@ -0,0 +1,44 @@ +00:01 Let's review the ideas behind these in place updates. +00:03 So here we have more or less a complete MongoDB Python program +00:06 using PyMongo here, so we're going to import PyMongo, +00:09 connect the local database, all the default options, +00:11 and we're going to either create or get access to the bookstore +00:16 by saying client.bookstore, now we're going to insert an object +00:19 that has no favorited by element, right no list, it just has a title and isbn, +00:24 so after the insert, we're going to end up with an _id and a title and an isbn. +00:28 And then maybe we want to add this idea of favorited by, +00:34 maybe you want to design this already that way +00:36 and have an empty list there, but whichever more or less would work the same, +00:41 so we can say I would like to go find the book, +00:43 the first part of our update statement is the where clause, +00:46 so find by primary key and remember, that's when we call insert_one +00:50 that's results.inserted_id, so that's going to find the one and only the item +00:55 and then we're going to use the add to set operator +00:58 and we just pass that as a string in PyMongo, +01:00 and then we'll push on favorited by such and such. +01:03 We could also use $set to set, say $setTitle: the new book with updated title, +01:10 or something like this right, so you can use this all over the place +01:14 and what's really cool, now you may be thinking oh this api is kind of crazy, +01:18 we've got these these dollar operators +01:21 and it's a lot to learn if you're totally new to it, I realize +01:24 but when we get to MongoEngine, you'll see that +01:26 MongoEngine does this transparently under the cover for us, +01:29 so you can actually not have to do this, +01:31 you won't have to necessarily remember all of these +01:34 but you'll get all the benefits that we're describing here. +01:36 If you're using PyMongo, you have to know the api really intimately +01:38 so we're going to push this 1001 user id on to favorited by +01:42 and maybe we'll push 1002 as well +01:46 if people signed up at the same time, they saw the same book, they loved it +01:50 and let's go head and push this 1002 again, +01:52 well not the push operator, but the add to set operator, +01:54 do this again, because it's add to set +01:56 we're going to get a new document that has new book title, +02:00 the same isbn and two items and it's favorited by +02:03 and it's going to be 1001 and 1002, because add to set is item potent +02:07 calling it once or calling it a hundred thousand times, +02:10 it has the same result, other than it might take longer +02:13 to call it a hundred thousand times, right. +02:15 So if it's already there it makes no difference +02:16 but if it's not there to push it in super cool operator, +02:19 really taking advantage of the hierarchical nature of these documents. \ No newline at end of file diff --git a/transcripts/ch5-connecting-with-python/7.txt b/transcripts/ch5-connecting-with-python/7.txt new file mode 100644 index 0000000..aa52d89 --- /dev/null +++ b/transcripts/ch5-connecting-with-python/7.txt @@ -0,0 +1,40 @@ +00:01 Now, when you go to mongodb.com and you look through the documentation +00:04 so docs.mongodb.com, you will find stuff about updates and inserts, +00:08 and queries and aggregation and so on, and so on; +00:11 all of these are going to be in the Javascript api, +00:14 notice at the bottom of this web page here, db.collection.insertOne +00:18 is new in version 3.2, so if you're trying to look up these operations +00:22 you will most likely find them in the Javascript style, and the Javascript api, +00:28 that's how MongoDB talks about it, you'll probably find them on Stack Overflow. +00:31 So, because that's the way the shell works, MongoDB is kind of standardized +00:36 on here is how we're going to do our documentation in Javscript, +00:39 once again, yet another reason we spent so much time on the Javascript api, +00:42 even though none of us are necessarily Javascript developers. +00:46 So, here we have the crud operations, now we have the query +00:48 and projection operators and things like that, +00:52 so if you want to know how to map these over to PyMongo, +00:56 then there's one page really that you need for most things, +01:00 and that's the collection documentation. +01:03 So over here at api.mongodb.com/python/current/api/pymongo/collection.html +01:10 you can see right at the top, we've got all of the stuff you can do +01:13 on the collection itself, so for example, we were passing one and minus one +01:18 as the sorting operators in the shell, +01:20 here you could say pymongo.ascending, pymongo.descending, +01:22 a little bit more explicit, but this is a really good place to go +01:25 because you'll find like the insert_one and the find_one and all the various ways +01:30 in which you need to adapt the documentation you find in Javascript +01:34 over to the PyMongo api, this is probably the biggest bang for the buck right here. +01:38 Okay, so if you want to write an app, +01:41 PyMongo could totally be your data access layer, +01:44 it would completely solve the problem, it's really great, +01:47 it's what a lot of applications use to talk to MongoDB from Python. +01:50 We're going to talk about some additional things going forward +01:53 but one of the bigger decisions you need to make is +01:56 are you going to use an odm that maps classes to MongoDB, +02:00 with additional features as we'll see in a lot of interesting ways, +02:03 or are you going to work down at the dictionary level, +02:06 it's very similar to say I'm going to work with say the DB api and sql strings, +02:10 versus SQLAlchemy or Django orm or something like that, right. +02:15 So, you kind of got the low level way to talk to MongoDB, +02:18 now, we're going to move on to talk about document design +02:21 and mapping higher level objects like classes with MongoEngine later In the course. \ No newline at end of file diff --git a/transcripts/ch6-modeling-data/1.txt b/transcripts/ch6-modeling-data/1.txt new file mode 100644 index 0000000..5f58e10 --- /dev/null +++ b/transcripts/ch6-modeling-data/1.txt @@ -0,0 +1,72 @@ +00:01 We've come to a pretty exciting part in the course, +00:03 we're going to talk about document design +00:05 and modeling with document databases. +00:08 So let's take a step back and think about relational databases. +00:11 There is in fact a couple of really systematic, well known, +00:15 widely taught ways of modeling with relational databases; +00:20 there's still a bit of an art to it, but basically it comes down to +00:24 third normal form, first normal form, some of these well known ways +00:29 to take your data, break them apart, generate the relationships between them, +00:33 so if we're going to model like a bookstore with publishers +00:36 and users who buy books at the bookstore, +00:38 and they rate books at the bookstore, it might look like this— +00:41 we have a book, the book would have a publisher, +00:44 so there is a one to many relationship from publisher to books, +00:47 you can see the one on the star and the little relationship there, +00:50 and we have some flat properties like title and published +00:52 and publisher id for that relationship, and similarly, +00:55 we have a navigational relationship over to the ratings, +00:58 so a book is rated, so the ratings would have almost normalization table +01:03 or many to many table there has the book id and the user id +01:06 and then the value and we just happen to have a auto increment id there, +01:10 it's not necessarily the way we have to do it, +01:13 we could have a composite key, we've got our user +01:15 and the user can go navigate to the ratings, and things like that. +01:17 Now, of course, this is a very simplified model +01:20 in a real bookstore with real ecommerce happening and all that +01:23 and categories and pictures and all those things, +01:26 this would be way more complicated, +01:28 but the whole idea going forward is going to be pretty similar +01:30 and I think keeping it simple enough that you quickly understand the model +01:34 and don't get lost in the details, is the most important thing here. +01:37 So this more or less follows third normal form here. +01:40 in terms of how we're modeling this in the relational database. +01:44 Could we move this to MongoDB, could we move this to a document database— +01:47 sure, we could have exactly the structure. +01:50 Now those relationships, those are not full on foreign key constraints, +01:52 those would be loosely enforced, not enforced in the database +01:56 but enforced in the app relationships between the what would be collections; +02:00 but certainly, we could do this, is it the best way though? +02:03 The answer is usually not, maybe, but probably not. +02:07 So what we're going to focus on now is how do we take our traditional knowledge +02:12 of modeling databases and relational databases +02:14 and how does that change, what are the trade-offs we have to deal with +02:18 when we get to a document database. +02:20 So the good news is, usually things get simpler in document databases +02:24 in terms of the relationships, you might have +02:27 what would have been four or five separate tables with relationships, +02:31 it might get consumed into a single item, +02:34 a single collection or single document really, +02:36 so here this is how we're going to model our bookstore +02:40 that we just looked at in third normal form, but now in a document database. +02:43 And really, the right choice here comes down to +02:46 how is your app using this data, what type of questions do you usually ask, +02:50 what's the performance implications, things like this. +02:53 So now we have a books, we have a publisher and a user +02:56 and these have similar top level items, +02:58 and we do have some traditional relationships. +03:01 So there's a one to many relationship between publisher and books +03:05 theoretically we can embed the book into the publisher +03:08 but there's many, many books for some publishers +03:10 and that would be really a bad idea; +03:12 so we have this traditional relationship, like you might have in a relational database. +03:15 Now again, not enforced by Mongo, but enforced by your app, so same basic idea. +03:19 Next up, we have the ratings, remember we have that +03:22 like many to many table from users to book ratings, +03:26 now that has actually moved and now we're storing these items +03:30 in an embedded array of objects inside the book table, or the book collection. +03:35 So now each book has a ratings array, it has the number of ratings, +03:39 those are just put right in there, so is this the right design— maybe, +03:43 it's certainly a possible design, and it's the design that we're going to go with +03:47 for our examples, but we'll talk about when it's actually the right design. +03:51 And I'll help you make those trade-offs next. \ No newline at end of file diff --git a/transcripts/ch6-modeling-data/2.txt b/transcripts/ch6-modeling-data/2.txt new file mode 100644 index 0000000..6a3fde7 --- /dev/null +++ b/transcripts/ch6-modeling-data/2.txt @@ -0,0 +1,121 @@ +00:01 When it comes down to modeling with document databases +00:04 you apply a lot of the same thinking as you do with relational databases +00:07 about what the entity should be, and so on. +00:10 However, there's one fundamental question that you often ask +00:13 that really does take some thinking about maybe working through +00:18 some of the guidelines, and that is to embed or not to embed related items. +00:23 So in our previous example, you saw that we had a book +00:26 and the book had ratings embedded within it, +00:28 but we could just as well have the ratings be a separate table +00:30 or the ratings could have even gone into the user object +00:33 about reference back to the book, instead of the reverse. +00:36 So should we embed that ratings, and if we do, +00:40 does it go in books, does it go in users, or does it not go there at all. +00:43 So what I'm going to do, is I'm going to give you some guidelines, +00:46 these are soft rules, we don't have like a really prescriptive way of doing things +00:51 like third normal form here, but some of the thinking there does help; +00:54 so let's get into the rules. +00:56 First of all, the question you want to ask is is that embedded data +00:59 wanted eighty percent of the time that you get the original object; +01:02 do I usually want the rating information when I have the book? +01:08 If it would have resulted in me doing a join in a traditional database +01:11 or going back and doing a second query to Mongo to pull that data out, +01:14 it's very beneficial to have that rating data embedded in the book. +01:19 We designed it that way, so let's suppose like most of our query patterns +01:22 and most the way our application works is +01:25 we want to list the number of ratings, the average number of ratings, +01:29 things like this we want to surface that in almost all the time, +01:32 we want that embedded data when we get a book. +01:35 So that would guide us to embed the data, if this is not true, +01:40 if you only very rarely want that data, +01:42 then you most likely will not want to embed it, +01:45 there's a serious performance cost for what you might think of as dead weight, +01:48 other embedded stuff that comes along with the object +01:51 that you generally don't care about most of the time, +01:54 you can do things like suppress those items coming back, +01:57 so you can basically suppress the ratings object, +02:00 but if you are doing that, it's probably a sign like +02:02 hey maybe I shouldn't really be designing it this way. +02:04 A lot of considerations, but here's the first rule— +02:07 do you want the embedded data most of the time? +02:11 Next, how often do you want the embedded data without the containing document? +02:15 The way our things are structured now is I cannot get the ratings +02:19 without getting the books, I cannot get individual +02:22 ratings without getting all of the ratings. +02:24 So if what I wanted to do was on the user profile page +02:27 show here are all of my individual ratings as a user +02:31 listed on my like favorites page, or things I've rated or something like this, +02:36 that's actually a little bit challenging the way things are written. +02:39 We can definitely do it, and if there's just one +02:41 query we do it that way it's totally fine, +02:43 but this is one of the tensions, you can't get the ratings without getting the books +02:47 you can't get individual ratings, without getting all the other ratings +02:50 from that particular book, there's no way MongoDB +02:53 to actually suppress that, I don't think, like you can suppress the other fields +02:56 we're using a projection right, you get all the ratings, or none of the ratings. +03:00 So how often is it necessary to get a rating without getting a book itself? +03:04 Right, if that's something you want to do often +03:07 or it's a very very hot spot in your application +03:09 maybe again you do not want to embed it, +03:11 if you want the object without the containing document. +03:14 Another really important question to answer is +03:17 is the embedded data a bounded set? +03:19 If it is just a single nested item, fine, that's no problem, +03:22 if it's a list or an array, like we have in the context of ratings, +03:25 how big could the ratings get, +03:28 how many ratings might a book have reasonably speaking; +03:31 if there's ten ratings, it's probably totally fine +03:34 to have the rating data embedded in the book, +03:36 it's nice self contained, you get a little atomicity +03:39 and some nice features of have it embedded there. +03:41 If there's a hundred ratings, maybe it's good, +03:45 if there's a thousand ratings, if there's an unbounded number of ratings +03:48 you do not want to embed it, right so is it a bounded set, first of all +03:53 and related to that, is the bounded set small, +03:55 because every time you get the book back +03:58 you're pulling all of that stuff off disk, possibly out of memory, +04:01 over network for deserialization or serialization +04:04 depending on the side that you're working with. +04:06 So that comes with a cost, and in fact, +04:08 MongoDB puts a limit on the size of these documents, +04:12 you're not allowed to have a document larger than 16 MB, +04:17 in fact, if you try to take a document that's larger than 16 MB +04:20 and save it into MongoDB, even if you pull it back, +04:23 add something it makes it a little bit bigger and you call save +04:26 it's going to totally fail and say no, no, no this is over the limit. +04:29 So this should not be thought of as like a safe upper bound +04:33 this should be thought of as like the absolute limit +04:36 if you've got a document that's ten megabytes, +04:38 it doesn't mean like wow, we're only halfway there, this is amazing or great, +04:41 no, that's a huge performance cost to pull 10 MB over +04:46 every time you need a little bit of something out of there. +04:48 So really, you should aim for a much, much, much smaller thing +04:51 than the upper limit of 16 MB, but the point here is +04:53 there is actually a limit where if this embedded data outgrows that 16 MB +04:59 you just cannot save it back to the database, +05:02 that's a will no longer operate problem, +05:04 is the bound small is more of a performance trade-off type of problem, right, +05:08 but you want to think about these very, very carefully, +05:10 average size of a document is definitely something worth keeping in mind. +05:14 How varied are your queries? +05:17 Do you have like a web app and it asks like maybe ten really common questions +05:21 and you very much know the structure, +05:24 like these are the types of queries my app asks, +05:26 these are the really hot pages and here's what I want to optimize for, +05:29 or is this more of like a bi type thing where people and analysts come along +05:34 and they can ask like almost any sort of reporting question whatsoever; +05:38 it turns out the more focused your queries are, +05:41 the more likely you are to embed data in other things, right, +05:44 if you know that you typically use these things together, +05:47 then embedding them often makes a lot of sense. +05:49 If you're not really sure about the use case, +05:51 it's hard to answer the above questions, +05:53 do you want the data eighty percent of the time, I have no idea, +05:55 there's all sorts of queries, some of the time, right, +05:58 and so the more varied your queries, the more likely you are going to +06:00 tend towards the normalized data, not the embedded modeling data. +06:06 And finally, related to this how varied are your queries +06:09 as are you working with an integration database that lives at the center +06:14 and almost is used for inter-process, inter-application communication +06:17 or is it very focused application database? +06:19 We're going to dig into that idea next. \ No newline at end of file diff --git a/transcripts/ch6-modeling-data/3.txt b/transcripts/ch6-modeling-data/3.txt new file mode 100644 index 0000000..cee5f70 --- /dev/null +++ b/transcripts/ch6-modeling-data/3.txt @@ -0,0 +1,65 @@ +00:00 In order to answer this question about whether you have +00:02 an integration database or an application database, +00:04 let's do a quick compare and contrast, +00:07 especially in large enterprises, you'll see that they use databases +00:11 almost as a means of inter-application communication, +00:15 so maybe you have this huge relational database that lives in the center +00:18 with many, many constraints, many, many store procedures, +00:21 lots and lots of structures and rules, and so on, +00:25 why— well, because we have a bunch of different applications +00:28 and they all need to access this data, +00:30 maybe the one in the top left here it needs users +00:33 but so does the one on the right, and their idea of users is slightly different +00:36 so this user is not like a real simple thing, it's really quite complex +00:40 it's kind of the thing that will solve the user problem for all of these apps +00:43 and so on and so on, through the constraints and the way you use it. +00:47 This is a decent, well, it's typically a good role for relational databases, +00:51 you're better off with other architectural patterns anyway, +00:55 but relational databases are a good guarding against this kind of use case, +00:58 they have a fixed schema, they have lots of constraints and relationships +01:03 and they are very good at enforcing and kicking it back to the app +01:06 and go no, you got it wrong, you messed up the data. +01:08 So they can be like this strong rock in the middle. +01:11 The problem with rocks is they're not very adaptable, +01:14 they can't be massaged into new and interesting things; +01:18 a rock is a rock, and it's extremely hard to change. +01:21 So that's partly why some of these major enterprises +01:25 will have like weekends where they deploy a new version of an app, +01:28 like we're going to take it down and everybody's going to come in +01:30 and we're going to release it; +01:32 that is not a super place to be, it's also not a great use case +01:36 for document databases with their flexibility in schema design, +01:40 their less enforcement at the database level +01:43 and more enforcement inside the app, +01:45 because how is the app on the left going to help +01:47 enforce things for the app on the right, that's not great. +01:49 So, this is an integration database, and it's generally not a good use case +01:53 for document databases, if you're still using that +01:56 this sort of style of document databases, it means your queries will be more varied +01:59 and you probably need to model in a more relational style, +02:03 less embedded style, just as a rule of thumb. +02:06 So what's the opposite? Well, it might look like this, +02:09 we have all of our little apps again, +02:11 and instead of them all sharing a single massive database +02:13 you can maybe think of this is more like a micro service type of architecture; +02:17 each one of them is going to have their own database +02:20 and they're going to talk to it, and then when they need to exchange information +02:22 we'll do that through some sort of web api, +02:25 so they will exchange it through some kind of service broker way +02:29 they like negotiate and locate the other services, right, +02:32 maybe the one in the left is about orders, +02:34 the one on the right is about users and accounts. +02:37 So what that means though is each one of these little apps is much simpler, +02:41 it can have its own database with its own focused query patterns, +02:45 which is more focused, easier to understand, +02:48 and the application can enforce the structure and the integrity at its api level, +02:53 so this is a much better use case when you're sharing data with a document database. +02:58 And in fact, this sort of whole pattern here means +03:01 we don't have to make it NoSQL versus SQL choice, +03:03 maybe three out of these six are using MongoDB, +03:07 one is using a graph database and two are using MySQL, +03:10 it's up to the individual application to decide what the best way +03:13 and model basically with the best database and its underlying model is. +03:18 So when we have an application database like this +03:20 you are more likely to have slightly more embedded objects +03:24 because the query patterns are going to be simpler and more focused and more constraint. \ No newline at end of file diff --git a/transcripts/ch6-modeling-data/4.txt b/transcripts/ch6-modeling-data/4.txt new file mode 100644 index 0000000..3d652ce --- /dev/null +++ b/transcripts/ch6-modeling-data/4.txt @@ -0,0 +1,152 @@ +00:01 So let's look inside the application that you're using right now +00:03 to take this course as an example. +00:06 So at the time of this recording, here's what the Talk Python training +00:10 website database looks like for courses and users. +00:14 So, first let's focus on the course side of things, +00:17 there's a couple of interesting ideas here, +00:19 one, we have an id which is not an object id, why is it not an object id, +00:24 well, it was actually migrated from a relational database initially, +00:27 this was using SQLAlchemy, and it was easier to keep this id here as a number +00:33 rather than switch to MongoDB's object id, +00:36 it's also easier to refer to it in other areas, +00:39 like say in the commerce system I can put the id in without using, +00:42 I don't have very much space in terms of the message, +00:46 that can go into the e commerce system based on their api, +00:49 so one is much easier than like 32 characters, +00:51 so we're using the non standard id which is generated in the app +00:55 but for these types of things, that is really no big deal, +00:58 for the users, I think we might be using object ids. +01:01 We have somewhat sort of flat things here, we have the url and the title +01:04 and when it was published, things like that, +01:07 so this is the Learn Python by Building Ten Apps Jumpstart Course +01:10 and you can see a lot of the initial ideas here, +01:13 and the initial pieces of data are totally straightforward +01:16 and they would look exactly the same in a relational database. +01:19 However, there's two things that are very different +01:21 than I want to pull your attention to; +01:24 first is not actually the embedded stuff, but is this duration in seconds, +01:27 when I created the MongoDB version of this web app, +01:31 I realized one of the things I do all the time on the home page, +01:36 on the course listing page, and many many places, +01:39 is I say how long is the course, this course is 6.5 hours, +01:42 I think this one is 7.1 hours or something to that effect. +01:46 Using quick math you can figure out duration in second. +01:48 So there was actually a pretty serious bottleneck +01:51 where I'd have to go and in this case pull back 12 chapters +01:55 and then from the chapters I could get the lectures +01:58 and from the lectures I could get how long each individual one was, +02:01 I had that all up and then I could print out that number. +02:05 And then I would do that for say like on the course catalog page, +02:08 there was like ten courses, I would have to go through so many of these chapters +02:13 and then their subsequent lectures, and that was a huge huge bottleneck. +02:15 So what I decided to do was in the application, +02:18 any time I save or update the course, I'm going to compute this on save +02:22 which is extremely rare, and then I'm going to stash this here, +02:26 so this is actually computed from the chapters +02:29 which are computed from the lectures themselves, +02:31 and this is data duplication, but you'll find that a little bit of data duplication, +02:36 I find usually most apps is like one or two little pieces like this that +02:40 just unlock a lot of performance +02:42 because actually computing this turns out to be really really computationally expensive, +02:47 but storing it here on this object made it super fast. +02:50 So this is one thing, this data duplication +02:53 which I try to stay away from as much as I can +02:55 but the trade-off here was so worth it. +02:57 Now, the other part we want to focus on is down here, +02:59 we said I'd like to associate these chapter ids with a particular course, +03:03 now if this was a relational database, +03:05 I might have a course to chapter normalization table, right, +03:09 it'd have the course id and the chapter id +03:11 and I do some query some kind of join on that; +03:14 you almost never ever, ever see that in MongoDB and document databases. +03:19 Usually, at least the ids are embedded on one side of that, one to many relationships +03:23 so here we have the course, the course has some chapters, +03:27 so we're just storing the ids here. +03:29 Now, we also have the chapters, you can see chapter 1001 goes right here +03:35 and this one is a little bit more interesting, +03:37 we've got again our duration in seconds +03:40 which is another thing computed from if you look at the individual lectures +03:44 they've got duration in seconds, and that's the real raw number. +03:48 So this is another duplication, because at many, many levels +03:51 I need to show the time of a chapter, +03:53 and that was turning out to be computationally expensive at many levels, +03:56 so again, these two places, this is the one bit of duplicated data +04:00 and you will see that this is more common +04:03 in a document database than in a relational one. +04:05 So here we've got our chapter which has this soft relationship +04:08 from the course over to the id, +04:10 we also have the course id down there and below it, +04:12 so it's kind of this bidirectional relationship; +04:15 then we have lectures, and lectures is interested in that +04:18 almost every time that we get a hold of a chapter +04:22 we care about its lectures, we usually want to display them in a list +04:27 any time that I get a lecture, this is the thing like you're watching right now, +04:30 this is the lecture, right, an individual video let's say, +04:32 any time you have one of those, you almost always need the other ones, +04:36 at least the ones before and after it, so like if you look in this particular player +04:40 you'll see there is a forward and a backward within the course button +04:45 that you can skip ahead or skip back, that is the other lectures +04:48 so what I find is grouping the chapter along with the lectures into one blob +04:51 that makes it super fast and I almost always want the other lectures +04:57 when I have one lecture, and if I have the lecture, +04:59 I usually need to display the chapter title, and things like that. +05:02 Anyway, so these are really well suited to be put together in this embedded style, +05:06 so I don't have a lectures table, I have course, courses +05:09 and I have chapters, and then in the chapters those are embedding the lectures, +05:12 and we also saw that little bit of data duplication. +05:15 So you can see down here is an individual embedded lecture, +05:18 here's one that talks about doing the exercises +05:20 in this course and it's apparently 202 seconds, +05:24 so I hope this look behind the scenes has helped you understand +05:28 how you might model this stuff, you can look at the course page +05:30 and the player and think about some of the trade-offs, +05:33 I don't know that this is perfect, but it is absolutely working well for the web app. +05:37 Let's look at one more thing. +05:39 Down here we have the users, and we have a couple of items +05:41 that we're going to focus on when we get to the users, +05:44 I have blurred some out, we're using object id now for the user id +05:46 I covered the password and things like that, +05:49 but we've got some flat stuff like whether or not you're opting out of email, +05:52 what your user name is, what your email address is, things like that. +05:55 And then, I have this concept of an origin, +05:58 so if you come from like some particular marketing source +06:01 it might record like hey this person created their account +06:04 and they originally came from Facebook, +06:06 this person originally came from the podcast or something like that, +06:08 so that's pretty interesting, we also have the courses that you are taking, +06:11 so right here, this particular person, this is me, +06:14 so I gave myself basically all the courses, +06:17 these are the ids of the courses that I am a student in, +06:20 so again, there's not a users, there's not a courses in a user courses +06:23 sort of normalization thing is very common that when I as a user +06:29 am loaded into the database, I very often need to know about the courses. +06:32 Now I can't easily embed the course into the user, right, +06:36 that'd be like insane levels of duplication, +06:38 but closest thing I can do is I can get this list +06:40 and then I can go back and do another queer +06:42 say give me all the courses where the course id is in this list of owned courses, +06:46 so basically two queries I have everything I need. +06:49 We also have the bundle id and some other things going on here. +06:52 So that embedded course id, that's actually a list +06:55 one more thing to look at down here is this preferences, +06:58 so this is short name, somewhat short name, +07:02 this is the preferences for your player +07:05 so when you're in the video player, you can choose different qualities, +07:08 you can turn on captions or you can turn off captions, +07:12 subtitles, transcripts basically and you can choose a playback speed, +07:15 it could be like .75 up to two or three or something crazy like this. +07:19 One of the primary actions a user does on this site is to go through the course, +07:25 each course might have 150 lectures +07:28 so as a user, you come in you look round a little bit +07:31 and then you go through 150 lectures, +07:33 so this preferences thing needs to be pulled back frequently. +07:36 And so we got to get the user anyway and embedding them together means +07:39 it's basically instant access any time I'm in the player +07:42 to figure out how to preconfigure the player +07:46 to render your video the way that you like it. +07:48 So this is an embedded item, but not an embedded list +07:51 just an embedded preference object. +07:53 So there you have it, a look inside Talk Python Training +07:57 at least as it was when we recorded this, +07:59 so hopefully this helps you think through some of the challenges +08:03 of building a more realistic app. \ No newline at end of file diff --git a/transcripts/ch6-modeling-data/5.txt b/transcripts/ch6-modeling-data/5.txt new file mode 100644 index 0000000..8589f87 --- /dev/null +++ b/transcripts/ch6-modeling-data/5.txt @@ -0,0 +1,24 @@ +00:01 Let's close out this chapter with a few more sources +00:03 you can get some patterns here; +00:05 so recently I had Rick Copeland who is in the MongoDB masters program +00:10 along with myself, and I had him on the podcast on episode 109 +00:14 to talk about applied MongoDB design patterns. +00:18 So this concept of embedding and modeling +00:20 and data duplication and all these things, +00:23 certainly we talked about on the podcast, and he talks about in his book, +00:25 but he has a lot of really interesting use cases +00:28 and actually some performance trade-offs, +00:31 using some of the atomic update operators, one versus the other or not at all, +00:37 just to see how that might work out. +00:40 So he's got a bunch of use cases and you might flip through his book +00:43 once you really get into things and say does one of the patterns he talks about +00:47 really closely match what I'm doing— you might get a huge jumpstart +00:50 on modeling your data with actual performance numbers behind it. +00:54 So check out the podcast, it's free +00:56 and check out his book if you find it to be helpful. +00:59 And final thought on modeling with these document databases is +01:02 there is no perfect answer, it's always this tension of +01:06 I could model it this way and this part of my app gets better, +01:09 I could model it another way, and that part is not quite as good, +01:12 but another part becomes more flexible or becomes better, +01:14 so it's really about balancing the trade-offs, not right versus wrong. \ No newline at end of file