00:01 So let's look inside the application that you're using right now 00:03 to take this course as an example. 00:06 So at the time of this recording, here's what the Talk Python training 00:10 website database looks like for courses and users. 00:14 So, first let's focus on the course side of things, 00:17 there's a couple of interesting ideas here, 00:19 one, we have an id which is not an object id, why is it not an object id, 00:24 well, it was actually migrated from a relational database initially, 00:27 this was using SQLAlchemy, and it was easier to keep this id here as a number 00:33 rather than switch to MongoDB's object id, 00:36 it's also easier to refer to it in other areas, 00:39 like say in the commerce system I can put the id in without using, 00:42 I don't have very much space in terms of the message, 00:46 that can go into the e commerce system based on their api, 00:49 so one is much easier than like 32 characters, 00:51 so we're using the non standard id which is generated in the app 00:55 but for these types of things, that is really no big deal, 00:58 for the users, I think we might be using object ids. 01:01 We have somewhat sort of flat things here, we have the url and the title 01:04 and when it was published, things like that, 01:07 so this is the Learn Python by Building Ten Apps Jumpstart Course 01:10 and you can see a lot of the initial ideas here, 01:13 and the initial pieces of data are totally straightforward 01:16 and they would look exactly the same in a relational database. 01:19 However, there's two things that are very different 01:21 than I want to pull your attention to; 01:24 first is not actually the embedded stuff, but is this duration in seconds, 01:27 when I created the MongoDB version of this web app, 01:31 I realized one of the things I do all the time on the home page, 01:36 on the course listing page, and many many places, 01:39 is I say how long is the course, this course is 6.5 hours, 01:42 I think this one is 7.1 hours or something to that effect. 01:46 Using quick math you can figure out duration in second. 01:48 So there was actually a pretty serious bottleneck 01:51 where I'd have to go and in this case pull back 12 chapters 01:55 and then from the chapters I could get the lectures 01:58 and from the lectures I could get how long each individual one was, 02:01 I had that all up and then I could print out that number. 02:05 And then I would do that for say like on the course catalog page, 02:08 there was like ten courses, I would have to go through so many of these chapters 02:13 and then their subsequent lectures, and that was a huge huge bottleneck. 02:15 So what I decided to do was in the application, 02:18 any time I save or update the course, I'm going to compute this on save 02:22 which is extremely rare, and then I'm going to stash this here, 02:26 so this is actually computed from the chapters 02:29 which are computed from the lectures themselves, 02:31 and this is data duplication, but you'll find that a little bit of data duplication, 02:36 I find usually most apps is like one or two little pieces like this that 02:40 just unlock a lot of performance 02:42 because actually computing this turns out to be really really computationally expensive, 02:47 but storing it here on this object made it super fast. 02:50 So this is one thing, this data duplication 02:53 which I try to stay away from as much as I can 02:55 but the trade-off here was so worth it. 02:57 Now, the other part we want to focus on is down here, 02:59 we said I'd like to associate these chapter ids with a particular course, 03:03 now if this was a relational database, 03:05 I might have a course to chapter normalization table, right, 03:09 it'd have the course id and the chapter id 03:11 and I do some query some kind of join on that; 03:14 you almost never ever, ever see that in MongoDB and document databases. 03:19 Usually, at least the ids are embedded on one side of that, one to many relationships 03:23 so here we have the course, the course has some chapters, 03:27 so we're just storing the ids here. 03:29 Now, we also have the chapters, you can see chapter 1001 goes right here 03:35 and this one is a little bit more interesting, 03:37 we've got again our duration in seconds 03:40 which is another thing computed from if you look at the individual lectures 03:44 they've got duration in seconds, and that's the real raw number. 03:48 So this is another duplication, because at many, many levels 03:51 I need to show the time of a chapter, 03:53 and that was turning out to be computationally expensive at many levels, 03:56 so again, these two places, this is the one bit of duplicated data 04:00 and you will see that this is more common 04:03 in a document database than in a relational one. 04:05 So here we've got our chapter which has this soft relationship 04:08 from the course over to the id, 04:10 we also have the course id down there and below it, 04:12 so it's kind of this bidirectional relationship; 04:15 then we have lectures, and lectures is interested in that 04:18 almost every time that we get a hold of a chapter 04:22 we care about its lectures, we usually want to display them in a list 04:27 any time that I get a lecture, this is the thing like you're watching right now, 04:30 this is the lecture, right, an individual video let's say, 04:32 any time you have one of those, you almost always need the other ones, 04:36 at least the ones before and after it, so like if you look in this particular player 04:40 you'll see there is a forward and a backward within the course button 04:45 that you can skip ahead or skip back, that is the other lectures 04:48 so what I find is grouping the chapter along with the lectures into one blob 04:51 that makes it super fast and I almost always want the other lectures 04:57 when I have one lecture, and if I have the lecture, 04:59 I usually need to display the chapter title, and things like that. 05:02 Anyway, so these are really well suited to be put together in this embedded style, 05:06 so I don't have a lectures table, I have course, courses 05:09 and I have chapters, and then in the chapters those are embedding the lectures, 05:12 and we also saw that little bit of data duplication. 05:15 So you can see down here is an individual embedded lecture, 05:18 here's one that talks about doing the exercises 05:20 in this course and it's apparently 202 seconds, 05:24 so I hope this look behind the scenes has helped you understand 05:28 how you might model this stuff, you can look at the course page 05:30 and the player and think about some of the trade-offs, 05:33 I don't know that this is perfect, but it is absolutely working well for the web app. 05:37 Let's look at one more thing. 05:39 Down here we have the users, and we have a couple of items 05:41 that we're going to focus on when we get to the users, 05:44 I have blurred some out, we're using object id now for the user id 05:46 I covered the password and things like that, 05:49 but we've got some flat stuff like whether or not you're opting out of email, 05:52 what your user name is, what your email address is, things like that. 05:55 And then, I have this concept of an origin, 05:58 so if you come from like some particular marketing source 06:01 it might record like hey this person created their account 06:04 and they originally came from Facebook, 06:06 this person originally came from the podcast or something like that, 06:08 so that's pretty interesting, we also have the courses that you are taking, 06:11 so right here, this particular person, this is me, 06:14 so I gave myself basically all the courses, 06:17 these are the ids of the courses that I am a student in, 06:20 so again, there's not a users, there's not a courses in a user courses 06:23 sort of normalization thing is very common that when I as a user 06:29 am loaded into the database, I very often need to know about the courses. 06:32 Now I can't easily embed the course into the user, right, 06:36 that'd be like insane levels of duplication, 06:38 but closest thing I can do is I can get this list 06:40 and then I can go back and do another queer 06:42 say give me all the courses where the course id is in this list of owned courses, 06:46 so basically two queries I have everything I need. 06:49 We also have the bundle id and some other things going on here. 06:52 So that embedded course id, that's actually a list 06:55 one more thing to look at down here is this preferences, 06:58 so this is short name, somewhat short name, 07:02 this is the preferences for your player 07:05 so when you're in the video player, you can choose different qualities, 07:08 you can turn on captions or you can turn off captions, 07:12 subtitles, transcripts basically and you can choose a playback speed, 07:15 it could be like .75 up to two or three or something crazy like this. 07:19 One of the primary actions a user does on this site is to go through the course, 07:25 each course might have 150 lectures 07:28 so as a user, you come in you look round a little bit 07:31 and then you go through 150 lectures, 07:33 so this preferences thing needs to be pulled back frequently. 07:36 And so we got to get the user anyway and embedding them together means 07:39 it's basically instant access any time I'm in the player 07:42 to figure out how to preconfigure the player 07:46 to render your video the way that you like it. 07:48 So this is an embedded item, but not an embedded list 07:51 just an embedded preference object. 07:53 So there you have it, a look inside Talk Python Training 07:57 at least as it was when we recorded this, 07:59 so hopefully this helps you think through some of the challenges 08:03 of building a more realistic app.