You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

85 lines
5.3 KiB
Plaintext

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

00:01 So let's talk about how document databases work.
00:04 Here's a record from my actual online training platform
00:09 chapter 1001 from 'The Python Jumpstart by Building Ten Apps' course
00:15 and this is more or less exactly what came out of the database,
00:19 with a few things taken away so it actually fits on the slide here.
00:23 Now, let's break this into two pieces here,
00:26 this green piece and the blue piece.
00:29 First of all, you can see we have json,
00:32 when you work with document databases, you'll frequently run into json
00:35 as at least the visual representation of the record.
00:40 In fact, in MongoDB it doesn't really work in terms of json
00:46 it works in something called bson or binary json,
00:50 so this binary tokenised type typefull, rich typed version
00:57 of sort of extended json but already tokenised and stored as a binary version;
01:02 this is what's transferred on the wire and to some degree
01:07 this is what stored actually in the database
01:10 so how it actually get stored is it moves around
01:13 and the database storage engines are changing
01:16 and sort of plugable now in MongoDB,
01:19 but more or less you can think of this becoming a binary thing
01:22 and then stored in the database.
01:24 When it comes over into say Python, we'll of course map this into
01:28 something like let's say a Python dictionary
01:31 or a type that has these fields and so on.
01:35 So if you look at the green area, this is just the jasonified version
01:39 of any other database record,
01:41 it has let's think of it as columns if you will for a minute
01:46 it would have columns here like one would be the id
01:49 one would be the title, one might be course id and it has values;
01:51 and that's all well and good, until we get to lectures,
01:54 and here's where the power of document databases comes in,
01:57 lectures is not just like seven, there are seven lectures or whatever
02:01 no, lectures is a list, so multiple things, and each one of those things is
02:09 a lecture, an individual one, with its individual fields
02:13 so id, title, video url, duration in seconds,
02:16 again there's actually more to it, but so it fits on screen right;
02:18 with this document database, you can think of these things
02:22 as kind of pre-computed joints, and this solves a ton of problems
02:27 and makes the NoSQL aspect of document databases super powerful.
02:31 So it makes this chapter more self contained, if I want to get this chapter back,
02:35 instead of going to the chapter and then doing a join
02:39 against the lectures and maybe some other type of join,
02:42 and you're getting a bunch of different pieces and pulling them back together
02:45 I just might do a query, find me the chapter with id 1001
02:50 bam, it's back, I've got the whole thing
02:53 and so you can think of this as like pre-joined data
02:57 if 80, 90 percent of the time I'm working with a chapter,
02:59 I care about the lecture data being there,
03:01 why not store it in a way that it's already bound together,
03:05 so I don't have to do that join,
03:07 I don't have to do multiple queries or things like this.
03:10 Okay, so this is really powerful and we'll talk a lot
03:14 about when this makes sense, when it does not make sense and so on,
03:18 but this means that if I take the single record
03:21 and I put it on some server, even if I've got like ten servers
03:25 and some sort of horizontal scale situation
03:27 and I do a query by chapter id, I don't then have to go back to the cluster
03:31 find where all the lecture data lives or anything like that.
03:34 No, it's just bringing that one record
03:36 brings most of the data that I need to work with
03:39 when I'm working with a chapter, right along with it, which is excellent.
03:43 That's the benefit, the important question the critical question to say
03:47 like is this going to work for our database system as a general thing
03:51 is well can I ask the questions that I would still have asked
03:54 if lectures was its separate table, if it was a flat table just like relational databases.
03:59 So, what if I want to find his last lecture here, 10 106,
04:06 will I be able to go to the database and say
04:08 hey document database, I would like to get lecture 10 106
04:15 and I want to do that with an index
04:17 and I want to have it basically instantaneously,
04:19 even if there's a million records, I want to instantaneously get the record
04:23 that embedded deep down within it could be many, many levels
04:27 not just one, right, but in this case it's just one;
04:30 I want to get the record that deep down within it
04:32 somewhere matches the fact that the lecture id is 10 106.
04:37 And the answer is for the document databases yes,
04:41 so this makes them very, very different
04:44 than the key value stores just doing a json blob
04:46 because we can ask these very interesting questions,
04:49 we can do map reduce or aggregation type things for big data,
04:52 analysis and analytics, all sorts of stuff is possible,
04:56 because you can actually query deeply down into these nested objects.
05:01 So that's how document databases work,
05:04 and we'll explore the when, whys and hows of designing these documents
05:08 when we get to the document design chapter.