You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
153 lines
7.1 KiB
Plaintext
153 lines
7.1 KiB
Plaintext
00:00 As we discussed, modeling with documents and
|
|
00:03 document databases is little bit more art.
|
|
00:05 There's a little bit more flexibility and
|
|
00:07 kind of just gut-feel of how you should do it,
|
|
00:09 but let me give you some guidelines that will
|
|
00:12 give you a clear set of considerations
|
|
00:15 as you work on your data models.
|
|
00:17 You'll see that the primary question for working
|
|
00:20 with document databases is to embed or not to embed.
|
|
00:24 That is, when there's a relationship between
|
|
00:26 two things in your application.
|
|
00:28 Should one of those be a sub-document?
|
|
00:31 Should it be contained within the same record
|
|
00:33 as the thing it is related to?
|
|
00:34 Or should they be two separate collections,
|
|
00:36 (what MongoDB calls tables: collections),
|
|
00:39 because they're not tabular, right?
|
|
00:40 Should those be two separate collections,
|
|
00:42 but just relate to each other.
|
|
00:44 We're going to try to answer this question.
|
|
00:46 I'll try to provide you some guidelines
|
|
00:48 for answering this question.
|
|
00:49 The first one is: is the embedded data
|
|
00:52 wanted 80% of the time when you have the outer,
|
|
00:56 or other related object?
|
|
00:58 Let's go back to our example we just worked with.
|
|
01:00 We had a book and the book had ratings.
|
|
01:02 To make this concrete, the question here is
|
|
01:05 do we care about having information about the ratings
|
|
01:08 most of the time when we're working with books?
|
|
01:10 So if our website lists books,
|
|
01:13 and that listing has a number of ratings
|
|
01:16 and the average rating, things like that,
|
|
01:17 listed as part of the listing,
|
|
01:19 you pull up a book, maybe the ratings,
|
|
01:21 and the reviews are shown right there--
|
|
01:22 most of the time, when we have the book,
|
|
01:24 we have the ratings involved somehow,
|
|
01:26 then we would want to embed the ratings within the book.
|
|
01:29 That's what we did in our data model.
|
|
01:30 We said, yes,
|
|
01:31 we do want the ratings most of the time.
|
|
01:34 Now, let's look at this in reverse.
|
|
01:36 How often do you want the embedded data
|
|
01:39 without the containing document?
|
|
01:41 So in the same example, how often is it the case
|
|
01:43 that you would like the ratings without the book?
|
|
01:49 Where would that show up in our apps?
|
|
01:50 So maybe you as a user,
|
|
01:52 go to your profile page in the bookstore,
|
|
01:55 and there you can see all the books you've rated, right?
|
|
01:57 And the details about the ratings.
|
|
01:58 You don't actually care about necessarily the books,
|
|
02:00 you just want, these are the ratings
|
|
02:02 that I've given to things, and here's my comment and so on.
|
|
02:05 You don't actually want most of the details
|
|
02:07 or maybe any of the details about the book itself.
|
|
02:09 So if you're in that sort of situation,
|
|
02:12 a lot of the time, you might want to put that
|
|
02:14 into a separate collection and not embed it.
|
|
02:16 You can still do it.
|
|
02:18 You can still do this query and it'll come back very quickly,
|
|
02:20 there's ways to work with it.
|
|
02:21 But you'll see that you have to do a query in the database
|
|
02:24 and then a little bit of filtering on the application side.
|
|
02:26 So it's not prohibitive,
|
|
02:31 it's not that you can't get the contained document
|
|
02:33 without its container,
|
|
02:34 but it's a little bit more clumsy.
|
|
02:36 So if this is something you frequently want to do,
|
|
02:39 then maybe consider not embedding it.
|
|
02:41 Now, is the embedded document a bounded set?
|
|
02:43 Let's look at ratings.
|
|
02:45 How many ratings might a book have?
|
|
02:47 10 ratings, 100 ratings, 1000 ratings.
|
|
02:49 Is that number going to just grow
|
|
02:51 The number of ratings that we have?
|
|
02:53 If this was page views and details about the browser,
|
|
02:57 IP address, and date and time of a page we viewed,
|
|
03:00 that would not make a good candidate for embedding that,
|
|
03:02 like say, for views of a book
|
|
03:04 because that could just grow and grow
|
|
03:05 as the popularity of your site grows.
|
|
03:07 And it could make the document so large
|
|
03:09 that when you retrieve it from the database,
|
|
03:10 actually the network traffic and the disk traffic
|
|
03:12 would be a problem.
|
|
03:13 I don't really see that happening with ratings.
|
|
03:15 I mean, even on Amazon, super popular books have
|
|
03:18 hundreds not millions of ratings.
|
|
03:20 So this is probably okay, but if it's an unbounded set,
|
|
03:23 you do not want to embed it.
|
|
03:25 And these have bounds small, right?
|
|
03:27 Maybe millions of views still are being recorded
|
|
03:31 within sign of a book would be a really bad idea.
|
|
03:33 And the reason is these documents are limited
|
|
03:35 to 16 megabytes.
|
|
03:37 So no single record of MongoDB can be larger
|
|
03:40 than 16 megabytes.
|
|
03:41 And this is not a limitation of MongoDB.
|
|
03:42 This is them trying to protect you from yourself.
|
|
03:45 You do not want to go and just say, query buy, say, ISBN,
|
|
03:49 and try to pull back a book
|
|
03:50 and actually read 100 megabytes off a disk in
|
|
03:53 over the network.
|
|
03:53 That would destroy the performance of your database.
|
|
03:56 So having these very, very large records is a problem.
|
|
03:59 So they actually set an upper bound on how large
|
|
04:02 that can be.
|
|
04:03 And that limit is right now, currently,
|
|
04:05 at the time of recording, 16 megabytes.
|
|
04:07 But you shouldn't think of 16 megabytes as like,
|
|
04:09 well, if it's 10 megabytes, everything's fine.
|
|
04:11 We still got a long ways to go.
|
|
04:13 No, you should try to keep these,
|
|
04:15 in the kilobytes: tens, twenties, hundreds of kilobytes,
|
|
04:18 not megabytes because that's going to really hurt your
|
|
04:21 database performance unless in some situation,
|
|
04:23 it just makes avton of sense to have these
|
|
04:25 very large documents.
|
|
04:25 So having small bounded sets means that your documents
|
|
04:29 won't grow into huge, huge monolithic things
|
|
04:32 that are hard to work with.
|
|
04:33 Also, how varied are your queries?
|
|
04:35 So one of the things that you do with document database is
|
|
04:38 is you try to structure the document to answer
|
|
04:41 the most common questions in the most well-structured way.
|
|
04:46 So if you're always going to say, I would like to, on my pages,
|
|
04:49 show a book in this related ratings,
|
|
04:51 you would absolutely put the ratings inside the book
|
|
04:53 because that means you just do a query against a book
|
|
04:55 and you already have that pre-joined data.
|
|
04:58 But if it's sort of a data warehouse
|
|
05:00 and you're asking all kinds of questions
|
|
05:02 from all different sorts of applications,
|
|
05:04 then trying to understand, well, what is the right way
|
|
05:06 to build my document so it matches the queries
|
|
05:08 that I typically do or the ones that I need to be
|
|
05:10 really quick and fast?
|
|
05:12 That becomes hard because there's all
|
|
05:13 these different queries and
|
|
05:15 the way you model for one is actually the opposite
|
|
05:18 of the way you model for the other.
|
|
05:20 So depending on how focused your application is,
|
|
05:22 or how many applications are using the database,
|
|
05:24 you'll have a different answer to this question.
|
|
05:26 So the more specific your queries are,
|
|
05:28 the more likely you are to embed things and structure them
|
|
05:31 exactly to match those queries.
|
|
05:33 Related to that, is are you working with an
|
|
05:36 integration database or an application database?
|
|
05:39 We'll get to that next.
|