|
|
00:01 When it comes down to modeling with document databases
|
|
|
00:04 you apply a lot of the same thinking as you do with relational databases
|
|
|
00:07 about what the entity should be, and so on.
|
|
|
00:10 However, there's one fundamental question that you often ask
|
|
|
00:13 that really does take some thinking about maybe working through
|
|
|
00:18 some of the guidelines, and that is to embed or not to embed related items.
|
|
|
00:23 So in our previous example, you saw that we had a book
|
|
|
00:26 and the book had ratings embedded within it,
|
|
|
00:28 but we could just as well have the ratings be a separate table
|
|
|
00:30 or the ratings could have even gone into the user object
|
|
|
00:33 about reference back to the book, instead of the reverse.
|
|
|
00:36 So should we embed that ratings, and if we do,
|
|
|
00:40 does it go in books, does it go in users, or does it not go there at all.
|
|
|
00:43 So what I'm going to do, is I'm going to give you some guidelines,
|
|
|
00:46 these are soft rules, we don't have like a really prescriptive way of doing things
|
|
|
00:51 like third normal form here, but some of the thinking there does help;
|
|
|
00:54 so let's get into the rules.
|
|
|
00:56 First of all, the question you want to ask is is that embedded data
|
|
|
00:59 wanted eighty percent of the time that you get the original object;
|
|
|
01:02 do I usually want the rating information when I have the book?
|
|
|
01:08 If it would have resulted in me doing a join in a traditional database
|
|
|
01:11 or going back and doing a second query to Mongo to pull that data out,
|
|
|
01:14 it's very beneficial to have that rating data embedded in the book.
|
|
|
01:19 We designed it that way, so let's suppose like most of our query patterns
|
|
|
01:22 and most the way our application works is
|
|
|
01:25 we want to list the number of ratings, the average number of ratings,
|
|
|
01:29 things like this we want to surface that in almost all the time,
|
|
|
01:32 we want that embedded data when we get a book.
|
|
|
01:35 So that would guide us to embed the data, if this is not true,
|
|
|
01:40 if you only very rarely want that data,
|
|
|
01:42 then you most likely will not want to embed it,
|
|
|
01:45 there's a serious performance cost for what you might think of as dead weight,
|
|
|
01:48 other embedded stuff that comes along with the object
|
|
|
01:51 that you generally don't care about most of the time,
|
|
|
01:54 you can do things like suppress those items coming back,
|
|
|
01:57 so you can basically suppress the ratings object,
|
|
|
02:00 but if you are doing that, it's probably a sign like
|
|
|
02:02 hey maybe I shouldn't really be designing it this way.
|
|
|
02:04 A lot of considerations, but here's the first rule—
|
|
|
02:07 do you want the embedded data most of the time?
|
|
|
02:11 Next, how often do you want the embedded data without the containing document?
|
|
|
02:15 The way our things are structured now is I cannot get the ratings
|
|
|
02:19 without getting the books, I cannot get individual
|
|
|
02:22 ratings without getting all of the ratings.
|
|
|
02:24 So if what I wanted to do was on the user profile page
|
|
|
02:27 show here are all of my individual ratings as a user
|
|
|
02:31 listed on my like favorites page, or things I've rated or something like this,
|
|
|
02:36 that's actually a little bit challenging the way things are written.
|
|
|
02:39 We can definitely do it, and if there's just one
|
|
|
02:41 query we do it that way it's totally fine,
|
|
|
02:43 but this is one of the tensions, you can't get the ratings without getting the books
|
|
|
02:47 you can't get individual ratings, without getting all the other ratings
|
|
|
02:50 from that particular book, there's no way MongoDB
|
|
|
02:53 to actually suppress that, I don't think, like you can suppress the other fields
|
|
|
02:56 we're using a projection right, you get all the ratings, or none of the ratings.
|
|
|
03:00 So how often is it necessary to get a rating without getting a book itself?
|
|
|
03:04 Right, if that's something you want to do often
|
|
|
03:07 or it's a very very hot spot in your application
|
|
|
03:09 maybe again you do not want to embed it,
|
|
|
03:11 if you want the object without the containing document.
|
|
|
03:14 Another really important question to answer is
|
|
|
03:17 is the embedded data a bounded set?
|
|
|
03:19 If it is just a single nested item, fine, that's no problem,
|
|
|
03:22 if it's a list or an array, like we have in the context of ratings,
|
|
|
03:25 how big could the ratings get,
|
|
|
03:28 how many ratings might a book have reasonably speaking;
|
|
|
03:31 if there's ten ratings, it's probably totally fine
|
|
|
03:34 to have the rating data embedded in the book,
|
|
|
03:36 it's nice self contained, you get a little atomicity
|
|
|
03:39 and some nice features of have it embedded there.
|
|
|
03:41 If there's a hundred ratings, maybe it's good,
|
|
|
03:45 if there's a thousand ratings, if there's an unbounded number of ratings
|
|
|
03:48 you do not want to embed it, right so is it a bounded set, first of all
|
|
|
03:53 and related to that, is the bounded set small,
|
|
|
03:55 because every time you get the book back
|
|
|
03:58 you're pulling all of that stuff off disk, possibly out of memory,
|
|
|
04:01 over network for deserialization or serialization
|
|
|
04:04 depending on the side that you're working with.
|
|
|
04:06 So that comes with a cost, and in fact,
|
|
|
04:08 MongoDB puts a limit on the size of these documents,
|
|
|
04:12 you're not allowed to have a document larger than 16 MB,
|
|
|
04:17 in fact, if you try to take a document that's larger than 16 MB
|
|
|
04:20 and save it into MongoDB, even if you pull it back,
|
|
|
04:23 add something it makes it a little bit bigger and you call save
|
|
|
04:26 it's going to totally fail and say no, no, no this is over the limit.
|
|
|
04:29 So this should not be thought of as like a safe upper bound
|
|
|
04:33 this should be thought of as like the absolute limit
|
|
|
04:36 if you've got a document that's ten megabytes,
|
|
|
04:38 it doesn't mean like wow, we're only halfway there, this is amazing or great,
|
|
|
04:41 no, that's a huge performance cost to pull 10 MB over
|
|
|
04:46 every time you need a little bit of something out of there.
|
|
|
04:48 So really, you should aim for a much, much, much smaller thing
|
|
|
04:51 than the upper limit of 16 MB, but the point here is
|
|
|
04:53 there is actually a limit where if this embedded data outgrows that 16 MB
|
|
|
04:59 you just cannot save it back to the database,
|
|
|
05:02 that's a will no longer operate problem,
|
|
|
05:04 is the bound small is more of a performance trade-off type of problem, right,
|
|
|
05:08 but you want to think about these very, very carefully,
|
|
|
05:10 average size of a document is definitely something worth keeping in mind.
|
|
|
05:14 How varied are your queries?
|
|
|
05:17 Do you have like a web app and it asks like maybe ten really common questions
|
|
|
05:21 and you very much know the structure,
|
|
|
05:24 like these are the types of queries my app asks,
|
|
|
05:26 these are the really hot pages and here's what I want to optimize for,
|
|
|
05:29 or is this more of like a bi type thing where people and analysts come along
|
|
|
05:34 and they can ask like almost any sort of reporting question whatsoever;
|
|
|
05:38 it turns out the more focused your queries are,
|
|
|
05:41 the more likely you are to embed data in other things, right,
|
|
|
05:44 if you know that you typically use these things together,
|
|
|
05:47 then embedding them often makes a lot of sense.
|
|
|
05:49 If you're not really sure about the use case,
|
|
|
05:51 it's hard to answer the above questions,
|
|
|
05:53 do you want the data eighty percent of the time, I have no idea,
|
|
|
05:55 there's all sorts of queries, some of the time, right,
|
|
|
05:58 and so the more varied your queries, the more likely you are going to
|
|
|
06:00 tend towards the normalized data, not the embedded modeling data.
|
|
|
06:06 And finally, related to this how varied are your queries
|
|
|
06:09 as are you working with an integration database that lives at the center
|
|
|
06:14 and almost is used for inter-process, inter-application communication
|
|
|
06:17 or is it very focused application database?
|
|
|
06:19 We're going to dig into that idea next. |