You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

121 lines
7.7 KiB
Plaintext

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

00:01 When it comes down to modeling with document databases
00:04 you apply a lot of the same thinking as you do with relational databases
00:07 about what the entity should be, and so on.
00:10 However, there's one fundamental question that you often ask
00:13 that really does take some thinking about maybe working through
00:18 some of the guidelines, and that is to embed or not to embed related items.
00:23 So in our previous example, you saw that we had a book
00:26 and the book had ratings embedded within it,
00:28 but we could just as well have the ratings be a separate table
00:30 or the ratings could have even gone into the user object
00:33 about reference back to the book, instead of the reverse.
00:36 So should we embed that ratings, and if we do,
00:40 does it go in books, does it go in users, or does it not go there at all.
00:43 So what I'm going to do, is I'm going to give you some guidelines,
00:46 these are soft rules, we don't have like a really prescriptive way of doing things
00:51 like third normal form here, but some of the thinking there does help;
00:54 so let's get into the rules.
00:56 First of all, the question you want to ask is is that embedded data
00:59 wanted eighty percent of the time that you get the original object;
01:02 do I usually want the rating information when I have the book?
01:08 If it would have resulted in me doing a join in a traditional database
01:11 or going back and doing a second query to Mongo to pull that data out,
01:14 it's very beneficial to have that rating data embedded in the book.
01:19 We designed it that way, so let's suppose like most of our query patterns
01:22 and most the way our application works is
01:25 we want to list the number of ratings, the average number of ratings,
01:29 things like this we want to surface that in almost all the time,
01:32 we want that embedded data when we get a book.
01:35 So that would guide us to embed the data, if this is not true,
01:40 if you only very rarely want that data,
01:42 then you most likely will not want to embed it,
01:45 there's a serious performance cost for what you might think of as dead weight,
01:48 other embedded stuff that comes along with the object
01:51 that you generally don't care about most of the time,
01:54 you can do things like suppress those items coming back,
01:57 so you can basically suppress the ratings object,
02:00 but if you are doing that, it's probably a sign like
02:02 hey maybe I shouldn't really be designing it this way.
02:04 A lot of considerations, but here's the first rule—
02:07 do you want the embedded data most of the time?
02:11 Next, how often do you want the embedded data without the containing document?
02:15 The way our things are structured now is I cannot get the ratings
02:19 without getting the books, I cannot get individual
02:22 ratings without getting all of the ratings.
02:24 So if what I wanted to do was on the user profile page
02:27 show here are all of my individual ratings as a user
02:31 listed on my like favorites page, or things I've rated or something like this,
02:36 that's actually a little bit challenging the way things are written.
02:39 We can definitely do it, and if there's just one
02:41 query we do it that way it's totally fine,
02:43 but this is one of the tensions, you can't get the ratings without getting the books
02:47 you can't get individual ratings, without getting all the other ratings
02:50 from that particular book, there's no way MongoDB
02:53 to actually suppress that, I don't think, like you can suppress the other fields
02:56 we're using a projection right, you get all the ratings, or none of the ratings.
03:00 So how often is it necessary to get a rating without getting a book itself?
03:04 Right, if that's something you want to do often
03:07 or it's a very very hot spot in your application
03:09 maybe again you do not want to embed it,
03:11 if you want the object without the containing document.
03:14 Another really important question to answer is
03:17 is the embedded data a bounded set?
03:19 If it is just a single nested item, fine, that's no problem,
03:22 if it's a list or an array, like we have in the context of ratings,
03:25 how big could the ratings get,
03:28 how many ratings might a book have reasonably speaking;
03:31 if there's ten ratings, it's probably totally fine
03:34 to have the rating data embedded in the book,
03:36 it's nice self contained, you get a little atomicity
03:39 and some nice features of have it embedded there.
03:41 If there's a hundred ratings, maybe it's good,
03:45 if there's a thousand ratings, if there's an unbounded number of ratings
03:48 you do not want to embed it, right so is it a bounded set, first of all
03:53 and related to that, is the bounded set small,
03:55 because every time you get the book back
03:58 you're pulling all of that stuff off disk, possibly out of memory,
04:01 over network for deserialization or serialization
04:04 depending on the side that you're working with.
04:06 So that comes with a cost, and in fact,
04:08 MongoDB puts a limit on the size of these documents,
04:12 you're not allowed to have a document larger than 16 MB,
04:17 in fact, if you try to take a document that's larger than 16 MB
04:20 and save it into MongoDB, even if you pull it back,
04:23 add something it makes it a little bit bigger and you call save
04:26 it's going to totally fail and say no, no, no this is over the limit.
04:29 So this should not be thought of as like a safe upper bound
04:33 this should be thought of as like the absolute limit
04:36 if you've got a document that's ten megabytes,
04:38 it doesn't mean like wow, we're only halfway there, this is amazing or great,
04:41 no, that's a huge performance cost to pull 10 MB over
04:46 every time you need a little bit of something out of there.
04:48 So really, you should aim for a much, much, much smaller thing
04:51 than the upper limit of 16 MB, but the point here is
04:53 there is actually a limit where if this embedded data outgrows that 16 MB
04:59 you just cannot save it back to the database,
05:02 that's a will no longer operate problem,
05:04 is the bound small is more of a performance trade-off type of problem, right,
05:08 but you want to think about these very, very carefully,
05:10 average size of a document is definitely something worth keeping in mind.
05:14 How varied are your queries?
05:17 Do you have like a web app and it asks like maybe ten really common questions
05:21 and you very much know the structure,
05:24 like these are the types of queries my app asks,
05:26 these are the really hot pages and here's what I want to optimize for,
05:29 or is this more of like a bi type thing where people and analysts come along
05:34 and they can ask like almost any sort of reporting question whatsoever;
05:38 it turns out the more focused your queries are,
05:41 the more likely you are to embed data in other things, right,
05:44 if you know that you typically use these things together,
05:47 then embedding them often makes a lot of sense.
05:49 If you're not really sure about the use case,
05:51 it's hard to answer the above questions,
05:53 do you want the data eighty percent of the time, I have no idea,
05:55 there's all sorts of queries, some of the time, right,
05:58 and so the more varied your queries, the more likely you are going to
06:00 tend towards the normalized data, not the embedded modeling data.
06:06 And finally, related to this how varied are your queries
06:09 as are you working with an integration database that lives at the center
06:14 and almost is used for inter-process, inter-application communication
06:17 or is it very focused application database?
06:19 We're going to dig into that idea next.