MongoDB: Many to Many - When to implement with denormalized vs normalized? -


it seems me there 2 ways implement many many relationships in mongodb, normalized or denormalized, both of have trade offs. how decide model implement? remainder of op explains trade offs examples.

take product sku relationship. 1 product has many skus, , each sku may associated many products. (i not intend answer of question specific example, rather arbitrary many many relationship).

normalized

in normalized model, there 2 collections, product , sku. each document in both collections contains array of (object)ids pointing other collection.

pros:

  • data not duplicated
  • prevents update/insert/deletion anomalies (except foreign key array)
  • finding distinct list of skus not expensive

cons:

  • modifications 2 separate documents not atomic
  • finding skus given product, or reverse, requires 2 queries db.
  • foreign keys not enforced, lending self anomalies (but in foreign key array)

e.g.,

db.store.sku.insert({     'sku_id': 1,     'name': 'cheese',     'price': 0.50,     'products': [10] }) db.store.sku.insert({     'sku_id': 2,     'name': 'beef paddy',     'price': 0.25,     'products': [10] }) db.store.product.insert({     'product_id': 10,     'name': 'cheese burger',     'skus': [1, 2] }); db.store.product.insert({     'product_id': 11,     'name': 'hamburger',     'skus': [2] }); 

denormalized

in denormalized model, there 1 collection. each document in collection contains array of documents.

pros:

  • modifications single document atomic
  • finding skus given product, or reverse, requires 1 query db.

cons:

  • data duplicated
  • lends update/insert/deletion anomalies
  • finding distinct list of skus expensive (i presuming db.store.product.distinct(skus) relatively expensive)

e.g.,

db.store.product.insert({     'product_id': 10,     'name': 'cheese burger',     'skus': [          {              'sku_id': 1,              'name': 'cheese',              'price': 0.50,          },          {              'sku_id': 2,              'name': 'beef paddy',              'price': 0.25,          }      ] }); db.store.product.insert({     'product_id': 11,     'name': 'hamburger',     'skus': [          {              'sku_id': 2,              'name': 'beef paddy',              'price': 0.25,          }      ] }); 

it important note there hybrid models, queried attributes and/or least changed attributes stored in denormalized collection, , other attributes stored in reference collection.

how 1 decide method implement many many relationship in mongodb?

i think call "denormalized" plain wrong problem scope, since you're storing different, equivalent "sku" subdocuments not related in data structure. having 2 different "sku_id":2 fields fine, 2 blog posts can both have "first" comment different users, in scenario isn't. if allow broken states in data model you're in serious trouble before start.

your "normalized" approach better, still not correct scenario. data modeling implies relationship between products , skus (in terms of graphs) directed edge, meaning product can refer sku without sku referring product.

i can think of 2 correct modeling approaches that: either reference sku in product , don't have reference back, since it's implicitly there. if need way sku product list, suggest building query , index that, pretty please don't double link this. second approach sql n:m approach: have set of documents contain information relationship. honestly, think unless you're running serious operations on link between products , skus, bad correct.

oh , foreign key constraints: there no such thing in mongodb. there isn't place define such thing, since there no schema definition.


Comments