mapreduce - Memory effective way to transform collection in mongodb -


i have collection in mongodb:

{         "_id" : objectid("56a5f47ed420cf0db5b70242"),         "tag" : "swift",         "values" : [                 {                         "word" : "osx",                         "value" : 0.02                 },                 {                         "word" : "compiler",                         "value" : 0.01                 }         ] }, {         "_id" : objectid("56a5f47ed420cf0db5b70243"),         "tag" : "c++",         "values" : [                 {                         "word" : "namespace",                         "value" : 0.04                 },                 {                         "word" : "compiler",                         "value" : 0.05                 }         ] } 

i need transform in collection this:

{         "_id" : objectid("56a5f4e5d420cf0db5b70247"),         "word" : "namespace",         "values" : [                 {                         "tag" : "c++",                         "value" : 0.04                 }         ] }, {         "_id" : objectid("56a5f4e5d420cf0db5b70248"),         "word" : "compiler",         "values" : [                 {                         "tag" : "swift",                         "value" : 0.01                 },                 {                         "tag" : "c++",                         "value" : 0.05                 }         ] }, {         "_id" : objectid("56a5f4e5d420cf0db5b70249"),         "word" : "osx",         "values" : [                 {                         "tag" : "swift",                         "value" : 0.02                 }         ] } 

i'm new working mongodb , mapreduce , have questions:

  1. should use mapreduce or aggregation framework $out collection in case?
  2. which approach more memory effective? because collection large (3gb) , have 8gb of ram, i'm afraid of getting outofmemory error.
  3. if use mapreduce, should in map , reduce stages? should emit items {"word": word, "values": {"tag":tag, "value": value} } in map stage , merge reduce state?

using .aggregate() method.

you need denormalize "values" array using $unwind operator. last stage in pipeline $group stage group document "values.word" , use $push accumulator operator return , array of sub-documents each group.

from there, can insert documents new collection using "bulk" operations.

var bulk = db.mycollection.initializeorderedbulkop(); var count  = 0; db.collection.aggregate( [      { "$unwind": "$values" },      { "$group": {          "_id": "$values.word",          "values": {               "$push": { "tag": "$tag", "value": "$values.value" }          }      } } ]).foreach(function(doc) {      bulk.insert( { "word": doc._id, "values": doc.values } );     count++;     if ( count % 1000 === 0 ) {         // execute per 1000 operations , re-init         bulk.execute();         bulk = db.mycollection.initializeorderedbulkop();     } }); // clean queues if (count > 0 ) {     bulk.execute(); } 

Comments