i have collection in mongodb:
{ "_id" : objectid("56a5f47ed420cf0db5b70242"), "tag" : "swift", "values" : [ { "word" : "osx", "value" : 0.02 }, { "word" : "compiler", "value" : 0.01 } ] }, { "_id" : objectid("56a5f47ed420cf0db5b70243"), "tag" : "c++", "values" : [ { "word" : "namespace", "value" : 0.04 }, { "word" : "compiler", "value" : 0.05 } ] }
i need transform in collection this:
{ "_id" : objectid("56a5f4e5d420cf0db5b70247"), "word" : "namespace", "values" : [ { "tag" : "c++", "value" : 0.04 } ] }, { "_id" : objectid("56a5f4e5d420cf0db5b70248"), "word" : "compiler", "values" : [ { "tag" : "swift", "value" : 0.01 }, { "tag" : "c++", "value" : 0.05 } ] }, { "_id" : objectid("56a5f4e5d420cf0db5b70249"), "word" : "osx", "values" : [ { "tag" : "swift", "value" : 0.02 } ] }
i'm new working mongodb , mapreduce , have questions:
- should use mapreduce or aggregation framework $out collection in case?
- which approach more memory effective? because collection large (3gb) , have 8gb of ram, i'm afraid of getting outofmemory error.
- if use mapreduce, should in map , reduce stages? should emit items
{"word": word, "values": {"tag":tag, "value": value} }
in map stage , merge reduce state?
using .aggregate()
method.
you need denormalize "values" array using $unwind
operator. last stage in pipeline $group
stage group document "values.word" , use $push
accumulator operator return , array of sub-documents each group.
from there, can insert documents new collection using "bulk" operations.
var bulk = db.mycollection.initializeorderedbulkop(); var count = 0; db.collection.aggregate( [ { "$unwind": "$values" }, { "$group": { "_id": "$values.word", "values": { "$push": { "tag": "$tag", "value": "$values.value" } } } } ]).foreach(function(doc) { bulk.insert( { "word": doc._id, "values": doc.values } ); count++; if ( count % 1000 === 0 ) { // execute per 1000 operations , re-init bulk.execute(); bulk = db.mycollection.initializeorderedbulkop(); } }); // clean queues if (count > 0 ) { bulk.execute(); }
Comments
Post a Comment