i'm having text file label , tweets .
positive,i love car negative,i hate book positive,good product.
i need convert each line vector value.if use seq2sparse
command means whole document gets converted vector,but need convert each line vector not whole document. ex : key : positive value : vectorvalue(tweet) how can achieve in mahout?
/* here have done */
stringtokenizer str= new stringtokenizer(line,","); string label=str.nexttoken(); while (str.hasmoretokens()) { tweetline =str.nexttoken(); system.out.println("tweetline"+tweetline); stringtokenizer words = new stringtokenizer(tweetline," "); while(words.hasmoretokens()){ featurelist.add(words.nexttoken());} } vector unclassifiedinstancevector = new randomaccesssparsevector(tweetline.split(" ").length); featurevectorencoder vectorencoder = new adaptivewordvalueencoder(label); vectorencoder.setprobes(1); system.out.println("feature list: "+featurelist); (object feature: featurelist) { vectorencoder.addtovector((string) feature, unclassifiedinstancevector); } context.write(new text("/"+label), new vectorwritable(unclassifiedinstancevector));
thanks in advance
you can write app hdfs path sequencefile.writer
fs = filesystem.get(hbaseconfiguration.create()); string newpath = "/foo/mahouttest/part-r-00000"; path newpathfile = new path(newpath); text key = new text(); vectorwritable value = new vectorwritable(); sequencefile.writer writer = sequencefile.createwriter(fs, conf, newpathfile, key.getclass(), value.getclass()); ..... key.set("c/"+label); value.set(unclassifiedinstancevector ); writer.append(key,value);
Comments
Post a Comment