java - How to vectorize text file in mahout? -


i'm having text file label , tweets .

    positive,i love car     negative,i hate book     positive,good product. 

i need convert each line vector value.if use seq2sparse command means whole document gets converted vector,but need convert each line vector not whole document. ex : key : positive value : vectorvalue(tweet) how can achieve in mahout?


/* here have done */

    stringtokenizer str= new stringtokenizer(line,",");             string label=str.nexttoken();             while (str.hasmoretokens())             {             tweetline =str.nexttoken();             system.out.println("tweetline"+tweetline);             stringtokenizer words = new stringtokenizer(tweetline," ");             while(words.hasmoretokens()){             featurelist.add(words.nexttoken());}             }             vector unclassifiedinstancevector = new randomaccesssparsevector(tweetline.split(" ").length);  featurevectorencoder vectorencoder = new adaptivewordvalueencoder(label);             vectorencoder.setprobes(1);             system.out.println("feature list: "+featurelist);             (object feature: featurelist) {                 vectorencoder.addtovector((string) feature, unclassifiedinstancevector);             }             context.write(new text("/"+label), new vectorwritable(unclassifiedinstancevector)); 

thanks in advance

you can write app hdfs path sequencefile.writer

           fs = filesystem.get(hbaseconfiguration.create());            string newpath =   "/foo/mahouttest/part-r-00000";            path newpathfile = new path(newpath);            text key = new text();            vectorwritable value = new vectorwritable();            sequencefile.writer writer = sequencefile.createwriter(fs, conf, newpathfile,                 key.getclass(), value.getclass());                  .....            key.set("c/"+label);            value.set(unclassifiedinstancevector );            writer.append(key,value); 

Comments