i want update values of table based on value found in specific column of table. i.e.
tb1<-data.table(w=c('xray','yankee','zulu','alpha','bravo','charlie', 'xray','yankee','zulu','alpha','bravo','charlie'), x=c('alpha','bravo','charlie','xray','yankee','zulu', 'xray','yankee','zulu','alpha','bravo','charlie'), y=c('zulu','yankee','xray','charlie','bravo','alpha')) tb2<-data.table(z= c('alpha','bravo','charlie'), z2=c('zalpha','zbravo','zcharlie'))
in column "w"of tb1, values not appear in col "z" of tb2 should replaced "none". code work short table.
tb1[,"w":=lapply(tb1$w,function(u){ifelse((u %in% tb2$z),u,"none")})]
however, actual table has more 2 million records , above instruction slow (>20 mins before stopping it). eventually, need substitute values in columns "w',"x" , "y" of tb1 not in tb2$z. correct "data.table" approach accomplish this? have tried subsetting assuming more efficient group values before doing replacement. none of combinations have used .sd give me correct answer. help.
we can use %chin%
create logical index of elements in 'w' not found in 'z', assign (:=
) corresponding elements in 'w' "none".
tb1[!w %chin% tb2$z, w:= "none"] tb1 # w x y # 1: none alpha zulu # 2: none bravo yankee # 3: none charlie xray # 4: alpha xray charlie # 5: bravo yankee bravo # 6: charlie zulu alpha # 7: none xray zulu # 8: none yankee yankee # 9: none zulu xray #10: alpha alpha charlie #11: bravo bravo bravo #12: charlie charlie alpha
if need change columns in 'tb1' based on 'z' column of 'tb2', can loop through columns of 'tb1' , assign showed earlier.
nm1 <- names(tb1) for(j in seq_along(tb1)){ tb1[!tb1[[j]] %chin% tb2$z, nm1[j] := "none"] } tb1 # w x y # 1: none alpha none # 2: none bravo none # 3: none charlie none # 4: alpha none charlie # 5: bravo none bravo # 6: charlie none alpha # 7: none none none # 8: none none none # 9: none none none #10: alpha alpha charlie #11: bravo bravo bravo #12: charlie charlie alpha
Comments
Post a Comment