r - Problems with secondary key of data.table -

here reproducible example:

mydt <- data.table(id=c('a','b','b'), val=c('check','check','a')); mydt[val == "check"]; # <= secondary index created on calling mydt[, val:=ifelse(.n>1, '2', '1'), by=id]  mydt #    id val # 1:    1 # 2:  b   2 # 3:  b   2  key(mydt) # null key2(mydt)  # [1] "val"

now, call simple command gives rather strange (for me) result:

mydt[val=='2', res:='yes'][]; #    id val res # 1:    1  na # 2:  b   2 yes # 3:  b   2  na

with filter val=='2', expected records 2 , 3, in fact got record 3. due secondary key because removal brings expected behavior:

set2key(mydt, null) mydt[val=='2', res:='yes'][]; #    id val res # 1:    1  na # 2:  b   2 yes # 3:  b   2 yes

i wondering if it's bug or expected behavior. in case, not desired: did not know such thing secondary key (before asking that question), , spent lot of time trying figure out why miss records. me, solved problem adding set2key(mydt, null) instruction worrying similar thing happen in other parts of code , don't know how detect/prevent - wouldn't add set2key(., null) calls after every other line...

this indeed bug (i reported turned out reported already), , fixed in package version 1.9.7 - works!

Shah

Search This Blog

r - Problems with secondary key of data.table -

Comments

Post a Comment