r - secondary key ("index" attribute) in data.table is lost when table is "copied" by selecting columns -
i have data.table mydt
, , i'm making "copies" of table 3 different ways:
mydt <- data.table(cola = 1:3) mydt[cola == 3] copy1 <- copy(mydt) copy2 <- mydt # yes know it's reference, not real copy copy3 <- mydt[,.(cola)] # list columns original table
then i'm comparing copies original table:
identical(mydt, copy1) # true identical(mydt, copy2) # true identical(mydt, copy3) # false
i trying figure out difference between mydt
, copy3
identical(names(mydt), names(copy3)) # true all.equal(mydt, copy3, check.attributes=false) # true all.equal(mydt, copy3, check.attributes=false, trim.levels=false, check.names=true) # true attr.all.equal(mydt, copy3, check.attributes=false, trim.levels=false, check.names=true) # null all.equal(mydt, copy3) # [1] "attributes: < length mismatch: comparison on first 1 components >" attr.all.equal(mydt, copy3) # [1] "attributes: < names: 1 string mismatch >" # [2] "attributes: < length mismatch: comparison on first 3 components >" # [3] "attributes: < component 3: attributes: < modes: list, null > >" # [4] "attributes: < component 3: attributes: < names target not current > >" # [5] "attributes: < component 3: attributes: < current not list-like > >" # [6] "attributes: < component 3: numeric: lengths (0, 3) differ >"
my original question how understand last output. came using attributes()
function:
attr0 <- attributes(mydt) attr3 <- attributes(copy3) str(attr0) str(attr3)
it has shown original data.table
had index
attribute not copied when created copy3
.
in order make question bit clearer (and maybe useful future readers), happened here (probably not) set secondary key while explicitly calling set2key
, or, data.table
seemingly set secondary key while making ordinary operations such filtering. (not so) new feature added in v 1.9.4
dt[column==value] , dt[column %in% values] optimized use dt's key when key(dt)[1]=="column", otherwise secondary key (a.k.a. index) automatically added next dt[column==value] faster. no code changes needed; existing code should automatically benefit. secondary keys can added manually using set2key() , existence checked using key2(). these optimizations , function names/arguments experimental , may turned off options(datatable.auto.index=false).
lets reproduce this
mydt <- data.table(a = 1:3) options(datatable.verbose = true) mydt[a == 3] # creating new index 'a' <~~~~ here # forder took 0 sec # coercing double column i.'v1' integer match type of x.'a'. please avoid coercion efficiency. # starting bmerge ...done in 0 secs # # 1: 3 attr(mydt, "index") # or using `key2(mydt)` # integer(0) # attr(,"__a") # integer(0)
so, unlike assuming, did create copy , secondary key wasn't transferred it. compare
copy1 <- mydt attr(copy1, "index") # integer(0) # attr(,"__a") # integer(0) copy2 <- mydt[,.(a)] # detected j uses these columns: <~~~ copy occures attr(copy2, "index") # null identical(mydt, copy1) # [1] true identical(mydt, copy2) # [1] false
and further validation
tracemem(mydt) # [1] "<00000000159cbbb0>" tracemem(copy1) # [1] "<00000000159cbbb0>" tracemem(copy2) # [1] "<000000001a5a46d8>"
the interesting conclusion here, 1 claim, [.data.table
does create copy, if object remains unchanged.
Comments
Post a Comment