i have data frame of x , y coordinates, , classification (a or b) best way apply operation on consecutive rows classification type has been repeated?
here example:
set.seed(1) n = 9 x = 1:n y = runif(n) df = data.frame(x,y,type=sample(c("a","b"),n,replace=true))
which produces following:
+---+---+-----------+------+ | | x | y | type | +---+---+-----------+------+ | 1 | 1 | 0.2655087 | | | 2 | 2 | 0.3721239 | | | 3 | 3 | 0.5728534 | | | 4 | 4 | 0.9082078 | b | | 5 | 5 | 0.2016819 | | | 6 | 6 | 0.8983897 | b | | 7 | 7 | 0.9446753 | | | 8 | 8 | 0.6607978 | b | | 9 | 9 | 0.6291140 | b | +---+---+-----------+------+
so want to ddply(...)
type orperation, average x , y coordinate when 'type' classification repeated across consecutive rows, in above, rows 1:3 should collapse 1 row, rows 4:7 unaffected , rows 8:9, collapsing 1 row, result should return 6 rows.
a few methods can think of using base, dplyr
, data.table
## put numerical groups df$grp <- match(df$type, letters) ## use rle find consecutive groups ngroups <- length(rle(df$grp)[[1]]) ## returns number of groups grp <- rep(seq(1,ngroups,1), rle(df$grp)$length) ## put rle groups onto data df$rle_grp <- grp ## perform calculation
base r
aggregate(x=df[,c("x","y")], by=list(df$rle_grp), fun=mean) # group.1 x y #1 1 2.0 0.4034953 #2 2 4.0 0.9082078 #3 3 5.0 0.2016819 #4 4 6.0 0.8983897 #5 5 7.0 0.9446753 #6 6 8.5 0.6449559
dplyr
## using dplyr (you asked ddply, don't use plyr anymore) library(dplyr) df %>% group_by(rle_grp) %>% summarise(avgx = mean(x), avgy = mean(y)) %>% ungroup # rle_grp avgx avgy # (dbl) (dbl) (dbl) #1 1 2.0 0.4034953 #2 2 4.0 0.9082078 #3 3 5.0 0.2016819 #4 4 6.0 0.8983897 #5 5 7.0 0.9446753 #6 6 8.5 0.6449559
data.table
## or using data.table package of choice library(data.table) setdt(df) df[, .(avgx = mean(x), avgy = mean(y)) , by=.(rle_grp)] # rle_grp avgx avgy #1: 1 2.0 0.4034953 #2: 2 4.0 0.9082078 #3: 3 5.0 0.2016819 #4: 4 6.0 0.8983897 #5: 5 7.0 0.9446753 #6: 6 8.5 0.6449559
Comments
Post a Comment