dataframe - R, Operation on data frame across consecutive rows with repeated entries -


i have data frame of x , y coordinates, , classification (a or b) best way apply operation on consecutive rows classification type has been repeated?

here example:

set.seed(1) n  = 9 x  = 1:n y  = runif(n) df = data.frame(x,y,type=sample(c("a","b"),n,replace=true)) 

which produces following:

 +---+---+-----------+------+ |   | x |     y     | type | +---+---+-----------+------+ | 1 | 1 | 0.2655087 |    | | 2 | 2 | 0.3721239 |    | | 3 | 3 | 0.5728534 |    | | 4 | 4 | 0.9082078 | b    | | 5 | 5 | 0.2016819 |    | | 6 | 6 | 0.8983897 | b    | | 7 | 7 | 0.9446753 |    | | 8 | 8 | 0.6607978 | b    | | 9 | 9 | 0.6291140 | b    | +---+---+-----------+------+ 

so want to ddply(...) type orperation, average x , y coordinate when 'type' classification repeated across consecutive rows, in above, rows 1:3 should collapse 1 row, rows 4:7 unaffected , rows 8:9, collapsing 1 row, result should return 6 rows.

a few methods can think of using base, dplyr , data.table

## put numerical groups df$grp <- match(df$type, letters)  ## use rle find consecutive groups ngroups <- length(rle(df$grp)[[1]])  ## returns number of groups grp <- rep(seq(1,ngroups,1), rle(df$grp)$length)  ## put rle groups onto data df$rle_grp <- grp  ## perform calculation 

base r

aggregate(x=df[,c("x","y")], by=list(df$rle_grp), fun=mean)  #  group.1   x         y #1       1 2.0 0.4034953 #2       2 4.0 0.9082078 #3       3 5.0 0.2016819 #4       4 6.0 0.8983897 #5       5 7.0 0.9446753 #6       6 8.5 0.6449559 

dplyr

## using dplyr (you asked ddply, don't use plyr anymore) library(dplyr) df %>%   group_by(rle_grp) %>%   summarise(avgx = mean(x),             avgy = mean(y)) %>%   ungroup  #  rle_grp  avgx      avgy #    (dbl) (dbl)     (dbl) #1       1   2.0 0.4034953 #2       2   4.0 0.9082078 #3       3   5.0 0.2016819 #4       4   6.0 0.8983897 #5       5   7.0 0.9446753 #6       6   8.5 0.6449559 

data.table

## or using data.table package of choice library(data.table) setdt(df)  df[, .(avgx = mean(x), avgy = mean(y)) , by=.(rle_grp)]  #   rle_grp avgx      avgy #1:       1  2.0 0.4034953 #2:       2  4.0 0.9082078 #3:       3  5.0 0.2016819 #4:       4  6.0 0.8983897 #5:       5  7.0 0.9446753 #6:       6  8.5 0.6449559 

Comments