so let's have data follow:
patient_id lab_type value 1 food 10 1 food 8 2 food 3 2 food 5 1 shot 4 1 shot 10 2 shot 2 2 shot 4
then group things such groupby(['patient_id', 'lab_type'])
after that, i'd aggregate on value
different each lab_type
. on food
i'd aggregate using mean
, on shot
i'd aggregate using sum
.
the final data should this:
patient_id lab_type value 1 food 9 (10 + 8 / 2) 2 food 4 (3 + 5 / 2) 1 shot 14 (10 + 4) 2 shot 6 (2 + 4)
on food i'd aggregate using mean , on shot i'd aggregate using sum.
just use .apply
, pass custom function:
def calc(g): if g.iloc[0].lab_type == 'shot': return sum(g.value) else: return np.mean(g.value) result = df.groupby(['patient_id', 'lab_type']).apply(calc)
here calc
receives per-group dataframe shown in panda's split-apply-combine. result want:
patient_id lab_type 1 food 9 shot 14 2 food 4 shot 6 dtype: float64
Comments
Post a Comment