i've df this
unit exitsn_hourly interval 1867 r081 104 00:00:00-04:00:00 1868 r081 0 04:00:00-04:00:00 1869 r081 129 04:00:00-08:00:00 1870 r081 521 08:00:00-12:00:00 1871 r081 1048 12:00:00-16:00:00 2838 r032 38 00:00:00-04:00:00 2839 r032 0 04:00:00-04:00:00 2840 r032 89 04:00:00-08:00:00 2841 r032 470 08:00:00-12:00:00
i need delete entire row when interval has particular format
1868 r081 0 04:00:00-04:00:00
i not want remove 04:00:00-04:00:00
such similar values
01:00:00-01:00:00
actually original df. created interval
c/a unit scp daten timen descn entriesn exitsn 0 a002 r051 02-00-00 06-29-13 00:00:00 regular 4174592 1433672 1 a002 r051 02-00-00 06-29-13 04:00:00 regular 4174628 1433675 2 a002 r051 02-00-00 06-29-13 08:00:00 regular 4174641 1433706 3 a002 r051 02-00-00 06-29-13 12:00:00 regular 4174741 1433775 4 a002 r051 02-00-00 06-29-13 16:00:00 regular 4174936 1433826 5 a002 r051 02-00-00 06-29-13 20:00:00 regular 4175270 1433877 6 a002 r051 02-00-00 06-30-13 00:00:00 regular 4175403 1433908 7 a002 r051 02-00-00 06-30-13 04:00:00 regular 4175441 1433914 8 a002 r051 02-00-00 06-30-13 08:00:00 regular 4175457 1433928 9 a002 r051 02-00-00 06-30-13 12:00:00 regular 4175520 1433981
i created interval using code
import copy df = copy.deepcopy(turnstile_data) pdf = df.shift(periods=1) df['entriesn_hourly'] = df['entriesn'] - pdf['entriesn'].fillna(0) df['exitsn_hourly'] = df['exitsn'] - pdf['exitsn'].fillna(0) df['interval'] = pdf['timen']+'-'+ df['timen'].fillna(0) df.loc[(df['entriesn'] == 0), 'entriesn_hourly'] = 0 df.loc[(df['exitsn'] == 0), 'exitsn_hourly'] = 0 df.loc[(df['c/a'] != pdf['c/a']) | (df['unit'] != pdf['unit']) | (df['scp'] != pdf['scp']), ['entriesn_hourly', 'exitsn_hourly','interval']] = 0 df = df[df.interval != 0] print df.head(20) head7=copy.deepcopy(df) required_df=head7[['unit','exitsn_hourly','interval']].groupby(head7.unit) print required_df.head(5)
probably want split interval interval_start , interval_end , check whether they're equal:
df['interval_start'] = df['interval'].map(lambda s: s.split('-')[0]) df['interval_end'] = df['interval'].map(lambda s: s.split('-')[1]) df.query("interval_start != interval_end") unit exitsn_hourly interval interval_start interval_end 1867 r081 104 00:00:00-04:00:00 00:00:00 04:00:00 1869 r081 129 04:00:00-08:00:00 04:00:00 08:00:00 1870 r081 521 08:00:00-12:00:00 08:00:00 12:00:00 1871 r081 1048 12:00:00-16:00:00 12:00:00 16:00:00 2838 r032 38 00:00:00-04:00:00 00:00:00 04:00:00 2840 r032 89 04:00:00-08:00:00 04:00:00 08:00:00 2841 r032 470 08:00:00-12:00:00 08:00:00 12:00:00
Comments
Post a Comment