i have bunch of text in dataframe (df) contains 3 lines of address in 1 column , goal extract district (central part of text), eg:
73 greenhill gardens, wandsworth, london 22 acacia heights, lambeth, london
fortunately me in 95% of cases person inputing data has used commas separate text want, 100% of time ends ", london" (ie comma space london). to state things therefore goal extract text before ", london" , after preceding comma
my desired output is:
wandsworth lambeth
i can manage extract part before:
df$extraction <- sub('.*,\\s*','',address)
and after
df$extraction <- sub('.*,\\s*','',address)
but not middle part need. can please help?
many thanks!
here couple of approaches:
# target ", london" , start of string # until first comma followed space, # , replace "" gsub("^.+?, |, london", "", address) #[1] "wandsworth" "lambeth"
or
# target whole string, use capture group # text before ", london" , after first comma. # replace string captured group. sub(".+, (.*), london", "\\1", address) #[1] "wandsworth" "lambeth"
Comments
Post a Comment