Regex extraction of text data between 2 commas in R -


i have bunch of text in dataframe (df) contains 3 lines of address in 1 column , goal extract district (central part of text), eg:

73 greenhill gardens, wandsworth, london 22 acacia heights, lambeth, london 

fortunately me in 95% of cases person inputing data has used commas separate text want, 100% of time ends ", london" (ie comma space london). to state things therefore goal extract text before ", london" , after preceding comma

my desired output is:

wandsworth lambeth 

i can manage extract part before:

df$extraction <- sub('.*,\\s*','',address) 

and after

df$extraction <- sub('.*,\\s*','',address) 

but not middle part need. can please help?

many thanks!

here couple of approaches:

# target ", london" , start of string # until first comma followed space, # , replace "" gsub("^.+?, |, london", "", address) #[1] "wandsworth" "lambeth"  

or

# target whole string, use capture group  # text before ", london" , after first comma. # replace string captured group. sub(".+, (.*), london", "\\1", address) #[1] "wandsworth" "lambeth"  

Comments