i trying use regular expression findall(). issue i'm having there unknown number of whitespace characters (spaces, tabs, linefeeds, carriage returns) in patterns.
in example bellow want use findall() text inside <d> </d> whenever </a> found after </d>. problem there whitespace characters after </d>.
in example below need retrieve second text. regular expression have works in there no whitespace between </d> , </a>. tried:
regex = '<d>(.+?)</d></a>' <a> <b> text </b> <d> second text</d> </a>
if need match whitespace between </d> , </a>:
regex = r'<d>(.+?)</d>\s*</a>' notice using r'' raw string literal regular expressions in python, avoid double-escaping needed in normal strings:
regex = '<d>(.+?)</d>\\s*</a>' and make . match newlines, can use re.dotall flag matching
Comments
Post a Comment