i trying use regular expression findall()
. issue i'm having there unknown number of whitespace characters (spaces, tabs, linefeeds, carriage returns) in patterns.
in example bellow want use findall()
text inside <d> </d>
whenever </a>
found after </d>
. problem there whitespace characters after </d>
.
in example below need retrieve second text
. regular expression have works in there no whitespace between </d>
, </a>
. tried:
regex = '<d>(.+?)</d></a>' <a> <b> text </b> <d> second text</d> </a>
if need match whitespace between </d>
, </a>
:
regex = r'<d>(.+?)</d>\s*</a>'
notice using r''
raw string literal regular expressions in python, avoid double-escaping needed in normal strings:
regex = '<d>(.+?)</d>\\s*</a>'
and make .
match newlines, can use re.dotall flag matching
Comments
Post a Comment