Saturday, April 16, 2005
Using a Regular Expression to Match HTML
Using a Regular Expression to Match HTML
The naive approach is something like:
"[^"]*?"(\s*(and|or)\s*"[^"]*?")*
This is basically matching an expression like:
"some text"
or
"some text" and "some more text"
or
"some text" and "some more" or "even more"
and so on.
This expression doesn't handle escaped double quotes within the text.
I found a mistake. For some reason my blogging engine capitalized some characters. Also, if a tag is on multiple lines, the expression above is broken. Here's my updated one.
\s]+))?)+\s*|\s*)/?>