Saturday, April 16, 2005

 

Using a Regular Expression to Match HTML

Using a Regular Expression to Match HTML

The naive approach is something like:

"[^"]*?"(\s*(and|or)\s*"[^"]*?")*

This is basically matching an expression like:

"some text"

or

"some text" and "some more text"

or

"some text" and "some more" or "even more"

and so on.

This expression doesn't handle escaped double quotes within the text.
I found a mistake. For some reason my blogging engine capitalized some characters. Also, if a tag is on multiple lines, the expression above is broken. Here's my updated one.

\s]+))?)+\s*|\s*)/?>

Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?