That is one seriously manly regex, but I'd recommend using the Tag Soup parser instead:
http://ccil.org/~cowan/XML/tagsoup/ wunder On 10/4/07 10:11 PM, "J.J. Larrea" <[EMAIL PROTECTED]> wrote: > It uses a PatternTokenizerFactory with a RegEx that swallows runs of HTML- or > XML-like tags: > > (?:\s*</?\w+((\s+\w+(\s*=\s*(?:"?&"'.?'|[^'">\s]+))?)\s*|\s*)/?>\s*)|\s