subject:"Re\: Defining tokenizer pattern with < character"

RE: Defining tokenizer pattern with < character

2013-03-01 Thread Van Tassell, Kristian

It was a subset of HTML, yes, and it appears to work for my needs, thank you! -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Friday, March 01, 2013 11:31 AM To: solr-user@lucene.apache.org Subject: Re: Defining tokenizer pattern with < character Are

Re: Defining tokenizer pattern with < character

2013-03-01 Thread Walter Underwood

Are you trying to strip out HTML tags? There are built-in classes that do that. Or you might want to parse the XML or HTML before you pass it to Solr. An XML parser will interpret CDATA so that you never have to think about it. The parsed data is just text. wunder On Mar 1, 2013, at 9:21 AM, S

Re: Defining tokenizer pattern with < character

2013-03-01 Thread Steve Rowe

Kristian, I think what you want is pattern="<[^>]>" (untested) - that is, you probably don't want to regex-escape the character class brackets "[" and "]", and you should html-escape the angle brackets. Steve On Mar 1, 2013, at 11:42 AM, "Van Tassell, Kristian" wrote: > I'm trying to defin