On 8/29/2010 2:17 PM, Erick Erickson wrote:
<<<charFilters are applied even before the tokenizer>>>
Try putting this after any instances of, say, WhiteSpaceTokenizerFactory
in your analyzser definition, and I believe you'll see that this is not
true.
At least looking at this in the analysis page from SOLR admin sure doesn't
seem to support that assertion.

It was the analysis page (branch_3x revision 990461) that told me that my charFilter was applied first. I had not actually tried it for real. I was in the process of trying it for real today with a new regex, but I am running into trouble with my it. The regex with a custom range in brackets (even run through an XML encoder) won't allow Solr to initialize. I also tried [[:punc:]] and \p.

If anyone has a regex that matches all punctuation and works with Solr, please share it.

Thanks,
Shawn

Reply via email to