Re: Multiple passes with WordDelimiterFilterFactory

Shawn Heisey Mon, 30 Aug 2010 08:08:11 -0700

 On 8/29/2010 2:17 PM, Erick Erickson wrote:

<<<charFilters are applied even before the tokenizer>>>
Try putting this after any instances of, say, WhiteSpaceTokenizerFactory
in your analyzser definition, and I believe you'll see that this is not
true.
At least looking at this in the analysis page from SOLR admin sure doesn't
seem to support that assertion.

It was the analysis page (branch_3x revision 990461) that told me thatmy charFilter was applied first. I had not actually tried it for real.I was in the process of trying it for real today with a new regex, but Iam running into trouble with my it. The regex with a custom range inbrackets (even run through an XML encoder) won't allow Solr toinitialize. I also tried [[:punc:]] and \p.

If anyone has a regex that matches all punctuation and works with Solr,please share it.


Thanks,
Shawn

Re: Multiple passes with WordDelimiterFilterFactory

Reply via email to