On 11/22/2010 7:40 AM, Erick Erickson wrote:
As I remember, PatternReplace... isn't in 1.4, so you'd have to move to 3.x
or trunk.
You could always write a custom class that did what you wanted, it's
actually
pretty easy.
PatternReplaceCharFilterFactory isn't in 1.4, but PatternReplaceFilterFactory
is. I'm using it in my 1.4.1 installation. The CharFilter version gets
applied before tokenization, which caused problems for me in my testing of
branch_3x. In situations where the order of operations isn't important, the
CharFilter option would be great.
Based on their description, I'd think what they actually want is
WordDelimiterFilterFactory with preserveOriginal and catenateWords
turned on at a minimum. That should match on any likely representation
of J.R.R. Tolkien. The other options can also be useful.
In my schema, the index analyzer has WordDelimiterFilterFactory with
everything turned on except catenateAll, and the query analyzer is the
same except all three catenate options are turned off.
Shawn