Hmmm, good point on WordDelimiterFilterFactory. You're right, that should
work.

Although there'd still be a problem with J. R. R. never matching
jrr. But that wouldn't be solved by Pattern.... either. I'd try to
define the problem away <G>...

good catch
Erick

On Mon, Nov 22, 2010 at 12:15 PM, Shawn Heisey <s...@elyograg.org> wrote:

> On 11/22/2010 7:40 AM, Erick Erickson wrote:
>
>> As I remember, PatternReplace... isn't in 1.4, so you'd have to move to
>> 3.x
>> or trunk.
>>
>> You could always write a custom class that did what you wanted, it's
>> actually
>> pretty easy.
>>
>
> PatternReplaceCharFilterFactory isn't in 1.4, but
> PatternReplaceFilterFactory is.  I'm using it in my 1.4.1 installation.  The
> CharFilter version gets applied before tokenization, which caused problems
> for me in my testing of branch_3x.  In situations where the order of
> operations isn't important, the CharFilter option would be great.
>
> Based on their description, I'd think what they actually want is
> WordDelimiterFilterFactory with preserveOriginal and catenateWords turned on
> at a minimum.  That should match on any likely representation of J.R.R.
> Tolkien.  The other options can also be useful.
>
> In my schema, the index analyzer has WordDelimiterFilterFactory with
> everything turned on except catenateAll, and the query analyzer is the same
> except all three catenate options are turned off.
>
> Shawn
>
>

Reply via email to