Re: WordDelimiter filter, expanding to multiple words, unexpected results

Jack Krupansky Tue, 30 Dec 2014 08:32:50 -0800

Right, that's what I meant by WDF not being "magic" - you can configure it
to match any three out of four use cases as you choose, but there is no
choice that matches all of the use cases.


To be clear, this is not a "bug" in WDF, but simply a limitation.


-- Jack Krupansky

On Tue, Dec 30, 2014 at 11:12 AM, Jonathan Rochkind <rochk...@jhu.edu>
wrote:

> Thanks Erick!
>
> Yes, if I set splitOnCaseChange=0, then of course it'll work -- but then
> query for "mixedCase" will no longer also match "mixed Case".
>
> I think I want WDF to... kind of do all of the above.
>
> Specifically, I had thought that it would allow a query for "mixedCase" to
> match both/either "mixed Case" or "mixedCase" in the index. (with case
> insensitivity on top of that via another filter).
>
> That would support things like names like "duBois" which are sometimes
> spelled "du bois" and sometimes "dubois", and allow the query "duBois" to
> match both in the index.
>
> I had somehow thought that was what WDF was intended for. But it's
> actually not the usual functioning, and may not be realistic?
>
> I'm a bit confused about what splitOnCaseChange combined with
> catenateWords is meant to do at all.  It _is_ generating both the split and
> single-word tokens at query time -- but not in a way that actually allows
> it to match both the split and single-word tokens?  What is supposed to be
> the purpose/use case for splitOnCaseChange with catenateWords? If any?
>
> Jonathan
>
>
> On 12/29/14 7:20 PM, Erick Erickson wrote:
>
>> Jonathan:
>>
>> Well, it works if you set splitOnCaseChange="0" in just the query part
>> of the analysis chain. I probably mislead you a bit months ago, WDFF
>> is intended for this case iff you expect the case change to generate
>> _tokens_ that are individually meaningful.. And unfortunately
>> "significant" in one case will be not-significant in others.
>>
>> So what kinds of things do you want WDFF to handle? Case changes?
>> Letter/non-letter transitions? All of the above?
>>
>> Best,
>> Erick
>>
>>
>>
>> On Mon, Dec 29, 2014 at 3:07 PM, Jonathan Rochkind <rochk...@jhu.edu>
>> wrote:
>>
>>> On 12/29/14 5:24 PM, Jack Krupansky wrote:
>>>
>>>>
>>>> WDF is powerful, but it is not magic. In general, the indexed data is
>>>> expected to be clean while the query might be sloppy. You need to
>>>> separate
>>>> the index and query analyzers and they need to respect that distinction
>>>>
>>>
>>>
>>> I do not understand what separate query/index analysis you are
>>> suggesting to
>>> accomplish what I wanted.
>>>
>>> I understand the WDF, like all software, is not magic, of course. But I
>>> thought this was an intended use case of the WDF, with those settings:
>>>
>>> A "mixedCase" query would match "mixedCase" in the index; and the same
>>> query
>>> "mixedCase" would also match two separate words "mixed Case" in index.
>>> (Case insensitively since I apply an ICUFoldingFilter on top of that).
>>>
>>> Was I wrong, is this not an intended thing for the WDF to do? Or do I
>>> just
>>> have the wrong configuration options for it to do it? Or is it a bug?
>>>
>>> When I started this thread a few months ago, I think Erick Erickson
>>> agreed
>>> this was an intended use case for the WDF, but maybe I explained it
>>> poorly.
>>> Erick if you're around and want to at least confirm whether WDF is
>>> supposed
>>> to do this in your understanding, that would be great!
>>>
>>> Jonathan
>>>
>>

Re: WordDelimiter filter, expanding to multiple words, unexpected results

Reply via email to