Re: Position increment in WordDelimiterFilter.

Modassar Ather Fri, 15 Jan 2016 04:15:39 -0800

Are you saying that WiFi Wi-Fi and Wi Fi should not match each other?
I am using WhiteSpaceTokenizer in my analysis chain so wi fi becomes two
different token. Please refer to my examples given in previous mail about
the issues faced.
Wi Fi are two term which will match but what happens if for a content
having *WiFi device* is searched with *"WiFi device"*. It will not match as
there is a position increment by WordDelimiterFilter for WiFi.
"WiFi device"~1 will match which is confusing that there is no gap in the
content why a slop is required.


Why do you use WordDelimiterFilter? Can you give us few examples where it
is useful?
It is useful when a word like* lucene-search documentation *is indexed with
WordDelimiterFilter and it is broken in two terms like lucene and search
then it will be helpful to get the documents containing it for queries like
lucene documentation or search documentation.

Best,
Modassar

On Fri, Jan 15, 2016 at 2:14 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Modassar,
> Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why
> do you use WordDelimiterFilter? Can you give us few examples where it is
> useful?
>
> Thanks,
> Emir
>
>
> On 15.01.2016 05:13, Modassar Ather wrote:
>
>> Thanks for your responses.
>>
>> It seems to me that you don't want to split on numbers.
>> It is not with number only. Even if you try to analyze WiFi it will create
>> 4 token one of which will be at position 2. So basically the issue is with
>> position increment which causes few of the queries behave unexpectedly.
>>
>> Which release of Solr are you using?
>> I am using Lucene/Solr-5.4.0.
>>
>> Best,
>> Modassar
>>
>> On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupansky <jack.krupan...@gmail.com
>> >
>> wrote:
>>
>> Which release of Solr are you using? Last year (or so) there was a Lucene
>>> change that had the effect of keeping all terms for WDF at the same
>>> position. There was also some discussion about whether this was either a
>>> bug or a bug fix, but I don't recall any resolution.
>>>
>>> -- Jack Krupansky
>>>
>>> On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather <modather1...@gmail.com>
>>> wrote:
>>>
>>> Hi,
>>>>
>>>> I have following definition for WordDelimiterFilter.
>>>>
>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>>>> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>>>> catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
>>>>
>>>> The analysis of 3d shows following four tokens and their positions.
>>>>
>>>> token         position
>>>> 3d             1
>>>> 3               1
>>>> 3d             1
>>>> d               2
>>>>
>>>> Please help me understand why d is at 2? Should not it also be at
>>>>
>>> position
>>>
>>>> 1.
>>>> Is it a bug and if not is there any attribute which I can use to
>>>> restrict
>>>> the position increment?
>>>>
>>>> Thanks,
>>>> Modassar
>>>>
>>>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>

Re: Position increment in WordDelimiterFilter.

Reply via email to