Can you please send us tokens you get (and positions) when you analyze *WiFi device*

On 15.01.2016 13:15, Modassar Ather wrote:
Are you saying that WiFi Wi-Fi and Wi Fi should not match each other?
I am using WhiteSpaceTokenizer in my analysis chain so wi fi becomes two
different token. Please refer to my examples given in previous mail about
the issues faced.
Wi Fi are two term which will match but what happens if for a content
having *WiFi device* is searched with *"WiFi device"*. It will not match as
there is a position increment by WordDelimiterFilter for WiFi.
"WiFi device"~1 will match which is confusing that there is no gap in the
content why a slop is required.

Why do you use WordDelimiterFilter? Can you give us few examples where it
is useful?
It is useful when a word like* lucene-search documentation *is indexed with
WordDelimiterFilter and it is broken in two terms like lucene and search
then it will be helpful to get the documents containing it for queries like
lucene documentation or search documentation.

Best,
Modassar

On Fri, Jan 15, 2016 at 2:14 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

Modassar,
Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why
do you use WordDelimiterFilter? Can you give us few examples where it is
useful?

Thanks,
Emir


On 15.01.2016 05:13, Modassar Ather wrote:

Thanks for your responses.

It seems to me that you don't want to split on numbers.
It is not with number only. Even if you try to analyze WiFi it will create
4 token one of which will be at position 2. So basically the issue is with
position increment which causes few of the queries behave unexpectedly.

Which release of Solr are you using?
I am using Lucene/Solr-5.4.0.

Best,
Modassar

On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupansky <jack.krupan...@gmail.com
wrote:

Which release of Solr are you using? Last year (or so) there was a Lucene
change that had the effect of keeping all terms for WDF at the same
position. There was also some discussion about whether this was either a
bug or a bug fix, but I don't recall any resolution.

-- Jack Krupansky

On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather <modather1...@gmail.com>
wrote:

Hi,
I have following definition for WordDelimiterFilter.

<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>

The analysis of 3d shows following four tokens and their positions.

token         position
3d             1
3               1
3d             1
d               2

Please help me understand why d is at 2? Should not it also be at

position

1.
Is it a bug and if not is there any attribute which I can use to
restrict
the position increment?

Thanks,
Modassar


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Reply via email to