Can you please send us tokens you get (and positions) when you analyze *WiFi device*
Tokens generated and their respective positions. WiFi 1 Wi 1 WiFi 1 Fi 2 device 3 Best, Modassar On Fri, Jan 15, 2016 at 6:25 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Can you please send us tokens you get (and positions) when you analyze > *WiFi device* > > On 15.01.2016 13:15, Modassar Ather wrote: > >> Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? >> I am using WhiteSpaceTokenizer in my analysis chain so wi fi becomes two >> different token. Please refer to my examples given in previous mail about >> the issues faced. >> Wi Fi are two term which will match but what happens if for a content >> having *WiFi device* is searched with *"WiFi device"*. It will not match >> as >> there is a position increment by WordDelimiterFilter for WiFi. >> "WiFi device"~1 will match which is confusing that there is no gap in the >> content why a slop is required. >> >> Why do you use WordDelimiterFilter? Can you give us few examples where it >> is useful? >> It is useful when a word like* lucene-search documentation *is indexed >> with >> >> WordDelimiterFilter and it is broken in two terms like lucene and search >> then it will be helpful to get the documents containing it for queries >> like >> lucene documentation or search documentation. >> >> Best, >> Modassar >> >> On Fri, Jan 15, 2016 at 2:14 PM, Emir Arnautovic < >> emir.arnauto...@sematext.com> wrote: >> >> Modassar, >>> Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why >>> do you use WordDelimiterFilter? Can you give us few examples where it is >>> useful? >>> >>> Thanks, >>> Emir >>> >>> >>> On 15.01.2016 05:13, Modassar Ather wrote: >>> >>> Thanks for your responses. >>>> >>>> It seems to me that you don't want to split on numbers. >>>> It is not with number only. Even if you try to analyze WiFi it will >>>> create >>>> 4 token one of which will be at position 2. So basically the issue is >>>> with >>>> position increment which causes few of the queries behave unexpectedly. >>>> >>>> Which release of Solr are you using? >>>> I am using Lucene/Solr-5.4.0. >>>> >>>> Best, >>>> Modassar >>>> >>>> On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupansky < >>>> jack.krupan...@gmail.com >>>> wrote: >>>> >>>> Which release of Solr are you using? Last year (or so) there was a >>>> Lucene >>>> >>>>> change that had the effect of keeping all terms for WDF at the same >>>>> position. There was also some discussion about whether this was either >>>>> a >>>>> bug or a bug fix, but I don't recall any resolution. >>>>> >>>>> -- Jack Krupansky >>>>> >>>>> On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather < >>>>> modather1...@gmail.com> >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>>> I have following definition for WordDelimiterFilter. >>>>>> >>>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" >>>>>> generateNumberParts="1" catenateWords="1" catenateNumbers="1" >>>>>> catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/> >>>>>> >>>>>> The analysis of 3d shows following four tokens and their positions. >>>>>> >>>>>> token position >>>>>> 3d 1 >>>>>> 3 1 >>>>>> 3d 1 >>>>>> d 2 >>>>>> >>>>>> Please help me understand why d is at 2? Should not it also be at >>>>>> >>>>>> position >>>>> >>>>> 1. >>>>>> Is it a bug and if not is there any attribute which I can use to >>>>>> restrict >>>>>> the position increment? >>>>>> >>>>>> Thanks, >>>>>> Modassar >>>>>> >>>>>> >>>>>> -- >>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management >>> Solr & Elasticsearch Support * http://sematext.com/ >>> >>> >>> > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > >