Re: fuzzy search issue with PatternTokenizer Factory

meghana Tue, 23 Apr 2013 06:46:14 -0700

Fuzzy Search is looking independent of all the analyzer, but it seems that
its not independent of tokenizer. As If i just change my tokenizer to
*Solr.StandardTokenizerFactory* , Fuzzy search started working fine, If it
is independent of Tokenizer then this should not occur.


And I also , I had analyzed my terms in Admin UI Analysis page, and the term
coming perfectly fine as expected, only this is only issue which I am
facing. but i cant analyze the fuzzy term in Admin UI Analysis page. so not
able to catch the issue. 



Jack Krupansky-2 wrote
> Once again, fuzzy search is completely independent of your analyzer or 
> pattern tokenizer. Please use the Solr Admin UI Analysis page to debug 
> whether the terms are what you expect. And realize that fuzzy search has a 
> maximum editing distance of 2 and that includes case changes.
> 
> -- Jack Krupansky
> 
> -----Original Message----- 
> From: meghana
> Sent: Monday, April 22, 2013 3:25 AM
> To: 

> solr-user@.apache

> Subject: Re: fuzzy search issue with PatternTokenizer Factory
> 
> Jack,
> 
> the regex will split tokens by anything expect alphabets , numbers, '&' ,
> '-' and ns: (where n is number from 0 to 9999, e.g 4323s: )
> 
> Lets say for example my text is like below.
> 
> *this is nice* day & sun 53s: is risen. *
> 
> Then pattern tokenizer should create tokens as
> 
> *this is nice day & sun is risen*
> 
> pattern seem to working fine with different text,
> 
> also for fuzzy search *worde~1*, I have checked the results returns for
> patterntokenizer factory, having punctuation marks like '*WORDS,*' ,
> *WORDED....* , etc...
> 
> One more weird thing is, all the results are in uppercase letters, no
> results with lowercase results come. although it does not return all
> results
> of uppercase letters.
> 
> but not sure after changing to this fuzzy search not working properly.
> 
> 
> Jack Krupansky-2 wrote
>> Give us some examples of tokens that you are expecting that pattern to
>> tokenize. And express the pattern in simple English as well. Some some
>> actual input data.
>>
>> I suspect that Solr is working fine - but you may not have precisely
>> specified your pattern. But we don't know what your pattern is supposed
>> to
>> recognize.
>>
>> Maybe some of your previous hits had punctuation adjacent to to the terms
>> that your pattern doesn't recognize.
>>
>> And use the Solr Admin UI Analysis page to see how your sample input data
>> is
>> analyzed.
>> w
>> One other thing... without a "group", the pattern specifies what
>> delimiter
>> sequence will "split" the rest of the input into tokens. I suspect you
>> didn't mean this.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- 
>> From: meghana
>> Sent: Friday, April 19, 2013 9:01 AM
>> To:
> 
>> solr-user@.apache
> 
>> Subject: fuzzy search issue with PatternTokenizer Factory
>>
>> I m using Solr4.2 , I have changed my text field definition, to use the
>> Solr.PatternTokenizerFactory instead of Solr.StandardTokenizerFactory ,
>> and
>> changed my schema defination as below
>> 
> <fieldType name="text_token" class="solr.TextField"
>>
>  positionIncrementGap="100">
>>
>> 
> <analyzer type="index">
>>
>> 
> <tokenizer class="solr.PatternTokenizerFactory"
>>
>  pattern="[^a-zA-Z0-9&amp;\-']|\d{0,4}s:" />
>>
>> 
> <filter class="solr.StopFilterFactory" ignoreCase="true"
>>
>  words="stopwords.txt" enablePositionIncrements="false" />
>>
>> 
> <filter class="solr.LowerCaseFilterFactory"/>
>>
>> 
> </analyzer>
>>
>> 
> <analyzer type="query">
>>
>> 
> <tokenizer class="solr.PatternTokenizerFactory"
>>
>  pattern="[^a-zA-Z0-9&amp;\-']|\d{0,4}s:" />
>>
>> 
> <filter class="solr.StopFilterFactory" ignoreCase="true"
>>
>  words="stopwords_extra_query.txt" enablePositionIncrements="false" />
>>
>> 
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>>
>  ignoreCase="true" expand="true"/>
>>
>> 
> <filter class="solr.LowerCaseFilterFactory"/>
>>
>> 
> </analyzer>
>>
>> 
> </fieldType>
>> after doing so, fuzzy search do not seems to working properly as it was
>> working before.
>>
>> I m searching with search term : worde~1
>>
>> on search , before it was returning , around 300 records , but now its
>> returning only 5 records. not sure what can be issue.
>>
>> Can anybody help me to make it work!!
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275p4057831.html
> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275p4058267.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: fuzzy search issue with PatternTokenizer Factory

Reply via email to