Re: fuzzy search issue with PatternTokenizer Factory

Jack Krupansky Mon, 22 Apr 2013 06:33:20 -0700

Once again, fuzzy search is completely independent of your analyzer orpattern tokenizer. Please use the Solr Admin UI Analysis page to debugwhether the terms are what you expect. And realize that fuzzy search has amaximum editing distance of 2 and that includes case changes.


-- Jack Krupansky

-----Original Message-----From: meghana

Sent: Monday, April 22, 2013 3:25 AM
To: solr-user@lucene.apache.org
Subject: Re: fuzzy search issue with PatternTokenizer Factory

Jack,

the regex will split tokens by anything expect alphabets , numbers, '&' ,
'-' and ns: (where n is number from 0 to 9999, e.g 4323s: )

Lets say for example my text is like below.

*this is nice* day & sun 53s: is risen. *

Then pattern tokenizer should create tokens as

*this is nice day & sun is risen*

pattern seem to working fine with different text,

also for fuzzy search *worde~1*, I have checked the results returns for
patterntokenizer factory, having punctuation marks like '*WORDS,*' ,
*WORDED....* , etc...

One more weird thing is, all the results are in uppercase letters, no
results with lowercase results come. although it does not return all results
of uppercase letters.

but not sure after changing to this fuzzy search not working properly.

Jack Krupansky-2 wrote

Give us some examples of tokens that you are expecting that pattern to
tokenize. And express the pattern in simple English as well. Some some
actual input data.

I suspect that Solr is working fine - but you may not have precisely
specified your pattern. But we don't know what your pattern is supposed to
recognize.

Maybe some of your previous hits had punctuation adjacent to to the terms
that your pattern doesn't recognize.

And use the Solr Admin UI Analysis page to see how your sample input data
is
analyzed.
w
One other thing... without a "group", the pattern specifies what delimiter
sequence will "split" the rest of the input into tokens. I suspect you
didn't mean this.

-- Jack Krupansky

-----Original Message-----From: meghana

Sent: Friday, April 19, 2013 9:01 AM
To:

solr-user@.apache

Subject: fuzzy search issue with PatternTokenizer Factory

I m using Solr4.2 , I have changed my text field definition, to use the
Solr.PatternTokenizerFactory instead of Solr.StandardTokenizerFactory ,
and
changed my schema defination as below
<fieldType name="text_token" class="solr.TextField"
positionIncrementGap="100">

<analyzer type="index">

<tokenizer class="solr.PatternTokenizerFactory"
pattern="[^a-zA-Z0-9&amp;\-']|\d{0,4}s:" />

<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="false" />

<filter class="solr.LowerCaseFilterFactory"/>

</analyzer>

<analyzer type="query">

<tokenizer class="solr.PatternTokenizerFactory"
pattern="[^a-zA-Z0-9&amp;\-']|\d{0,4}s:" />

<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_extra_query.txt" enablePositionIncrements="false" />

<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>

<filter class="solr.LowerCaseFilterFactory"/>

</analyzer>

</fieldType>
after doing so, fuzzy search do not seems to working properly as it was
working before.

I m searching with search term : worde~1

on search , before it was returning , around 300 records , but now its
returning only 5 records. not sure what can be issue.

Can anybody help me to make it work!!







--
View this message in context:
http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275.html
Sent from the Solr - User mailing list archive at Nabble.com.

--

View this message in context:http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275p4057831.htmlSent from the Solr - User mailing list archive at Nabble.com.

Re: fuzzy search issue with PatternTokenizer Factory

Reply via email to