Re: Searching for the '+' character

Paul Forsyth Mon, 14 Sep 2009 09:44:05 -0700

Thanks Ahmet,

Thats excellent, thanks :) I may have to increase the gramsize to takeinto account other possible uses but i can now read around thesefilters to make the adjustments.

With regard to WordDelimiterFilterFactory. Is there a way to place adelimiter on this filter to still get most of its functionalitywithout it absorbing the + signs? Will i loose a lot of 'good'functionality by removing it? 'preserveOriginal' sounds promising andseems to work but is it a good idea to use this?


On 14 Sep 2009, at 16:16, AHMET ARSLAN wrote:

--- On Mon, 9/14/09, Paul Forsyth <p...@ez.no> wrote:
From: Paul Forsyth <p...@ez.no>
Subject: Re: Searching for the '+' character
To: solr-user@lucene.apache.org
Date: Monday, September 14, 2009, 5:55 PM
With words like 'product+' i'd expect
a search for '+' to return results like any other character
or word, so '+' would be found within 'product+' or similar
text.

I've tried removing the worddelimiter from the query
analyzer, restarting and reindexing but i get the same
result. Nothing is found. I assume one of the filters could
be adjusted to keep the '+'.

Weird thing is that i tried to remove all filters from the
analyzer and i get the same result.

Paul
When you remove all filters '+' is kept, but still '+' won't match'product+'. Because you want to search inside a token.
If + sign is always at the end of of your text, and you want tosearch only last character of your text EdgeNGramFilterFactory cando that.
with the settings side="back" maxGramSize="1" minGramSize="1"

The fieldType below will match '+' to 'product+'
<fieldType name="textx" class="solr.TextField"positionIncrementGap="100">
     <analyzer type="index">
       <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="ISOLatin1AccentFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory"language="English"/><filter class="solr.EdgeNGramFilterFactory" side="back"maxGramSize="1" minGramSize="1"/>
     </analyzer>
     <analyzer type="query">
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory"synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="ISOLatin1AccentFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory"language="English"/>
     </analyzer>
   </fieldType>
But this time 'product+' will be reduced to only '+'. You won't beable to search it otherways for example product*. Along with thelast character, if you want to keep the original word it self youcan set maxGramSize to 512. By doing this token 'product+' willproduce 8 tokens: (and query product* or product+ will return it )
+ word
t+ word
ct+ word
uct+ word
duct+ word
oduct+ word
roduct+ word
product+ word
If + sign can be anywhere inside the text you can useNGramTokenFilter.
Hope this helps.


Best regards,

Paul Forsyth

mail: p...@ez.no
skype: paulforsyth

Re: Searching for the '+' character

Reply via email to