Thanks Ahmet,

Thats excellent, thanks :) I may have to increase the gramsize to take into account other possible uses but i can now read around these filters to make the adjustments.

With regard to WordDelimiterFilterFactory. Is there a way to place a delimiter on this filter to still get most of its functionality without it absorbing the + signs? Will i loose a lot of 'good' functionality by removing it? 'preserveOriginal' sounds promising and seems to work but is it a good idea to use this?

On 14 Sep 2009, at 16:16, AHMET ARSLAN wrote:



--- On Mon, 9/14/09, Paul Forsyth <p...@ez.no> wrote:

From: Paul Forsyth <p...@ez.no>
Subject: Re: Searching for the '+' character
To: solr-user@lucene.apache.org
Date: Monday, September 14, 2009, 5:55 PM
With words like 'product+' i'd expect
a search for '+' to return results like any other character
or word, so '+' would be found within 'product+' or similar
text.

I've tried removing the worddelimiter from the query
analyzer, restarting and reindexing but i get the same
result. Nothing is found. I assume one of the filters could
be adjusted to keep the '+'.

Weird thing is that i tried to remove all filters from the
analyzer and i get the same result.

Paul

When you remove all filters '+' is kept, but still '+' won't match 'product+'. Because you want to search inside a token.

If + sign is always at the end of of your text, and you want to search only last character of your text EdgeNGramFilterFactory can do that.
with the settings side="back" maxGramSize="1" minGramSize="1"

The fieldType below will match '+' to 'product+'

<fieldType name="textx" class="solr.TextField" positionIncrementGap="100">
     <analyzer type="index">
       <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="ISOLatin1AccentFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"/> <filter class="solr.EdgeNGramFilterFactory" side="back" maxGramSize="1" minGramSize="1"/>
     </analyzer>
     <analyzer type="query">
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="ISOLatin1AccentFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"/>
     </analyzer>
   </fieldType>


But this time 'product+' will be reduced to only '+'. You won't be able to search it otherways for example product*. Along with the last character, if you want to keep the original word it self you can set maxGramSize to 512. By doing this token 'product+' will produce 8 tokens: (and query product* or product+ will return it )

+ word
t+ word
ct+ word
uct+ word
duct+ word
oduct+ word
roduct+ word
product+ word

If + sign can be anywhere inside the text you can use NGramTokenFilter.
Hope this helps.




Best regards,

Paul Forsyth

mail: p...@ez.no
skype: paulforsyth

Reply via email to