Thanks Ahmet,
Thats excellent, thanks :) I may have to increase the gramsize to take
into account other possible uses but i can now read around these
filters to make the adjustments.
With regard to WordDelimiterFilterFactory. Is there a way to place a
delimiter on this filter to still get most of its functionality
without it absorbing the + signs? Will i loose a lot of 'good'
functionality by removing it? 'preserveOriginal' sounds promising and
seems to work but is it a good idea to use this?
On 14 Sep 2009, at 16:16, AHMET ARSLAN wrote:
--- On Mon, 9/14/09, Paul Forsyth <p...@ez.no> wrote:
From: Paul Forsyth <p...@ez.no>
Subject: Re: Searching for the '+' character
To: solr-user@lucene.apache.org
Date: Monday, September 14, 2009, 5:55 PM
With words like 'product+' i'd expect
a search for '+' to return results like any other character
or word, so '+' would be found within 'product+' or similar
text.
I've tried removing the worddelimiter from the query
analyzer, restarting and reindexing but i get the same
result. Nothing is found. I assume one of the filters could
be adjusted to keep the '+'.
Weird thing is that i tried to remove all filters from the
analyzer and i get the same result.
Paul
When you remove all filters '+' is kept, but still '+' won't match
'product+'. Because you want to search inside a token.
If + sign is always at the end of of your text, and you want to
search only last character of your text EdgeNGramFilterFactory can
do that.
with the settings side="back" maxGramSize="1" minGramSize="1"
The fieldType below will match '+' to 'product+'
<fieldType name="textx" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="ISOLatin1AccentFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory"
language="English"/>
<filter class="solr.EdgeNGramFilterFactory" side="back"
maxGramSize="1" minGramSize="1"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="ISOLatin1AccentFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory"
language="English"/>
</analyzer>
</fieldType>
But this time 'product+' will be reduced to only '+'. You won't be
able to search it otherways for example product*. Along with the
last character, if you want to keep the original word it self you
can set maxGramSize to 512. By doing this token 'product+' will
produce 8 tokens: (and query product* or product+ will return it )
+ word
t+ word
ct+ word
uct+ word
duct+ word
oduct+ word
roduct+ word
product+ word
If + sign can be anywhere inside the text you can use
NGramTokenFilter.
Hope this helps.
Best regards,
Paul Forsyth
mail: p...@ez.no
skype: paulforsyth