--- On Mon, 9/14/09, Paul Forsyth <p...@ez.no> wrote:
> From: Paul Forsyth <p...@ez.no> > Subject: Re: Searching for the '+' character > To: solr-user@lucene.apache.org > Date: Monday, September 14, 2009, 5:55 PM > With words like 'product+' i'd expect > a search for '+' to return results like any other character > or word, so '+' would be found within 'product+' or similar > text. > > I've tried removing the worddelimiter from the query > analyzer, restarting and reindexing but i get the same > result. Nothing is found. I assume one of the filters could > be adjusted to keep the '+'. > > Weird thing is that i tried to remove all filters from the > analyzer and i get the same result. > > Paul When you remove all filters '+' is kept, but still '+' won't match 'product+'. Because you want to search inside a token. If + sign is always at the end of of your text, and you want to search only last character of your text EdgeNGramFilterFactory can do that. with the settings side="back" maxGramSize="1" minGramSize="1" The fieldType below will match '+' to 'product+' <fieldType name="textx" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="ISOLatin1AccentFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English"/> <filter class="solr.EdgeNGramFilterFactory" side="back" maxGramSize="1" minGramSize="1"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="ISOLatin1AccentFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English"/> </analyzer> </fieldType> But this time 'product+' will be reduced to only '+'. You won't be able to search it otherways for example product*. Along with the last character, if you want to keep the original word it self you can set maxGramSize to 512. By doing this token 'product+' will produce 8 tokens: (and query product* or product+ will return it ) + word t+ word ct+ word uct+ word duct+ word oduct+ word roduct+ word product+ word If + sign can be anywhere inside the text you can use NGramTokenFilter. Hope this helps.