Re: Word Concat 0 Results

AHMET ARSLAN Mon, 30 Nov 2009 07:38:55 -0800

> I have a quick question for anyone with an idea how to
> solve this.  We have
> times when our users don¹t put spaces between words. 
> So for instance
> ³airmax² returns 0 results but ³air max² has at least
> 100 results.  Other
> than adding to the synonyms file every time, is there a
> more programmatic
> way we could possibly understand this scenario and return
> correct results.



Without manuel synonym table lookup, it would be very hard to recognize airmax 
at query time and split it into air max.

But at index time you can do it using modified version of ShingleFilterFactory. 
Simply it will concat all token n-grams.

Change the 
public static final String TOKEN_SEPARATOR = " ";
to 
public static final String TOKEN_SEPARATOR = "";
in org.apache.lucene.analysis.shingle.ShingleFilter

Also you need its Factory class to integrate it into solr.

The input document ( "but air max has" ) at index time will be tokenized into :

but => word
butair => shingle
air => word
airmax => shingle
max => word
maxhas => shingle
has => word

And the query airmax will match that document. But this solution increase your 
index size. It is better to write all possible words in to synonym.txt file 
manually. There is a similiar discussion suggests this in lucene-java-users 
group: 
http://old.nabble.com/splitting-words-to26573829.html#a26573829

Hope this helps.

Re: Word Concat 0 Results

Reply via email to