> I have a quick question for anyone with an idea how to > solve this. We have > times when our users don¹t put spaces between words. > So for instance > ³airmax² returns 0 results but ³air max² has at least > 100 results. Other > than adding to the synonyms file every time, is there a > more programmatic > way we could possibly understand this scenario and return > correct results.
Without manuel synonym table lookup, it would be very hard to recognize airmax at query time and split it into air max. But at index time you can do it using modified version of ShingleFilterFactory. Simply it will concat all token n-grams. Change the public static final String TOKEN_SEPARATOR = " "; to public static final String TOKEN_SEPARATOR = ""; in org.apache.lucene.analysis.shingle.ShingleFilter Also you need its Factory class to integrate it into solr. The input document ( "but air max has" ) at index time will be tokenized into : but => word butair => shingle air => word airmax => shingle max => word maxhas => shingle has => word And the query airmax will match that document. But this solution increase your index size. It is better to write all possible words in to synonym.txt file manually. There is a similiar discussion suggests this in lucene-java-users group: http://old.nabble.com/splitting-words-to26573829.html#a26573829 Hope this helps.