Because we use in many cases mutli-term search together with synonyms as thesaurus we had to develop a solution for this. There is a whole chain of pitfalls through the system and you have to be careful.
The thesaurus (synonym.txt) solves not only single-terms to multi-terms but also multi-terms to single-terms, multi-terms to multi-terms and naturally single-terms to single-terms. And all this together combined with some boosting where needed. May be we can/should provide a general solution for thesaurus support to solr community...? But actually we have some other more importent issues on our list. If you want to start with multi-term synonyms turn the weakness of the QueryParser into its strength. (Wow, sound like a Zen wisdom) Regards Bernd Am 11.06.2012 05:02, schrieb John Berryman: > According to https://issues.apache.org/jira/browse/LUCENE-2605, the Lucene > QueryParser tokenizes on white space before giving any text to the > Analyzer. This makes it impossible to use multi-term synonyms because the > SynonymFilter only receives one word at a time. > > Resolution to this would really help with my current project. My project > client sells clothing and accessories online. They have plenty of examples > of compound words e.g."rain coat". But some of these compound words are > really tripping them up. A prime example is that a search for "dress shoes" > returns a list of dresses and random shoes (not necessarily dress shoes). I > wish that I was able to synonym compound words to single tokens (e.g. > "dress shoes => dress_shoes"), but with this whitespace tokenization issue, > it's impossible. > > Has anything happened with this bug recently? For a short time I've got a > client that would be willing to pay for this issues to be fixed if it's not > too much of a rabbit hole. Anyone care to catch me up with what this might > entail? > > LinkedIn <http://www.linkedin.com/pub/john-berryman/13/b17/864> > Twitter <http://twitter.com/#!/jnbrymn> >