Because we use in many cases mutli-term search together with synonyms
as thesaurus we had to develop a solution for this. There is a whole
chain of pitfalls through the system and you have to be careful.

The thesaurus (synonym.txt) solves not only single-terms to multi-terms
but also multi-terms to single-terms, multi-terms to multi-terms and
naturally single-terms to single-terms.
And all this together combined with some boosting where needed.

May be we can/should provide a general solution for thesaurus support to solr 
community...?
But actually we have some other more importent issues on our list.

If you want to start with multi-term synonyms turn the weakness of the
QueryParser into its strength. (Wow, sound like a Zen wisdom)

Regards
Bernd

Am 11.06.2012 05:02, schrieb John Berryman:
> According to https://issues.apache.org/jira/browse/LUCENE-2605, the Lucene
> QueryParser tokenizes on white space before giving any text to the
> Analyzer. This makes it impossible to use multi-term synonyms because the
> SynonymFilter only receives one word at a time.
> 
> Resolution to this would really help with my current project. My project
> client sells clothing and accessories online. They have plenty of examples
> of compound words e.g."rain coat". But some of these compound words are
> really tripping them up. A prime example is that a search for "dress shoes"
> returns a list of dresses and random shoes (not necessarily dress shoes). I
> wish that I was able to synonym compound words to single tokens (e.g.
> "dress shoes => dress_shoes"), but with this whitespace tokenization issue,
> it's impossible.
> 
> Has anything happened with this bug recently? For a short time I've got a
> client that would be willing to pay for this issues to be fixed if it's not
> too much of a rabbit hole. Anyone care to catch me up with what this might
> entail?
> 
> LinkedIn <http://www.linkedin.com/pub/john-berryman/13/b17/864>
> Twitter <http://twitter.com/#!/jnbrymn>
> 

Reply via email to