Have a look at the report about EuroVoc integration into Solr which gives you an idea about the problems and solutions with multiword synonyms and query expansion.
http://www.ub.uni-bielefeld.de/~befehl/base/solr/eurovoc.html Regards Bernd Fehling Am 18.10.2012 02:36, schrieb Nicholas Ding: > Hi guys, > > I'm trying to make query expansion and multiword synonym working at query > time, and I spent the whole day in digging into source code of Lucene and > Solr and writing custom tokenizer, filter and even query parser in order to > make it work. Now I'm bit confused. > > Requirement > Searching "chinese cuisine", I want expand it to "chinese", "cuisine", > "cuisine chinese" and "chinese cuisine". And I have synonyms like "chinese > cuisines, chinese food, chinese dish". > > My Plan > <fieldType name="text_en" class="solr.TextField" ...> > <analyzer type="index"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="analysis.ExpandableKeywordTokenzierFactory"/> > </analyzer> > </fieldType> > > ExpandableKeywordTokenzierFactory is using a customized Tokenizer, that > could permutate the words inside token. > For example: > Input Token "A B" => Output Token "A", "B", "A B", "B C" > > It works fine even on Solr admin, see attachement. But when I perform the > search, like q=Keyword:"chinese cuisine", from debug, I saw unexpected > result. > parsedquery_toString: Keyword:\"chinese cuisine chinese cuisine cuisine > chinese\"" > Somehow, the tokens from tokenizer are concatenated. > > Ideally, if Tokenizer works and do produce tokens, I can pass it to > SynonymFilterFactory to apply synonyms. > > I think I can write QParserPlugin to solve this problem by expanding the > query before it goes into fieldType, but if that can be solved in > Tokenizer, that could be great. > > Thanks > Nicholas >