Have a look at the report about EuroVoc integration into Solr
which gives you an idea about the problems and solutions with
multiword synonyms and query expansion.

http://www.ub.uni-bielefeld.de/~befehl/base/solr/eurovoc.html

Regards
Bernd Fehling


Am 18.10.2012 02:36, schrieb Nicholas Ding:
> Hi guys,
> 
> I'm trying to make query expansion and multiword synonym working at query
> time, and I spent the whole day in digging into source code of Lucene and
> Solr and writing custom tokenizer, filter and even query parser in order to
> make it work. Now I'm bit confused.
> 
> Requirement
> Searching "chinese cuisine", I want expand it to "chinese", "cuisine",
> "cuisine chinese" and "chinese cuisine". And I have synonyms like "chinese
> cuisines, chinese food, chinese dish".
> 
> My Plan
> <fieldType name="text_en"  class="solr.TextField" ...>
> <analyzer type="index">
>   <tokenizer class="solr.KeywordTokenizerFactory"/>
> </analyzer>
> <analyzer type="query">
>   <tokenizer class="analysis.ExpandableKeywordTokenzierFactory"/>
> </analyzer>
> </fieldType>
> 
> ExpandableKeywordTokenzierFactory is using a customized Tokenizer, that
> could permutate the words inside token.
> For example:
> Input Token "A B" => Output Token "A", "B", "A B", "B C"
> 
> It works fine even on Solr admin, see attachement. But when I perform the
> search, like q=Keyword:"chinese cuisine", from debug, I saw unexpected
> result.
> parsedquery_toString: Keyword:\"chinese cuisine chinese cuisine cuisine
> chinese\""
> Somehow, the tokens from tokenizer are concatenated.
> 
> Ideally, if Tokenizer works and do produce tokens, I can pass it to
> SynonymFilterFactory to apply synonyms.
> 
> I think I can write QParserPlugin to solve this problem by expanding the
> query before it goes into fieldType, but if that can be solved in
> Tokenizer, that could be great.
> 
> Thanks
> Nicholas
> 

Reply via email to