Hi,

I have been using this plugin with success: 
https://github.com/healthonnet/hon-lucene-synonyms
While it gives you multi-word synonyms, you lose the ability to have different 
synonym dictionaries per field.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

4. mars 2013 kl. 19:40 skrev David Sharpe <david.sha...@seekersolutions.com>:

> Hello Solr mailing list,
> 
> I have read many posts and run many tests, but still I cannot get
> multi-word synonyms behaving the way I think they should. I would
> appreciate your advice.
> 
> Here is an example of the behaviour I am trying to achieve:
> 
> *# Given synonyms.txt
> wordOne, phrase one
> *
> 
> 
>   1. At index time, a document containing "wordOne" should expand to
>   "wordOne | phrase one". A query for "wordOne" or "phrase one" should find
>   the document, but a query for just "phrase" or "one" should not find the
>   document.
> 
>   2. Conversely, a document containing "phrase one" should expand to
>   "phrase one | wordOne". A query for "wordOne" or "phrase one" should find
>   the document. (Depending on field tokenization, I would also expect
>   "phrase" and "one" to find the document.)
> 
> To attempt to achieve this behaviour, I have downloaded Solr 4.1.0 and made
> the following changes to
> "solr-4.1.0\example\solr\collection1\conf\schema.xml":
> 
> https://gist.github.com/sharpedavid/5072150
> 
> 
> (Note that I set SynonymFilterFactor
> tokenizerFactory="solr.KeywordTokenizerFactory". This is to prevent
> "wordOne" from being expanded to "wordOne | phrase | one".)
> 
> Achieving the first behaviour (i.e. number one in the above list) seems
> difficult. A query for "wordOne" returns the document, but a query for
> "phrase one" returns nothing. I realized that the query tokenizer tokenized
> my query for "phrase one", so I changed the query tokenizer to
> KeywordTokenizer, which achieves the desired behaviour, but now queries are
> not tokenized at all, which breaks other desirable behaviour.
> 
> The second behaviour (i.e. number two in the above list) has similar
> problems, but no solution that I can see. If the index tokenizer is
> StandardTokenizer, "phrase one" is tokenized to "phrase | one", so the
> equivalent synonym is not matched. If I change the index tokenizer to
> KeywordTokenizer, it does match; however, KeywordTokenizer will treat the
> entire field as a a single token, so a document containing "something
> phrase one something" will not match the equivalent synonym, and also a
> query for "phrase" or "one" will not find the document.
> 
> Thank you for your time.
> 
> Sincerely,
> David Sharpe

Reply via email to