Hi, I have been using this plugin with success: https://github.com/healthonnet/hon-lucene-synonyms While it gives you multi-word synonyms, you lose the ability to have different synonym dictionaries per field.
-- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 4. mars 2013 kl. 19:40 skrev David Sharpe <david.sha...@seekersolutions.com>: > Hello Solr mailing list, > > I have read many posts and run many tests, but still I cannot get > multi-word synonyms behaving the way I think they should. I would > appreciate your advice. > > Here is an example of the behaviour I am trying to achieve: > > *# Given synonyms.txt > wordOne, phrase one > * > > > 1. At index time, a document containing "wordOne" should expand to > "wordOne | phrase one". A query for "wordOne" or "phrase one" should find > the document, but a query for just "phrase" or "one" should not find the > document. > > 2. Conversely, a document containing "phrase one" should expand to > "phrase one | wordOne". A query for "wordOne" or "phrase one" should find > the document. (Depending on field tokenization, I would also expect > "phrase" and "one" to find the document.) > > To attempt to achieve this behaviour, I have downloaded Solr 4.1.0 and made > the following changes to > "solr-4.1.0\example\solr\collection1\conf\schema.xml": > > https://gist.github.com/sharpedavid/5072150 > > > (Note that I set SynonymFilterFactor > tokenizerFactory="solr.KeywordTokenizerFactory". This is to prevent > "wordOne" from being expanded to "wordOne | phrase | one".) > > Achieving the first behaviour (i.e. number one in the above list) seems > difficult. A query for "wordOne" returns the document, but a query for > "phrase one" returns nothing. I realized that the query tokenizer tokenized > my query for "phrase one", so I changed the query tokenizer to > KeywordTokenizer, which achieves the desired behaviour, but now queries are > not tokenized at all, which breaks other desirable behaviour. > > The second behaviour (i.e. number two in the above list) has similar > problems, but no solution that I can see. If the index tokenizer is > StandardTokenizer, "phrase one" is tokenized to "phrase | one", so the > equivalent synonym is not matched. If I change the index tokenizer to > KeywordTokenizer, it does match; however, KeywordTokenizer will treat the > entire field as a a single token, so a document containing "something > phrase one something" will not match the equivalent synonym, and also a > query for "phrase" or "one" will not find the document. > > Thank you for your time. > > Sincerely, > David Sharpe