Hello Solr mailing list, I have read many posts and run many tests, but still I cannot get multi-word synonyms behaving the way I think they should. I would appreciate your advice.
Here is an example of the behaviour I am trying to achieve: *# Given synonyms.txt wordOne, phrase one * 1. At index time, a document containing "wordOne" should expand to "wordOne | phrase one". A query for "wordOne" or "phrase one" should find the document, but a query for just "phrase" or "one" should not find the document. 2. Conversely, a document containing "phrase one" should expand to "phrase one | wordOne". A query for "wordOne" or "phrase one" should find the document. (Depending on field tokenization, I would also expect "phrase" and "one" to find the document.) To attempt to achieve this behaviour, I have downloaded Solr 4.1.0 and made the following changes to "solr-4.1.0\example\solr\collection1\conf\schema.xml": https://gist.github.com/sharpedavid/5072150 (Note that I set SynonymFilterFactor tokenizerFactory="solr.KeywordTokenizerFactory". This is to prevent "wordOne" from being expanded to "wordOne | phrase | one".) Achieving the first behaviour (i.e. number one in the above list) seems difficult. A query for "wordOne" returns the document, but a query for "phrase one" returns nothing. I realized that the query tokenizer tokenized my query for "phrase one", so I changed the query tokenizer to KeywordTokenizer, which achieves the desired behaviour, but now queries are not tokenized at all, which breaks other desirable behaviour. The second behaviour (i.e. number two in the above list) has similar problems, but no solution that I can see. If the index tokenizer is StandardTokenizer, "phrase one" is tokenized to "phrase | one", so the equivalent synonym is not matched. If I change the index tokenizer to KeywordTokenizer, it does match; however, KeywordTokenizer will treat the entire field as a a single token, so a document containing "something phrase one something" will not match the equivalent synonym, and also a query for "phrase" or "one" will not find the document. Thank you for your time. Sincerely, David Sharpe