I've been eagerly awaiting the release of the new SynonymGraphFilter in Solr 6.4. We have the need to support multi-word synonyms, which were always problematic with the old SynonymFilterFactory. I've upgraded to Solr 6.4 and replaced the old filter with the new one, but am not seeing the results that I had hoped for yet. I suspect my configuration is lacking something important.
I'm starting with the simple example in the SynonymGraphFilterFactory API doucmentation: <fieldType name="text_synonym" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" format="solr" ignoreCase="false" expand="true" tokenizerFactory="solr.WhitespaceTokenizerFactory"/> </analyzer> </fieldType> And example entry in the synonyms.txt file is: booster, representative of athletics interest My problem with the old filter has always been that if I run a query for "booster", I get results containing any of the following words: booster, representative, athletics, interest. This is way more results than I want. A document that only contains athletics, but none of the other words in the synonym is returned. What I really want are documents that contain "booster" or the full synonym phrase of "representative of athletics interest". How could I accomplish this? The SynonymGraphFilter API documentation contains the following statement at the end: "To get fully correct positional queries when your synonym replacements are multiple tokens, you should instead apply synonyms using this TokenFilter at query time and translate the resulting graph to a TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery." How do I use TokenStreamtoTermAutomationQuery or can this not be configured in Solr, but only by writing code against Lucene? Would this even address my issue? I've found synonyms to be very frustrating in Solr and am hoping this new filter will be a big improvement. Thanks in advance for the help!