You might have stumbled on all these articles, but you can probably read our orgs progression with this problem as a play in 3 acts
Act I Introducing the characters http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/ Act II Heroes Meet Despair http://opensourceconnections.com/blog/2016/06/23/solr-multi-word-synonym-solutions-2016/ Act III Triumph We use a combination of these techniques http://opensourceconnections.com/blog/2016/12/02/solr-elasticsearch-synonyms-better-patterns-keyphrases/ http://opensourceconnections.com/blog/2016/12/23/elasticsearch-synonyms-patterns-taxonomies/ Made possible in Solr with our Match Query Parser, which IMO is the most satisfactory solution. I'm of course biased given we created it http://opensourceconnections.com/blog/2017/01/23/our-solution-to-solr-multiterm-synonyms/ All of these articles also point towards other solutions, like auto phrasing query parser/token filter and hon-lucene-synonyms. On Sun, Mar 26, 2017 at 7:05 PM John Blythe <j...@curvolabs.com> wrote: > Sure thing. Post back w what you find! > > Good luck- > > On Sun, Mar 26, 2017 at 3:36 PM Sanjana Sridhar <sanjana.srid...@flipp.com > > > wrote: > > > Hi John, > > > > Thanks for letting me know what works for you. I'm going to try that out. > > Sounds like a suitable solution to my problem. > > > > Best, > > Sanjana > > > > > > > > On Sun, Mar 26, 2017 at 12:30 PM, John Blythe <j...@curvolabs.com> > wrote: > > > > > I use the keyword tokenizer and then pattern replace to transform multi > > > words into underscore connected tokens. For instance, "Burger Joint" > > > transforms to "burger_joint" which then looks in my synonym filter for > > > underscored synonyms. When it matches I then replace underscores with > > > spaces or just toss over to the word delimiter filter factory before > > > further processing > > > > > > > > > On Sun, Mar 26, 2017 at 11:53 AM Sanjana Sridhar < > > > sanjana.srid...@wishabi.com> wrote: > > > > > > > Hello, > > > > > > > > Does anyone have a good solution for working with multi word > synonyms? > > > I've > > > > been reading a lot about this online and haven't really found a great > > > > solution to it. I use the SynonymFilterFactory at index time, but > words > > > > don't really get matched to the appropriate multi word synonyms, even > > > > though using the Analysis tool shows that it should be matched. > > > > > > > > Examples: > > > > > > > > coke, coca cola > > > > > > > > > > > > > > > > This is the configuration I have on text fields: > > > > > > > > <fieldType name ="text_icu_english" class="solr.TextField" > > > > positionIncrementGap="100" multiValued="true"> > > > > <analyzer type="index"> > > > > <!-- The white space tokenizer splits on white space but > > > preserves > > > > the tokens so that it can be used by the next filter --> > > > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > > > <filter class="solr.SynonymFilterFactory" ignoreCase="true" > > > expand= > > > > "true" synonyms="synonyms.txt" /> > > > > <!-- This filter splits a word on punctuation, preserves the > > > > original, concatenates the split words and also stems english > > possessive > > > > nouns --> > > > > <filter class="solr.WordDelimiterFilterFactory" > > > > generateWordParts="0" generateNumberParts = "0" > > > > splitOnCaseChange = "0" preserveOriginal="1" > > > catenateWords="1"/> > > > > <filter class="solr.LowerCaseFilterFactory"/> > > > > <filter class="solr.EnglishMinimalStemFilterFactory"/> > > > > <filter class="solr.ICUFoldingFilterFactory"/> > > > > <filter class="solr.PatternReplaceFilterFactory" > > > > pattern="(.*[\*].*)" replacement=""/> > > > > <filter class="solr.TrimFilterFactory"/> > > > > <filter class="solr.LengthFilterFactory" min="1" max="100"/> > > > > <filter class="solr.ClassicFilterFactory"/> > > > > > > > > </analyzer> > > > > <analyzer type="query"> > > > > <!-- The white space tokenizer splits on white space but > > > preserves > > > > the tokens so that it can be used by the next filter --> > > > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > > > <!-- This filter splits a word on punctuation, preserves the > > > > original, concatenates the split words and also stems english > > possessive > > > > nouns --> > > > > <filter class="solr.WordDelimiterFilterFactory" > > > > generateWordParts="0" generateNumberParts = "0" > > > > splitOnCaseChange = "0" preserveOriginal="1" > > > catenateWords="1"/> > > > > <filter class="solr.LowerCaseFilterFactory"/> > > > > <filter class="solr.EnglishMinimalStemFilterFactory"/> > > > > <filter class="solr.ICUFoldingFilterFactory"/> > > > > <filter class="solr.ClassicFilterFactory"/> > > > > </analyzer> > > > > <similarity class="solr.BM25SimilarityFactory"> > > > > <float name="b">0.0</float> > > > > </similarity> > > > > </fieldType> > > > > > > > > > > > > Greatly appreciate any help ya'll can offer. > > > > > > > > Thanks, > > > > Sanjana > > > > > > > > -- > > > > IMPORTANT NOTICE: This message, including any attachments > (hereinafter > > > > collectively referred to as "Communication"), is intended only for > the > > > > addressee(s) > > > > named above. This Communication may include information that is > > > > privileged, confidential and exempt from disclosure under applicable > > law. > > > > If the recipient of this Communication is not the intended > recipient, > > or > > > > the employee or agent responsible for delivering this Communication > to > > > the > > > > intended recipient, you are notified that any dissemination, > > distribution > > > > or copying of this Communication is strictly prohibited. If you have > > > > received this Communication in error, please notify the sender > > > immediately > > > > by phone or email and permanently delete this Communication from your > > > > computer without making a copy. Thank you. > > > > > > > -- > > > -- > > > *John Blythe* > > > Product Manager & Lead Developer > > > > > > 251.605.3071 | j...@curvolabs.com > > > www.curvolabs.com > > > > > > 58 Adams Ave > > > Evansville, IN 47713 > > > > > > > > > > > -- > > > > <http://corp.flipp.com/> <http://corp.flipp.com/> > > > > Sanjana Sridhar > > Flipp Corporation > > > > p: 647-217-3599 > > e: sanjana.srid...@flipp.com > > > > -- > > IMPORTANT NOTICE: This message, including any attachments (hereinafter > > collectively referred to as "Communication"), is intended only for the > > addressee(s) > > named above. This Communication may include information that is > > privileged, confidential and exempt from disclosure under applicable law. > > If the recipient of this Communication is not the intended recipient, or > > the employee or agent responsible for delivering this Communication to > the > > intended recipient, you are notified that any dissemination, distribution > > or copying of this Communication is strictly prohibited. If you have > > received this Communication in error, please notify the sender > immediately > > by phone or email and permanently delete this Communication from your > > computer without making a copy. Thank you. > > > -- > -- > *John Blythe* > Product Manager & Lead Developer > > 251.605.3071 | j...@curvolabs.com > www.curvolabs.com > > 58 Adams Ave > Evansville, IN 47713 >