Re: Multi word synonyms

Doug Turnbull Sun, 26 Mar 2017 16:37:06 -0700

You might have stumbled on all these articles, but you can probably read
our orgs progression with this problem as a play in 3 acts


Act I Introducing the characters

http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/

Act II Heroes Meet Despair
http://opensourceconnections.com/blog/2016/06/23/solr-multi-word-synonym-solutions-2016/

Act III Triumph
We use a combination of these techniques
http://opensourceconnections.com/blog/2016/12/02/solr-elasticsearch-synonyms-better-patterns-keyphrases/
http://opensourceconnections.com/blog/2016/12/23/elasticsearch-synonyms-patterns-taxonomies/

Made possible in Solr with our Match Query Parser, which IMO is the most
satisfactory solution. I'm of course biased given we created it

http://opensourceconnections.com/blog/2017/01/23/our-solution-to-solr-multiterm-synonyms/


All of these articles also point towards other solutions, like auto
phrasing query parser/token filter and hon-lucene-synonyms.
On Sun, Mar 26, 2017 at 7:05 PM John Blythe <j...@curvolabs.com> wrote:

> Sure thing. Post back w what you find!
>
> Good luck-
>
> On Sun, Mar 26, 2017 at 3:36 PM Sanjana Sridhar <sanjana.srid...@flipp.com
> >
> wrote:
>
> > Hi John,
> >
> > Thanks for letting me know what works for you. I'm going to try that out.
> > Sounds like a suitable solution to my problem.
> >
> > Best,
> > Sanjana
> >
> >
> >
> > On Sun, Mar 26, 2017 at 12:30 PM, John Blythe <j...@curvolabs.com>
> wrote:
> >
> > > I use the keyword tokenizer and then pattern replace to transform multi
> > > words into underscore connected tokens. For instance, "Burger Joint"
> > > transforms to "burger_joint" which then looks in my synonym filter for
> > > underscored synonyms. When it matches I then replace underscores with
> > > spaces or just toss over to the word delimiter filter factory before
> > > further processing
> > >
> > >
> > > On Sun, Mar 26, 2017 at 11:53 AM Sanjana Sridhar <
> > > sanjana.srid...@wishabi.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > Does anyone have a good solution for working with multi word
> synonyms?
> > > I've
> > > > been reading a lot about this online and haven't really found a great
> > > > solution to it. I use the SynonymFilterFactory at index time, but
> words
> > > > don't really get matched to the appropriate multi word synonyms, even
> > > > though using the Analysis tool shows that it should be matched.
> > > >
> > > > Examples:
> > > >
> > > > coke, coca cola
> > > >
> > > >
> > > >
> > > > This is the configuration I have on text fields:
> > > >
> > > > <fieldType name ="text_icu_english" class="solr.TextField"
> > > > positionIncrementGap="100" multiValued="true">
> > > >         <analyzer type="index">
> > > >         <!-- The white space tokenizer splits on white space but
> > > preserves
> > > > the tokens so that it can be used by the next filter -->
> > > >         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > > >         <filter class="solr.SynonymFilterFactory" ignoreCase="true"
> > > expand=
> > > > "true" synonyms="synonyms.txt" />
> > > >         <!-- This filter splits a word on punctuation, preserves the
> > > > original, concatenates the split words and also stems english
> > possessive
> > > > nouns -->
> > > >         <filter class="solr.WordDelimiterFilterFactory"
> > > > generateWordParts="0" generateNumberParts = "0"
> > > >           splitOnCaseChange = "0" preserveOriginal="1"
> > > catenateWords="1"/>
> > > >         <filter class="solr.LowerCaseFilterFactory"/>
> > > >         <filter class="solr.EnglishMinimalStemFilterFactory"/>
> > > >         <filter class="solr.ICUFoldingFilterFactory"/>
> > > >         <filter class="solr.PatternReplaceFilterFactory"
> > > > pattern="(.*[\*].*)"  replacement=""/>
> > > >         <filter class="solr.TrimFilterFactory"/>
> > > >         <filter class="solr.LengthFilterFactory" min="1" max="100"/>
> > > >         <filter class="solr.ClassicFilterFactory"/>
> > > >
> > > >       </analyzer>
> > > >       <analyzer type="query">
> > > >         <!-- The white space tokenizer splits on white space but
> > > preserves
> > > > the tokens so that it can be used by the next filter -->
> > > >          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > > >          <!-- This filter splits a word on punctuation, preserves the
> > > > original, concatenates the split words and also stems english
> > possessive
> > > > nouns -->
> > > >          <filter class="solr.WordDelimiterFilterFactory"
> > > > generateWordParts="0" generateNumberParts = "0"
> > > >           splitOnCaseChange = "0" preserveOriginal="1"
> > > catenateWords="1"/>
> > > >         <filter class="solr.LowerCaseFilterFactory"/>
> > > >         <filter class="solr.EnglishMinimalStemFilterFactory"/>
> > > >         <filter class="solr.ICUFoldingFilterFactory"/>
> > > >         <filter class="solr.ClassicFilterFactory"/>
> > > >       </analyzer>
> > > >       <similarity class="solr.BM25SimilarityFactory">
> > > >         <float name="b">0.0</float>
> > > >       </similarity>
> > > >     </fieldType>
> > > >
> > > >
> > > > Greatly appreciate any help ya'll can offer.
> > > >
> > > > Thanks,
> > > > Sanjana
> > > >
> > > > --
> > > > IMPORTANT NOTICE:  This message, including any attachments
> (hereinafter
> > > > collectively referred to as "Communication"), is intended only for
> the
> > > > addressee(s)
> > > > named above.  This Communication may include information that is
> > > > privileged, confidential and exempt from disclosure under applicable
> > law.
> > > >  If the recipient of this Communication is not the intended
> recipient,
> > or
> > > > the employee or agent responsible for delivering this Communication
> to
> > > the
> > > > intended recipient, you are notified that any dissemination,
> > distribution
> > > > or copying of this Communication is strictly prohibited.  If you have
> > > > received this Communication in error, please notify the sender
> > > immediately
> > > > by phone or email and permanently delete this Communication from your
> > > > computer without making a copy. Thank you.
> > > >
> > > --
> > > --
> > > *John Blythe*
> > > Product Manager & Lead Developer
> > >
> > > 251.605.3071 | j...@curvolabs.com
> > > www.curvolabs.com
> > >
> > > 58 Adams Ave
> > > Evansville, IN 47713
> > >
> >
> >
> >
> > --
> >
> > <http://corp.flipp.com/> <http://corp.flipp.com/>
> >
> > Sanjana Sridhar
> > Flipp Corporation
> >
> > p: 647-217-3599
> > e: sanjana.srid...@flipp.com
> >
> > --
> > IMPORTANT NOTICE:  This message, including any attachments (hereinafter
> > collectively referred to as "Communication"), is intended only for the
> > addressee(s)
> > named above.  This Communication may include information that is
> > privileged, confidential and exempt from disclosure under applicable law.
> >  If the recipient of this Communication is not the intended recipient, or
> > the employee or agent responsible for delivering this Communication to
> the
> > intended recipient, you are notified that any dissemination, distribution
> > or copying of this Communication is strictly prohibited.  If you have
> > received this Communication in error, please notify the sender
> immediately
> > by phone or email and permanently delete this Communication from your
> > computer without making a copy. Thank you.
> >
> --
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | j...@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713
>

Re: Multi word synonyms

Reply via email to