Hi John,

Thanks for letting me know what works for you. I'm going to try that out.
Sounds like a suitable solution to my problem.

Best,
Sanjana



On Sun, Mar 26, 2017 at 12:30 PM, John Blythe <j...@curvolabs.com> wrote:

> I use the keyword tokenizer and then pattern replace to transform multi
> words into underscore connected tokens. For instance, "Burger Joint"
> transforms to "burger_joint" which then looks in my synonym filter for
> underscored synonyms. When it matches I then replace underscores with
> spaces or just toss over to the word delimiter filter factory before
> further processing
>
>
> On Sun, Mar 26, 2017 at 11:53 AM Sanjana Sridhar <
> sanjana.srid...@wishabi.com> wrote:
>
> > Hello,
> >
> > Does anyone have a good solution for working with multi word synonyms?
> I've
> > been reading a lot about this online and haven't really found a great
> > solution to it. I use the SynonymFilterFactory at index time, but words
> > don't really get matched to the appropriate multi word synonyms, even
> > though using the Analysis tool shows that it should be matched.
> >
> > Examples:
> >
> > coke, coca cola
> >
> >
> >
> > This is the configuration I have on text fields:
> >
> > <fieldType name ="text_icu_english" class="solr.TextField"
> > positionIncrementGap="100" multiValued="true">
> >         <analyzer type="index">
> >         <!-- The white space tokenizer splits on white space but
> preserves
> > the tokens so that it can be used by the next filter -->
> >         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >         <filter class="solr.SynonymFilterFactory" ignoreCase="true"
> expand=
> > "true" synonyms="synonyms.txt" />
> >         <!-- This filter splits a word on punctuation, preserves the
> > original, concatenates the split words and also stems english possessive
> > nouns -->
> >         <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="0" generateNumberParts = "0"
> >           splitOnCaseChange = "0" preserveOriginal="1"
> catenateWords="1"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.EnglishMinimalStemFilterFactory"/>
> >         <filter class="solr.ICUFoldingFilterFactory"/>
> >         <filter class="solr.PatternReplaceFilterFactory"
> > pattern="(.*[\*].*)"  replacement=""/>
> >         <filter class="solr.TrimFilterFactory"/>
> >         <filter class="solr.LengthFilterFactory" min="1" max="100"/>
> >         <filter class="solr.ClassicFilterFactory"/>
> >
> >       </analyzer>
> >       <analyzer type="query">
> >         <!-- The white space tokenizer splits on white space but
> preserves
> > the tokens so that it can be used by the next filter -->
> >          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >          <!-- This filter splits a word on punctuation, preserves the
> > original, concatenates the split words and also stems english possessive
> > nouns -->
> >          <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="0" generateNumberParts = "0"
> >           splitOnCaseChange = "0" preserveOriginal="1"
> catenateWords="1"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.EnglishMinimalStemFilterFactory"/>
> >         <filter class="solr.ICUFoldingFilterFactory"/>
> >         <filter class="solr.ClassicFilterFactory"/>
> >       </analyzer>
> >       <similarity class="solr.BM25SimilarityFactory">
> >         <float name="b">0.0</float>
> >       </similarity>
> >     </fieldType>
> >
> >
> > Greatly appreciate any help ya'll can offer.
> >
> > Thanks,
> > Sanjana
> >
> > --
> > IMPORTANT NOTICE:  This message, including any attachments (hereinafter
> > collectively referred to as "Communication"), is intended only for the
> > addressee(s)
> > named above.  This Communication may include information that is
> > privileged, confidential and exempt from disclosure under applicable law.
> >  If the recipient of this Communication is not the intended recipient, or
> > the employee or agent responsible for delivering this Communication to
> the
> > intended recipient, you are notified that any dissemination, distribution
> > or copying of this Communication is strictly prohibited.  If you have
> > received this Communication in error, please notify the sender
> immediately
> > by phone or email and permanently delete this Communication from your
> > computer without making a copy. Thank you.
> >
> --
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | j...@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713
>



-- 

<http://corp.flipp.com/> <http://corp.flipp.com/>

Sanjana Sridhar
Flipp Corporation

p: 647-217-3599
e: sanjana.srid...@flipp.com

-- 
IMPORTANT NOTICE:  This message, including any attachments (hereinafter 
collectively referred to as "Communication"), is intended only for the 
addressee(s) 
named above.  This Communication may include information that is 
privileged, confidential and exempt from disclosure under applicable law. 
 If the recipient of this Communication is not the intended recipient, or 
the employee or agent responsible for delivering this Communication to the 
intended recipient, you are notified that any dissemination, distribution 
or copying of this Communication is strictly prohibited.  If you have 
received this Communication in error, please notify the sender immediately 
by phone or email and permanently delete this Communication from your 
computer without making a copy. Thank you.

Reply via email to