Re: Multi word synonyms

John Blythe Sun, 26 Mar 2017 09:31:08 -0700

I use the keyword tokenizer and then pattern replace to transform multi
words into underscore connected tokens. For instance, "Burger Joint"
transforms to "burger_joint" which then looks in my synonym filter for
underscored synonyms. When it matches I then replace underscores with
spaces or just toss over to the word delimiter filter factory before
further processing



On Sun, Mar 26, 2017 at 11:53 AM Sanjana Sridhar <
[email protected]> wrote:

> Hello,
>
> Does anyone have a good solution for working with multi word synonyms? I've
> been reading a lot about this online and haven't really found a great
> solution to it. I use the SynonymFilterFactory at index time, but words
> don't really get matched to the appropriate multi word synonyms, even
> though using the Analysis tool shows that it should be matched.
>
> Examples:
>
> coke, coca cola
>
>
>
> This is the configuration I have on text fields:
>
> <fieldType name ="text_icu_english" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>         <analyzer type="index">
>         <!-- The white space tokenizer splits on white space but preserves
> the tokens so that it can be used by the next filter -->
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" ignoreCase="true" expand=
> "true" synonyms="synonyms.txt" />
>         <!-- This filter splits a word on punctuation, preserves the
> original, concatenates the split words and also stems english possessive
> nouns -->
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts = "0"
>           splitOnCaseChange = "0" preserveOriginal="1" catenateWords="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishMinimalStemFilterFactory"/>
>         <filter class="solr.ICUFoldingFilterFactory"/>
>         <filter class="solr.PatternReplaceFilterFactory"
> pattern="(.*[\*].*)"  replacement=""/>
>         <filter class="solr.TrimFilterFactory"/>
>         <filter class="solr.LengthFilterFactory" min="1" max="100"/>
>         <filter class="solr.ClassicFilterFactory"/>
>
>       </analyzer>
>       <analyzer type="query">
>         <!-- The white space tokenizer splits on white space but preserves
> the tokens so that it can be used by the next filter -->
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <!-- This filter splits a word on punctuation, preserves the
> original, concatenates the split words and also stems english possessive
> nouns -->
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts = "0"
>           splitOnCaseChange = "0" preserveOriginal="1" catenateWords="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishMinimalStemFilterFactory"/>
>         <filter class="solr.ICUFoldingFilterFactory"/>
>         <filter class="solr.ClassicFilterFactory"/>
>       </analyzer>
>       <similarity class="solr.BM25SimilarityFactory">
>         <float name="b">0.0</float>
>       </similarity>
>     </fieldType>
>
>
> Greatly appreciate any help ya'll can offer.
>
> Thanks,
> Sanjana
>
> --
> IMPORTANT NOTICE:  This message, including any attachments (hereinafter
> collectively referred to as "Communication"), is intended only for the
> addressee(s)
> named above.  This Communication may include information that is
> privileged, confidential and exempt from disclosure under applicable law.
>  If the recipient of this Communication is not the intended recipient, or
> the employee or agent responsible for delivering this Communication to the
> intended recipient, you are notified that any dissemination, distribution
> or copying of this Communication is strictly prohibited.  If you have
> received this Communication in error, please notify the sender immediately
> by phone or email and permanently delete this Communication from your
> computer without making a copy. Thank you.
>
-- 
-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | [email protected]
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

Re: Multi word synonyms

Reply via email to