Re: WordDelimiterFilterFactory with Wildcards

Webster Homer Wed, 26 Jul 2017 11:14:12 -0700

1. KeywordTokenizer - we want to treat the entire field as a single term to
parse
2. preserveOriginal = "0" Thought about changing this to 1
3. 6.2.2


This is the fieldtype
    <fieldType name="cas_num_tokenizer" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.TrimFilterFactory" />
        <filter class="solr.WordDelimiterFilterFactory"
                   generateWordParts="0"
                   splitOnCaseChange="0"
                   splitOnNumerics="1"
                   generateNumberParts="0"
                   catenateWords="0"
                   catenateNumbers="1"
                   catenateAll="0"
                   preserveOriginal="0"
                   stemEnglishPossessive="0"/>
      </analyzer>
       <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.TrimFilterFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
                       ignoreCase="true" expand="true"
tokenizerFactory="solr.KeywordTokenizerFactory"/>
         <!-- remove non-cas queries and junk from synonyms -->
                <filter class="solr.PatternReplaceFilterFactory"
pattern="^.*([^- 0-9*]+).*$" replacement="" replace="all"/>
          <filter class="solr.WordDelimiterFilterFactory"
                   generateWordParts="0"
                   splitOnCaseChange="0"
                   splitOnNumerics="1"
                   generateNumberParts="0"
                   catenateWords="0"
                   catenateNumbers="1"
                   catenateAll="0"
                   preserveOriginal="0"
                   stemEnglishPossessive="0"/>
      </analyzer>
   </fieldType>


On Wed, Jul 26, 2017 at 12:56 PM, Saurabh Sethi <saurabh.se...@sendgrid.com>
wrote:

> 1. What tokenizer are you using?
> 2. Do you have preserveOriginal="1" flag set in your filter?
> 3. Which version of solr are you using?
>
> On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer <webster.ho...@sial.com>
> wrote:
>
> > I have several fieldtypes that use the WordDelimiterFilterFactory
> >
> > We have a fieldtype for cas numbers. which look like 1234-12-1, numbers
> > separated by hyphens, users often leave out the hyphens and either use
> > spaces or just string the numbers together.
> >
> > The WDF seemed like a great solution especially as it gave partial
> matches.
> > However, a query like 1234-12-* fails. The analyzer tool shows the
> wildcard
> > getting stripped off.
> > Is there any way to preserve the wildcard in the query analyzer when
> using
> > the WordDelimiterFilterFactory?
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
> >
>
>
>
> --
> Saurabh Sethi
> Principal Engineer I | Engineering
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: WordDelimiterFilterFactory with Wildcards

Reply via email to