Re: WordDelimiterFilterFactory with Wildcards

Saurabh Sethi Wed, 26 Jul 2017 11:31:57 -0700

My guess is PatternReplaceFilterFactory is most likely the cause.
Also, based on your query, you might want to set preserveOriginal=1


You can take one filter out at a time and see which one is altering the
query.

On Wed, Jul 26, 2017 at 11:13 AM, Webster Homer <webster.ho...@sial.com>
wrote:

> 1. KeywordTokenizer - we want to treat the entire field as a single term to
> parse
> 2. preserveOriginal = "0" Thought about changing this to 1
> 3. 6.2.2
>
> This is the fieldtype
>     <fieldType name="cas_num_tokenizer" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>                 <filter class="solr.TrimFilterFactory" />
>         <filter class="solr.WordDelimiterFilterFactory"
>                    generateWordParts="0"
>                    splitOnCaseChange="0"
>                    splitOnNumerics="1"
>                    generateNumberParts="0"
>                    catenateWords="0"
>                    catenateNumbers="1"
>                    catenateAll="0"
>                    preserveOriginal="0"
>                    stemEnglishPossessive="0"/>
>       </analyzer>
>        <analyzer type="query">
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>                 <filter class="solr.TrimFilterFactory" />
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>                        ignoreCase="true" expand="true"
> tokenizerFactory="solr.KeywordTokenizerFactory"/>
>          <!-- remove non-cas queries and junk from synonyms -->
>                 <filter class="solr.PatternReplaceFilterFactory"
> pattern="^.*([^- 0-9*]+).*$" replacement="" replace="all"/>
>           <filter class="solr.WordDelimiterFilterFactory"
>                    generateWordParts="0"
>                    splitOnCaseChange="0"
>                    splitOnNumerics="1"
>                    generateNumberParts="0"
>                    catenateWords="0"
>                    catenateNumbers="1"
>                    catenateAll="0"
>                    preserveOriginal="0"
>                    stemEnglishPossessive="0"/>
>       </analyzer>
>    </fieldType>
>
>
> On Wed, Jul 26, 2017 at 12:56 PM, Saurabh Sethi <
> saurabh.se...@sendgrid.com>
> wrote:
>
> > 1. What tokenizer are you using?
> > 2. Do you have preserveOriginal="1" flag set in your filter?
> > 3. Which version of solr are you using?
> >
> > On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer <webster.ho...@sial.com>
> > wrote:
> >
> > > I have several fieldtypes that use the WordDelimiterFilterFactory
> > >
> > > We have a fieldtype for cas numbers. which look like 1234-12-1, numbers
> > > separated by hyphens, users often leave out the hyphens and either use
> > > spaces or just string the numbers together.
> > >
> > > The WDF seemed like a great solution especially as it gave partial
> > matches.
> > > However, a query like 1234-12-* fails. The analyzer tool shows the
> > wildcard
> > > getting stripped off.
> > > Is there any way to preserve the wildcard in the query analyzer when
> > using
> > > the WordDelimiterFilterFactory?
> > >
> > > --
> > >
> > >
> > > This message and any attachment are confidential and may be privileged
> or
> > > otherwise protected from disclosure. If you are not the intended
> > recipient,
> > > you must not copy this message or attachment or disclose the contents
> to
> > > any other person. If you have received this transmission in error,
> please
> > > notify the sender immediately and delete the message and any attachment
> > > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > > subsidiaries do not accept liability for any omissions or errors in
> this
> > > message which may arise as a result of E-Mail-transmission or for
> damages
> > > resulting from any unauthorized changes of the content of this message
> > and
> > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > > subsidiaries do not guarantee that this message is free of viruses and
> > does
> > > not accept liability for any damages caused by any virus transmitted
> > > therewith.
> > >
> > > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > > Spanish and Portuguese versions of this disclaimer.
> > >
> >
> >
> >
> > --
> > Saurabh Sethi
> > Principal Engineer I | Engineering
> >
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>



-- 
Saurabh Sethi
Principal Engineer I | Engineering

Re: WordDelimiterFilterFactory with Wildcards

Reply via email to