Re: Why I get a hit on %, &, but not on !, @, #, $, ^, *

Steven White Wed, 15 Jul 2015 05:14:53 -0700

Thank you all for helping on this topic.  I'm going to play with this and
might come back with more questions.


Steve

On Tue, Jul 14, 2015 at 1:57 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Steve:
>
> Simplest solution:
> remove WordDelimiterFilterFactory.
> Use something like PatternReplaceCharFilterFactory or
> PatternReplaceFilterFactory to selectively remove the characters you
> don't care about and leave in the ones you do care about.
>
> You might also want to do this kind of thing in a copyField and search
> one or the other selectively as desired, or perhaps boost or...
>
> NOTE: one side effect of WDFF is that punctuation is removed, so you
> have to consider what you want to do with periods at the end of a
> sentence, apostrophes and the like.
>
> Best,
> Erick
>
> On Tue, Jul 14, 2015 at 10:08 AM, Steven White <swhite4...@gmail.com>
> wrote:
> > Thanks Jack.
> >
> > Can you provide me with a concrete example of how to:
> >
> > 1) Be able to search and find "$10" (without quotes).  This will get me
> > started on how to add all other variations for !, @, etc. and be able to
> > search on them.  In this case, a search for "$10" will give me a hit on
> > text of "$10", but not "10" and a search on "10" will give me a hit on
> "10"
> > but not "$10".
> >
> > 2) Prevent a hit on "10%" (without quotes).  This will get me started on
> > howto prevent a hit on %, &, etc.  In this case, a search for "%" or
> "10%"
> > will give me 0 hits, but a search on "10" will give me a hit on "10" or
> > "10%".
> >
> > Do you see where I'm going with this?  Are both of those configurations
> > possible?  This will let me customize Solr to meet customer need.
> >
> > Thanks.
> >
> > Steve
> >
> > On Mon, Jul 13, 2015 at 11:12 PM, Jack Krupansky <
> jack.krupan...@gmail.com>
> > wrote:
> >
> >> Oops... that's the "types" attribute.
> >>
> >> -- Jack Krupansky
> >>
> >> On Mon, Jul 13, 2015 at 11:11 PM, Jack Krupansky <
> jack.krupan...@gmail.com
> >> >
> >> wrote:
> >>
> >> > The word delimiter filter is remmoving special characters. You can
> add a
> >> > file containing a list of the special characters that you wish to
> treat
> >> as
> >> > alpha, using the "type" parameter.
> >> >
> >> > -- Jack Krupansky
> >> >
> >> > On Mon, Jul 13, 2015 at 6:43 PM, Steven White <swhite4...@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi Everyone,
> >> >>
> >> >> I think the subject line said it all.  Here is the schema I'm using:
> >> >>
> >> >> <fieldType name="my_text" class="solr.TextField"
> >> >> positionIncrementGap="100"
> >> >> autoGeneratePhraseQueries="true">
> >> >>   <analyzer>
> >> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >> >> <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> >> words="lang/stopwords_en.txt"/>
> >> >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> >> >> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> >> >> catenateAll="1" splitOnCaseChange="0" splitOnNumerics="1"
> >> >> stemEnglishPossessive="1" preserveOriginal="1"/>
> >> >> <filter class="solr.LowerCaseFilterFactory"/>
> >> >> <filter class="solr.KeywordMarkerFilterFactory"
> >> >> protected="protwords.txt"/>
> >> >> <filter class="solr.PorterStemFilterFactory"/>
> >> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >> >>   </analyzer>
> >> >> </fieldType>
> >> >>
> >> >> I'm guessing this is due to how solr.WhitespaceTokenizerFactory works
> >> and
> >> >> those that it is not indexing are removed because they are considered
> >> >> "white-spaces"?  If so, how can I include %, &, etc. into this
> >> >> none-indexed
> >> >> list?  I would rather see all these not indexed vs some are and some
> are
> >> >> not causing confusion to my users.
> >> >>
> >> >> Thanks
> >> >>
> >> >> Steve
> >> >>
> >> >
> >> >
> >>
>

Re: Why I get a hit on %, &, but not on !, @, #, $, ^, *

Reply via email to