You can also checkout
https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-RegularExpressionPatternTokenizer
.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Mar 22, 2017 at 7:52 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Susheel:
>
> That'll work, but the options you've specified for
> WordDelimiterFilterFactory pretty much make it so it's doing nothing.
> I realize it's commented out...
>
> That said, it's true that if you have a very specific pattern you want
> to recognize a Regex can do the trick. WDFF is a bit more generic
> though when you have less specific requirements.
>
> Best,
> Erick
>
> On Wed, Mar 22, 2017 at 12:56 PM, Susheel Kumar <susheel2...@gmail.com>
> wrote:
> > I have used PatternReplaceFilterFactory in some of these situations. e.g.
> > below
> >
> > <tokenizer class="solr.ClassicTokenizerFactory"/> <!-- <filter
> > class="solr.WordDelimiterFilterFactory" generateWordParts="0"
> > generateNumberParts="0" catenateWords="0" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" /> --> <filter
> > class="solr.PatternReplaceFilterFactory" pattern="(\d+)-(\d+)-?(\d+)$"
> > replacement="$1$2$3"/>
> >
> > On Wed, Mar 22, 2017 at 2:54 PM, Mark Johnson <
> mjohn...@emersonecologics.com
> >> wrote:
> >
> >> Awesome, thank you much!
> >>
> >> On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>
> >> > Take a close look at WordDelimiterFilterFactory, it's designed to deal
> >> > with things like part numbers, phone numbers and the like, and the
> >> > example you gave is in the same class of problem I think. It'll take
> >> > a bit to get your head around what it does, but it'll perfom better
> >> > than regexes, assuming you can get what you need out of it.
> >> >
> >> > And the admin/analysis page will help you _greatly_ in understanding
> >> > what the effects of the various parameters are.
> >> >
> >> > Best,
> >> > Erick
> >> >
> >> > On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson
> >> > <mjohn...@emersonecologics.com> wrote:
> >> > > Is it possible to configure Solr to treat text that matches a regex
> as
> >> a
> >> > > phrase?
> >> > >
> >> > > I have a database full of products, and the Title and Description
> >> fields
> >> > > are text_en, tokenized via the StandardTokenizerFactory. This works
> in
> >> > most
> >> > > cases, but a number of products have names like:
> >> > >
> >> > >  - Vitamin A
> >> > >  - Vitamin-A
> >> > >  - Vitamin B12
> >> > >  - Vitamin B-12
> >> > > ...and so on
> >> > >
> >> > > I have a regex that will match all of the permutations and would
> like
> >> to
> >> > > configure the field type so that anything that matches the regex
> >> pattern
> >> > is
> >> > > treated as a single token, instead of being broken up by spaces,
> etc.
> >> Is
> >> > > that possible?
> >> > >
> >> > > --
> >> > > *This message is intended only for the use of the individual or
> entity
> >> to
> >> > > which it is addressed and may contain information that is
> privileged,
> >> > > confidential and exempt from disclosure under applicable law. If you
> >> have
> >> > > received this message in error, you are hereby notified that any
> use,
> >> > > dissemination, distribution or copying of this message is
> prohibited.
> >> If
> >> > > you have received this communication in error, please notify the
> sender
> >> > > immediately and destroy the transmitted information.*
> >> >
> >>
> >>
> >>
> >> --
> >>
> >> Best Regards,
> >>
> >> *Mark Johnson* | .NET Software Engineer
> >>
> >> Office: 603-392-7017
> >>
> >> Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH |
> >> 03101
> >>
> >> <http://www.emersonecologics.com/>  <https://wellevate.me/#/>
> >>
> >> *Supporting The Practice Of Healthy Living*
> >>
> >> <http://blog.emersonecologics.com/>
> >> <https://www.linkedin.com/company/emerson-ecologics>
> >> <https://www.facebook.com/emersonecologics/>
> >> <https://twitter.com/EmersonEcologic>
> >> <https://www.instagram.com/emerson_ecologics/>
> >> <https://www.pinterest.com/emersonecologic/>
> >> <https://www.glassdoor.com/Overview/Working-at-Emerson-
> >> Ecologics-EI_IE388367.11,28.htm>
> >>
> >> --
> >> *This message is intended only for the use of the individual or entity
> to
> >> which it is addressed and may contain information that is privileged,
> >> confidential and exempt from disclosure under applicable law. If you
> have
> >> received this message in error, you are hereby notified that any use,
> >> dissemination, distribution or copying of this message is prohibited. If
> >> you have received this communication in error, please notify the sender
> >> immediately and destroy the transmitted information.*
> >>
>

Reply via email to