I have used PatternReplaceFilterFactory in some of these situations. e.g.
below

<tokenizer class="solr.ClassicTokenizerFactory"/> <!-- <filter
class="solr.WordDelimiterFilterFactory" generateWordParts="0"
generateNumberParts="0" catenateWords="0" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" /> --> <filter
class="solr.PatternReplaceFilterFactory" pattern="(\d+)-(\d+)-?(\d+)$"
replacement="$1$2$3"/>

On Wed, Mar 22, 2017 at 2:54 PM, Mark Johnson <mjohn...@emersonecologics.com
> wrote:

> Awesome, thank you much!
>
> On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > Take a close look at WordDelimiterFilterFactory, it's designed to deal
> > with things like part numbers, phone numbers and the like, and the
> > example you gave is in the same class of problem I think. It'll take
> > a bit to get your head around what it does, but it'll perfom better
> > than regexes, assuming you can get what you need out of it.
> >
> > And the admin/analysis page will help you _greatly_ in understanding
> > what the effects of the various parameters are.
> >
> > Best,
> > Erick
> >
> > On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson
> > <mjohn...@emersonecologics.com> wrote:
> > > Is it possible to configure Solr to treat text that matches a regex as
> a
> > > phrase?
> > >
> > > I have a database full of products, and the Title and Description
> fields
> > > are text_en, tokenized via the StandardTokenizerFactory. This works in
> > most
> > > cases, but a number of products have names like:
> > >
> > >  - Vitamin A
> > >  - Vitamin-A
> > >  - Vitamin B12
> > >  - Vitamin B-12
> > > ...and so on
> > >
> > > I have a regex that will match all of the permutations and would like
> to
> > > configure the field type so that anything that matches the regex
> pattern
> > is
> > > treated as a single token, instead of being broken up by spaces, etc.
> Is
> > > that possible?
> > >
> > > --
> > > *This message is intended only for the use of the individual or entity
> to
> > > which it is addressed and may contain information that is privileged,
> > > confidential and exempt from disclosure under applicable law. If you
> have
> > > received this message in error, you are hereby notified that any use,
> > > dissemination, distribution or copying of this message is prohibited.
> If
> > > you have received this communication in error, please notify the sender
> > > immediately and destroy the transmitted information.*
> >
>
>
>
> --
>
> Best Regards,
>
> *Mark Johnson* | .NET Software Engineer
>
> Office: 603-392-7017
>
> Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH |
> 03101
>
> <http://www.emersonecologics.com/>  <https://wellevate.me/#/>
>
> *Supporting The Practice Of Healthy Living*
>
> <http://blog.emersonecologics.com/>
> <https://www.linkedin.com/company/emerson-ecologics>
> <https://www.facebook.com/emersonecologics/>
> <https://twitter.com/EmersonEcologic>
> <https://www.instagram.com/emerson_ecologics/>
> <https://www.pinterest.com/emersonecologic/>
> <https://www.glassdoor.com/Overview/Working-at-Emerson-
> Ecologics-EI_IE388367.11,28.htm>
>
> --
> *This message is intended only for the use of the individual or entity to
> which it is addressed and may contain information that is privileged,
> confidential and exempt from disclosure under applicable law. If you have
> received this message in error, you are hereby notified that any use,
> dissemination, distribution or copying of this message is prohibited. If
> you have received this communication in error, please notify the sender
> immediately and destroy the transmitted information.*
>

Reply via email to