I have used PatternReplaceFilterFactory in some of these situations. e.g. below
<tokenizer class="solr.ClassicTokenizerFactory"/> <!-- <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" /> --> <filter class="solr.PatternReplaceFilterFactory" pattern="(\d+)-(\d+)-?(\d+)$" replacement="$1$2$3"/> On Wed, Mar 22, 2017 at 2:54 PM, Mark Johnson <mjohn...@emersonecologics.com > wrote: > Awesome, thank you much! > > On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > Take a close look at WordDelimiterFilterFactory, it's designed to deal > > with things like part numbers, phone numbers and the like, and the > > example you gave is in the same class of problem I think. It'll take > > a bit to get your head around what it does, but it'll perfom better > > than regexes, assuming you can get what you need out of it. > > > > And the admin/analysis page will help you _greatly_ in understanding > > what the effects of the various parameters are. > > > > Best, > > Erick > > > > On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson > > <mjohn...@emersonecologics.com> wrote: > > > Is it possible to configure Solr to treat text that matches a regex as > a > > > phrase? > > > > > > I have a database full of products, and the Title and Description > fields > > > are text_en, tokenized via the StandardTokenizerFactory. This works in > > most > > > cases, but a number of products have names like: > > > > > > - Vitamin A > > > - Vitamin-A > > > - Vitamin B12 > > > - Vitamin B-12 > > > ...and so on > > > > > > I have a regex that will match all of the permutations and would like > to > > > configure the field type so that anything that matches the regex > pattern > > is > > > treated as a single token, instead of being broken up by spaces, etc. > Is > > > that possible? > > > > > > -- > > > *This message is intended only for the use of the individual or entity > to > > > which it is addressed and may contain information that is privileged, > > > confidential and exempt from disclosure under applicable law. If you > have > > > received this message in error, you are hereby notified that any use, > > > dissemination, distribution or copying of this message is prohibited. > If > > > you have received this communication in error, please notify the sender > > > immediately and destroy the transmitted information.* > > > > > > -- > > Best Regards, > > *Mark Johnson* | .NET Software Engineer > > Office: 603-392-7017 > > Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH | > 03101 > > <http://www.emersonecologics.com/> <https://wellevate.me/#/> > > *Supporting The Practice Of Healthy Living* > > <http://blog.emersonecologics.com/> > <https://www.linkedin.com/company/emerson-ecologics> > <https://www.facebook.com/emersonecologics/> > <https://twitter.com/EmersonEcologic> > <https://www.instagram.com/emerson_ecologics/> > <https://www.pinterest.com/emersonecologic/> > <https://www.glassdoor.com/Overview/Working-at-Emerson- > Ecologics-EI_IE388367.11,28.htm> > > -- > *This message is intended only for the use of the individual or entity to > which it is addressed and may contain information that is privileged, > confidential and exempt from disclosure under applicable law. If you have > received this message in error, you are hereby notified that any use, > dissemination, distribution or copying of this message is prohibited. If > you have received this communication in error, please notify the sender > immediately and destroy the transmitted information.* >