Susheel: That'll work, but the options you've specified for WordDelimiterFilterFactory pretty much make it so it's doing nothing. I realize it's commented out...
That said, it's true that if you have a very specific pattern you want to recognize a Regex can do the trick. WDFF is a bit more generic though when you have less specific requirements. Best, Erick On Wed, Mar 22, 2017 at 12:56 PM, Susheel Kumar <susheel2...@gmail.com> wrote: > I have used PatternReplaceFilterFactory in some of these situations. e.g. > below > > <tokenizer class="solr.ClassicTokenizerFactory"/> <!-- <filter > class="solr.WordDelimiterFilterFactory" generateWordParts="0" > generateNumberParts="0" catenateWords="0" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" /> --> <filter > class="solr.PatternReplaceFilterFactory" pattern="(\d+)-(\d+)-?(\d+)$" > replacement="$1$2$3"/> > > On Wed, Mar 22, 2017 at 2:54 PM, Mark Johnson <mjohn...@emersonecologics.com >> wrote: > >> Awesome, thank you much! >> >> On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson <erickerick...@gmail.com> >> wrote: >> >> > Take a close look at WordDelimiterFilterFactory, it's designed to deal >> > with things like part numbers, phone numbers and the like, and the >> > example you gave is in the same class of problem I think. It'll take >> > a bit to get your head around what it does, but it'll perfom better >> > than regexes, assuming you can get what you need out of it. >> > >> > And the admin/analysis page will help you _greatly_ in understanding >> > what the effects of the various parameters are. >> > >> > Best, >> > Erick >> > >> > On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson >> > <mjohn...@emersonecologics.com> wrote: >> > > Is it possible to configure Solr to treat text that matches a regex as >> a >> > > phrase? >> > > >> > > I have a database full of products, and the Title and Description >> fields >> > > are text_en, tokenized via the StandardTokenizerFactory. This works in >> > most >> > > cases, but a number of products have names like: >> > > >> > > - Vitamin A >> > > - Vitamin-A >> > > - Vitamin B12 >> > > - Vitamin B-12 >> > > ...and so on >> > > >> > > I have a regex that will match all of the permutations and would like >> to >> > > configure the field type so that anything that matches the regex >> pattern >> > is >> > > treated as a single token, instead of being broken up by spaces, etc. >> Is >> > > that possible? >> > > >> > > -- >> > > *This message is intended only for the use of the individual or entity >> to >> > > which it is addressed and may contain information that is privileged, >> > > confidential and exempt from disclosure under applicable law. If you >> have >> > > received this message in error, you are hereby notified that any use, >> > > dissemination, distribution or copying of this message is prohibited. >> If >> > > you have received this communication in error, please notify the sender >> > > immediately and destroy the transmitted information.* >> > >> >> >> >> -- >> >> Best Regards, >> >> *Mark Johnson* | .NET Software Engineer >> >> Office: 603-392-7017 >> >> Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH | >> 03101 >> >> <http://www.emersonecologics.com/> <https://wellevate.me/#/> >> >> *Supporting The Practice Of Healthy Living* >> >> <http://blog.emersonecologics.com/> >> <https://www.linkedin.com/company/emerson-ecologics> >> <https://www.facebook.com/emersonecologics/> >> <https://twitter.com/EmersonEcologic> >> <https://www.instagram.com/emerson_ecologics/> >> <https://www.pinterest.com/emersonecologic/> >> <https://www.glassdoor.com/Overview/Working-at-Emerson- >> Ecologics-EI_IE388367.11,28.htm> >> >> -- >> *This message is intended only for the use of the individual or entity to >> which it is addressed and may contain information that is privileged, >> confidential and exempt from disclosure under applicable law. If you have >> received this message in error, you are hereby notified that any use, >> dissemination, distribution or copying of this message is prohibited. If >> you have received this communication in error, please notify the sender >> immediately and destroy the transmitted information.* >>