You can also checkout https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-RegularExpressionPatternTokenizer .
Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Mar 22, 2017 at 7:52 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Susheel: > > That'll work, but the options you've specified for > WordDelimiterFilterFactory pretty much make it so it's doing nothing. > I realize it's commented out... > > That said, it's true that if you have a very specific pattern you want > to recognize a Regex can do the trick. WDFF is a bit more generic > though when you have less specific requirements. > > Best, > Erick > > On Wed, Mar 22, 2017 at 12:56 PM, Susheel Kumar <susheel2...@gmail.com> > wrote: > > I have used PatternReplaceFilterFactory in some of these situations. e.g. > > below > > > > <tokenizer class="solr.ClassicTokenizerFactory"/> <!-- <filter > > class="solr.WordDelimiterFilterFactory" generateWordParts="0" > > generateNumberParts="0" catenateWords="0" catenateNumbers="1" > > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" /> --> <filter > > class="solr.PatternReplaceFilterFactory" pattern="(\d+)-(\d+)-?(\d+)$" > > replacement="$1$2$3"/> > > > > On Wed, Mar 22, 2017 at 2:54 PM, Mark Johnson < > mjohn...@emersonecologics.com > >> wrote: > > > >> Awesome, thank you much! > >> > >> On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson < > erickerick...@gmail.com> > >> wrote: > >> > >> > Take a close look at WordDelimiterFilterFactory, it's designed to deal > >> > with things like part numbers, phone numbers and the like, and the > >> > example you gave is in the same class of problem I think. It'll take > >> > a bit to get your head around what it does, but it'll perfom better > >> > than regexes, assuming you can get what you need out of it. > >> > > >> > And the admin/analysis page will help you _greatly_ in understanding > >> > what the effects of the various parameters are. > >> > > >> > Best, > >> > Erick > >> > > >> > On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson > >> > <mjohn...@emersonecologics.com> wrote: > >> > > Is it possible to configure Solr to treat text that matches a regex > as > >> a > >> > > phrase? > >> > > > >> > > I have a database full of products, and the Title and Description > >> fields > >> > > are text_en, tokenized via the StandardTokenizerFactory. This works > in > >> > most > >> > > cases, but a number of products have names like: > >> > > > >> > > - Vitamin A > >> > > - Vitamin-A > >> > > - Vitamin B12 > >> > > - Vitamin B-12 > >> > > ...and so on > >> > > > >> > > I have a regex that will match all of the permutations and would > like > >> to > >> > > configure the field type so that anything that matches the regex > >> pattern > >> > is > >> > > treated as a single token, instead of being broken up by spaces, > etc. > >> Is > >> > > that possible? > >> > > > >> > > -- > >> > > *This message is intended only for the use of the individual or > entity > >> to > >> > > which it is addressed and may contain information that is > privileged, > >> > > confidential and exempt from disclosure under applicable law. If you > >> have > >> > > received this message in error, you are hereby notified that any > use, > >> > > dissemination, distribution or copying of this message is > prohibited. > >> If > >> > > you have received this communication in error, please notify the > sender > >> > > immediately and destroy the transmitted information.* > >> > > >> > >> > >> > >> -- > >> > >> Best Regards, > >> > >> *Mark Johnson* | .NET Software Engineer > >> > >> Office: 603-392-7017 > >> > >> Emerson Ecologics, LLC | 1230 Elm Street | Suite 301 | Manchester NH | > >> 03101 > >> > >> <http://www.emersonecologics.com/> <https://wellevate.me/#/> > >> > >> *Supporting The Practice Of Healthy Living* > >> > >> <http://blog.emersonecologics.com/> > >> <https://www.linkedin.com/company/emerson-ecologics> > >> <https://www.facebook.com/emersonecologics/> > >> <https://twitter.com/EmersonEcologic> > >> <https://www.instagram.com/emerson_ecologics/> > >> <https://www.pinterest.com/emersonecologic/> > >> <https://www.glassdoor.com/Overview/Working-at-Emerson- > >> Ecologics-EI_IE388367.11,28.htm> > >> > >> -- > >> *This message is intended only for the use of the individual or entity > to > >> which it is addressed and may contain information that is privileged, > >> confidential and exempt from disclosure under applicable law. If you > have > >> received this message in error, you are hereby notified that any use, > >> dissemination, distribution or copying of this message is prohibited. If > >> you have received this communication in error, please notify the sender > >> immediately and destroy the transmitted information.* > >> >