Take a close look at WordDelimiterFilterFactory, it's designed to deal with things like part numbers, phone numbers and the like, and the example you gave is in the same class of problem I think. It'll take a bit to get your head around what it does, but it'll perfom better than regexes, assuming you can get what you need out of it.
And the admin/analysis page will help you _greatly_ in understanding what the effects of the various parameters are. Best, Erick On Wed, Mar 22, 2017 at 11:06 AM, Mark Johnson <mjohn...@emersonecologics.com> wrote: > Is it possible to configure Solr to treat text that matches a regex as a > phrase? > > I have a database full of products, and the Title and Description fields > are text_en, tokenized via the StandardTokenizerFactory. This works in most > cases, but a number of products have names like: > > - Vitamin A > - Vitamin-A > - Vitamin B12 > - Vitamin B-12 > ...and so on > > I have a regex that will match all of the permutations and would like to > configure the field type so that anything that matches the regex pattern is > treated as a single token, instead of being broken up by spaces, etc. Is > that possible? > > -- > *This message is intended only for the use of the individual or entity to > which it is addressed and may contain information that is privileged, > confidential and exempt from disclosure under applicable law. If you have > received this message in error, you are hereby notified that any use, > dissemination, distribution or copying of this message is prohibited. If > you have received this communication in error, please notify the sender > immediately and destroy the transmitted information.*