Re: Regex Phrases

2017-03-23 Thread Mark Johnson
So I managed to get the tokenizing to work with both PatternTokenizerFactory and WordDelimiterFilterFactory (used in combination with WhitespaceTokenizerFactory). For PT I used a regex that matches the various permutations of the phrases, and for WDF/WT I used protected words with every permutation

Re: Regex Phrases

2017-03-23 Thread Joel Bernstein
You can also checkout https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-RegularExpressionPatternTokenizer . Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Mar 22, 2017 at 7:52 PM, Erick Erickson wrote: > Susheel: > > That'll work, but the options you've specified for

Re: Regex Phrases

2017-03-22 Thread Erick Erickson
Susheel: That'll work, but the options you've specified for WordDelimiterFilterFactory pretty much make it so it's doing nothing. I realize it's commented out... That said, it's true that if you have a very specific pattern you want to recognize a Regex can do the trick. WDFF is a bit more generi

Re: Regex Phrases

2017-03-22 Thread Susheel Kumar
I have used PatternReplaceFilterFactory in some of these situations. e.g. below On Wed, Mar 22, 2017 at 2:54 PM, Mark Johnson wrote: > Awesome, thank you much! > > On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson > wrote: > > > Take a close look at WordDelimiterFilterFactory, it's designed t

Re: Regex Phrases

2017-03-22 Thread Mark Johnson
Awesome, thank you much! On Wed, Mar 22, 2017 at 2:38 PM, Erick Erickson wrote: > Take a close look at WordDelimiterFilterFactory, it's designed to deal > with things like part numbers, phone numbers and the like, and the > example you gave is in the same class of problem I think. It'll take > a

Re: Regex Phrases

2017-03-22 Thread Erick Erickson
Take a close look at WordDelimiterFilterFactory, it's designed to deal with things like part numbers, phone numbers and the like, and the example you gave is in the same class of problem I think. It'll take a bit to get your head around what it does, but it'll perfom better than regexes, assuming y

Regex Phrases

2017-03-22 Thread Mark Johnson
Is it possible to configure Solr to treat text that matches a regex as a phrase? I have a database full of products, and the Title and Description fields are text_en, tokenized via the StandardTokenizerFactory. This works in most cases, but a number of products have names like: - Vitamin A - Vi