The reason is almost certainly because the query parser is splitting on whitespace before the analysis chain gets the query - thus, each token travels separately through your chain. Try it with quotes around it to see if this is your issue.
Upayavira On Thu, Jul 30, 2015, at 04:52 PM, Jack Schlederer wrote: > Hi, > > I'm in the process of revising a schema for the search function of an > eCommerce platform. One of the sticking points is a particular use > case of searching for "xx yy" where xx is any number and yy is an > abbreviation for a unit of measurement (mm, cc, ml, in, etc.). The > problem is that searching for "xx yy" and "xxyy" return different > results. One possible solution I tried was applying a few > PatternReplaceCharFilterFactories to remove the whitespace between xx > and yy if there was any (at both index- and query-time). These are > the first few lines in the analyzer: > > <charFilter class="solr.PatternReplaceCharFilterFactory" > pattern="(?i)(\d+)\s?(pounds?|lbs?)" replacement="$1lb" /> <charFilter > class="solr.PatternReplaceCharFilterFactory" > pattern="(?i)(\d+)\s?(inch[es]?|in?)" replacement="$1in" /> > <charFilter class="solr.PatternReplaceCharFilterFactory" > pattern="(?i)(\d+)\s?(ounc[es]?|oz)" replacement="$1oz" /> <charFilter > class="solr.PatternReplaceCharFilterFactory" > pattern="(?i)(\d+)\s?(quarts?|qts?)" replacement="$1qt" /> <charFilter > class="solr.PatternReplaceCharFilterFactory" > pattern="(?i)(\d+)\s?(gallons?|gal?)" replacement="$1gal" /> > <charFilter class="solr.PatternReplaceCharFilterFactory" > pattern="(?i)(\d+)\s?(mm|cc|ml)" replacement="$1$2" /> > > A few more lines down, I use a PatternCaptureGroupFilterFactory to > emit the tokens "xxyy", "xx", and "yy": > > <filter class="solr.PatternCaptureGroupFilterFactory" > pattern="(\d+)(lb|oz|in|qt|gal|mm|cc|ml)" preserve_original="true" /> > > In Solr admin's analysis tool for the field type this applies to, both > "xx yy" and "xxyy" are tokenized and filtered down indentically (at > both index- and -query time). > > The platform I'm working on searches many different fields by default, > but even when I rig up the query to only search in this one field, I > still get different results for "xxyy" and "xx yy". I'm wondering why > this is. > > Attached is a screenshot from Solr analysis. > > Thanks, John