The reason is almost certainly because the query parser is splitting on
whitespace before the analysis chain gets the query - thus, each token
travels separately through your chain. Try it with quotes around it to
see if this is your issue.

Upayavira

On Thu, Jul 30, 2015, at 04:52 PM, Jack Schlederer wrote:
> Hi,
>
> I'm in the process of revising a schema for the search function of an
> eCommerce platform.  One of the sticking points is a particular use
> case of searching for "xx yy" where xx is any number and yy is an
> abbreviation for a unit of measurement (mm, cc, ml, in, etc.).  The
> problem is that searching for "xx yy" and "xxyy" return different
> results. One possible solution I tried was applying a few
> PatternReplaceCharFilterFactories to remove the whitespace between xx
> and yy if there was any (at both index- and query-time).  These are
> the first few lines in the analyzer:
>
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="(?i)(\d+)\s?(pounds?|lbs?)" replacement="$1lb" /> <charFilter
> class="solr.PatternReplaceCharFilterFactory"
> pattern="(?i)(\d+)\s?(inch[es]?|in?)" replacement="$1in" />
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="(?i)(\d+)\s?(ounc[es]?|oz)" replacement="$1oz" /> <charFilter
> class="solr.PatternReplaceCharFilterFactory"
> pattern="(?i)(\d+)\s?(quarts?|qts?)" replacement="$1qt" /> <charFilter
> class="solr.PatternReplaceCharFilterFactory"
> pattern="(?i)(\d+)\s?(gallons?|gal?)" replacement="$1gal" />
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="(?i)(\d+)\s?(mm|cc|ml)" replacement="$1$2" />
>
> A few more lines down, I use a PatternCaptureGroupFilterFactory to
> emit the tokens "xxyy", "xx", and "yy":
>
> <filter class="solr.PatternCaptureGroupFilterFactory"
> pattern="(\d+)(lb|oz|in|qt|gal|mm|cc|ml)" preserve_original="true" />
>
> In Solr admin's analysis tool for the field type this applies to, both
> "xx yy" and "xxyy" are tokenized and filtered down indentically (at
> both index- and -query time).
>
> The platform I'm working on searches many different fields by default,
> but even when I rig up the query to only search in this one field, I
> still get different results for "xxyy" and "xx yy".  I'm wondering why
> this is.
>
> Attached is a screenshot from Solr analysis.
>
> Thanks, John

Reply via email to