I'm kinda under the gun for this problem, and I thought that I would
be able to solve this problem using the different Tokenizers and Query
Analyzers that come with Solr, but I seem to be running into a brick
wall.

I'm currently using Acts_as_solr 0.9.

So my requirements of my project is this: I need to configure my
solr server so that when I have this field indexed : sku_name_t:
"FT-50-43"
that it will show up as a valid result for the following queries:
"FT", "50", "43", "FT5043", "FT50-43", and "FT-5043"

The basic goal behind this requirement is that many people see these
part number's in hobby magazines, and that when they search for the
part, many times they will put in incorrect dashes, or no dash at all,
etc, but they will usually at least have the letters/numbers correct.

With the schema as it is, all of the queries work, EXCEPT for
"FT-5043" and "FT5043".

Looking at solr's documentation here (http://wiki.apache.org/solr/
AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089)
I believe that properly changing the parameters in the
solr.WordDelimiterFilterFactory tokenizer/analyzer fields in
schema.xml should provide the results I need.

As near as I can tell, in the schema.xml line 55 (I'm using AAS 0.9,
if it matters):
<code><filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="1"/></code>

The default is for catenateAll="0".  My understanding from reading the
solr docs (please correct me if I'm wrong.  Is that catenateAll on
"FT-50-43" should result in an index of "FT5043" in addition to the
other options (I believe this is referred to as Index Expansion).  And
when I use the solr admin analyzer tool, it appears to do that, but
the find_by_solr query for "FT5043" still doesn't return any results.

I've also tried playing around with the analyzer on line 64:
<code><filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="1"/></code>

I've been  banging my head against the wall, trying to come up with
the magic combination that will provide me with the results I need,
and I would greatly appreciate some feedback.   Or if there's a better
solr filter out there that someone can point me in the direction of,
it would be greatly appreciated.  I'm going to try and post this to
one of the solr lists too, and if I get a solution there, I'll be sure
to share it with you guys.

Brian Artiaco
Blue Hill Solutions

Reply via email to