Re: Solr schema filters

Yonik Seeley Fri, 11 Jan 2008 12:54:44 -0800

On Jan 10, 2008 8:51 PM, Brian Artiaco <[EMAIL PROTECTED]> wrote:
> I'm kinda under the gun for this problem, and I thought that I would
> be able to solve this problem using the different Tokenizers and Query
> Analyzers that come with Solr, but I seem to be running into a brick
> wall.
>
> I'm currently using Acts_as_solr 0.9.
>
> So my requirements of my project is this: I need to configure my
> solr server so that when I have this field indexed : sku_name_t:
> "FT-50-43"
> that it will show up as a valid result for the following queries:
> "FT", "50", "43", "FT5043", "FT50-43", and "FT-5043"


For this exact example, use the WordDelimiterFilter exactly as
configured in the "text" fieldType in the example schema that ships
with solr.  The trick is to then use some slop when querying.

FT-50-43 will be indexed as FT, 50, 43 / 5043  (the last two tokens
are in the same position).
Now when querying, "FT-5043" won't match without slop because there is
a "50" token in the middle of the indexed terms... so try "FT-5043"~1

-Yonik



> The basic goal behind this requirement is that many people see these
> part number's in hobby magazines, and that when they search for the
> part, many times they will put in incorrect dashes, or no dash at all,
> etc, but they will usually at least have the letters/numbers correct.
>
> With the schema as it is, all of the queries work, EXCEPT for
> "FT-5043" and "FT5043".
>
> Looking at solr's documentation here (http://wiki.apache.org/solr/
> AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089)
> I believe that properly changing the parameters in the
> solr.WordDelimiterFilterFactory tokenizer/analyzer fields in
> schema.xml should provide the results I need.
>
> As near as I can tell, in the schema.xml line 55 (I'm using AAS 0.9,
> if it matters):
> <code><filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="1"/></code>
>
> The default is for catenateAll="0".  My understanding from reading the
> solr docs (please correct me if I'm wrong.  Is that catenateAll on
> "FT-50-43" should result in an index of "FT5043" in addition to the
> other options (I believe this is referred to as Index Expansion).  And
> when I use the solr admin analyzer tool, it appears to do that, but
> the find_by_solr query for "FT5043" still doesn't return any results.
>
> I've also tried playing around with the analyzer on line 64:
> <code><filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="1"/></code>
>
> I've been  banging my head against the wall, trying to come up with
> the magic combination that will provide me with the results I need,
> and I would greatly appreciate some feedback.   Or if there's a better
> solr filter out there that someone can point me in the direction of,
> it would be greatly appreciated.  I'm going to try and post this to
> one of the solr lists too, and if I get a solution there, I'll be sure
> to share it with you guys.
>
> Brian Artiaco
> Blue Hill Solutions
>

Re: Solr schema filters

Reply via email to