Hi Jason, what about multi-word searches like "harry potter"? When I do a search in our index for "harry poter", I get the suggestion "harry spotter" (using spellcheck.collate=true and jarowinkler distance). Searching for "harry spotter" (we're searching AND, not OR) then gives no results. I asume that this is because suggestions are done for words separately, and this does not require that both/all suggestions are contained in the same document.
I wonder what's the standard approach for searches with multiple words.
Are these working ok for you?
Cheers,
Martin
On Fri, 2008-10-03 at 16:21 -0400, Jason Rennie wrote:
> Hi Martin,
>
> I'm a relative newbie to solr, have been playing with the spellcheck
> component and seem to have it working. I certainly can't explain what all
> is going on, but with any luck, I can help you get the spellchecker
> up-and-running. Additional replies in-lined below.
>
> On Wed, Oct 1, 2008 at 7:11 AM, Martin Grotzke <[EMAIL PROTECTED]
> > wrote:
>
> > Now I'm thinking about the source-field in the spellchecker ("spell"):
> > how should fields be analyzed during indexing, and how should the
> > queryAnalyzerFieldType be configured.
>
>
> I followed the conventions in the default solrconfig.xml and schema.xml
> files. So I created a "textSpell" field type (schema.xml):
>
> <!-- field type for the spell checker which doesn't stem -->
> <fieldtype name="textSpell" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer>
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> </fieldtype>
>
> and used this for the queryAnalyzerFieldType. I also created a spellField
> to store the text I want to spell check against and used the same analyzer
> (figuring that the query and indexed data should be analyzed the same way)
> (schema.xml):
>
> <!-- Spell check field -->
> <field name="spellField" type="textSpell" indexed="true" stored="true" />
>
>
>
> > If I have brands like e.g. "Apple" or "Ed Hardy" I would copy them (the
> > field "brand") directly to the "spell" field. The "spell" field is of
> > type "string".
>
>
> We're copying description to spellField. I'd recommend using a type like
> the above textSpell type since "The StringField type is not analyzed, but
> indexed/stored verbatim" (schema.xml):
>
> <copyField source="description" dest="spellField" />
>
> Other fields like e.g. the product title I would first copy to some
> > whitespaceTokinized field (field type with WhitespaceTokenizerFactory)
> > and afterwards to the "spell" field. The product title might be e.g.
> > "Canon EOS 450D EF-S 18-55 mm".
>
>
> Hmm... I'm not sure if this would work as I don't think the analyzer is
> applied until after the copy is made. FWIW, I've had trouble copying
> multipe fields to spellField (i.e. adding a second copyField w/
> dest="spellField"), so we just index the spellchecker on a single field...
>
> Shouldn't this be a WhitespaceTokenizerFactory, or is it better to use a
> > StandardTokenizerFactory here?
>
>
> I think if you use the same analyzer for indexing and queries, the
> distinction probably isn't tremendously important. When I went searching,
> it looked like the StandardTokenizer split on non-letters. I'd guess the
> rationale for using the StandardTokenizer is that it won't recommend
> non-letter characters. I was seeing some weirdness earlier (no
> inserts/deletes), but that disappeared now that I'm using the
> StandardTokenizer.
>
> Cheers,
>
> Jason
--
Martin Grotzke
http://www.javakaffee.de/blog/
signature.asc
Description: This is a digitally signed message part
