Hi Martin, I'm a relative newbie to solr, have been playing with the spellcheck component and seem to have it working. I certainly can't explain what all is going on, but with any luck, I can help you get the spellchecker up-and-running. Additional replies in-lined below.
On Wed, Oct 1, 2008 at 7:11 AM, Martin Grotzke <[EMAIL PROTECTED] > wrote: > Now I'm thinking about the source-field in the spellchecker ("spell"): > how should fields be analyzed during indexing, and how should the > queryAnalyzerFieldType be configured. I followed the conventions in the default solrconfig.xml and schema.xml files. So I created a "textSpell" field type (schema.xml): <!-- field type for the spell checker which doesn't stem --> <fieldtype name="textSpell" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldtype> and used this for the queryAnalyzerFieldType. I also created a spellField to store the text I want to spell check against and used the same analyzer (figuring that the query and indexed data should be analyzed the same way) (schema.xml): <!-- Spell check field --> <field name="spellField" type="textSpell" indexed="true" stored="true" /> > If I have brands like e.g. "Apple" or "Ed Hardy" I would copy them (the > field "brand") directly to the "spell" field. The "spell" field is of > type "string". We're copying description to spellField. I'd recommend using a type like the above textSpell type since "The StringField type is not analyzed, but indexed/stored verbatim" (schema.xml): <copyField source="description" dest="spellField" /> Other fields like e.g. the product title I would first copy to some > whitespaceTokinized field (field type with WhitespaceTokenizerFactory) > and afterwards to the "spell" field. The product title might be e.g. > "Canon EOS 450D EF-S 18-55 mm". Hmm... I'm not sure if this would work as I don't think the analyzer is applied until after the copy is made. FWIW, I've had trouble copying multipe fields to spellField (i.e. adding a second copyField w/ dest="spellField"), so we just index the spellchecker on a single field... Shouldn't this be a WhitespaceTokenizerFactory, or is it better to use a > StandardTokenizerFactory here? I think if you use the same analyzer for indexing and queries, the distinction probably isn't tremendously important. When I went searching, it looked like the StandardTokenizer split on non-letters. I'd guess the rationale for using the StandardTokenizer is that it won't recommend non-letter characters. I was seeing some weirdness earlier (no inserts/deletes), but that disappeared now that I'm using the StandardTokenizer. Cheers, Jason