Hi, I'm just starting with the spellchecker component provided by solr - it is really cool!
Now I'm thinking about the source-field in the spellchecker ("spell"): how should fields be analyzed during indexing, and how should the queryAnalyzerFieldType be configured. If I have brands like e.g. "Apple" or "Ed Hardy" I would copy them (the field "brand") directly to the "spell" field. The "spell" field is of type "string". Other fields like e.g. the product title I would first copy to some whitespaceTokinized field (field type with WhitespaceTokenizerFactory) and afterwards to the "spell" field. The product title might be e.g. "Canon EOS 450D EF-S 18-55 mm". This is the process I have in mind during indexing (though I'm not sure if some tokens/terms should be removed, but I'd asume that all terms might be misspelled by the user). Now when it comes to searching, the query should be analyzed using the queryAnalyzerFieldType definition, which has a StandardTokenizerFactory in the schema.xml of the solr example. Shouldn't this be a WhitespaceTokenizerFactory, or is it better to use a StandardTokenizerFactory here? Or should I use a StandardTokenizerFactory for the "spell" field, so that fields copied into this field get tokenized/analyzed in the same way as the query will get tokenized/analyzed? Do you have any experience with this and/or recommendations regarding this? Are there other things to consider? Thanx for your help, cheers, Martin
signature.asc
Description: This is a digitally signed message part