This is why OR is a better choice. With AND, one miss means no results at all. Spelling suggestions will never be good enough to make AND work.
wunder On 10/6/08 12:51 AM, "Martin Grotzke" <[EMAIL PROTECTED]> wrote: > Hi Jason, > > what about multi-word searches like "harry potter"? When I do a search > in our index for "harry poter", I get the suggestion "harry > spotter" (using spellcheck.collate=true and jarowinkler distance). > Searching for "harry spotter" (we're searching AND, not OR) then gives > no results. I asume that this is because suggestions are done for words > separately, and this does not require that both/all suggestions are > contained in the same document. > > I wonder what's the standard approach for searches with multiple words. > Are these working ok for you? > > Cheers, > Martin > > On Fri, 2008-10-03 at 16:21 -0400, Jason Rennie wrote: >> Hi Martin, >> >> I'm a relative newbie to solr, have been playing with the spellcheck >> component and seem to have it working. I certainly can't explain what all >> is going on, but with any luck, I can help you get the spellchecker >> up-and-running. Additional replies in-lined below. >> >> On Wed, Oct 1, 2008 at 7:11 AM, Martin Grotzke <[EMAIL PROTECTED] >>> wrote: >> >>> Now I'm thinking about the source-field in the spellchecker ("spell"): >>> how should fields be analyzed during indexing, and how should the >>> queryAnalyzerFieldType be configured. >> >> >> I followed the conventions in the default solrconfig.xml and schema.xml >> files. So I created a "textSpell" field type (schema.xml): >> >> <!-- field type for the spell checker which doesn't stem --> >> <fieldtype name="textSpell" class="solr.TextField" >> positionIncrementGap="100"> >> <analyzer> >> <tokenizer class="solr.StandardTokenizerFactory"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> </analyzer> >> </fieldtype> >> >> and used this for the queryAnalyzerFieldType. I also created a spellField >> to store the text I want to spell check against and used the same analyzer >> (figuring that the query and indexed data should be analyzed the same way) >> (schema.xml): >> >> <!-- Spell check field --> >> <field name="spellField" type="textSpell" indexed="true" stored="true" /> >> >> >> >>> If I have brands like e.g. "Apple" or "Ed Hardy" I would copy them (the >>> field "brand") directly to the "spell" field. The "spell" field is of >>> type "string". >> >> >> We're copying description to spellField. I'd recommend using a type like >> the above textSpell type since "The StringField type is not analyzed, but >> indexed/stored verbatim" (schema.xml): >> >> <copyField source="description" dest="spellField" /> >> >> Other fields like e.g. the product title I would first copy to some >>> whitespaceTokinized field (field type with WhitespaceTokenizerFactory) >>> and afterwards to the "spell" field. The product title might be e.g. >>> "Canon EOS 450D EF-S 18-55 mm". >> >> >> Hmm... I'm not sure if this would work as I don't think the analyzer is >> applied until after the copy is made. FWIW, I've had trouble copying >> multipe fields to spellField (i.e. adding a second copyField w/ >> dest="spellField"), so we just index the spellchecker on a single field... >> >> Shouldn't this be a WhitespaceTokenizerFactory, or is it better to use a >>> StandardTokenizerFactory here? >> >> >> I think if you use the same analyzer for indexing and queries, the >> distinction probably isn't tremendously important. When I went searching, >> it looked like the StandardTokenizer split on non-letters. I'd guess the >> rationale for using the StandardTokenizer is that it won't recommend >> non-letter characters. I was seeing some weirdness earlier (no >> inserts/deletes), but that disappeared now that I'm using the >> StandardTokenizer. >> >> Cheers, >> >> Jason