What would also help is a query to find records for the spellcheck dictionary builder. We would like to make separate spelling indexes for all records in english, one in spanish, etc. We would also like to slice&dice the records by other dimensions as well, and have separate spelling DBs for each partition.
That is, we would like to make an english spelling dictionary and a spanish dictionary, and also make subject-specific dictionaries like News and Sports. These are separate orthogonal partitions of our index. The usual practice for this is to create separate fields in the records where one field is only populated for english records, one for spanish records, etc. In our situation this is not practical for space reasons and other proprietary reasons. Lance -----Original Message----- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Thursday, November 29, 2007 6:01 PM To: solr-user@lucene.apache.org Subject: Re: LowerCaseFilterFactory and spellchecker On 29-Nov-07, at 5:40 PM, Chris Hostetter wrote: > > I'm not very familiar with the SpellCheckerRequestHandler, but i don't > think you are doing anything wrong. > > a quick skim of the code indicates that the "q" param isn't being > analyzed by that handler, so the raw input string is pased to the > SpellChecker.suggestSimilar method. This may or may not have been > intentional. > > I personally can't think of > any reason why it wouldn't make sense to get the query analyzer for > the termSourceField and use it to analyze the q param before getting > suggestions. It does make some sense, but I'm not sure that it should be blindly analyzed without adding logic to handle certain cases (like the QueryParser does). What happens if the analyzer produces two tokens? The spellchecker has to deal with this appropriately. Spell checkers should be able to "reverse analyze" the suggestions as well, so "Pyhton" gets corrected to "Python" and not "python". Similarly, "ad-hco" should probably suggest "ad-hoc" and not "adhoc". -Mike