Re: LowerCaseFilterFactory and spellchecker

Mike Klaas Fri, 30 Nov 2007 13:58:39 -0800

That's a pretty difficult proposition. Currently the spellcheckdoesn't look at documents at all: only the top-level term&count datais used to create the index. Adding select-by-query would beconsiderably more complicated and expensive (I think a near-fulliteration of TermDocs would be needed).


-Mike


On 30-Nov-07, at 1:45 PM, Norskog, Lance wrote:

What would also help is a query to find records for the spellcheck

dictionary builder. We would like to make separate spelling indexesfor

all records in english, one in spanish, etc. We would also like to
slice&dice the records by other dimensions as well, and have separate
spelling DBs for each partition.

That is, we would like to make an english spelling dictionary and a
spanish dictionary, and also make subject-specific dictionaries like

News and Sports. These are separate orthogonal partitions of ourindex.

The usual practice for this is to create separate fields in therecords

where one field is only populated for english records, one for spanish
records, etc. In our situation this is not practical for space reasons
and other proprietary reasons.

Lance

-----Original Message-----
From: Mike Klaas [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 29, 2007 6:01 PM
To: solr-user@lucene.apache.org
Subject: Re: LowerCaseFilterFactory and spellchecker

On 29-Nov-07, at 5:40 PM, Chris Hostetter wrote:

I'm not very familiar with the SpellCheckerRequestHandler, but idon't

think you are doing anything wrong.

a quick skim of the code indicates that the "q" param isn't being
analyzed by that handler, so the raw input string is pased to the
SpellChecker.suggestSimilar method. This may or may not have been
intentional.

I personally can't think of
any reason why it wouldn't make sense to get the query analyzer for
the termSourceField and use it to analyze the q param before getting
suggestions.


It does make some sense, but I'm not sure that it should be blindly
analyzed without adding logic to handle certain cases (like the
QueryParser does).  What happens if the analyzer produces two tokens?
The spellchecker has to deal with this appropriately.  Spell checkers

should be able to "reverse analyze" the suggestions as well, so"Pyhton"gets corrected to "Python" and not "python". Similarly, "ad-hco"should

probably suggest "ad-hoc" and not "adhoc".

-Mike

Re: LowerCaseFilterFactory and spellchecker

Reply via email to