Character-based NGrams are a good tool for this problem. MLT is a
document-wide numerical analysis.

If the common types of OCR mistakes are different than what NGrams
create, you might tune the ngram generator. For example, swapping
letters might not happen very often. SIngle- and multi-word errors
must happen a lot.

If you do a facet query on your indexed terms, you will get a lot of
facets with only one appearance in the index. These are often
misspellings. It is possible to automate pulling these and creating a
matching set of synonyms for words that appear in the spelling index.

On Tue, Dec 15, 2009 at 12:57 PM, Chris Hostetter
<hossman_luc...@fucit.org> wrote:
>
> : My first problem appears because I need suggestions inclusive when the
> : expression has returned results. It's seems that only appear
> : suggestions when there are no results. Is there a way to do so?
>
> can you give us an example of what your queries look like?  with the
> example configs, i can get matches, as well as suggestions...
>
>
> http://localhost:8983/solr/spell?q=ide&spellcheck=true
>
> : The second question is: For the purposes that I've mentioned, is the
> : best way to use spellchecker or mlt component? Or some other (as a
> : fuzzy query)?
>
> there's no clear cut answer to that -- i don't remember anyone else ever
> asking about anything particularly similar to what you're doing, so i
> don't know that there is any precident for a "best" way to go about it.
>
>
>
> -Hoss
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to