Re: Using multiple DirectSolrSpellcheckers for a query

Nalini Kartha Tue, 06 Mar 2012 16:31:24 -0800

Hi James,

Thanks for the detailed reply and sorry for the delay getting back.

One issue for us with using the collate functionality is that some of our
query types  are default OR (implemented using the mm param value). Since
the collate functionality reruns the query using all param values specified
in the original query, it'll effectively be issuing an OR query again
right? Which means that again we could end up with corrections which aren't
the best for the current query?

Another issue we're running into is that we're using unstemmed fields as
the source for our spell correction field and so we could end up
unnecessarily correcting queries containing stemmed versions of words.

So for eg. if I have a document containing "running" my fields look like
this -

docUnstemmed: running
docStemmed: run, ...
spellcheck: running

If a user searches for "run OR jump", there are matching results (since we
search against both the stemmed and unstemmed fields) but the spellcheck
results will contain corrections for "run", let's say "sun". We don't want
to overcorrect queries which are returning valid results like this one. Any
suggestions for how to deal with this?

I was thinking that there might be value in having another dictionary which
is used for vetting words but not for finding corrections - the stemmed
fields could be used as a source for this dictionary. So before finding
corrections for a term if it doesn't exist in the primary dictionary, check
the secondary dictionary and make sure the term does not exist in it as
well. But then, this would require an extra copyfield (we could have
multiple unstemmed fields as a source for this secondary dictionary) and
bloat the index even more so I'm not sure if it's feasible.

Thanks,
Nalini

On Thu, Jan 26, 2012 at 10:23 AM, Dyer, James <james.d...@ingrambook.com>wrote:

> Nalini,
>
> Right now the best you can do is to use <copyField> to combine everything
> into a catch-all for spellchecking purposes.  While this seems wasteful,
> this often has to be done anyhow because typically you'll need
> less/different analysis for spellchecking than for searching.  But rather
> than having separate <copyField>s to create multiple dictionaries, put
> everything into one field to create a single "master" dictionary.
>
> From there, you need to set "spellcheck.collate" to true and also
> "spellcheck.maxCollationTries" greater than zero (5-10 usually works).  The
> first parameter tells it to generate re-written queries with spelling
> suggestions (collations).  The second parameter tells it to weed out any
> collations that won't generate hits if you re-query them.  This is
> important because having unrelated keywords in your master dictionary will
> increase the chances the spellchecker will pick the wrong words as
> corrections.
>
> There is a significant caveat to this:  The spellchecker typically only
> suggests for words in the dictionary.  So by creating a huge, master
> dictionary you might find that many misspelled words won't generate
> suggestions.  See this thread for some workarounds:
> http://lucene.472066.n3.nabble.com/Improving-Solr-Spell-Checker-Results-td3658411.html
>
> I think having multiple, per-field dictionaries as you suggest might be a
> good way to go.  While this is not supported, I don't think its because of
> performance concerns.  (There would be an overhead cost to this but I think
> it would still be practical).  It just hasn't been implemented yet.  But we
> might be getting to a possible start to this type of functionality.  In
> https://issues.apache.org/jira/browse/SOLR-2585 a separate spellchecker
> is added that just corrects wordbreak (or is it "word break"?) problems,
> then a "ConjunctionSolrSpellChecker" combines the results from the main
> spellchecker and the wordbreak spellcheker.  I could see a next step beyond
> this being to support per-field dictionaries, checking them separately,
> then combining the results.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
> -----Original Message-----
> From: Nalini Kartha [mailto:nalinikar...@gmail.com]
> Sent: Wednesday, January 25, 2012 11:56 AM
> To: solr-user@lucene.apache.org
> Subject: Using multiple DirectSolrSpellcheckers for a query
>
> Hi,
>
> We are trying to use the DirectSolrSpellChecker to get corrections for
> mis-spelled query terms directly from fields in the Solr index.
>
> However, we need to use multiple fields for spellchecking a query. It looks
> looks like you can only use one spellchecker for a request and so the
> workaround for this it to create a copy field from the fields required for
> spell correction?
>
> We'd like to avoid this because we allow users to perform different kinds
> of queries on different sets of fields and so to provide meaningful
> corrections we'd have to create multiple copy fields - one for each query
> type.
>
> Is there any reason why Solr doesn't support using multiple spellcheckers
> for a query? Is it because of performance overhead?
>
> Thanks,
> Nalini
>

Re: Using multiple DirectSolrSpellcheckers for a query

Reply via email to