Thanks for the advice. Unfortunately what we really need is for the corrections to satisfy fq params.
Was wondering how bad the perf would be if we're using the same DocSet (or should it be an OpenBitSet? sorry, I'm still trying to figure all that code out) for each 'correction query'? Seems like this is similar to how facet counts are calculated? Thanks, Nalini On Thu, Dec 20, 2012 at 12:12 PM, Dyer, James <james.d...@ingramcontent.com>wrote: > The spellchecker doesn't support checking the indivdual words against the > index with "fq" applied. This is only done for collations (and only if > "maxCollationTries" is greater than 0). Checking every suggested word > individually against the index after applying filter queries is probably > going to be very expensive no matter how you implement it. However, > someone with more lucene-core knowledge than I have might be able to give > you better advice. > > If your root problem, though, is getting good "did-you-mean"-style > suggestions with dismax queries and mm=0, and if you want to consider the > case where some words in the query are misspelled and others are entirely > irrelevant (and can't be corrected), then setting "maxResultsForSuggest" to > a high value might give you the end result you want. Unlike if you use " > spellcheck.collateParam.mm=100%", it won't insist that the irrelevant > terms (or a "corrected" irrelevant term) match anything. On the other > hand, it won't assume the query is "Correctly > Spelled" just because you got some hits from it (because mm=0 will just > cause the misspelled terms to be thrown out). > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > > -----Original Message----- > From: Nalini Kartha [mailto:nalinikar...@gmail.com] > Sent: Thursday, December 20, 2012 8:53 AM > To: solr-user@lucene.apache.org > Subject: Re: Ensuring SpellChecker returns corrections which satisfy fq > params for default OR query > > Hi James, > > I don't get how the spellcheck.maxResultsForSuggest param helps with making > sure that the suggestions returned satisfy the fq params? > > That's the main problem we're trying to solve, how often suggestions are > being returned is not really an issue for us at the moment. > > Thanks, > Nalini > > > On Wed, Dec 19, 2012 at 4:35 PM, Dyer, James > <james.d...@ingramcontent.com>wrote: > > > Instead of using spellcheck.collateParam.mm, try just setting > > spellcheck.maxResultsForSuggest to a very high value (you can use up to > > Integer.MAX_VALUE here). So long as the user gets fewer results that > > whatever this is set for, you will get suggestions (and collations if > > desired). I was just playing with this and if I am understanding you > > correctly think this combination of parameters will give you what you > want: > > > > spellcheck=true > > > > spellcheck.dictionary=whatever > > > > spellcheck.maxResultsForSuggest=10000000 (or whatever the cut off is > > before you don't want suggestions) > > > > spellcheck.count=20 (+/- depending on performance vs # suggestions > > required) > > > > spellcheck.maxCollationTries=10 (+/- depending on performance vs # > > suggestions required) > > > > spellcheck.maxCollations=10 (+/- depending on performance vs # > suggestions > > required) > > > > spellcheck.collate=true > > > > spellcheck.collateExtendedResults=true > > > > James Dyer > > E-Commerce Systems > > Ingram Content Group > > (615) 213-4311 > > > > > > -----Original Message----- > > From: Nalini Kartha [mailto:nalinikar...@gmail.com] > > Sent: Wednesday, December 19, 2012 2:06 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Ensuring SpellChecker returns corrections which satisfy fq > > params for default OR query > > > > Hi James, > > > > Yup the example you gave about sums it up. Reason we use an OR query is > > that we want the flexibility of every term not having to match but when > it > > comes to corrections we want to be sure that the ones we pick will > actually > > return results (we message the user with the corrected query so it would > be > > bad/confusing if there were no matches for the corrections). > > > > *- by default the spellchecker doesn't see this as a problem because it > has > > hits (mm=0 and "wrapping" matches something). So you get neither > > individual words back nor collations from the spellchecker.* > > * > > * > > I think we would still get back 'papr -> paper' as a correction and > > 'christmas wrapping paper' as a collation in this case - I've seen that > > corrections are returned even for OR queries. Problem is these would be > > returned even if 'paper' doesn't exist in any docs that have > item:in_stock. > > > > *- with "spellcheck.collateParam.mm <http://spellcheck.collateparam.mm/ > > >=100" > > it tries to fix both "papr" and "christmas" but can't fix "christmas" > > because spelling isn't the problem here (it is an irrelevant term not in > > the index). So while you get words suggested there are no collations. > The > > individual words would be helpful, but you're not sure because they might > > all apply to items that do not match "fq=item:in_stock".* > > > > Yup, exactly. > > > > Do you think the workaround I suggested would work (and not have terrible > > perf)? Or any other ideas? > > > > Thanks, > > Nalini > > > > > > On Wed, Dec 19, 2012 at 1:09 PM, Dyer, James > > <james.d...@ingramcontent.com>wrote: > > > > > Let me try and get a better idea of what you're after. Is it that your > > > users might query a combination of irrelevant terms and misspelled > terms, > > > so you want the ability to ignore the irrelevant terms but still get > > > suggestions for the misspelled terms? > > > > > > For instance if someone wanted "q=christmas wrapping > > > papr&mm=0&fq=item:in_stock", but "christmas" was not in the index and > you > > > wanted to return results for just "wrapping paper", the problem is... > > > > > > - by default the spellchecker doesn't see this as a problem because it > > has > > > hits (mm=0 and "wrapping" matches something). So you get neither > > > individual words back nor collations from the spellchecker. > > > > > > - with "spellcheck.collateParam.mm=100" it tries to fix both "papr" > and > > > "christmas" but can't fix "christmas" because spelling isn't the > problem > > > here (it is an irrelevant term not in the index). So while you get > words > > > suggested there are no collations. The individual words would be > > helpful, > > > but you're not sure because they might all apply to items that do not > > match > > > "fq=item:in_stock". > > > > > > Is this the problem? > > > > > > James Dyer > > > E-Commerce Systems > > > Ingram Content Group > > > (615) 213-4311 > > > > > > > > > -----Original Message----- > > > From: Nalini Kartha [mailto:nalinikar...@gmail.com] > > > Sent: Wednesday, December 19, 2012 11:20 AM > > > To: solr-user@lucene.apache.org > > > Subject: Ensuring SpellChecker returns corrections which satisfy fq > > params > > > for default OR query > > > > > > Hi, > > > > > > With the DirectSolrSpellChecker, we want to be able to make sure that > the > > > corrections that are being returned satisfy the fq params of the > original > > > query. > > > > > > The collate functionality helps with this but seems to only work with > > > default AND queries - our use case is for default OR queries. I also > saw > > > that there is now a spellcheck.collateParam.XX param which allows you > to > > > override params from the original query - the example mentioned was to > > > override the mm param to be 100% which would make the collated query > > > default AND. This doesn't quite do what we want either though because > it > > > seems like all collations would be thrown out if one of the correctly > > > spelled terms in the query did not satisfy the fq params. We don't want > > it > > > to check that the correctly spelled terms MUST be in results, just that > > > each correction (individually) would result in some hits taking into > > > account the fqs. > > > > > > I was wondering whether it is possible (and what the perf overhead > would > > > be) to use the SolrIndexSearcher.getDocSet(Query, DocSet) method to > check > > > that each correction being considered (the Query) matches some docs > > taking > > > into account the fqs (the DocSet)? > > > > > > Would appreciate other suggestions/ideas if this isn't feasible. > > > > > > Thanks! > > > > > > - Nalini > > > > > > > > > > > >