Matt,

Seeing the response, my guess is you have "point" in your index, and that it 
has a higher frequency than "rockpoint".  By default the spellchecker will 
never try to correct something that exists in your index.  Adding 
"spellcheck.onlyMorePopular=true" might help, but only if the correction has a 
higher frequency than the original.  Try using 
"spellcheck.alternativeTermCount=n" instead of 
"spellcheck.onlyMorePopular=true".  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount 
for more information.

James Dyer
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: Matt Mongeau [mailto:halogenandto...@gmail.com] 
Sent: Monday, December 15, 2014 10:23 AM
To: solr-user@lucene.apache.org
Subject: Re: WordBreakSolrSpellChecker Usage

I think you were right about maxChanges, that does seem get rid of the
ridiculous values. However I don't seem to be getting anything reasonable.
Most variations look something like:

http://localhost:8982/solr/development/select?q=Rock+point&fq=type%3ACompany&wt=ruby&indent=true&defType=edismax&qf=name_text&stopwords=true&lowercaseOperators=true&spellcheck=true&spellcheck.count=20&spellcheck.onlyMorePopular=true&spellcheck.extendedResults=true&spellcheck.collate=true&spellcheck.maxCollations=1&spellcheck.maxCollationTries=10&spellcheck.accuracy=0.5

{
  'responseHeader'=>{
    'status'=>0,
    'QTime'=>20},
  'response'=>{'numFound'=>0,'start'=>0,'docs'=>[]
  },
  'spellcheck'=>{
    'suggestions'=>[
      'rock',{
        'numFound'=>5,
        'startOffset'=>0,
        'endOffset'=>4,
        'origFreq'=>3,
        'suggestion'=>[{
            'word'=>'rocky',
            'freq'=>3},
          {
            'word'=>'brook',
            'freq'=>6},
          {
            'word'=>'york',
            'freq'=>460},
          {
            'word'=>'oak',
            'freq'=>7},
          {
            'word'=>'boca',
            'freq'=>3}]},
      'correctlySpelled',false]}}


I'm going to post both my solrconfig.xml and schema.xml because maybe
I'm just doing something crazy. They can both be found here:
https://gist.github.com/halogenandtoast/76fd5dcfae1c4edeba30


On Thu, Dec 11, 2014 at 1:19 PM, Dyer, James <james.d...@ingramcontent.com>
wrote:
>
> Matt,
>
> There is no exact number here, but I would think most people would want
> "count" to be maybe 10-20.  Increasing this incurs a very small performance
> penalty for each term it generates suggestions for, but you probably won't
> notice a difference.  For "maxCollationTries", 5 is a reasonable number but
> you might see improved collations if this is also perhaps 10.  With this
> one, you get a much larger performance penalty, but only when it need to
> try more combinations to return the "maxCollations".  In your case you have
> this at 5 also, right?  I would reduce this to the maximum number of
> re-written queries your application or users is actually going to use.  In
> a lot of cases, 1 is the right number here.  This would improve performance
> for you in some cases.
>
> Possibly the reason “Rock point” > “Rockpoint” is failing is because you
> have "maxChanges" set to 10.  This tells it you are willing for it to break
> a word into 10 separate parts, or to combine up to 10 adjacent words into
> 1.  Having taken a quick glance at the code, I think what is happening is
> it is trying things like "r ock p oint" and "r o ck p o int", etc and never
> getting to your intended result.  In a typical scenario I would set
> "maxChanges" to 1-3, and often 1 is probably the most appropriate value
> here.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Matt Mongeau [mailto:halogenandto...@gmail.com]
> Sent: Thursday, December 11, 2014 11:34 AM
> To: solr-user@lucene.apache.org
> Subject: Re: WordBreakSolrSpellChecker Usage
>
> Is there a suggested value for this. I bumped them up to 20 and still
> nothing has seemed to change.
>
> On Thu, Dec 11, 2014 at 9:42 AM, Dyer, James <james.d...@ingramcontent.com
> >
> wrote:
>
> > My first guess here, is seeing it works some of the time but not others,
> > is that these values are too low:
> >
> > <str name="spellcheck.maxCollationTries">5</str>
> > <str name="spellcheck.count">5</str>
> >
> > You know spellcheck.count is too low if the suggestion you want is not in
> > the "suggestions" part of the response, but increasing it makes it get
> > included.
> >
> > You know that spellcheck.maxCollationTries is too low if it exists in
> > "suggestions" but it is not getting suggested in the "collation" section.
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: Matt Mongeau [mailto:halogenandto...@gmail.com]
> > Sent: Wednesday, December 10, 2014 12:43 PM
> > To: solr-user@lucene.apache.org
> > Subject: Fwd: WordBreakSolrSpellChecker Usage
> >
> > If I have my search component setup like this
> > https://gist.github.com/halogenandtoast/cf9f296d01527080f18c and I have
> an
> > entry for “Rockpoint” shouldn’t “Rock point” generate suggestions?
> >
> > This doesn't seem to be the case, but it works for "Blackstone" with
> "Black
> > stone". Any ideas on what I might be doing wrong?
> >
>

Reply via email to