Hi Nitin,

I was trying many different options for a couple different queries.   In fact, 
I have collations working ok now with the Suggester and WFSTLookup.   The 
problem may have been due to a different dictionary and/or lookup 
implementation and the specific options I was sending.

In general, we're using spellcheck for search suggestions.   The Suggester 
component (vs. Suggester spellcheck implementation), doesn't handle all of our 
cases.  But we can get things working using the spellcheck interface.  What 
gives us particular troubles are the cases where a term may be valid by itself, 
but also be the start of longer words.

The specific terms are acronyms specific to our business.   But I'll attempt to 
show generic examples.

E.g. a partial term like "fo" can expand to fox, fog, etc. and a full term like 
brown can also expand to something like brownstone.   And, yes, the collation 
"brownstone fox" is nonsense.  But assume, for the sake of argument, it appears 
in our documents somewhere.

For multiple term query with a spelling error (or partially typed term):  brown 
fo

We get collations in order of hits, descending like ...
"brown fox",
"brown fog",
"brownstone fox".

So far, so good.  

For a single term query, brown, we get a single suggestion, brownstone and no 
collations.

So, we don't know to keep the term brown!

At this point, we need spellcheck.extendedResults=true and look at the origFreq 
value in the suggested corrections.  Unfortunately, the Suggester (spellcheck 
dictionary) does not populate the original frequency information.  And, without 
this information, the SpellCheckComponent cannot format the extended results.

However, with a simple change to Suggester.java, it was easy to get the needed 
frequency information use it to make a sound decision to keep or drop the input 
term.   But I'd be much obliged if there is a better way to go about it.

Configs below.

Thanks,
Charlie

<!-- SpellCheck component -->
  <searchComponent class="solr.SpellCheckComponent" name="suggestSC">
    <lst name="spellchecker">
      <str name="name">suggestDictionary</str>
      <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
      <str 
name="lookupImpl">org.apache.solr.spelling.suggest.fst.WFSTLookupFactory</str>
      <str name="field">text_all</str>
      <float name="threshold">0.00000001</float>
      <str name="exactMatchFirst">true</str>
      <str name="buildOnCommit">true</str>
    </lst>
  </searchComponent>

<!-- Request Handler -->
<requestHandler name="/tcSuggest" class="solr.SearchHandler">
  <lst name="defaults">
    <str name="title">Search Suggestions (spellcheck)</str>
    <str name="echoParams">explicit</str>
    <str name="wt">json</str>
    <str name="rows">0</str>
    <str name="defType">edismax</str>
    <str name="df">text_all</str>
    <str name="fl">id,name,ticker,entityType,transactionType,accountType</str>
    <str name="spellcheck">true</str>
    <str name="spellcheck.count">5</str>
    <str name="spellcheck.dictionary">suggestDictionary</str>
    <str name="spellcheck.alternativeTermCount">5</str>
    <str name="spellcheck.collate">true</str>
    <str name="spellcheck.extendedResults">true</str>
    <str name="spellcheck.maxCollationTries">10</str>
    <str name="spellcheck.maxCollations">5</str>
  </lst>
  <arr name="last-components">
    <str>suggestSC</str>
  </arr>
</requestHandler>

-----Original Message-----
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 3:17 AM
To: solr-user@lucene.apache.org
Subject: Re: Collations are not working fine.

Hi Charles,
                 Will you please send the configuration which you tried. It 
will help to solve my problem. Have you sorted the collations on hits or 
frequencies of suggestions? If you did than please assist me.

On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles < 
charles.reit...@tiaa-cref.org> wrote:

> I have been working with collations the last couple days and I kept adding
> the collation-related parameters until it started working for me.   It
> seems I needed <str name="spellcheck.collateMaxCollectDocs">50</str>.
>
> But, I am using the Suggester with the WFSTLookupFactory.
>
> Also, I needed to patch the suggester to get frequency information in 
> the spellcheck response.
>
> -----Original Message-----
> From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com]
> Sent: Friday, February 13, 2015 3:48 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Collations are not working fine.
>
> Hi Nitin,
>
> Can u try with the below config, we have these config seems to be 
> working for us.
>
> <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>
>      <str name="queryAnalyzerFieldType">text_general</str>
>
>
>   <lst name="spellchecker">
> <str name="name">wordbreak</str>
> <str name="classname">solr.WordBreakSolrSpellChecker</str>
> <str name="field">textSpell</str>
> <str name="combineWords">true</str>
> <str name="breakWords">false</str>
> <int name="maxChanges">5</int>
>   </lst>
>
>    <lst name="spellchecker">
> <str name="name">default</str>
> <str name="field">textSpell</str>
> <str name="classname">solr.IndexBasedSpellChecker</str>
> <str name="spellcheckIndexDir">./spellchecker</str>
> <str name="accuracy">0.75</str>
> <float name="thresholdTokenFrequency">0.01</float>
> <str name="buildOnCommit">true</str>
> <str name="spellcheck.maxResultsForSuggest">5</str>
>      </lst>
>
>
>   </searchComponent>
>
>
>
> <str name="spellcheck">true</str>
> <str name="spellcheck.dictionary">default</str>
> <str name="spellcheck.dictionary">wordbreak</str>
> <int name="spellcheck.count">5</int>
> <str name="spellcheck.alternativeTermCount">15</str>
> <str name="spellcheck.collate">true</str>
> <str name="spellcheck.onlyMorePopular">false</str>
> <str name="spellcheck.extendedResults">true</str>
> <str name ="spellcheck.maxCollations">100</str>
> <str name="spellcheck.collateParam.mm">100%</str>
> <str name="spellcheck.collateParam.q.op">AND</str>
> <str name="spellcheck.maxCollationTries">1000</str>
>
>
> *Rajesh.*
>
> On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James 
> <james.d...@ingramcontent.com
> >
> wrote:
>
> > Nitin,
> >
> > Can you post the full spellcheck response when you query:
> >
> > q=gram_ci:"gone wthh thes wint"&wt=json&indent=true&shards.qt=/spell
> >
> > James Dyer
> > Ingram Content Group
> >
> >
> > -----Original Message-----
> > From: Nitin Solanki [mailto:nitinml...@gmail.com]
> > Sent: Friday, February 13, 2015 1:05 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Collations are not working fine.
> >
> > Hi James Dyer,
> >                           I did the same as you told me. Used 
> > WordBreakSolrSpellChecker instead of shingles. But still collations 
> > are not coming or working.
> > For instance, I tried to get collation of "gone with the wind" by 
> > searching "gone wthh thes wint" on field=gram_ci but didn't succeed.
> > Even, I am getting the suggestions of wtth as *with*, thes as *the*,
> wint as *wind*.
> > Also I have documents which contains "gone with the wind" having 167 
> > times in the documents. I don't know that I am missing something or not.
> > Please check my below solr configuration:
> >
> > *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:"gone wthh thes 
> > wint"&wt=json&indent=true&shards.qt=/spell
> >
> > *solrconfig.xml:*
> >
> > <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
> >     <str name="queryAnalyzerFieldType">textSpellCi</str>
> >     <lst name="spellchecker">
> >       <str name="name">default</str>
> >       <str name="field">gram_ci</str>
> >       <str name="classname">solr.DirectSolrSpellChecker</str>
> >       <str name="distanceMeasure">internal</str>
> >       <float name="accuracy">0.5</float>
> >       <int name="maxEdits">2</int>
> >       <int name="minPrefix">0</int>
> >       <int name="maxInspections">5</int>
> >       <int name="minQueryLength">2</int>
> >       <float name="maxQueryFrequency">0.9</float>
> >       <str name="comparatorClass">freq</str>
> >     </lst>
> > <lst name="spellchecker">
> >       <str name="name">wordbreak</str>
> >       <str name="classname">solr.WordBreakSolrSpellChecker</str>
> >       <str name="field">gram</str>
> >       <str name="combineWords">true</str>
> >       <str name="breakWords">true</str>
> >       <int name="maxChanges">5</int>
> >     </lst>
> > </searchComponent>
> >
> > <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
> >     <lst name="defaults">
> >       <str name="df">gram_ci</str>
> >       <str name="spellcheck.dictionary">default</str>
> >       <str name="spellcheck">on</str>
> >       <str name="spellcheck.extendedResults">true</str>
> >       <str name="spellcheck.count">25</str>
> >       <str name="spellcheck.onlyMorePopular">true</str>
> >       <str name="spellcheck.maxResultsForSuggest">100000000</str>
> >       <str name="spellcheck.alternativeTermCount">25</str>
> >       <str name="spellcheck.collate">true</str>
> >       <str name="spellcheck.maxCollations">50</str>
> >       <str name="spellcheck.maxCollationTries">50</str>
> >       <str name="spellcheck.collateExtendedResults">true</str>
> >     </lst>
> >     <arr name="last-components">
> >       <str>spellcheck</str>
> >     </arr>
> >   </requestHandler>
> >
> > *Schema.xml: *
> >
> > <field name="gram_ci" type="textSpellCi" indexed="true" stored="true"
> > multiValued="false"/>
> >
> > </fieldType><fieldType name="textSpellCi" class="solr.TextField"
> > positionIncrementGap="100">
> >        <analyzer type="index">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> >     <analyzer type="query">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> > </fieldType>
> >
>
> **********************************************************************
> *** This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender 
> immediately and then delete it.
>
> TIAA-CREF
> **********************************************************************
> ***
>

*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*************************************************************************

Reply via email to