Perfect thanks so much. You just cleared up the other little bit, i.e. when the SpellingQueryConverter is used/not used and why you might implement your own.
Thanks again. On Tue, Jul 23, 2013 at 6:48 PM, Dyer, James <james.d...@ingramcontent.com>wrote: > You've got it. The only other thing is that "spellcheck.q" does not > analyze anything. The whole purpose of this is to allow you to just send > raw keywords to be spellchecked. This is handy if you have a complex "q" > parameter (say, you're using local params, etc) and the > SpellingQueryConverter cannot handle it. You could write your own Query > COnverter but its often just easier to strip out the keywords and send them > over with "spellcheck.q". > > James Dyer > Ingram Content Group > (615) 213-4311 > > > -----Original Message----- > From: Brendan Grainger [mailto:brendan.grain...@gmail.com] > Sent: Tuesday, July 23, 2013 4:41 PM > To: solr-user@lucene.apache.org > Subject: Re: Spellcheck field element and collation issues > > Thanks James. That's it! Now: > > > http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0&maxCollationTries=0 > > returns: > > <lst name="collation"> > <str name="collationQuery">perform hvac</str> > <int name="hits">4</int> > <lst name="misspellingsAndCorrections"> > <str name="perfrm">perform</str> > <str name="hvc">hvac</str> > </lst> > </lst> > <lst name="collation"> > <str name="collationQuery">performed hvac</str> > <int name="hits">4</int> > <lst name="misspellingsAndCorrections"> > <str name="perfrm">performed</str> > <str name="hvc">hvac</str> > </lst> > </lst> > > If you have time, I'm still slightly unclear on the field element in the > spellcheck configuration. Maybe I should explain how I think it works: > > 1. You create a relatively unanalyzed field type (e.g. no stemming) > 2. You copy text you want to be used to build the spellcheck index into > that field. > 3. Build the spellcheck sidecar index (or noop if using DirectSpellChecker > in which case I assume it still uses the dedicated spellcheck field text > was copied into). > > When executing a spellcheck request, solr uses the analyzer specified in > queryAnalyzerFieldType to tokenize the query passed in via the q or > spellcheck.q parameter and this tokenized text is the input the > spellcheckchecking instance. > > Does that sound right? > > Thanks > Brendan > > > > > > > > On Tue, Jul 23, 2013 at 5:15 PM, Dyer, James > <james.d...@ingramcontent.com>wrote: > > > I don't believe you can specify more than 1 field on "df" (default > field). > > What you want, I think, is "qf" (query fields), which is available only > if > > using dismax/edismax. > > > > http://wiki.apache.org/solr/SearchHandler#df > > http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29 > > > > James Dyer > > Ingram Content Group > > (615) 213-4311 > > > > > > -----Original Message----- > > From: Brendan Grainger [mailto:brendan.grain...@gmail.com] > > Sent: Tuesday, July 23, 2013 3:22 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Spellcheck field element and collation issues > > > > Hi James, > > > > If I try: > > > > > > > http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0&maxCollationTries=0 > > > > I get the same result: > > > > <response> > > <lst name="responseHeader"> > > <int name="status">0</int> > > <int name="QTime">7</int> > > <lst name="params"> > > <str name="indent">true</str> > > <str name="q">Perfrm HVC</str> > > <str name="maxCollationTries">0</str> > > <str name="rows">0</str> > > </lst> > > </lst> > > <result name="response" numFound="0" start="0"></result> > > <lst name="spellcheck"> > > <lst name="suggestions"> > > <lst name="perfrm"> > > <int name="numFound">3</int> > > <int name="startOffset">0</int> > > <int name="endOffset">6</int> > > <int name="origFreq">0</int> > > <arr name="suggestion"> > > <lst> > > <str name="word">perform</str> > > <int name="freq">4</int> > > </lst> > > <lst> > > <str name="word">performed</str> > > <int name="freq">1</int> > > </lst> > > <lst> > > <str name="word">performance</str> > > <int name="freq">3</int> > > </lst> > > </arr> > > </lst> > > <lst name="hvc"> > > <int name="numFound">2</int> > > <int name="startOffset">7</int> > > <int name="endOffset">10</int> > > <int name="origFreq">0</int> > > <arr name="suggestion"> > > <lst> > > <str name="word">hvac</str> > > <int name="freq">4</int> > > </lst> > > <lst> > > <str name="word">have</str> > > <int name="freq">5</int> > > </lst> > > </arr> > > </lst> > > <bool name="correctlySpelled">false</bool> > > </lst> > > </lst> > > </response> > > > > However, you're right that my df field for the /select handler is in > fact: > > > > <str name="df">markup_texts title_texts</str> > > > > I would note that if I specify the query as follows: > > > > > > > http://localhost:8981/solr/articles/select?indent=true&q=markup_texts:(Perfrm%20HVC)+OR+title_texts:(Perfrm%20HVC)&rows=0&maxCollationTries=0 > > > > which is what I thought specifying a df would effectively do, I get > > collation results: > > > > <lst name="collation"> > > <str name="collationQuery"> > > markup_texts:(perform hvac) OR title_texts:(perform hvac) > > </str> > > <int name="hits">4</int> > > <lst name="misspellingsAndCorrections"> > > <str name="perfrm">perform</str> > > <str name="hvc">hvac</str> > > <str name="perfrm">perform</str> > > <str name="hvc">hvac</str> > > </lst> > > </lst> > > <lst name="collation"> > > <str name="collationQuery"> > > markup_texts:(perform hvac) OR title_texts:(performed hvac) > > </str> > > <int name="hits">4</int> > > <lst name="misspellingsAndCorrections"> > > <str name="perfrm">perform</str> > > <str name="hvc">hvac</str> > > <str name="perfrm">performed</str> > > <str name="hvc">hvac</str> > > </lst> > > </lst> > > > > I think I'm confused about the relationship between the q parameter and > > what the field and queryAnalyzerFieldType are for in the spellcheck > > component definition, i.e. what is this for: > > > > <str name="field">spellcheck</str> > > > > is it even needed if I've specified how the spelling index terms should > > analyzed with: > > > > <str name="queryAnalyzerFieldType">text_spell</str> > > > > Thanks again > > Brendan > > > > > > > > > > > > On Tue, Jul 23, 2013 at 3:58 PM, Dyer, James > > <james.d...@ingramcontent.com>wrote: > > > > > Try tacking &maxCollationTries=0 to the URL and see if the collation > > > returns. > > > > > > If you get a collation, then try the same URL with the collation as the > > > "q" parameter. Does that get results? > > > > > > My suspicion here is that you are assuming that "markup_texts" is the > > > default search field for "/select" but in fact it isn't. > > > > > > James Dyer > > > Ingram Content Group > > > (615) 213-4311 > > > > > > > > > -----Original Message----- > > > From: Brendan Grainger [mailto:brendan.grain...@gmail.com] > > > Sent: Tuesday, July 23, 2013 2:43 PM > > > To: solr-user@lucene.apache.org > > > Subject: Re: Spellcheck field element and collation issues > > > > > > Hi James, > > > > > > I get the following response for that query: > > > > > > <response> > > > <lst name="responseHeader"> > > > <int name="status">0</int> > > > <int name="QTime">8</int> > > > <lst name="params"> > > > <str name="indent">true</str> > > > <str name="q">Perfrm HVC</str> > > > <str name="rows">0</str> > > > </lst> > > > </lst> > > > <result name="response" numFound="0" start="0"></result> > > > <lst name="spellcheck"> > > > <lst name="suggestions"> > > > <lst name="perfrm"> > > > <int name="numFound">3</int> > > > <int name="startOffset">0</int> > > > <int name="endOffset">6</int> > > > <int name="origFreq">0</int> > > > <arr name="suggestion"> > > > <lst> > > > <str name="word">perform</str> > > > <int name="freq">4</int> > > > </lst> > > > <lst> > > > <str name="word">performed</str> > > > <int name="freq">1</int> > > > </lst> > > > <lst> > > > <str name="word">performance</str> > > > <int name="freq">3</int> > > > </lst> > > > </arr> > > > </lst> > > > <lst name="hvc"> > > > <int name="numFound">2</int> > > > <int name="startOffset">7</int> > > > <int name="endOffset">10</int> > > > <int name="origFreq">0</int> > > > <arr name="suggestion"> > > > <lst> > > > <str name="word">hvac</str> > > > <int name="freq">4</int> > > > </lst> > > > <lst> > > > <str name="word">have</str> > > > <int name="freq">5</int> > > > </lst> > > > </arr> > > > </lst> > > > <bool name="correctlySpelled">false</bool> > > > </lst> > > > </lst> > > > </response> > > > > > > Thanks > > > Brendan > > > > > > > > > On Tue, Jul 23, 2013 at 3:19 PM, Dyer, James > > > <james.d...@ingramcontent.com>wrote: > > > > > > > For this query: > > > > > > > > > > > > > > > > > > http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0 > > > > > > > > ...do you get anything back in the spellcheck response? Is it > > correcting > > > > the individual words and not giving collations? Or are you getting > no > > > > individual word suggestions also? > > > > > > > > James Dyer > > > > Ingram Content Group > > > > (615) 213-4311 > > > > > > > > > > > > -----Original Message----- > > > > From: Brendan Grainger [mailto:brendan.grain...@gmail.com] > > > > Sent: Tuesday, July 23, 2013 1:47 PM > > > > To: solr-user@lucene.apache.org > > > > Subject: Spellcheck field element and collation issues > > > > > > > > Hi All, > > > > > > > > I have an IndexBasedSpellChecker component configured as follows > (note > > > the > > > > field parameter is set to the spellcheck field): > > > > > > > > <searchComponent name="spellcheck" > class="solr.SpellCheckComponent"> > > > > > > > > <str name="queryAnalyzerFieldType">text_spell</str> > > > > > > > > <lst name="spellchecker"> > > > > <str name="name">default</str> > > > > <str name="classname">solr.IndexBasedSpellChecker</str> > > > > <!-- > > > > Load tokens from the following field for spell checking, > > > > analyzer for the field's type as defined in schema.xml are > > used > > > > --> > > > > * <str name="field">spellcheck</str>* > > > > <str name="spellcheckIndexDir">./spellchecker</str> > > > > <float name="thresholdTokenFrequency">.0001</float> > > > > </lst> > > > > </searchComponent> > > > > > > > > with the corresponding field type for spellcheck: > > > > > > > > <fieldType name="text_spell" class="solr.TextField" > > > > positionIncrementGap="100" omitNorms="true"> > > > > <analyzer type="index"> > > > > <tokenizer class="solr.StandardTokenizerFactory"/> > > > > <filter class="solr.StopFilterFactory" > > > > ignoreCase="true" > > > > words="lang/stopwords_en.txt" > > > > enablePositionIncrements="true" > > > > /> > > > > <filter class="solr.LowerCaseFilterFactory"/> > > > > <filter class="solr.StandardFilterFactory"/> > > > > </analyzer> > > > > <analyzer type="query"> > > > > <tokenizer class="solr.StandardTokenizerFactory"/> > > > > <filter class="solr.SynonymFilterFactory" > > > > synonyms="moto_synonyms.txt" ignoreCase="true" expand="true"/> > > > > <filter class="solr.StopFilterFactory" > > > > ignoreCase="true" > > > > words="lang/stopwords_en.txt" > > > > enablePositionIncrements="true" > > > > /> > > > > <filter class="solr.LowerCaseFilterFactory"/> > > > > <filter class="solr.StandardFilterFactory"/> > > > > </analyzer> > > > > </fieldType> > > > > > > > > and field: > > > > > > > > <!-- spellcheck field is multivalued because it has the title and > > > > markup > > > > fields copied into it --> > > > > <field name="spellcheck" type="text_spell" stored="false" > > > > omitTermFreqAndPositions="true" multiValued="true"/> > > > > > > > > values from a markup and title field are copied into the spellcheck > > > field. > > > > > > > > My /select search component has the following defaults: > > > > > > > > <lst name="defaults"> > > > > <str name="echoParams">explicit</str> > > > > <int name="rows">10</int> > > > > <str name="df">markup_texts title_texts</str> > > > > > > > > <!-- Spell checking defaults --> > > > > <str name="spellcheck">true</str> > > > > <str name="spellcheck.collateExtendedResults">true</str> > > > > <str name="spellcheck.extendedResults">true</str> > > > > <str name="spellcheck.maxCollations">2</str> > > > > <str name="spellcheck.maxCollationTries">5</str> > > > > <str name="spellcheck.count">5</str> > > > > <str name="spellcheck.collate">true</str> > > > > > > > > <str name="spellcheck.maxResultsForSuggest">5</str> > > > > <str name="spellcheck.alternativeTermCount">5</str> > > > > > > > > </lst> > > > > > > > > > > > > When I issue a search like this: > > > > > > > > > > > > > > > > > > http://localhost:8981/solr/articles/select?indent=true&spellcheck.q=markup_texts:(Perfrm%20HVC)&q=Perfrm%20HVC&rows=0 > > > > > > > > I get collations: > > > > > > > > <lst name="collation"> > > > > <str name="collationQuery">markup_texts:(perform hvac)</str> > > > > <int name="hits">4</int> > > > > <lst name="misspellingsAndCorrections"> > > > > <str name="perfrm">perform</str> > > > > <str name="hvc">hvac</str> > > > > </lst> > > > > </lst> > > > > <lst name="collation"> > > > > <str name="collationQuery">markup_texts:(performed hvac)</str> > > > > <int name="hits">4</int> > > > > <lst name="misspellingsAndCorrections"> > > > > <str name="perfrm">performed</str> > > > > <str name="hvc">hvac</str> > > > > </lst> > > > > </lst> > > > > > > > > However, if I remove the spellcheck.q parameter I do not, i.e. no > > > > collations are returned for the following: > > > > > > > > > > > > > > > > > > http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0 > > > > > > > > > > > > > > > > If I specify the fields being searched over for the q parameter I get > > > > collations: > > > > > > > > > > > > > > > > > > http://localhost:8981/solr/articles/select?indent=true&q=markup_texts:(Perfrm%20HVC)&rows=0 > > > > > > > > <lst name="collation"> > > > > <str name="collationQuery">markup_texts:(perform hvac)</str> > > > > <int name="hits">4</int> > > > > <lst name="misspellingsAndCorrections"> > > > > <str name="perfrm">perform</str> > > > > <str name="hvc">hvac</str> > > > > </lst> > > > > </lst> > > > > <lst name="collation"> > > > > <str name="collationQuery">markup_texts:(performed hvac)</str> > > > > <int name="hits">4</int> > > > > <lst name="misspellingsAndCorrections"> > > > > <str name="perfrm">performed</str> > > > > <str name="hvc">hvac</str> > > > > </lst> > > > > </lst> > > > > > > > > > > > > I'm a bit confused as to what the value for field should be in > > spellcheck > > > > component definition. In fact what is it's purpose here, just as the > > > input > > > > for building the spellchecking index? If that is so then why do I > need > > to > > > > even specify the queryAnalyzerFieldType? > > > > > > > > Also, why do I need to explicitly specify the field in the query or > > > > spellcheck.q to get collations? > > > > > > > > Thanks and sorry for the rather long question. > > > > > > > > Brendan > > > > > > > > > > > > > > > > -- > > > Brendan Grainger > > > www.kuripai.com > > > > > > > > > > > -- > > Brendan Grainger > > www.kuripai.com > > > > > > -- > Brendan Grainger > www.kuripai.com > -- Brendan Grainger www.kuripai.com