Re: Improving Solr Spell Checker Results

David Radunz Sat, 21 Jan 2012 07:04:33 -0800

James,

Thanks again for your lengthy and informative response. I updatedfrom SVN trunk again today and was successfully able to run 'ant test'.So I proceeded with trying your suggestions (for question 1 so far):


On 17/01/2012 5:32 AM, Dyer, James wrote:

David,

The spellchecker normally won't give suggestions for any term in your index.  So even if 
"wever" is misspelled in context, if it exists in the index the spell checker 
will not try correcting it.  There are 3 workarounds:
1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only).  See 
https://issues.apache.org/jira/browse/SOLR-2585

I have tried using this with the original test case of 'SignorneyWever'. I didn't notice any difference, although I am a little unclearas to what exactly this patch does. Nor am I really clear what to seteither of the options to, so I set them both to '5'. I tried to find thetest case it mentions, but it's not present inSpellCheckCollatorTest.java .. Any suggestions?

2. try "onlyMorePopular=true" in your request.  
(http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular).  But see 
the September 2, 2011 comment in SOLR-2585 about why this might not do what you'd hope it 
would.

Trying this did produce 'Signourney Weaver' as you would hope, but I ama little afraid of the downside. I would much more like a contextsensative spell check that involves the terms around the correction.


3. If you're building your index on a<copyField />, you can add a stopword 
filter that filters out all of the misspelt or rare words from the field that the 
dictionary is based.  This could be an arduous task, and it may or may not work well 
for your data.

I am currently using a copyField for all terms that are relevant, whichis quite a lot and the dictionary would encompass a huge amount of data.Adding stopword filters would be out of the question as we presentlyhave more than 30,000 products and this is for the initial launch, weintend to have many many more.


As for your second question, I take it you're using (e)dismax with multiple fields in "qf", 
right?  The only way I know to handle this is to create a<copyfield>  that combines all of the 
fields you search across.  Use this combined field to base your dictionary.  Also, specifying 
"spellcheck.maxCollationTries" with a non-zero value will weed out the nonsense word 
combinations that are likely to occur when doing this, ensuring that any collations provided will indeed 
yield hits.  The downside to doing this, of course, is it will make your first problem more acute in that 
there will be even more terms in your index that the spellchecker will ignore entirely, even if they're 
mispelled in context.  Once again, SOLR-2585 is designed to tackle this problem but it is still in its 
early stages, and thus far it is Trunk-only.

I tried setting spellcheck.maxCollationTries to 5 to see if it wouldhelp with the above problem, but it did not.

I have now tried using it in the context of question 2. I triedsearching for 'Sigorney Wever' in the series name (which it's notpresent in, as its an actor):


spellcheck=true&facet=on&fl=id,sku,name,format,thumbnail,release_date,url_path,price,special_price,year_made_attr_opt_combo,series_name_attr_opt_combo&sort=score+desc,release_date+desc&start=0&q=*+series_name:"signourney+wever"^100&spellcheck.q=signourney+wever&fq=store_id:"1"+AND+series_name_attr_opt_search:*signourney*wever*&rows=5&spellcheck.maxCollationTries=5

Suggestions for 'Sigourney' Wever were returned, but no spellingsuggestions or ones for series names (which i doubt there would be)should have been returned.


You might also be interested in https://issues.apache.org/jira/browse/SOLR-2993 .  Although 
this is unrelated to your two questions, the patch on this issue introduces a new 
"ConjunctionSolrSpellChecker" which theoretically could be enhanced to do exactly 
what you want.  That is, you could (theoretically) create separate dictionaries for each of 
the fields you're searching and let the CSSC combine the results&  generate collations, 
etc.

During the upgrade I switched to solr.DirectSolrSpellChecker, which Ipresume will help with this? I am a senior developer (inJava/Perl/Python/PHP) but I have not as yet looked at any of the Solrsource code. So I am in the dark when you say it could be tailored formy needs. Also, how would it work? Query wise.. Would it be like..spellcheck.series_name.q= and spellcheck.actor.q= and so on? If so thatsounds tempting to try and achieve. But if you could provide anypointers in what exactly would be required that would really help.


Thanks again for your time,

David


James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: David Radunz [mailto:[email protected]]
Sent: Friday, January 13, 2012 11:42 PM
To: [email protected]
Subject: Improving Solr Spell Checker Results

Hey,

      Firstly I would like to thank you all for creating such a great
searching platform. What I was wondering is whether it is possible to:

1. Have the spell checker take into account multiple words. For example
if I search for "Sigourney Wever" it doesn't flag as a spelling issue as
'wever' is a correctly spelled word. And if I searched for "Sigourney
Wevr" the suggestion is "Sigourney Wever". Of course the correct
spelling is: Sigourney Weaver
2. Have the spell checker return corrections only for dictionary items
added on the field being searched. i.e. Searching for an actor would
only use the dictionary fields from the actor. This makes sense on many
levels, as when you are field searching its useless to get a correction
from another field as no values would match in any case.

Hopefully someone can help!

Thanks in advance,

David

Re: Improving Solr Spell Checker Results

Reply via email to