David: There's some good info here: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches
But the short form is to go into solr_home and issue this command: 'svn diff > SOLR-2585.patch'. IDE's may also have a "create patch" feature, but I find the straight SVN command more reliable. Note I'm not saying that your patch will necessarily be picked up, but it's a thoughtful gesture to upload a more current patch. In your comments please identify what code line you're working on (4.x? 3.x?). And when you upload, down near the bottom of the dialog box there'll be a radio button about "grant ASF license" which is fairly important to click for legal reasons.... Thanks Erick On Sun, Jan 22, 2012 at 5:54 PM, David Radunz <da...@boxen.net> wrote: > Hey Erick, > > Sure, can you explain the process to create the patch and upload it and > i'll do it first thing tomorrow. > > Thanks again for your help, > > David > > > On 23/01/2012 12:51 PM, Erick Erickson wrote: >> >> I can't help with your *real* problem, but when looking at patches, >> if the "resolution" field isn't set to something like "fixed" it means >> that the patch has NOT been applied to any code lines. There >> also should be commit revisions specified in the comments. >> If "Fix Versions" has values, that doesn't mean the patch has >> been applied either, that's often just a statement of where >> the patch *should* go. >> >> And, between the time someone uploads a patch and it actually >> gets *committed*, the underlying code line can, indeed, change >> and the patch doesn't apply cleanly. Since you've already had >> to do this, could you upload your version that *does* apply >> cleanly? >> >> Best >> Erick >> >> On Sun, Jan 22, 2012 at 2:56 AM, David Radunz<da...@boxen.net> wrote: >>> >>> James, >>> >>> I worked out that I actually needed to 'apply' patch SOLR-2585, >>> whoops. >>> So I have done that now and it seems to return 'correctlySpelled=true' >>> for >>> 'Sigorney Wever' (when Sigorney isn't even in the dictionary). Could >>> something have changed in the trunk to make your patch no longer work? I >>> had >>> to manually merge the setup for the test case due to a new 'hyphens' test >>> case. The settings I am use are: >>> >>> <lst name="defaults"> >>> <str name="echoParams">explicit</str> >>> <int name="rows">10</int> >>> >>> <str name="spellcheck.onlyMorePopular">false</str> >>> <int name="spellcheck.count">10</int> >>> <str name="spellcheck.extendedResults">true</str> >>> <str name="spellcheck.collate">true</str> >>> <str name="spellcheck.collateExtendedResults">true</str> >>> <int name="spellcheck.maxCollationTries">10</int> >>> <int name="spellcheck.maxCollations">1</int> >>> >>> <int name="spellcheck.alternativeTermCount">5</int> >>> <int name="spellcheck.maxResultsForSuggest">1</int> >>> </lst> >>> >>> >>> <lst name="spellchecker"> >>> <str name="name">default</str> >>> <str name="field">spell</str> >>> <str name="classname">solr.DirectSolrSpellChecker</str> >>> >>> <!-- the spellcheck distance measure used, the default is the internal >>> levenshtein --> >>> <str name="distanceMeasure">internal</str> >>> <!-- minimum accuracy needed to be considered a valid spellcheck >>> suggestion >>> --> >>> <float name="accuracy">0.5</float> >>> <!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 >>> --> >>> <int name="maxEdits">2</int> >>> <!-- the minimum shared prefix when enumerating terms --> >>> <int name="minPrefix">1</int> >>> <!-- maximum number of inspections per result. --> >>> <int name="maxInspections">5</int> >>> <!-- minimum length of a query term to be considered for correction --> >>> <int name="minQueryLength">4</int> >>> <!-- maximum threshold of documents a query term can appear to be >>> considered >>> for correction --> >>> <float name="maxQueryFrequency">0.01</float> >>> <!-- require suggestions to occur in 0.1% of the documents --> >>> <!-- >>> <float name="thresholdTokenFrequency">0.001</float> >>> --> >>> >>> <str name="spellcheckIndexDir">spellchecker</str> >>> <str name="buildOnCommit">true</str> >>> </lst> >>> >>> With the query: >>> >>> >>> spellcheck=true&facet=on&fl=id,sku,name,format,thumbnail,release_date,url_path,price,special_price,year_made_attr_opt_combo,primary_cat_id&sort=score+desc,name+asc,year_made+desc&start=0&q=sigorney+wever+title:"sigorney+wever"^100+series_name:"sigorney+wever"^50&spellcheck.q=sigorney+wever&fq=store_id:"1"&rows=5 >>> >>> Cheers, >>> >>> David >>> >>> >>> >>> On 22/01/2012 2:03 AM, David Radunz wrote: >>>> >>>> James, >>>> >>>> Thanks again for your lengthy and informative response. I updated >>>> from >>>> SVN trunk again today and was successfully able to run 'ant test'. So I >>>> proceeded with trying your suggestions (for question 1 so far): >>>> >>>> On 17/01/2012 5:32 AM, Dyer, James wrote: >>>>> >>>>> David, >>>>> >>>>> The spellchecker normally won't give suggestions for any term in your >>>>> index. So even if "wever" is misspelled in context, if it exists in >>>>> the >>>>> index the spell checker will not try correcting it. There are 3 >>>>> workarounds: >>>>> 1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only). >>>>> See https://issues.apache.org/jira/browse/SOLR-2585 >>>> >>>> I have tried using this with the original test case of 'Signorney >>>> Wever'. >>>> I didn't notice any difference, although I am a little unclear as to >>>> what >>>> exactly this patch does. Nor am I really clear what to set either of the >>>> options to, so I set them both to '5'. I tried to find the test case it >>>> mentions, but it's not present in SpellCheckCollatorTest.java .. Any >>>> suggestions? >>>> >>>>> 2. try "onlyMorePopular=true" in your request. >>>>> >>>>> (http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular). >>>>> But see the September 2, 2011 comment in SOLR-2585 about why this >>>>> might not >>>>> do what you'd hope it would. >>>> >>>> >>>> Trying this did produce 'Signourney Weaver' as you would hope, but I am >>>> a >>>> little afraid of the downside. I would much more like a context >>>> sensative >>>> spell check that involves the terms around the correction. >>>>> >>>>> >>>>> 3. If you're building your index on a<copyField />, you can add a >>>>> stopword filter that filters out all of the misspelt or rare words from >>>>> the >>>>> field that the dictionary is based. This could be an arduous task, and >>>>> it >>>>> may or may not work well for your data. >>>> >>>> I am currently using a copyField for all terms that are relevant, which >>>> is >>>> quite a lot and the dictionary would encompass a huge amount of data. >>>> Adding >>>> stopword filters would be out of the question as we presently have more >>>> than >>>> 30,000 products and this is for the initial launch, we intend to have >>>> many >>>> many more. >>>>> >>>>> >>>>> As for your second question, I take it you're using (e)dismax with >>>>> multiple fields in "qf", right? The only way I know to handle this is >>>>> to >>>>> create a<copyfield> that combines all of the fields you search >>>>> across. Use >>>>> this combined field to base your dictionary. Also, specifying >>>>> "spellcheck.maxCollationTries" with a non-zero value will weed out the >>>>> nonsense word combinations that are likely to occur when doing this, >>>>> ensuring that any collations provided will indeed yield hits. The >>>>> downside >>>>> to doing this, of course, is it will make your first problem more acute >>>>> in >>>>> that there will be even more terms in your index that the spellchecker >>>>> will >>>>> ignore entirely, even if they're mispelled in context. Once again, >>>>> SOLR-2585 is designed to tackle this problem but it is still in its >>>>> early >>>>> stages, and thus far it is Trunk-only. >>>> >>>> I tried setting spellcheck.maxCollationTries to 5 to see if it would >>>> help >>>> with the above problem, but it did not. >>>> >>>> I have now tried using it in the context of question 2. I tried >>>> searching >>>> for 'Sigorney Wever' in the series name (which it's not present in, as >>>> its >>>> an actor): >>>> >>>> >>>> >>>> spellcheck=true&facet=on&fl=id,sku,name,format,thumbnail,release_date,url_path,price,special_price,year_made_attr_opt_combo,series_name_attr_opt_combo&sort=score+desc,release_date+desc&start=0&q=*+series_name:"signourney+wever"^100&spellcheck.q=signourney+wever&fq=store_id:"1"+AND+series_name_attr_opt_search:*signourney*wever*&rows=5&spellcheck.maxCollationTries=5 >>>> >>>> Suggestions for 'Sigourney' Wever were returned, but no spelling >>>> suggestions or ones for series names (which i doubt there would be) >>>> should >>>> have been returned. >>>> >>>>> You might also be interested in >>>>> https://issues.apache.org/jira/browse/SOLR-2993 . Although this is >>>>> unrelated to your two questions, the patch on this issue introduces a >>>>> new >>>>> "ConjunctionSolrSpellChecker" which theoretically could be enhanced to >>>>> do >>>>> exactly what you want. That is, you could (theoretically) create >>>>> separate >>>>> dictionaries for each of the fields you're searching and let the CSSC >>>>> combine the results& generate collations, etc. >>>> >>>> >>>> During the upgrade I switched to solr.DirectSolrSpellChecker, which I >>>> presume will help with this? I am a senior developer (in >>>> Java/Perl/Python/PHP) but I have not as yet looked at any of the Solr >>>> source >>>> code. So I am in the dark when you say it could be tailored for my >>>> needs. >>>> Also, how would it work? Query wise.. Would it be like.. >>>> spellcheck.series_name.q= and spellcheck.actor.q= and so on? If so that >>>> sounds tempting to try and achieve. But if you could provide any >>>> pointers in >>>> what exactly would be required that would really help. >>>> >>>> Thanks again for your time, >>>> >>>> David >>>>> >>>>> >>>>> James Dyer >>>>> E-Commerce Systems >>>>> Ingram Content Group >>>>> (615) 213-4311 >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: David Radunz [mailto:da...@boxen.net] >>>>> Sent: Friday, January 13, 2012 11:42 PM >>>>> To: solr-user@lucene.apache.org >>>>> Subject: Improving Solr Spell Checker Results >>>>> >>>>> Hey, >>>>> >>>>> Firstly I would like to thank you all for creating such a great >>>>> searching platform. What I was wondering is whether it is possible to: >>>>> >>>>> 1. Have the spell checker take into account multiple words. For example >>>>> if I search for "Sigourney Wever" it doesn't flag as a spelling issue >>>>> as >>>>> 'wever' is a correctly spelled word. And if I searched for "Sigourney >>>>> Wevr" the suggestion is "Sigourney Wever". Of course the correct >>>>> spelling is: Sigourney Weaver >>>>> 2. Have the spell checker return corrections only for dictionary items >>>>> added on the field being searched. i.e. Searching for an actor would >>>>> only use the dictionary fields from the actor. This makes sense on many >>>>> levels, as when you are field searching its useless to get a correction >>>>> from another field as no values would match in any case. >>>>> >>>>> Hopefully someone can help! >>>>> >>>>> Thanks in advance, >>>>> >>>>> David >>>> >>>> >