Improving Solr Spell Checker Results
Hey, Firstly I would like to thank you all for creating such a great searching platform. What I was wondering is whether it is possible to: 1. Have the spell checker take into account multiple words. For example if I search for "Sigourney Wever" it doesn't flag as a spelling issue as 'wever' is a correctly spelled word. And if I searched for "Sigourney Wevr" the suggestion is "Sigourney Wever". Of course the correct spelling is: Sigourney Weaver 2. Have the spell checker return corrections only for dictionary items added on the field being searched. i.e. Searching for an actor would only use the dictionary fields from the actor. This makes sense on many levels, as when you are field searching its useless to get a correction from another field as no values would match in any case. Hopefully someone can help! Thanks in advance, David
Re: Improving Solr Spell Checker Results
Hey, Thanks so much for your outstanding response. I have been buisy for a few days so have not had a chance to try it out. I have now tried to install trunc of solr and when i run 'ant test' I encounter the following: [junit] Testsuite: org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader [junit] Testcase: testRefreshReadRecreatedTaxonomy(org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader): FAILED [junit] Expected InconsistentTaxonomyException [junit] junit.framework.AssertionFailedError: Expected InconsistentTaxonomyException [junit] at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.doTestReadRecreatedTaxono(TestDirectoryTaxonomyReader.java:168) [junit] at org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader.testRefreshReadRecreatedTaxonomy(TestDirectoryTaxonomyReader.java:130) [junit] at org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529) [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165) [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) Should I ignore this (and other failed tests) and continue anyway? Cheers, David On 17/01/2012 5:32 AM, Dyer, James wrote: David, The spellchecker normally won't give suggestions for any term in your index. So even if "wever" is misspelled in context, if it exists in the index the spell checker will not try correcting it. There are 3 workarounds: 1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only). See https://issues.apache.org/jira/browse/SOLR-2585 2. try "onlyMorePopular=true" in your request. (http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular). But see the September 2, 2011 comment in SOLR-2585 about why this might not do what you'd hope it would. 3. If you're building your index on a, you can add a stopword filter that filters out all of the misspelt or rare words from the field that the dictionary is based. This could be an arduous task, and it may or may not work well for your data. As for your second question, I take it you're using (e)dismax with multiple fields in "qf", right? The only way I know to handle this is to create a that combines all of the fields you search across. Use this combined field to base your dictionary. Also, specifying "spellcheck.maxCollationTries" with a non-zero value will weed out the nonsense word combinations that are likely to occur when doing this, ensuring that any collations provided will indeed yield hits. The downside to doing this, of course, is it will make your first problem more acute in that there will be even more terms in your index that the spellchecker will ignore entirely, even if they're mispelled in context. Once again, SOLR-2585 is designed to tackle this problem but it is still in its early stages, and thus far it is Trunk-only. You might also be interested in https://issues.apache.org/jira/browse/SOLR-2993 . Although this is unrelated to your two questions, the patch on this issue introduces a new "ConjunctionSolrSpellChecker" which theoretically could be enhanced to do exactly what you want. That is, you could (theoretically) create separate dictionaries for each of the fields you're searching and let the CSSC combine the results& generate collations, etc. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: David Radunz [mailto:da...@boxen.net] Sent: Friday, January 13, 2012 11:42 PM To: solr-user@lucene.apache.org Subject: Improving Solr Spell Checker Results Hey, Firstly I would like to thank you all for creating such a great searching platform. What I was wondering is whether it is possible to: 1. Have the spell checker take into account multiple words. For example if I search for "Sigourney Wever" it doesn't flag as a spelling issue as 'wever' is a correctly spelled word. And if I searched for "Sigourney Wevr" the suggestion is "Sigourney Wever". Of course the correct spelling is: Sigourney Weaver 2. Have the spell checker return corrections only for dictionary items added on the field being searched. i.e. Searching for an actor would only use the dictionary fields from the actor. This makes sense on many levels, as when you are field searching its useless to get a correction from another field as no values would match in any case. Hopefully someone can help! Thanks in advance, David
Re: Improving Solr Spell Checker Results
James, Thanks again for your lengthy and informative response. I updated from SVN trunk again today and was successfully able to run 'ant test'. So I proceeded with trying your suggestions (for question 1 so far): On 17/01/2012 5:32 AM, Dyer, James wrote: David, The spellchecker normally won't give suggestions for any term in your index. So even if "wever" is misspelled in context, if it exists in the index the spell checker will not try correcting it. There are 3 workarounds: 1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only). See https://issues.apache.org/jira/browse/SOLR-2585 I have tried using this with the original test case of 'Signorney Wever'. I didn't notice any difference, although I am a little unclear as to what exactly this patch does. Nor am I really clear what to set either of the options to, so I set them both to '5'. I tried to find the test case it mentions, but it's not present in SpellCheckCollatorTest.java .. Any suggestions? 2. try "onlyMorePopular=true" in your request. (http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular). But see the September 2, 2011 comment in SOLR-2585 about why this might not do what you'd hope it would. Trying this did produce 'Signourney Weaver' as you would hope, but I am a little afraid of the downside. I would much more like a context sensative spell check that involves the terms around the correction. 3. If you're building your index on a, you can add a stopword filter that filters out all of the misspelt or rare words from the field that the dictionary is based. This could be an arduous task, and it may or may not work well for your data. I am currently using a copyField for all terms that are relevant, which is quite a lot and the dictionary would encompass a huge amount of data. Adding stopword filters would be out of the question as we presently have more than 30,000 products and this is for the initial launch, we intend to have many many more. As for your second question, I take it you're using (e)dismax with multiple fields in "qf", right? The only way I know to handle this is to create a that combines all of the fields you search across. Use this combined field to base your dictionary. Also, specifying "spellcheck.maxCollationTries" with a non-zero value will weed out the nonsense word combinations that are likely to occur when doing this, ensuring that any collations provided will indeed yield hits. The downside to doing this, of course, is it will make your first problem more acute in that there will be even more terms in your index that the spellchecker will ignore entirely, even if they're mispelled in context. Once again, SOLR-2585 is designed to tackle this problem but it is still in its early stages, and thus far it is Trunk-only. I tried setting spellcheck.maxCollationTries to 5 to see if it would help with the above problem, but it did not. I have now tried using it in the context of question 2. I tried searching for 'Sigorney Wever' in the series name (which it's not present in, as its an actor): spellcheck=true&facet=on&fl=id,sku,name,format,thumbnail,release_date,url_path,price,special_price,year_made_attr_opt_combo,series_name_attr_opt_combo&sort=score+desc,release_date+desc&start=0&q=*+series_name:"signourney+wever"^100&spellcheck.q=signourney+wever&fq=store_id:"1"+AND+series_name_attr_opt_search:*signourney*wever*&rows=5&spellcheck.maxCollationTries=5 Suggestions for 'Sigourney' Wever were returned, but no spelling suggestions or ones for series names (which i doubt there would be) should have been returned. You might also be interested in https://issues.apache.org/jira/browse/SOLR-2993 . Although this is unrelated to your two questions, the patch on this issue introduces a new "ConjunctionSolrSpellChecker" which theoretically could be enhanced to do exactly what you want. That is, you could (theoretically) create separate dictionaries for each of the fields you're searching and let the CSSC combine the results& generate collations, etc. During the upgrade I switched to solr.DirectSolrSpellChecker, which I presume will help with this? I am a senior developer (in Java/Perl/Python/PHP) but I have not as yet looked at any of the Solr source code. So I am in the dark when you say it could be tailored for my needs. Also, how would it work? Query wise.. Would it be like.. spellcheck.series_name.q= and spellcheck.actor.q= and so on? If so that sounds tempting to try and achieve. But if you could provide any pointers in what exactly would be required that would really help. Thanks again for your time, David James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Me
Re: Improving Solr Spell Checker Results
On 19/01/2012 12:21 AM, O. Klein wrote: Dyer, James wrote David, The spellchecker normally won't give suggestions for any term in your index. So even if "wever" is misspelled in context, if it exists in the index the spell checker will not try correcting it. There are 3 workarounds: 1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only). See https://issues.apache.org/jira/browse/SOLR-2585 When using trunk and DirectSolrSpellChecker I do get suggestions for terms that are in the index. Lowering the thresholdTokenFrequency to 0.001 in my case is giving me very good suggestions even if documents with the misspelled word in them were found. This combined with maxCollationTries (with all terms required) is giving some sort of context sensitive suggestions. Is this correct or is there something I'm missing? Hey, Thanks for the input, but setting the thresholdTokenFrequency to 0.001 has now excluded spell check suggesions that were correctly working. I.e. 'Matrx' now does not work, but when I remove the theshold again it suggests 'Matrix'. Si I guess to use this I would have to constantly reconfigure this property as the product database grows, which isn't really what I wanted. Thanks for your input though, David -- View this message in context: http://lucene.472066.n3.nabble.com/Improving-Solr-Spell-Checker-Results-tp3658411p3669186.html Sent from the Solr - User mailing list archive at Nabble.com.
Failure noticed from new...@zju.edu.cn
Hey, Every time I send a reply to the list I get a failure for new...@zju.edu.cn. Should I just ignore this? I am unsure if the message has been delivered... Cheers, David
Re: Improving Solr Spell Checker Results
James, I worked out that I actually needed to 'apply' patch SOLR-2585, whoops. So I have done that now and it seems to return 'correctlySpelled=true' for 'Sigorney Wever' (when Sigorney isn't even in the dictionary). Could something have changed in the trunk to make your patch no longer work? I had to manually merge the setup for the test case due to a new 'hyphens' test case. The settings I am use are: explicit 10 false 10 true true true 10 1 5 1 default spell solr.DirectSolrSpellChecker internal 0.5 2 1 5 4 0.01 spellchecker true With the query: spellcheck=true&facet=on&fl=id,sku,name,format,thumbnail,release_date,url_path,price,special_price,year_made_attr_opt_combo,primary_cat_id&sort=score+desc,name+asc,year_made+desc&start=0&q=sigorney+wever+title:"sigorney+wever"^100+series_name:"sigorney+wever"^50&spellcheck.q=sigorney+wever&fq=store_id:"1"&rows=5 Cheers, David On 22/01/2012 2:03 AM, David Radunz wrote: James, Thanks again for your lengthy and informative response. I updated from SVN trunk again today and was successfully able to run 'ant test'. So I proceeded with trying your suggestions (for question 1 so far): On 17/01/2012 5:32 AM, Dyer, James wrote: David, The spellchecker normally won't give suggestions for any term in your index. So even if "wever" is misspelled in context, if it exists in the index the spell checker will not try correcting it. There are 3 workarounds: 1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only). See https://issues.apache.org/jira/browse/SOLR-2585 I have tried using this with the original test case of 'Signorney Wever'. I didn't notice any difference, although I am a little unclear as to what exactly this patch does. Nor am I really clear what to set either of the options to, so I set them both to '5'. I tried to find the test case it mentions, but it's not present in SpellCheckCollatorTest.java .. Any suggestions? 2. try "onlyMorePopular=true" in your request. (http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular). But see the September 2, 2011 comment in SOLR-2585 about why this might not do what you'd hope it would. Trying this did produce 'Signourney Weaver' as you would hope, but I am a little afraid of the downside. I would much more like a context sensative spell check that involves the terms around the correction. 3. If you're building your index on a, you can add a stopword filter that filters out all of the misspelt or rare words from the field that the dictionary is based. This could be an arduous task, and it may or may not work well for your data. I am currently using a copyField for all terms that are relevant, which is quite a lot and the dictionary would encompass a huge amount of data. Adding stopword filters would be out of the question as we presently have more than 30,000 products and this is for the initial launch, we intend to have many many more. As for your second question, I take it you're using (e)dismax with multiple fields in "qf", right? The only way I know to handle this is to create a that combines all of the fields you search across. Use this combined field to base your dictionary. Also, specifying "spellcheck.maxCollationTries" with a non-zero value will weed out the nonsense word combinations that are likely to occur when doing this, ensuring that any collations provided will indeed yield hits. The downside to doing this, of course, is it will make your first problem more acute in that there will be even more terms in your index that the spellchecker will ignore entirely, even if they're mispelled in context. Once again, SOLR-2585 is designed to tackle this problem but it is still in its early stages, and thus far it is Trunk-only. I tried setting spellcheck.maxCollationTries to 5 to see if it would help with the above problem, but it did not. I have now tried using it in the context of question 2. I tried searching for 'Sigorney Wever' in the series name (which it's not present in, as its an actor): spellcheck=true&facet=on&fl=id,sku,name,format,thumbnail,release_date,url_path,price,special_price,year_made_attr_opt_combo,series_name_attr_opt_combo&sort=score+desc,release_date+desc&start=0&q=*+series_name:"signourney+wever"^100&spellcheck.q=signourney+wever&fq=store_id:"1"+AND+series_name_attr_opt_search:*signourney*wever*&rows=5&spellcheck.maxCollationTries=5 Suggestions for 'Sigourney' Wever were returned, but no spelling suggestions or ones for series names (which i doubt there would be) should have been returned. You might also be interested in https://issues.apach
Re: Improving Solr Spell Checker Results
Hey James, I have played around a bit more with the settings and tried setting spellcheck.maxResultsForSuggest=100 and spellcheck.maxCollations=3. This yields 'Sigourney Weaver' as ONE of the corrections, but it's the second one and not the first. Which is wrong if this is a patch for 'context sensative', because it doesn't really seem to honor any context at all. Unless I am missunderstanding this? Also, I don't really like maxResultsForSuggest as it means 'all or nothing'. If you set it to 10 and there are 100 results, then you offer no corrections at all even if the term is missing in the dictionary entirely. If I set spellcheck.maxResultsForSuggest=100 and spellcheck.maxCollations=3 and choose the collation with the largest 'hits' I get Sigourney Weaver and other 'popular' terms. But say I searched for 'pork and chups', the 'popular' correction is 'park and chips' where as the first correction was correct: 'pork and chips'. So really, none of the solutions either in this patch or Solr offer what I would truely call context sensative spell checking. That being, in a full text search engine you find documents based on terms and how close they are togehter in the document. It makes more than perfect sense to treat the dictionary like this, so that when there are multiple terms it offers suggestions for the terms that match closely to whats entered surrounding the term. Example: "Sigourney Wever" would never appear in a document ever. "Sigourney Weaver" however has many 'hits' in exactly that order of words. So there needs to be a way to boost suggestions based on adjacency... Much like the full text search operates. Thoughts? David On 22/01/2012 9:56 PM, David Radunz wrote: James, I worked out that I actually needed to 'apply' patch SOLR-2585, whoops. So I have done that now and it seems to return 'correctlySpelled=true' for 'Sigorney Wever' (when Sigorney isn't even in the dictionary). Could something have changed in the trunk to make your patch no longer work? I had to manually merge the setup for the test case due to a new 'hyphens' test case. The settings I am use are: explicit 10 false 10 true true true 10 1 5 1 default spell solr.DirectSolrSpellChecker internal 0.5 2 1 5 4 0.01 spellchecker true With the query: spellcheck=true&facet=on&fl=id,sku,name,format,thumbnail,release_date,url_path,price,special_price,year_made_attr_opt_combo,primary_cat_id&sort=score+desc,name+asc,year_made+desc&start=0&q=sigorney+wever+title:"sigorney+wever"^100+series_name:"sigorney+wever"^50&spellcheck.q=sigorney+wever&fq=store_id:"1"&rows=5 Cheers, David On 22/01/2012 2:03 AM, David Radunz wrote: James, Thanks again for your lengthy and informative response. I updated from SVN trunk again today and was successfully able to run 'ant test'. So I proceeded with trying your suggestions (for question 1 so far): On 17/01/2012 5:32 AM, Dyer, James wrote: David, The spellchecker normally won't give suggestions for any term in your index. So even if "wever" is misspelled in context, if it exists in the index the spell checker will not try correcting it. There are 3 workarounds: 1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only). See https://issues.apache.org/jira/browse/SOLR-2585 I have tried using this with the original test case of 'Signorney Wever'. I didn't notice any difference, although I am a little unclear as to what exactly this patch does. Nor am I really clear what to set either of the options to, so I set them both to '5'. I tried to find the test case it mentions, but it's not present in SpellCheckCollatorTest.java .. Any suggestions? 2. try "onlyMorePopular=true" in your request. (http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular). But see the September 2, 2011 comment in SOLR-2585 about why this might not do what you'd hope it would. Trying this did produce 'Signourney Weaver' as you would hope, but I am a little afraid of the downside. I would much more like a context sensative spell check that involves the terms around the correction. 3. If you're building your index on a, you can add a stopword filter that filters out all of the misspelt or rare words from the field that the dictionary is based. This could be an arduous task, and it may or may not work well for your data. I am currently using a copyField for all terms that are relevant, which is quite a lot and the dictionary would encompass a huge amount of data. Adding stopword filters would be out of the question as we presently have more than 30,0
Re: Improving Solr Spell Checker Results
Hey, I am trying to send this again as 'plain-text' to see if it delivers ok this time. All of the previous messages I sent should be below.. Cheers, David On 22/01/2012 11:42 PM, David Radunz wrote: Hey James, I have played around a bit more with the settings and tried setting spellcheck.maxResultsForSuggest=100 and spellcheck.maxCollations=3. This yields 'Sigourney Weaver' as ONE of the corrections, but it's the second one and not the first. Which is wrong if this is a patch for 'context sensative', because it doesn't really seem to honor any context at all. Unless I am missunderstanding this? Also, I don't really like maxResultsForSuggest as it means 'all or nothing'. If you set it to 10 and there are 100 results, then you offer no corrections at all even if the term is missing in the dictionary entirely. If I set spellcheck.maxResultsForSuggest=100 and spellcheck.maxCollations=3 and choose the collation with the largest 'hits' I get Sigourney Weaver and other 'popular' terms. But say I searched for 'pork and chups', the 'popular' correction is 'park and chips' where as the first correction was correct: 'pork and chips'. So really, none of the solutions either in this patch or Solr offer what I would truely call context sensative spell checking. That being, in a full text search engine you find documents based on terms and how close they are togehter in the document. It makes more than perfect sense to treat the dictionary like this, so that when there are multiple terms it offers suggestions for the terms that match closely to whats entered surrounding the term. Example: "Sigourney Wever" would never appear in a document ever. "Sigourney Weaver" however has many 'hits' in exactly that order of words. So there needs to be a way to boost suggestions based on adjacency... Much like the full text search operates. Thoughts? David On 22/01/2012 9:56 PM, David Radunz wrote: James, I worked out that I actually needed to 'apply' patch SOLR-2585, whoops. So I have done that now and it seems to return 'correctlySpelled=true' for 'Sigorney Wever' (when Sigorney isn't even in the dictionary). Could something have changed in the trunk to make your patch no longer work? I had to manually merge the setup for the test case due to a new 'hyphens' test case. The settings I am use are: explicit 10 false 10 true true true 10 1 5 1 default spell solr.DirectSolrSpellChecker internal 0.5 2 1 5 4 0.01 spellchecker true With the query: spellcheck=true&facet=on&fl=id,sku,name,format,thumbnail,release_date,url_path,price,special_price,year_made_attr_opt_combo,primary_cat_id&sort=score+desc,name+asc,year_made+desc&start=0&q=sigorney+wever+title:"sigorney+wever"^100+series_name:"sigorney+wever"^50&spellcheck.q=sigorney+wever&fq=store_id:"1"&rows=5 Cheers, David On 22/01/2012 2:03 AM, David Radunz wrote: James, Thanks again for your lengthy and informative response. I updated from SVN trunk again today and was successfully able to run 'ant test'. So I proceeded with trying your suggestions (for question 1 so far): On 17/01/2012 5:32 AM, Dyer, James wrote: David, The spellchecker normally won't give suggestions for any term in your index. So even if "wever" is misspelled in context, if it exists in the index the spell checker will not try correcting it. There are 3 workarounds: 1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only). See https://issues.apache.org/jira/browse/SOLR-2585 I have tried using this with the original test case of 'Signorney Wever'. I didn't notice any difference, although I am a little unclear as to what exactly this patch does. Nor am I really clear what to set either of the options to, so I set them both to '5'. I tried to find the test case it mentions, but it's not present in SpellCheckCollatorTest.java .. Any suggestions? 2. try "onlyMorePopular=true" in your request. (http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular). But see the September 2, 2011 comment in SOLR-2585 about why this might not do what you'd hope it would. Trying this did produce 'Signourney Weaver' as you would hope, but I am a little afraid of the downside. I would much more like a context sensative spell check that involves the terms around the correction. 3. If you're building your index on a, you can add a stopword filter that filters out all of the misspelt or rare words from the field that the dictionary is based. This could be an arduous task, and it may or may not work well for your data. I am currently
Re: Failure noticed from new...@zju.edu.cn
Hey, That seems to have helped, I didn't get a failure notice re-sending the message. I'll have to keep that in mind. Thanks very much, David On 23/01/2012 12:41 PM, Erick Erickson wrote: I've seen the spam filter be pretty aggressive with HTML formatting etc, what happens when you just send them as "plain text"? Best Erick On Sat, Jan 21, 2012 at 7:24 AM, David Radunz wrote: Hey, Every time I send a reply to the list I get a failure for new...@zju.edu.cn. Should I just ignore this? I am unsure if the message has been delivered... Cheers, David
Re: Improving Solr Spell Checker Results
Hey Erick, Sure, can you explain the process to create the patch and upload it and i'll do it first thing tomorrow. Thanks again for your help, David On 23/01/2012 12:51 PM, Erick Erickson wrote: I can't help with your *real* problem, but when looking at patches, if the "resolution" field isn't set to something like "fixed" it means that the patch has NOT been applied to any code lines. There also should be commit revisions specified in the comments. If "Fix Versions" has values, that doesn't mean the patch has been applied either, that's often just a statement of where the patch *should* go. And, between the time someone uploads a patch and it actually gets *committed*, the underlying code line can, indeed, change and the patch doesn't apply cleanly. Since you've already had to do this, could you upload your version that *does* apply cleanly? Best Erick On Sun, Jan 22, 2012 at 2:56 AM, David Radunz wrote: James, I worked out that I actually needed to 'apply' patch SOLR-2585, whoops. So I have done that now and it seems to return 'correctlySpelled=true' for 'Sigorney Wever' (when Sigorney isn't even in the dictionary). Could something have changed in the trunk to make your patch no longer work? I had to manually merge the setup for the test case due to a new 'hyphens' test case. The settings I am use are: explicit 10 false 10 true true true 10 1 5 1 default spell solr.DirectSolrSpellChecker internal 0.5 2 1 5 4 0.01 spellchecker true With the query: spellcheck=true&facet=on&fl=id,sku,name,format,thumbnail,release_date,url_path,price,special_price,year_made_attr_opt_combo,primary_cat_id&sort=score+desc,name+asc,year_made+desc&start=0&q=sigorney+wever+title:"sigorney+wever"^100+series_name:"sigorney+wever"^50&spellcheck.q=sigorney+wever&fq=store_id:"1"&rows=5 Cheers, David On 22/01/2012 2:03 AM, David Radunz wrote: James, Thanks again for your lengthy and informative response. I updated from SVN trunk again today and was successfully able to run 'ant test'. So I proceeded with trying your suggestions (for question 1 so far): On 17/01/2012 5:32 AM, Dyer, James wrote: David, The spellchecker normally won't give suggestions for any term in your index. So even if "wever" is misspelled in context, if it exists in the index the spell checker will not try correcting it. There are 3 workarounds: 1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only). See https://issues.apache.org/jira/browse/SOLR-2585 I have tried using this with the original test case of 'Signorney Wever'. I didn't notice any difference, although I am a little unclear as to what exactly this patch does. Nor am I really clear what to set either of the options to, so I set them both to '5'. I tried to find the test case it mentions, but it's not present in SpellCheckCollatorTest.java .. Any suggestions? 2. try "onlyMorePopular=true" in your request. (http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular). But see the September 2, 2011 comment in SOLR-2585 about why this might not do what you'd hope it would. Trying this did produce 'Signourney Weaver' as you would hope, but I am a little afraid of the downside. I would much more like a context sensative spell check that involves the terms around the correction. 3. If you're building your index on a, you can add a stopword filter that filters out all of the misspelt or rare words from the field that the dictionary is based. This could be an arduous task, and it may or may not work well for your data. I am currently using a copyField for all terms that are relevant, which is quite a lot and the dictionary would encompass a huge amount of data. Adding stopword filters would be out of the question as we presently have more than 30,000 products and this is for the initial launch, we intend to have many many more. As for your second question, I take it you're using (e)dismax with multiple fields in "qf", right? The only way I know to handle this is to create athat combines all of the fields you search across. Use this combined field to base your dictionary. Also, specifying "spellcheck.maxCollationTries" with a non-zero value will weed out the nonsense word combinations that are likely to occur when doing this, ensuring that any collations provided will indeed yield hits. The downside to doing this, of course, is it will make your first problem more acute in that there will be even more terms in your index that the spellchecker will ignore entirely, even if they're mispelled in context. Once again, SOLR-2585 is designed to tackle this problem but it is still in its early stages, and thus far it i
Re: Improving Solr Spell Checker Results
Hey, Thanks for that, I have uploaded a new patch as advised. Cheers, David On 23/01/2012 1:01 PM, Erick Erickson wrote: David: There's some good info here: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches But the short form is to go into solr_home and issue this command: 'svn diff> SOLR-2585.patch'. IDE's may also have a "create patch" feature, but I find the straight SVN command more reliable. Note I'm not saying that your patch will necessarily be picked up, but it's a thoughtful gesture to upload a more current patch. In your comments please identify what code line you're working on (4.x? 3.x?). And when you upload, down near the bottom of the dialog box there'll be a radio button about "grant ASF license" which is fairly important to click for legal reasons.... Thanks Erick On Sun, Jan 22, 2012 at 5:54 PM, David Radunz wrote: Hey Erick, Sure, can you explain the process to create the patch and upload it and i'll do it first thing tomorrow. Thanks again for your help, David On 23/01/2012 12:51 PM, Erick Erickson wrote: I can't help with your *real* problem, but when looking at patches, if the "resolution" field isn't set to something like "fixed" it means that the patch has NOT been applied to any code lines. There also should be commit revisions specified in the comments. If "Fix Versions" has values, that doesn't mean the patch has been applied either, that's often just a statement of where the patch *should* go. And, between the time someone uploads a patch and it actually gets *committed*, the underlying code line can, indeed, change and the patch doesn't apply cleanly. Since you've already had to do this, could you upload your version that *does* apply cleanly? Best Erick On Sun, Jan 22, 2012 at 2:56 AM, David Radunzwrote: James, I worked out that I actually needed to 'apply' patch SOLR-2585, whoops. So I have done that now and it seems to return 'correctlySpelled=true' for 'Sigorney Wever' (when Sigorney isn't even in the dictionary). Could something have changed in the trunk to make your patch no longer work? I had to manually merge the setup for the test case due to a new 'hyphens' test case. The settings I am use are: explicit 10 false 10 true true true 10 1 5 1 default spell solr.DirectSolrSpellChecker internal 0.5 2 1 5 4 0.01 spellchecker true With the query: spellcheck=true&facet=on&fl=id,sku,name,format,thumbnail,release_date,url_path,price,special_price,year_made_attr_opt_combo,primary_cat_id&sort=score+desc,name+asc,year_made+desc&start=0&q=sigorney+wever+title:"sigorney+wever"^100+series_name:"sigorney+wever"^50&spellcheck.q=sigorney+wever&fq=store_id:"1"&rows=5 Cheers, David On 22/01/2012 2:03 AM, David Radunz wrote: James, Thanks again for your lengthy and informative response. I updated from SVN trunk again today and was successfully able to run 'ant test'. So I proceeded with trying your suggestions (for question 1 so far): On 17/01/2012 5:32 AM, Dyer, James wrote: David, The spellchecker normally won't give suggestions for any term in your index. So even if "wever" is misspelled in context, if it exists in the index the spell checker will not try correcting it. There are 3 workarounds: 1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only). See https://issues.apache.org/jira/browse/SOLR-2585 I have tried using this with the original test case of 'Signorney Wever'. I didn't notice any difference, although I am a little unclear as to what exactly this patch does. Nor am I really clear what to set either of the options to, so I set them both to '5'. I tried to find the test case it mentions, but it's not present in SpellCheckCollatorTest.java .. Any suggestions? 2. try "onlyMorePopular=true" in your request. (http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.onlyMorePopular). But see the September 2, 2011 comment in SOLR-2585 about why this might not do what you'd hope it would. Trying this did produce 'Signourney Weaver' as you would hope, but I am a little afraid of the downside. I would much more like a context sensative spell check that involves the terms around the correction. 3. If you're building your index on a, you can add a stopword filter that filters out all of the misspelt or rare words from the field that the dictionary is based. This could be an arduous task, and it may or may not work well for your data. I am currently using a copyField for all terms that are relevant, which is quite a lot and the dictionary would encompass a huge amount of data. Adding stopword filters would be out of the question as we presentl
Re: solr not working with magento enterprise 1.11
Hey, Shouldn't you be asking this question to the Magento people? You have an Enterprise edition, so you have paid for their support. Cheers, David On 25/01/2012 2:57 PM, vishal_asc wrote: I am integrating solr 3.5 with jetty in magento EE 1.11. I have followed all the necessary steps, configure and tested solr connection in magento catalog system config. I have copied magento/lib/Solr/conf/ content to solr installation. I have run the index management, restarted jetty but when I search any word or misspell its not showing me "Did you mean ?" string means not correcting misspelling. seems solr is not throwing results. please let me know how can i know solr is working with magento and where solr save XML documents when magento pushes attributes and product information in solr ? which directory it stores them. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-not-working-with-magento-enterprise-1-11-tp3686773p3686773.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr not working with magento enterprise 1.11
Hey, I am using Magento Community Edition, I wrote my own Magento extension to integrate Solr and it works fine. So I really don't know what the Enterprise edition does. On a personal and unrelated note, I would never use Windows for a server; Unreliable and most of the system resources go towards the OS. Cheers, David On 25/01/2012 3:30 PM, vishal_asc wrote: Thanks David. As of now we are configuring it on local WAMP server and we have only development version provided by sales team. Do you when where solr saves information or push the xml docs when we run index management in magento ? I followed this site: http://www.summasolutions.net/blogposts/magento-apache-solr-set Please let me know if you have some other info also. Best Regards, Vishal Porwal From: David Radunz [via Lucene] [mailto:ml-node+s472066n3686805...@n3.nabble.com] Sent: Wednesday, January 25, 2012 9:47 AM To: Vishal Porwal Subject: Re: solr not working with magento enterprise 1.11 Hey, Shouldn't you be asking this question to the Magento people? You have an Enterprise edition, so you have paid for their support. Cheers, David On 25/01/2012 2:57 PM, vishal_asc wrote: I am integrating solr 3.5 with jetty in magento EE 1.11. I have followed all the necessary steps, configure and tested solr connection in magento catalog system config. I have copied magento/lib/Solr/conf/ content to solr installation. I have run the index management, restarted jetty but when I search any word or misspell its not showing me "Did you mean ?" string means not correcting misspelling. seems solr is not throwing results. please let me know how can i know solr is working with magento and where solr save XML documents when magento pushes attributes and product information in solr ? which directory it stores them. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-not-working-with-magento-enterprise-1-11-tp3686773p3686773.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/solr-not-working-with-magento-enterprise-1-11-tp3686773p3686805.html To unsubscribe from solr not working with magento enterprise 1.11, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3686773&code=dmlzaGFsLnBvcndhbEBhc2NlbmR1bS5jb218MzY4Njc3M3w5NjEyMzY0MDE=>. NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> -- View this message in context: http://lucene.472066.n3.nabble.com/solr-not-working-with-magento-enterprise-1-11-tp3686773p3686818.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Data Directories and 1 SOLR instance
Hey, Sounds like what you need to setup is "Multiple Cores" configuration. At first I confused this with "Multi Core CPU", but that's not what it's about. Basically it's a way to run multiple 'solr' cores/indexes/configurations from a single Solr instance (which will scale better as the resources will be shared). Have a read anyway: http://wiki.apache.org/solr/CoreAdmin Cheers, David On 27/01/2012 8:18 AM, Nitin Arora wrote: Hi, We are using SOLR/Lucene to index/search the data about the user's of an organization. The nature of data is brief information about the user's work. Our data index requirement is to have segregated stores for each organization and currently we have 10 organizations and we have to run 10 different instances of SOLR to serve search results for an organization. As the new organizations are joining it is getting difficult to manage these many instances. I think now there is a need to use 1 SOLR instance and then have 10/multiple different data directories for each organization. When index/search request is received in SOLR we decide the data directory based on the organization. 1. Is it possible to do the same in SOLR and how can we achieve the same? 2. Will it be a good design to use SOLR like this? 3. Is there any impact on the scalability if we are able to manage the separate data directories inside SOLR? Thanks in advance Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3691644.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SpellCheck Help
Hey, I really recommend you contact Magento pre-sales to find out why THEIR stuff doesn't work. The information you have provided is specific to magento... You can't expect people on a Solr mailing list to help you with a Magento problem. I guarantee you the issue is probably something Magento is doing, so try seeking support their first (Try their mailing lists if they have any, or on IRC: irc.freenode.org #magento). I am not trying to be rude, rather to save you time and others effort. Cheers, David On 27/01/2012 5:37 PM, vishal_asc wrote: Downloaded Apache Solr from the URL: http://apache.dattatec.com//lucene/solr/ , extracted it at my windows machine. Then started solr: [solr-path]/example, and typed the following in a terminal: java –jar start.jar. it started and i can see the solr page at http://localhost:8983/solr/admin/ Now copied Magento [magento-instance-root]/lib/Apache/Solr/conf to [Solr-instance-root]/example/solr/conf. then again restared solr lots of activity was going on their. then I run System->index management and at front end search box i tried to search a product with incorrect spelling, in solr console i can see some activity but at magento front end I couldnt get any result, why ? I followed the steps given at this URL: http://www.summasolutions.net/blogposts/magento-apache-solr-set#comment-615 Please look into it and let me know any other information you require. I also want to know how i can implement facet and highlight search with resulted output. -- View this message in context: http://lucene.472066.n3.nabble.com/SpellCheck-Help-tp3648589p3692518.html Sent from the Solr - User mailing list archive at Nabble.com.
Performance improvement in large OR query using boosting (also, cache doesn't work?)
Hey Guys, I have really been enjoying Solr and I can't really blame the slowness on solr as this is a pretty insane query. However, I am a little curious why a repeated query moments later also suffers from the same load time? Anyway, the queries are: // 1st Query INFO: [] webapp=/solr path=/select/ params={facet=on&fl=id,name,url_path,url_key,price,special_price,small_image,thumbnail,sku,stock_qty,release_date,tax_class_id&sort=score+desc,retail_rating+desc,release_date+desc,year_made+desc&start=&q=**+-sku:"1029996"+-movie_id:"2665"+(series_names_attr_opt_id:"426317"^9000+OR+cat_id:"307"^1000+OR+cat_id:"308"^1000+OR+matching_genres:"Science+Fiction"^2000+OR+matching_genres:"Action"^1000+OR+matching_genres:"Thriller"^800+OR+matching_keywords:"superhero+team"^400+OR+matching_keywords:"superhero"^400+OR+matching_keywords:"superheroine"^300+OR+matching_keywords:"marvel+comic"^200+OR+matching_keywords:"costumed+hero"^100+OR+matching_keywords:"alien+life-form"^100+OR+matching_keywords:"thor"^100+OR+matching_keywords:"captain+america"^100+OR+matching_keywords:"the+incredible+hulk"^100+OR+matching_keywords:"iron+man"^100+OR+matching_keywords:"shape+shifting+alien"^50+OR+matching_keywords:"world+domination"^50+OR+matching_keywords:"human+alien"^50+OR+matching_keywords:"alien+invasion"^25+OR+matching_keywords:"super+strength"^25+OR+matching_keywords:"invisibility+cloak"^10+OR+matching_keywords:"warrior+race"^10+OR+matching_keywords:"alien+race"^5+OR+matching_keywords:"super+speed"^5+OR+matching_keywords:"flying+fortress"^5+OR+matching_keywords:"teleportation"^5+OR+matching_keywords:"creature"^5+OR+matching_keywords:"electromagnetic+pulse"^5+OR+matching_keywords:"immortality"^5+OR+matching_keywords:"mothership"^5+OR+matching_keywords:"mind+control"^5+OR+matching_keywords:"god"^5+OR+matching_keywords:"inventor"^5+OR+matching_keywords:"space+travel"^5+OR+matching_keywords:"fictional+government+agency"^5+OR+matching_keywords:"beautiful+woman"^5+OR+matching_keywords:"based+on+comic+book"^5+OR+matching_keywords:"army"^5+OR+matching_keywords:"blockbuster"^5+OR+matching_keywords:"mercenary"^5+OR+matching_keywords:"martial+arts"^5+OR+matching_keywords:"shield"^5+OR+matching_keywords:"captain"^5+OR+matching_keywords:"shot+in+the+head"^5+OR+matching_keywords:"shootout"^5+OR+matching_keywords:"flashback"^5+OR+matching_keywords:"pistol"^5+OR+matching_keywords:"airplane"^5+OR+matching_keywords:"helmet"^5+OR+matching_keywords:"car+accident"^5+OR+matching_keywords:"body+landing+on+a+car"^5+OR+matching_keywords:"spear"^5+OR+matching_keywords:"laboratory"^5+OR+matching_keywords:"warrior+woman"^5+OR+matching_keywords:"punching+bag"^5+OR+matching_keywords:"banquet"^5+OR+matching_keywords:"macguffin"^5+OR+matching_keywords:"mission"^5+OR+matching_keywords:"attack"^5+OR+matching_keywords:"hand+to+hand+combat"^5+OR+matching_keywords:"police+officer"^5+OR+matching_keywords:"robot"^5+OR+matching_keywords:"disguise"^5+OR+matching_keywords:"beating"^5+OR+matching_keywords:"falling+from+height"^5+OR+matching_keywords:"government+agent"^5+OR+matching_keywords:"battleship"^5+OR+matching_keywords:"parking+garage"^5+OR+matching_keywords:"head+butt"^5+OR+matching_keywords:"forest"^5+OR+matching_keywords:"crushed+to+death"^5+OR+matching_keywords:"deception"^5+OR+matching_keywords:"philanthropist"^5+OR+matching_keywords:"knife+fight"^5+OR+matching_keywords:"portal"^5+OR+matching_keywords:"knife"^5+OR+matching_keywords:"underwater+scene"^5+OR+matching_keywords:"exploding+plane"^5+OR+matching_keywords:"robot+suit"^5+OR+matching_keywords:"outer+space"^5+OR+matching_keywords:"stabbed+in+the+stomach"^5+OR+matching_keywords:"bodyguard"^5+OR+matching_keywords:"disaster+in+new+york"^5+OR+matching_keywords:"shot+in+the+chest"^5+OR+matching_keywords:"security+camera"^5+OR+matching_keywords:"rocket+launcher"^5+OR+matching_keywords:"tough+guy"^5+OR+matching_keywords:"secretary"^5+OR+matching_keywords:"monster"^5+OR+matching_keywords:"elevator"^5+OR+matching_keywords:"severed+arm"^5+OR+matching_keywords:"revenge"^5+OR+matching_keywords:"missile"^5+OR+matching_keywords:"kneeling"^5+OR+matching_keywords:"brawl"^5+OR+matching_keywords:"russian"^5+OR+matching_keywords:"scientist"^5+OR+matching_keywords:"super+computer"^5+OR+matching_keywords:"assault+rifle"^5+OR+matching_keywords:"adopted+brother"^5+OR+matching_keywords:"villain+arrested"^5+OR+matching_keywords:"man+punching+a+woman"^5+OR+matching_keywords:"soldier"^5+OR+matching_keywords:"national+guard"^5+OR+matching_keywords:"hammer"^5+OR+matching_keywords:"chase"^5+OR+matchin
Re: Performance improvement in large OR query using boosting (also, cache doesn't work?)
Hey, Sorry for the delay, I had to enable larger head buffers in jetty to do this as a GET query (LOL). Anyway, I have put the results on pastebin to try and make it more presenable, though it's mostly failed. 1st Query: http://pastebin.com/uSGtQjA3 (query with a freshly started solr) 2nd Query: http://pastebin.com/4NbSdEHC (as before, just the same query again) Seemingly, the slowness happens in 'processing'. But yeah, i'm sure you guys would better understand all of that :) Cheers, David On 14/12/2012 11:13 PM, Markus Jelsma wrote: Hi, This is insane indeed! Please enable debugging and report the prepare and process times for the query component. I think the prepare time is very high in both queries and the process time is slightly less for the second query due to caching. Cheers, -Original message- From:David Radunz Sent: Fri 14-Dec-2012 13:04 To: solr-user@lucene.apache.org Subject: Performance improvement in large OR query using boosting (also, cache doesn't work?) Hey Guys, I have really been enjoying Solr and I can't really blame the slowness on solr as this is a pretty insane query. However, I am a little curious why a repeated query moments later also suffers from the same load time? Anyway, the queries are: // 1st Query INFO: [] webapp=/solr path=/select/ params={facet=on&fl=id,name,url_path,url_key,price,special_price,small_image,thumbnail,sku,stock_qty,release_date,tax_class_id&sort=score+desc,retail_rating+desc,release_date+desc,year_made+desc&start=&q=**+-sku:"1029996"+-movie_id:"2665"+(series_names_attr_opt_id:"426317"^9000+OR+cat_id:"307"^1000+OR+cat_id:"308"^1000+OR+matching_genres:"Science+Fiction"^2000+OR+matching_genres:"Action"^1000+OR+matching_genres:"Thriller"^800+OR+matching_keywords:"superhero+team"^400+OR+matching_keywords:"superhero"^400+OR+matching_keywords:"superheroine"^300+OR+matching_keywords:"marvel+comic"^200+OR+matching_keywords:"costumed+hero"^100+OR+matching_keywords:"alien+life-form"^100+OR+matching_keywords:"thor"^100+OR+matching_keywords:"captain+america"^100+OR+matching_keywords:"the+incredible+hulk"^100+OR+matching_keywords:"iron+man"^100+OR+matching_keywords:"shape+shifting+alien"^50+OR+matching_keywords:"worl d+domination"^50+OR+matching_keywords:"human+alien"^50+OR+matching_keywords:"alien+invasion"^25+OR+matching_keywords:"super+strength"^25+OR+matching_keywords:"invisibility+cloak"^10+OR+matching_keywords:"warrior+race"^10+OR+matching_keywords:"alien+race"^5+OR+matching_keywords:"super+speed"^5+OR+matching_keywords:"flying+fortress"^5+OR+matching_keywords:"teleportation"^5+OR+matching_keywords:"creature"^5+OR+matching_keywords:"electromagnetic+pulse"^5+OR+matching_keywords:"immortality"^5+OR+matching_keywords:"mothership"^5+OR+matching_keywords:"mind+control"^5+OR+matching_keywords:"god"^5+OR+matching_keywords:"inventor"^5+OR+matching_keywords:"space+travel"^5+OR+matching_keywords:"fictional+government+agency"^5+OR+matching_keywords:"beautiful+woman"^5+OR+matching_keywords:"based+on+comic+book"^5+OR+matching_keywords:"army"^5+OR+matching_keywords:"blockbuster"^5+OR+matching_keywords:"me rcenary"^5+OR+matching_keywords:"martial+arts"^5+OR+matching_keywords:"shield"^5+OR+matching_keywords:"captain"^5+OR+matching_keywords:"shot+in+the+head"^5+OR+matching_keywords:"shootout"^5+OR+matching_keywords:"flashback"^5+OR+matching_keywords:"pistol"^5+OR+matching_keywords:"airplane"^5+OR+matching_keywords:"helmet"^5+OR+matching_keywords:"car+accident"^5+OR+matching_keywords:"body+landing+on+a+car"^5+OR+matching_keywords:"spear"^5+OR+matching_keywords:"laboratory"^5+OR+matching_keywords:"warrior+woman"^5+OR+matching_keywords:"punching+bag"^5+OR+matching_keywords:"banquet"^5+OR+matching_keywords:"macguffin"^5+OR+matching_keywords:"mission"^5+OR+matching_keywords:"attack"^5+OR+matching_keywords:"hand+to+hand+combat"^5+OR+matching_keywords:"police+officer"^5+OR+matching_keywords:"robot"^5+OR+matching_keywords:"disguise"^5+OR+matching_keywords:"beating"^5+OR+matching_keywords: "falling+from+height"^5+OR+matching_keywords:"government+agent"^5+OR+matching_keywords:"battleship"^5+OR+matching_keywords:"parking+garage"^5+OR+matching_keywords:"head+butt"^5+OR+matching_keywords:"forest"^5+OR+matching_keywords:"crushed+to+death"^5+OR+matching_keywords:"deception"^5+OR+matching_keywords:"philanthropist"^5+OR+matching_keywords:"knife+fight"^5+OR+matching_keywords:"portal"^5+OR+matching_keywords:"knife"^5+OR+matching_keywords:"underwater+scene"^5+OR+matching_keywords:"exploding+plane"^5+OR+matching_keywords:"robot+suit"^5+OR+matching_keywords:"outer+space"^5+OR+matching_keywords:"stabbed+in+the+sto
Re: Advanced search with results matrix
Hey Gnanam, 1. If I understand correctly you just need to perform one query. Like so (translated to propper syntax of course): ("SQL Server" OR SQL) OR ("Visual Basic" OR VB.NET) OR (Java AND JavaScript) 2. Every query you perform with Solr returns the 'results' count, if you ONLY want the results count simply set rows to 0 (but im guessing you will want both the results and the count as to avoid 2 trips). - The 'results count' is here: start="0"/> (being numFound) David On 4/05/2012 4:46 PM, Gnanakumar wrote: Hi, First off, we're a happy user of Apache Solr v3.1 Enterprise search server, integrated and successfully running in our LIVE Production server. Now, we're enhancing our existing search feature in our web application as explained below, that truly helps application users in making informed decision before getting their search results: There will be 3 textboxes provided and users can enter keyword phrases with OR, AND combination within each textbox as shown below, for example: Textbox 1: "SQL Server" OR SQL Textbox 2: "Visual Basic" OR VB.NET Textbox 3: Java AND JavaScript If User clicks "Search" button, we want to present an intermediate or "results matrix" page that would generate all possible combinations for 3 textboxes with how many records found for each combination as given below (between combination it is AND operation). This, as I said before, truly helps application users in making informed decision/choice before getting their search results: +-+-+--- - Matches | Textbox 1 | Textbox 2 | Textbox 3 +-+-+--- - 200 |"SQL Server" OR SQL | | 300 | |"Visual Basic" OR VB.NET | 400 | | | Java AND JavaScript 250 |"SQL Server" OR SQL |"Visual Basic" OR VB.NET | 350 | |"Visual Basic" OR VB.NET | Java AND JavaScript 300 |"SQL Server" OR SQL | | Java AND JavaScript 100 |"SQL Server" OR SQL |"Visual Basic" OR VB.NET | Java AND JavaScript +-+-+--- - Only on clicking one of this "Matches" count will display actual results of that particular search. My questions are, 1) Do I need to run search separately for each combination or is it possible to combine and obtain "results matrix" page by making "only" one single call to Apache Solr? Or are they any plug-ins available that provides functionality close to my use case? 2) How do I instruct Solr to return only count (not result) for the search performed? 3) Any ideas/suggestions/approaches/resources are really appreciated and welcomed Regards, Gnanam
Re: indexing unstructured text (tweets)
Hey, I think you might be over-thinking this. Tweets are structured. You have the content (tweet), the user who tweeted it and various other meta data. So your 'document', might look like this: ABCD1234 I bought some apples JohnnyBoy To get this structure, you can use any programming language your comfortable with and load it into Solr via various means. Obviously you can add more 'meta' fields that you get from twitter if you want as well. David On 28/05/2012 9:37 PM, Giovanni Gherdovich wrote: Hi all. I am in the process of setting up Solr for my application, which is full text search on a bunch of tweets from twitter. I am afraid I am missing something. From the books I am reading, "Apache Solr 3 Enterprise Search Server", it looks like Solr works with structured input, like XML or CVS, while I have the most wild and unstructured input ever (tweets). A section named "Indexing documents with Solr Cell" seems to address my problem, but also shows that before getting to Solr, I might need to use another Apache tool called Tika. Can anybody provide a brief explaination about the general picture? Can I index my tweets with Solr? Or do I need to put also Tika in my pipeline? Best regards, Giovanni Gherdovich
Weighted Search Results / Multi-Value Value's Not Aggregating Weight
Hey, I have been having some problems getting good search results when using weighting against many fields with multi-values. After quite a bit of testing it seems to me that the problem is (at least as far as my query is concerned) is that the only one weighting is taken into account per field. For example, in a multi-value field if we have "Comedy" and "Romance" and have separate weightings for those - the one with the highest weighting is used (and not a combined weighting). Which means that searched for romantic comedy returns "Alvin and the Chipmunks" (Family, Children Comedy). Query: facet=on&fl=id,name,matching_genres,score,url_path,url_key,price,special_price,small_image,thumbnail,sku,stock_qty,release_date&sort=score+desc,retail_rating+desc,release_date+desc&start=&q=**+-sku:"1019660"+-movie_id:"1805"+-movie_id:"1806"+(series_names_attr_opt_id:"454282"^9000+OR+cat_id:"22"^9+OR+cat_id:"248"^9+OR+cat_id:"249"^9+OR+matching_genres:"Comedy"^9+OR+matching_genres:"Romance"^7+OR+matching_genres:"Drama"^5)&fq=store_id:"1"+AND+avail_status_attr_opt_id:"available"+AND+(format_attr_opt_id:"372619")&rows=4 Now if I change matching_genres:"Romance"^7 to matching_genres:"Romance"^70 (adding a 0) suddenly the first result is "Sex and the City: The Movie / Sex and the City 2" (which ironically is "Drama", "Comedy", "Romance - The very combination we are looking for). So is there a way to structure my query so that all of the multi-value values are treated individually? Aggregating the weighting/score? Thanks in advance! David
Re: Weighted Search Results / Multi-Value Value's Not Aggregating Weight
Hey, Please disregard this, I worked out what the actual problem was. I am going to post another query with something else I discovered. Thanks :) David On 22/08/2012 7:24 PM, David Radunz wrote: Hey, I have been having some problems getting good search results when using weighting against many fields with multi-values. After quite a bit of testing it seems to me that the problem is (at least as far as my query is concerned) is that the only one weighting is taken into account per field. For example, in a multi-value field if we have "Comedy" and "Romance" and have separate weightings for those - the one with the highest weighting is used (and not a combined weighting). Which means that searched for romantic comedy returns "Alvin and the Chipmunks" (Family, Children Comedy). Query: facet=on&fl=id,name,matching_genres,score,url_path,url_key,price,special_price,small_image,thumbnail,sku,stock_qty,release_date&sort=score+desc,retail_rating+desc,release_date+desc&start=&q=**+-sku:"1019660"+-movie_id:"1805"+-movie_id:"1806"+(series_names_attr_opt_id:"454282"^9000+OR+cat_id:"22"^9+OR+cat_id:"248"^9+OR+cat_id:"249"^9+OR+matching_genres:"Comedy"^9+OR+matching_genres:"Romance"^7+OR+matching_genres:"Drama"^5)&fq=store_id:"1"+AND+avail_status_attr_opt_id:"available"+AND+(format_attr_opt_id:"372619")&rows=4 Now if I change matching_genres:"Romance"^7 to matching_genres:"Romance"^70 (adding a 0) suddenly the first result is "Sex and the City: The Movie / Sex and the City 2" (which ironically is "Drama", "Comedy", "Romance - The very combination we are looking for). So is there a way to structure my query so that all of the multi-value values are treated individually? Aggregating the weighting/score? Thanks in advance! David