SolrCloud: facet range option f..facet.mincount=1 omits buckets on response
Good afternoon, Is the f..facet.mincount option supported on a distributed search? Under SolrCloud experiencing that some buckets are ignored when using the option "f..facet.mincount=1". The Solr logs do not indicate any error or warning during execution. The debug=true option and increasing the log levels to the FacetComponent do not provide any hints to the behaviour. Replicated the issue on both Solr 4.5.1 & 4.8.1. Attached a PDF that provides additional details and steps to replicate the behaviour using the out of the box Solr distribution. Any insight or recommendation to tackle this situation is much appreciated. Example, Removing the f..facet.mincount=1 option gives the expected list of buckets for the 6 documents matched. 0 1 0 3 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 50.0 0.0 1000.0 0 0 2 Using the f..facet.mincount=1 option removes the 0 count buckets but will also omit bucket 1 3 1 50.0 0.0 1000.0 0 0 4 Refreshing the query using the browser's F5 option renders a different bucket list (you may need to refresh multiple times) 3 1 50.0 0.0 1000.0 0 0 2 Regards Ronald Matamoros
RE: SolrCloud: facet range option f..facet.mincount=1 omits buckets on response
Hi all, At the moment I am reviewing the code to determine if this is a legitimate bug that needs to be set as a JIRA ticket. Any insight or recommendation is appreciated. Including the replication steps as text: - Solr versions where issue was replicated. * 4.5.1 (Linux) * 4.8.1 (Windows + Cygwin) Replicating 1. Created two-shard environment - no replication https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud a. Download Solr distribution from http://lucene.apache.org/solr/downloads.html b. Unzipped solr-4.8.1.zip to a temporary location: c. Ran once so the SolrCloud jars get unpacked: java -jar start.jar d. Create nodes i. cd ii. Via Windows Explorer copied example to node1 iii. Via Windows Explorer copied example to node2 e. Start Nodes i. Start node 1 cd node1 java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar ii. Start node 2 cd node2 java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar f. Fed sample documents i. Out of the box curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" -d "@mem.xml" curl http://localhost:7574/solr/update?commit=true -H "Content-Type: text/xml" -d "@monitor2.xml" ii. Create a copy of mem.xml to mem2.xml; modified identifiers, names, prices and fed curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" -d "@mem2.xml" COMPANY1 COMPANY1 Device COMPANY1 Device Mfg . 190 . COMPANY2 COMPANY2 flatscreen COMPANY2 Device Mfg. . 200.00 . COMPANY3 COMPANY3 Laptop COMPANY3 Device Mfg. . 800.00 . 2. Query **without** f.price.facet.mincount=1, counts and buckets are OK http://localhost:8983/solr/collection1/select?q=*:*&fl=id,price&sort=id+asc&facet=true&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=1000&f.price.facet.range.gap=50&f.price.facet.range.other=all&f.price.facet.range.include=upper&spellcheck=false&hl=false Only six documents have prices 0 1 0 3 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 50.0 0.0 1000.0 0 0 2 Note: the value in changes with every other refresh of the query. 3.Use of &f.price.facet.mincount=1, missing bucket 1 http://localhost:8983/solr/collection1/select?q=*:*&fl=id,price&sort=id+asc&facet=true&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=1000&f.price.facet.range.gap=50&f.price.facet.range.other=all&f.price.facet.range.include=upper&spellcheck=false&hl=false&f.price.facet.mincount=1 1 3 1 50.0 0.0 1000.0 0 0 4 Refresh of the Query (may need to do this multiple times with F5 key on browser) 3 1 50.0 0.0 1000.0 0 0 2 Thank you, Ronald Matamoros -Original Message- From: Ronald Matamoros [mailto:rmatamo...@searchtechnologies.com] Sent: 27 May 2014 16:25 To: solr-user@lucene.apache.org Subject: COMMERCIAL: SolrCloud: facet range option f..facet.mincount=1 omits buckets on response Good afternoon, Is the f..facet.mincount option supported on a distributed search? Under SolrCloud experiencing that some buckets are ignored when using the option "f..facet.mincount=1". The Solr
Re: SolrCloud: facet range option f..facet.mincount=1 omits buckets on response
Hi Shawn, Thanks very much for the feedback. Have tested using the routing mechanism/composite-id on a larger scale. Unfortunately the same behaviour. Regards Ronald -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: 29 May 2014 20:16 To: solr-user@lucene.apache.org Subject: COMMERCIAL: Re: SolrCloud: facet range option f..facet.mincount=1 omits buckets on response On 5/29/2014 12:06 PM, Ronald Matamoros wrote: > Hi all, > > At the moment I am reviewing the code to determine if this is a legitimate > bug that needs to be set as a JIRA ticket. > Any insight or recommendation is appreciated. > Note: the value in changes with every other > refresh of the query. Whenever distributed search results change from one query to the next, it's almost always caused by having documents with the same uniqueKey in more than one shard. Solr is able to remove these duplicates from the results, but there are other aspects of distributed searching that cannot be dealt with when there are duplicate documents. This leads to problems like numFound changing from one request to the next. To avoid these problems with SolrCloud, you'll likely want to create a new collection and set its router to compositeId. This ensures that indexed documents are distributed to shards according to the hash of their uniqueKey, not imported directly into the node where you made the update request. It's possible that my guess here is completely wrong, but this is usually the problem. Thanks, Shawn
RE: COMMERCIAL: RE: SolrCloud: facet range option f..facet.mincount=1 omits buckets on response
Hi Chris, Created ticket https://issues.apache.org/jira/browse/SOLR-6154 Included to the ticket the data.xml and a PDF with instructions on how to replicate. Sending different updates to different ports was just how the confluence tutorial made the steps; it does not affect the result of the test As soon as I have more information will post to the ticket. Appreciate the interest, let me know about any suggestion or feedback Thank you Ronald Matamoros -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: 06 June 2014 22:00 To: solr-user@lucene.apache.org Subject: COMMERCIAL: RE: SolrCloud: facet range option f..facet.mincount=1 omits buckets on response Ronald: I'm having a little trouble understading the steps o reproduce that you are describing -- in particular Step "1 f ii" because i'm not really sure i understand what exactly you are putting in "mem2.xml" Also: Since you don't appera to be using implicit routing, i'm not clear on why you are explicitly sending differnet updates to different ports in Step "1 f i" -- does that affect the results of your test? If you can reliably reproduce using modified data from the example, could you please open a Jira outline these steps and atached the modified data to index directly to that issue? (FWIW: If it doesn't matter what port you use to send which documents, then you should be able to create a single unified "data.xml" file containing all the docs to index in a single command) : Date: Thu, 29 May 2014 18:06:38 + : From: Ronald Matamoros : Reply-To: solr-user@lucene.apache.org : To: "solr-user@lucene.apache.org" : Subject: RE: SolrCloud: facet range option f..facet.mincount=1 omits : buckets on response : : Hi all, : : At the moment I am reviewing the code to determine if this is a legitimate bug that needs to be set as a JIRA ticket. : Any insight or recommendation is appreciated. : : Including the replication steps as text: : : - : Solr versions where issue was replicated. : * 4.5.1 (Linux) : * 4.8.1 (Windows + Cygwin) : : Replicating : : 1. Created two-shard environment - no replication : https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud : : a. Download Solr distribution from http://lucene.apache.org/solr/downloads.html : b. Unzipped solr-4.8.1.zip to a temporary location: : c. Ran once so the SolrCloud jars get unpacked: java -jar start.jar : d. Create nodes : i. cd : ii. Via Windows Explorer copied example to node1 : iii. Via Windows Explorer copied example to node2 : : e. Start Nodes : i. Start node 1 : :cd node1 :java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar : : ii. Start node 2 : :cd node2 :java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar : : f. Fed sample documents : i. Out of the box : :curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" -d "@mem.xml" :curl http://localhost:7574/solr/update?commit=true -H "Content-Type: text/xml" -d "@monitor2.xml" : : ii. Create a copy of mem.xml to mem2.xml; modified identifiers, names, prices and fed : :curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" -d "@mem2.xml" : : : :COMPANY1 :COMPANY1 Device :COMPANY1 Device Mfg :. :190 :. : : :COMPANY2 :COMPANY2 flatscreen :COMPANY2 Device Mfg. :. :200.00 :. : : :COMPANY3 :COMPANY3 Laptop :COMPANY3 Device Mfg. :. :800.00 :. : : : : : 2. Query **without** f.price.facet.mincount=1, counts and buckets are OK : : http://localhost:8983/solr/collection1/select?q=*:*&fl=id,price&sort=id+asc&facet=true&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=1000&f.price.facet.range.gap=50&f.price.facet.range.other=all&f.price.facet.range.include=upper&spellcheck=false&hl=false : : Only six documents have prices : : : : : 0 : 1