Hi all,
At the moment I am reviewing the code to determine if this is a legitimate bug
that needs to be set as a JIRA ticket.
Any insight or recommendation is appreciated.
Including the replication steps as text:
-----------------------------------------------------------------
Solr versions where issue was replicated.
* 4.5.1 (Linux)
* 4.8.1 (Windows + Cygwin)
Replicating
1. Created two-shard environment - no replication
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
a. Download Solr distribution from
http://lucene.apache.org/solr/downloads.html
b. Unzipped solr-4.8.1.zip to a temporary location: <SOLR_DIST_HOME>
c. Ran once so the SolrCloud jars get unpacked: java -jar start.jar
d. Create nodes
i. cd <SOLR_DIST_HOME>
ii. Via Windows Explorer copied example to node1
iii. Via Windows Explorer copied example to node2
e. Start Nodes
i. Start node 1
cd node1
java -DzkRun -DnumShards=2
-Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar
start.jar
ii. Start node 2
cd node2
java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
f. Fed sample documents
i. Out of the box
curl http://localhost:8983/solr/update?commit=true -H
"Content-Type: text/xml" -d "@mem.xml"
curl http://localhost:7574/solr/update?commit=true -H
"Content-Type: text/xml" -d "@monitor2.xml"
ii. Create a copy of mem.xml to mem2.xml; modified identifiers,
names, prices and fed
curl http://localhost:8983/solr/update?commit=true -H
"Content-Type: text/xml" -d "@mem2.xml"
<add>
<doc>
<field name="id">COMPANY1</field>
<field name="name">COMPANY1 Device</field>
<field name="manu">COMPANY1 Device Mfg</field>
.
<field name="price">190</field>
.
</doc>
<doc>
<field name="id">COMPANY2</field>
<field name="name">COMPANY2 flatscreen</field>
<field name="manu">COMPANY2 Device Mfg.</field>
.
<field name="price">200.00</field>
.
</doc>
<doc>
<field name="id">COMPANY3</field>
<field name="name">COMPANY3 Laptop</field>
<field name="manu">COMPANY3 Device Mfg.</field>
.
<field name="price">800.00</field>
.
</doc>
</add>
2. Query **without** f.price.facet.mincount=1, counts and buckets are OK
http://localhost:8983/solr/collection1/select?q=*:*&fl=id,price&sort=id+asc&facet=true&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=1000&f.price.facet.range.gap=50&f.price.facet.range.other=all&f.price.facet.range.include=upper&spellcheck=false&hl=false
Only six documents have prices
<lst name="facet_ranges">
<lst name="price">
<lst name="counts">
<int name="0.0">0</int>
<int name="50.0">1</int>
<int name="100.0">0</int>
<int name="150.0">3</int>
<int name="200.0">0</int>
<int name="250.0">1</int>
<int name="300.0">0</int>
<int name="350.0">0</int>
<int name="400.0">0</int>
<int name="450.0">0</int>
<int name="500.0">0</int>
<int name="550.0">0</int>
<int name="600.0">0</int>
<int name="650.0">0</int>
<int name="700.0">0</int>
<int name="750.0">1</int>
<int name="800.0">0</int>
<int name="850.0">0</int>
<int name="900.0">0</int>
<int name="950.0">0</int>
</lst>
<float name="gap">50.0</float>
<float name="start">0.0</float>
<float name="end">1000.0</float>
<int name="before">0</int>
<int name="after">0</int>
<int name="between">2</int>
</lst>
</lst>
Note: the value in <int name="between"> changes with every other
refresh of the query.
3. Use of &f.price.facet.mincount=1, missing bucket <int
name="250.0">1</int>
http://localhost:8983/solr/collection1/select?q=*:*&fl=id,price&sort=id+asc&facet=true&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=1000&f.price.facet.range.gap=50&f.price.facet.range.other=all&f.price.facet.range.include=upper&spellcheck=false&hl=false&f.price.facet.mincount=1
<lst name="facet_ranges">
<lst name="price">
<lst name="counts">
<int name="50.0">1</int>
<int name="150.0">3</int>
<int name="750.0">1</int>
</lst>
<float name="gap">50.0</float>
<float name="start">0.0</float>
<float name="end">1000.0</float>
<int name="before">0</int>
<int name="after">0</int>
<int name="between">4</int>
</lst>
</lst>
Refresh of the Query (may need to do this multiple times with F5 key on
browser)
<lst name="facet_ranges">
<lst name="price">
<lst name="counts">
<int name="150.0">3</int>
<int name="250.0">1</int>
</lst>
<float name="gap">50.0</float>
<float name="start">0.0</float>
<float name="end">1000.0</float>
<int name="before">0</int>
<int name="after">0</int>
<int name="between">2</int>
</lst>
</lst>
Thank you,
Ronald Matamoros
-----Original Message-----
From: Ronald Matamoros [mailto:[email protected]]
Sent: 27 May 2014 16:25
To: [email protected]
Subject: COMMERCIAL: SolrCloud: facet range option f.<field>.facet.mincount=1
omits buckets on response
Good afternoon,
Is the f.<field>.facet.mincount option supported on a distributed search?
Under SolrCloud experiencing that some buckets are ignored when using the
option "f.<field>.facet.mincount=1".
The Solr logs do not indicate any error or warning during execution.
The debug=true option and increasing the log levels to the FacetComponent do
not provide any hints to the behaviour.
Replicated the issue on both Solr 4.5.1 & 4.8.1.
Attached a PDF that provides additional details and steps to replicate the
behaviour using the out of the box Solr distribution.
Any insight or recommendation to tackle this situation is much appreciated.
Example,
Removing the f.<field>.facet.mincount=1 option gives the expected list of
buckets for the 6 documents matched.
<lst name="facet_ranges">
<lst name="price">
<lst name="counts">
<int name="0.0">0</int>
<int name="50.0">1</int>
<int name="100.0">0</int>
<int name="150.0">3</int>
<int name="200.0">0</int>
<int name="250.0">1</int>
<int name="300.0">0</int>
<int name="350.0">0</int>
<int name="400.0">0</int>
<int name="450.0">0</int>
<int name="500.0">0</int>
<int name="550.0">0</int>
<int name="600.0">0</int>
<int name="650.0">0</int>
<int name="700.0">0</int>
<int name="750.0">1</int>
<int name="800.0">0</int>
<int name="850.0">0</int>
<int name="900.0">0</int>
<int name="950.0">0</int>
</lst>
<float name="gap">50.0</float>
<float name="start">0.0</float>
<float name="end">1000.0</float>
<int name="before">0</int>
<int name="after">0</int>
<int name="between">2</int>
</lst>
</lst>
Using the f.<field>.facet.mincount=1 option removes the 0 count buckets
but will also omit bucket <int name="250.0">
<lst name="facet_ranges">
<lst name="price">
<lst name="counts">
<int name="50.0">1</int>
<int name="150.0">3</int>
<int name="750.0">1</int>
</lst>
<float name="gap">50.0</float>
<float name="start">0.0</float>
<float name="end">1000.0</float>
<int name="before">0</int>
<int name="after">0</int>
<int name="between">4</int>
</lst>
</lst>
Refreshing the query using the browser's F5 option renders a different
bucket list
(you may need to refresh multiple times)
<lst name="facet_ranges">
<lst name="price">
<lst name="counts">
<int name="150.0">3</int>
<int name="250.0">1</int>
</lst>
<float name="gap">50.0</float>
<float name="start">0.0</float>
<float name="end">1000.0</float>
<int name="before">0</int>
<int name="after">0</int>
<int name="between">2</int>
</lst>
</lst>
Regards
Ronald Matamoros