SolrCloud: facet range option f..facet.mincount=1 omits buckets on response

2014-05-27 Thread Ronald Matamoros
Good afternoon,

Is the f..facet.mincount option supported on a distributed search?
Under SolrCloud experiencing that some buckets are ignored when using the 
option "f..facet.mincount=1".

The Solr logs do not indicate any error or warning during execution.
The debug=true option and increasing the log levels to the FacetComponent do 
not provide any hints to the behaviour.

Replicated the issue on both Solr 4.5.1 & 4.8.1.
Attached a PDF that provides additional details and steps to replicate the 
behaviour using the out of the box Solr distribution.

Any insight or recommendation to tackle this situation is much appreciated.

Example, 

  Removing the f..facet.mincount=1 option gives the expected list of 
buckets for the 6 documents matched.


 
   
 0
 1
 0
 3
 0
 1
 0
 0
 0
 0
 0
 0
 0
 0
 0
 1
 0
 0
 0
 0
   
   50.0
   0.0
   1000.0
   0
   0
   2
 
   

  Using the f..facet.mincount=1 option removes the 0 count buckets 
but will also omit bucket 

   
  

1
3
1
 
 50.0
 0.0
 1000.0
 0
 0
 4
  


 Refreshing the query using the browser's F5 option renders a different 
bucket list 
 (you may need to refresh multiple times)

   
  

3
1
 
 50.0
 0.0
 1000.0
 0
 0
 2
  
    

Regards 
Ronald Matamoros


RE: SolrCloud: facet range option f..facet.mincount=1 omits buckets on response

2014-05-29 Thread Ronald Matamoros
Hi all,

At the moment I am reviewing the code to determine if this is a legitimate bug 
that needs to be set as a JIRA ticket.
Any insight or recommendation is appreciated.

Including the replication steps as text:

-
Solr versions where issue was replicated.
  * 4.5.1 (Linux)
  * 4.8.1 (Windows + Cygwin)

Replicating

  1. Created two-shard environment - no replication 
 
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud

 a. Download Solr distribution from 
http://lucene.apache.org/solr/downloads.html 
 b. Unzipped solr-4.8.1.zip to a temporary location:  
 c. Ran once so the SolrCloud jars get unpacked: java -jar start.jar
 d. Create nodes
  i. cd 
  ii. Via Windows Explorer copied example to node1
  iii. Via Windows Explorer copied example to node2

 e. Start Nodes 
  i. Start node 1

   cd node1
   java -DzkRun -DnumShards=2 
-Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar 
start.jar

  ii. Start node 2

   cd node2
   java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar

 f. Fed sample documents
  i. Out of the box

   curl http://localhost:8983/solr/update?commit=true -H 
"Content-Type: text/xml" -d "@mem.xml"
   curl http://localhost:7574/solr/update?commit=true -H 
"Content-Type: text/xml" -d "@monitor2.xml"

  ii. Create a copy of mem.xml to mem2.xml; modified identifiers, 
names, prices and fed

   curl http://localhost:8983/solr/update?commit=true -H 
"Content-Type: text/xml" -d "@mem2.xml"

   
 
   COMPANY1
   COMPANY1 Device
   COMPANY1 Device Mfg
   .
   190
   .
 
 
   COMPANY2
   COMPANY2 flatscreen
   COMPANY2 Device Mfg.
   .
   200.00
   .
 
 
   COMPANY3
   COMPANY3 Laptop
   COMPANY3 Device Mfg.
   .
   800.00
   .
 
 
 

  2. Query **without** f.price.facet.mincount=1, counts and buckets are OK

 
http://localhost:8983/solr/collection1/select?q=*:*&fl=id,price&sort=id+asc&facet=true&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=1000&f.price.facet.range.gap=50&f.price.facet.range.other=all&f.price.facet.range.include=upper&spellcheck=false&hl=false
 
 Only six documents have prices
 
  

  
0
1
0
3
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
  
  50.0
  0.0
  1000.0
  0
  0
  2

  

  Note: the value in  changes with every other 
refresh of the query. 

  3.Use of &f.price.facet.mincount=1, missing bucket  1

 
http://localhost:8983/solr/collection1/select?q=*:*&fl=id,price&sort=id+asc&facet=true&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=1000&f.price.facet.range.gap=50&f.price.facet.range.other=all&f.price.facet.range.include=upper&spellcheck=false&hl=false&f.price.facet.mincount=1

  

  
1
3
1
  
  50.0
  0.0
  1000.0
  0
  0
  4

  

 Refresh of the Query (may need to do this multiple times with F5 key on 
browser)

  

  
3
1
      
  50.0
  0.0
      1000.0
  0
  0
  2

  

Thank you,
Ronald Matamoros

-Original Message-
From: Ronald Matamoros [mailto:rmatamo...@searchtechnologies.com] 
Sent: 27 May 2014 16:25
To: solr-user@lucene.apache.org
Subject: COMMERCIAL: SolrCloud: facet range option f..facet.mincount=1 
omits buckets on response

Good afternoon,

Is the f..facet.mincount option supported on a distributed search?
Under SolrCloud experiencing that some buckets are ignored when using the 
option "f..facet.mincount=1".

The Solr 

Re: SolrCloud: facet range option f..facet.mincount=1 omits buckets on response

2014-05-30 Thread Ronald Matamoros
Hi Shawn,

Thanks very much for the feedback.

Have tested using the routing mechanism/composite-id on a larger scale.
Unfortunately the same behaviour.

Regards
Ronald


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: 29 May 2014 20:16
To: solr-user@lucene.apache.org
Subject: COMMERCIAL: Re: SolrCloud: facet range option 
f..facet.mincount=1 omits buckets on response

On 5/29/2014 12:06 PM, Ronald Matamoros wrote:
> Hi all,
>
> At the moment I am reviewing the code to determine if this is a legitimate 
> bug that needs to be set as a JIRA ticket.
> Any insight or recommendation is appreciated.



>   Note: the value in  changes with every other 
> refresh of the query. 

Whenever distributed search results change from one query to the next, it's 
almost always caused by having documents with the same uniqueKey in more than 
one shard.  Solr is able to remove these duplicates from the results, but there 
are other aspects of distributed searching that cannot be dealt with when there 
are duplicate documents.  This leads to problems like numFound changing from 
one request to the next.

To avoid these problems with SolrCloud, you'll likely want to create a new 
collection and set its router to compositeId.  This ensures that indexed 
documents are distributed to shards according to the hash of their uniqueKey, 
not imported directly into the node where you made the update request.

It's possible that my guess here is completely wrong, but this is usually the 
problem.

Thanks,
Shawn



RE: COMMERCIAL: RE: SolrCloud: facet range option f..facet.mincount=1 omits buckets on response

2014-06-09 Thread Ronald Matamoros
Hi Chris,

Created ticket https://issues.apache.org/jira/browse/SOLR-6154
Included to the ticket the data.xml and a PDF with instructions on how to 
replicate.

Sending different updates to different ports was just how the confluence 
tutorial made the steps; it does not affect the result of the test

As soon as I have more information will post to the ticket.
Appreciate the interest, let me know about any suggestion or feedback  

Thank you
Ronald Matamoros


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: 06 June 2014 22:00
To: solr-user@lucene.apache.org
Subject: COMMERCIAL: RE: SolrCloud: facet range option 
f..facet.mincount=1 omits buckets on response



Ronald: I'm having a little trouble understading the  steps o reproduce that 
you are describing -- in particular Step "1 f ii" because i'm not really sure i 
understand what exactly you are putting in "mem2.xml"

Also: Since you don't appera to be using implicit routing, i'm not clear on why 
you are explicitly sending differnet updates to different ports in Step "1 f i" 
-- does that affect the results of your test?


If you can reliably reproduce using modified data from the example, could you 
please open a Jira outline these steps and atached the modified data to index 
directly to that issue?  (FWIW: If it doesn't matter what port you use to send 
which documents, then you should be able to create a single unified "data.xml" 
file containing all the docs to index in a single
command)



: Date: Thu, 29 May 2014 18:06:38 +
: From: Ronald Matamoros 
: Reply-To: solr-user@lucene.apache.org
: To: "solr-user@lucene.apache.org" 
: Subject: RE: SolrCloud: facet range option f..facet.mincount=1 omits
: buckets on response
: 
: Hi all,
: 
: At the moment I am reviewing the code to determine if this is a legitimate 
bug that needs to be set as a JIRA ticket.
: Any insight or recommendation is appreciated.
: 
: Including the replication steps as text:
: 
: -
: Solr versions where issue was replicated.
:   * 4.5.1 (Linux)
:   * 4.8.1 (Windows + Cygwin)
: 
: Replicating
: 
:   1. Created two-shard environment - no replication 
:  
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
: 
:  a. Download Solr distribution from 
http://lucene.apache.org/solr/downloads.html 
:  b. Unzipped solr-4.8.1.zip to a temporary location:  
:  c. Ran once so the SolrCloud jars get unpacked: java -jar start.jar
:  d. Create nodes
:   i. cd 
:   ii. Via Windows Explorer copied example to node1
:   iii. Via Windows Explorer copied example to node2
: 
:  e. Start Nodes 
:   i. Start node 1
: 
:cd node1
:java -DzkRun -DnumShards=2 
-Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar 
start.jar
: 
:   ii. Start node 2
: 
:cd node2
:java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
: 
:  f. Fed sample documents
:   i. Out of the box
: 
:curl http://localhost:8983/solr/update?commit=true -H 
"Content-Type: text/xml" -d "@mem.xml"
:curl http://localhost:7574/solr/update?commit=true -H 
"Content-Type: text/xml" -d "@monitor2.xml"
: 
:   ii. Create a copy of mem.xml to mem2.xml; modified identifiers, 
names, prices and fed
: 
:curl http://localhost:8983/solr/update?commit=true -H 
"Content-Type: text/xml" -d "@mem2.xml"
: 
:
:  
:COMPANY1
:COMPANY1 Device
:COMPANY1 Device Mfg
:.
:190
:.
:  
:  
:COMPANY2
:COMPANY2 flatscreen
:COMPANY2 Device Mfg.
:.
:200.00
:.
:  
:  
:COMPANY3
:COMPANY3 Laptop
:COMPANY3 Device Mfg.
:.
:800.00
:.
:  
:  
:  
: 
:   2. Query **without** f.price.facet.mincount=1, counts and buckets are OK
: 
:  
http://localhost:8983/solr/collection1/select?q=*:*&fl=id,price&sort=id+asc&facet=true&facet.range=price&f.price.facet.range.start=0&f.price.facet.range.end=1000&f.price.facet.range.gap=50&f.price.facet.range.other=all&f.price.facet.range.include=upper&spellcheck=false&hl=false
:  
:  Only six documents have prices
:  
:   
: 
:   
: 0
: 1