Hi Erick, Thanks for your reply. My test environment only has one shard and one replica per collection. So, I think there is no possibility of replicas getting out of sync. Here is how I create each (month-based) collection: http://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_01&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_confhttp://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_02&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_confhttp://192.168.59.103:8983/solr/admin/collections?action=CREATE&name=2014_03&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=main_conf...etc, etc...
Still, I think you are on to something. I had already noticed that querying one collection at a time works. For example, if I change my query oh-so-slightly from this: "....collection=2014_04,2014_03...." to this "...collection=2014_04...." Then, the results are correct 100% of the time. I think substantively this is the same as specifying the name of the shard since, again, in my test environment I only have one shard per collection anyway. I should mention that the "2014_03" collection is empty. 0 documents. All 3 documents which satisfy the facet range are in the "2014_04" collection. So, it's a real head-scratcher that introducing that collection name into the query makes the results misbehave. Kind regards,David On Tuesday, December 16, 2014 2:25 PM, Erick Erickson <erickerick...@gmail.com> wrote: bq: Facet counts include deleted documents until the segments merge Whoa! Facet counts do _not_ require segment merging to be accurate. What merging does is remove the _term_ information associated with deleted documents, and removes their contribution to the TF/IDF scores. David: Hmmm, what happens if you direct the query not only to a single collection, but to a single shard? Add &distrib=false to the query and point it to each of your replicas. (one collection at a time). The expectation is that each replica for a slice within a collection has identical documents. One possibility is that somehow your shards are out of sync on a collection. So the internal load balancing that happens sometimes sends the query to one replica and sometime to another. 2 replicas (leader and follower) and 50% failure, coincidence? That just bumps the question up another level of course, the next question is _why_ is the shard out of sync. So in that case I'd issue a commit to all the collections on the off chance that somehow that didn't happen and try again (very low probability that this is the root cause, but you never know). but it sure sounds like one replica doesn't agree with another, so the above will give us place to look. Best, Erick On Tue, Dec 16, 2014 at 12:12 PM, David Smith <dsmiths...@yahoo.com.invalid> wrote: > Alex, > Good suggestion, but in this case, no. This example is from a cleanroom type > test environment where the collections have very recently been created, there > are only 4 documents total across all collections, and no delete's have been > issued. > Kind regards, > David > > > On Tuesday, December 16, 2014 12:01 PM, Alexandre Rafalovitch ><arafa...@gmail.com> wrote: > > > Facet counts include deleted documents until the segments merge. Could that > be an issue? > > Regards, > Alex > On 16/12/2014 12:18 pm, "David Smith" <dsmiths...@yahoo.com.invalid> wrote: > >> I have a prototype SolrCloud 4.10.2 setup with 13 collections (of 1 >> replica, 1 shard each) and a separate 1-node Zookeeper 3.4.6. >> The very first app test case I wrote is failing intermittently in this >> environment, when I only have 4 documents ingested into the cloud. >> I dug in and found when I query against multiple collections, using the >> "collection=" parameter, the aggregates I request are correct about 50% of >> the time. The other 50% of the time, the aggregate returned by Solr is not >> correct. Note this is for the identical query. In other words, I can run >> the same query multiple times in a row, and get different answers. >> >> The simplest version of the query that still exhibits the odd behavior is >> as follows: >> >> http://192.168.59.103:8985/solr/query_handler/query?facet.range=eventDate&f.eventDate.facet.range.end=2014-12-31T23:59:59.999Z&f.eventDate.facet.range.gap=%2B1DAY&fl=eventDate,id&start=0&collection=2014_04,2014_03&rows=10&f.eventDate.facet.range.start=2014-01-01T00:00:00.000Z&q=*:*&f.eventDate.facet.mincount=1&facet=true >> >> When it SUCCEEDS, the aggregate correctly appears like this: >> >> "facet_counts":{ "facet_queries":{}, "facet_fields":{}, >> "facet_dates":{}, "facet_ranges":{ "eventDate":{ "counts":[ >> "2014-04-01T00:00:00Z",3], "gap":"+1DAY", >> "start":"2014-01-01T00:00:00Z", "end":"2015-01-01T00:00:00Z"}}, >> "facet_intervals":{}}} >> >> When it FAILS, note that the counts[] array is empty: >> "facet_counts":{ "facet_queries":{}, "facet_fields":{}, >> "facet_dates":{}, "facet_ranges":{ "eventDate":{ >> "counts":[], "gap":"+1DAY", "start":"2014-01-01T00:00:00Z", >> "end":"2015-01-01T00:00:00Z"}}, "facet_intervals":{}}} >> >> If I further simplify the query, by removing range options or reducing to >> one (1) collection name, then the problem goes away. >> >> The solr logs are clean at INFO level, and there is no substantive >> difference in log output when the query succeeds vs fails, leaving me >> stumped where to look next. Suggestions welcome. >> Regards, >> David >> >> >> >> >> > >