Re: Facet in SOLR Cloud vs Core

Pablo Anzorena Thu, 07 Jul 2016 12:01:38 -0700

As long as you don't shard your index, you will have no problem migrating
to solrcloud.


The problem with the shards appears in the following scenario (note that
the problem below also applies in a solr standalone enviroment with
distributed search):

Shard1: DATA_SOURCE1 (3 docs), DATA_SOURCE2 (2 docs), DATA_SOURCE3 (2 docs).
Shard2: DATA_SOURCE3 (2 docs), DATA_SOURCE2 (1 docs).

If you make a distributed search across these two shards, faceting
dataSourceName with a limit of 1, it will ask for the top 1 in the first
shard (DATA_SOURCE1 (3 docs)) and for the top 1 in the second shard
(DATA_SOURCE3
(2 docs)). After that it will merge the results and return DATA_SOURCE1 (3
docs), when it should have return DATA_SOURCE3 (4 docs).

Summarizing: if you make a distributed search with a facet.limit, there is
a chance that the count is not correct (it also applies to stats).

2016-07-07 15:28 GMT-03:00 Whelan, Andy <awhe...@srcinc.com>:

> Hello,
>
> I have am somewhat of a novice when it comes to using SOLR in a
> distributed SolrCloud environment. My team and I are doing development work
> with a SOLR core. We will shortly be transitioning over to a SolrCloud
> environment.
>
> My question specifically has to do with Facets in a SOLR cloud/collection
> (distributed environment). The core I am working with has a field
> "dataSourceName" defined as following in its schema.xml file.
>
> <field name="dataSourceName" type="string" indexed="true" stored="true"
> required="true"/>
>
> I am using the following facet query which works fine in more Core based
> index
>
>
> http://localhost:8983/solr/gamra/select?q=*:*&rows=0&facet=true&facet.field=dataSourceName
>
> It returns counts for each distinct dataSourceName as follows (which is
> the desired behavior).
>
> <lst name="facet_fields">
>        <lst name="dataSourceName">
>           <int name="DATA_SOURCE1">169</int>
>           <int name=" DATA_SOURCE2">121</int>
>           <int name=" DATA_SOURCE3">68</int>
>        </lst>
> </lst>
>
> I am wondering if this should work fine in the SOLR Cloud as well?  Will
> this method give me accurate counts out of the box in a SOLR Cloud
> configuration?
>
> Thanks
> -Andrew
>
> PS: The reason I ask is because I know there is some estimating performed
> in certain cases for the Facet "unique" function (as is outlined here:
> http://yonik.com/solr-count-distinct/ ). So I guess I am wondering why
> folks wouldn't just do what I have done vs going throught the trouble of
> using the unique(dataSourceName) function?
>
>
>

Re: Facet in SOLR Cloud vs Core

Reply via email to