The collection is a 12 shards distributed to 12 physical nodes (24G heap each, 
32G RAM) (no replication). all cache are disable in solrconfig.xml, The rate of 
indexing is about 2000 docs/s, this transform cache useless 




At the time of the perf test the amount of docs were 34M (now is 54 but the set 
will grow to 600 millions more or less) with 7M (and growing) unique keys. I’m 
indexing docs with an url and an user_id.





{
name: “url_encoded",



type: "string",



docValues: true,



indexed: true,



stored: true



},






{
name: “user_id",



type: "tlong",



docValues: true,



multiValued: false,



indexed: true,



stored: true



},





The query is simple, aggregate by url with a subfacet to each url to calculate 
the estimate unique users




I’m using Solr 5.3.1.




- Normal query (I guess uses under the hood the DVs): 
json.facet={url:{type:terms,field:url,limit:-1,sort:{index:asc},facet:{users:’hll(user_id)'}}}

- Streaming query:  
json.facet={url:{type:terms,field:url,limit:-1,sort:{index:asc},facet:{users:’hll(user_id)’},
 method:stream}}




This is a perf test to see if sorl has the capacity to aggregate the 600M url 
with the unique users and the average response time (minutes is acceptable, but 
less as possible is desirable)


—/Yago Riveiro

On Tue, Dec 22, 2015 at 3:27 PM, Yonik Seeley <ysee...@gmail.com> wrote:

> On Tue, Dec 22, 2015 at 6:06 AM, Yago Riveiro <yago.rive...@gmail.com> wrote:
>> I’m surprised with the difference of speed between DV and stream, the same 
>> query (aggregate 7M unique keys) with stream method takes 21s and with DV is 
>> about 3 minutes ...
> Wow - is this a "real" DV field, or one that was built on-demand in
> the FieldCache?  Were those times for the first request, or subsequent
> requests?
> What are the characteristics of that field... i.e. how many unique
> values in the shard (local index being queried) and how many typical
> values per field?
> And how many docs total on the shard?
> -Yonik

Reply via email to