The collection is a 12 shards distributed to 12 physical nodes (24G heap each, 32G RAM) (no replication). all cache are disable in solrconfig.xml, The rate of indexing is about 2000 docs/s, this transform cache useless
At the time of the perf test the amount of docs were 34M (now is 54 but the set will grow to 600 millions more or less) with 7M (and growing) unique keys. I’m indexing docs with an url and an user_id. { name: “url_encoded", type: "string", docValues: true, indexed: true, stored: true }, { name: “user_id", type: "tlong", docValues: true, multiValued: false, indexed: true, stored: true }, The query is simple, aggregate by url with a subfacet to each url to calculate the estimate unique users I’m using Solr 5.3.1. - Normal query (I guess uses under the hood the DVs): json.facet={url:{type:terms,field:url,limit:-1,sort:{index:asc},facet:{users:’hll(user_id)'}}} - Streaming query: json.facet={url:{type:terms,field:url,limit:-1,sort:{index:asc},facet:{users:’hll(user_id)’}, method:stream}} This is a perf test to see if sorl has the capacity to aggregate the 600M url with the unique users and the average response time (minutes is acceptable, but less as possible is desirable) —/Yago Riveiro On Tue, Dec 22, 2015 at 3:27 PM, Yonik Seeley <ysee...@gmail.com> wrote: > On Tue, Dec 22, 2015 at 6:06 AM, Yago Riveiro <yago.rive...@gmail.com> wrote: >> I’m surprised with the difference of speed between DV and stream, the same >> query (aggregate 7M unique keys) with stream method takes 21s and with DV is >> about 3 minutes ... > Wow - is this a "real" DV field, or one that was built on-demand in > the FieldCache? Were those times for the first request, or subsequent > requests? > What are the characteristics of that field... i.e. how many unique > values in the shard (local index being queried) and how many typical > values per field? > And how many docs total on the shard? > -Yonik