Here a live example



[yago@dev-1 ~]$ time curl -g 
"http://dev-1:8983/solr/collection-perf/query?rows=0&q=date:[20150101%20TO%2020150115]&json.facet={label:{type:terms,field:url_encoded,limit:-1,sort:{index:asc},facet:{user:'hll(user_id)'}}}"
 > dump




  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100 90.7M    0 90.7M    0     0  1039k      0 --:--:--  0:01:29 --:--:-- 21.2M




real    1m29.387s

user    0m0.065s

sys     0m0.338s




[yago@dev-1 ~]$ time curl -g 
"http://dev-1/solr/collection-perf/query?rows=0&q=date:[20150101%20TO%2020150115]&json.facet={label:{type:terms,field:url_encoded,limit:-1,sort:{index:asc},method:stream,facet:{user:'hll(user_id)'}}}"
 > dump-stream




  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100 90.7M    0 90.7M    0     0  9276k      0 --:--:--  0:00:10 --:--:-- 22.6M




real    0m10.026s

user    0m0.038s

sys     0m0.245s





[yago@dev-1 ~]$ diff dump dump-stream

[yago@dev-1 ~]$




—/Yago Riveiro

On Tue, Dec 22, 2015 at 3:57 PM, Yago Riveiro <yago.rive...@gmail.com>
wrote:

> The collection is a 12 shards distributed to 12 physical nodes (24G heap 
> each, 32G RAM) (no replication). all cache are disable in solrconfig.xml, The 
> rate of indexing is about 2000 docs/s, this transform cache useless 
> At the time of the perf test the amount of docs were 34M (now is 54 but the 
> set will grow to 600 millions more or less) with 7M (and growing) unique 
> keys. I’m indexing docs with an url and an user_id.
> {
> name: “url_encoded",
> type: "string",
> docValues: true,
> indexed: true,
> stored: true
> },
> {
> name: “user_id",
> type: "tlong",
> docValues: true,
> multiValued: false,
> indexed: true,
> stored: true
> },
> The query is simple, aggregate by url with a subfacet to each url to 
> calculate the estimate unique users
> I’m using Solr 5.3.1.
> - Normal query (I guess uses under the hood the DVs): 
> json.facet={url:{type:terms,field:url,limit:-1,sort:{index:asc},facet:{users:’hll(user_id)'}}}
> - Streaming query:  
> json.facet={url:{type:terms,field:url,limit:-1,sort:{index:asc},facet:{users:’hll(user_id)’},
>  method:stream}}
> This is a perf test to see if sorl has the capacity to aggregate the 600M url 
> with the unique users and the average response time (minutes is acceptable, 
> but less as possible is desirable)
> —/Yago Riveiro
> On Tue, Dec 22, 2015 at 3:27 PM, Yonik Seeley <ysee...@gmail.com> wrote:
>> On Tue, Dec 22, 2015 at 6:06 AM, Yago Riveiro <yago.rive...@gmail.com> wrote:
>>> I’m surprised with the difference of speed between DV and stream, the same 
>>> query (aggregate 7M unique keys) with stream method takes 21s and with DV 
>>> is about 3 minutes ...
>> Wow - is this a "real" DV field, or one that was built on-demand in
>> the FieldCache?  Were those times for the first request, or subsequent
>> requests?
>> What are the characteristics of that field... i.e. how many unique
>> values in the shard (local index being queried) and how many typical
>> values per field?
>> And how many docs total on the shard?
>> -Yonik

Reply via email to