Justin Sweeney <justin.sweene...@gmail.com> wrote:

[Index: 10 shards, 450M docs]

> We are creating a CloudSolrStream and when we call CloudSolrStream.open()
> we see that call being slower than we had hoped. For some queries, that
> call can take 800 ms. [...]

As far as I can see in the code, CloudSolrStream.open() opens streams against 
the relevant shards and checks if there is a result. The last step is important 
as that means the first batch of tuples must be calculated in the shards. 
Streaming works internally by having a sliding window of 30K tuples through the 
result set in each shard, so open() results in (up to) 30K tuples being 
calculated. On the other hand, getting the first 30K tuples should be very fast 
after open().

> We are currently using Solr 5, but we’ve also tried with Solr 7 and seen
> similar results.

Solr 7 has a performance regression for export (or rather a regression for 
DocValues that is very visible when using export. See 
https://issues.apache.org/jira/browse/SOLR-13013), so I would expect it to be 
slower than Solr 5. You could try with Solr 8 where this regression should be 
mitigated somewhat.

- Toke Eskildsen

Reply via email to