Thanks for the quick response. We are generally seeing exports from Solr 5 and 7 to be roughly the same, but I’ll check out Solr 8.
Joel - We are generally sorting a on tlong field and criteria can vary from searching everything (*:*) to searching on a combination of a few tint and string types. All of our 16 fields are docvalues. Is there any performance degradation as the number of docvalues fields increases or should that not have an impact? Also, is the 30k sliding window configurable? In many cases we are streaming back a few thousand, maybe up to 10k and then cutting off the stream. If we could configure the size of that window, could that speed things up some? Thanks again for the info. On Sat, May 11, 2019 at 2:38 PM Joel Bernstein <joels...@gmail.com> wrote: > Can you share the sort criteria and search query? The main strategy for > improving performance of the export handler is adding more shards. This is > different than with typical distributed search, where deep paging issues > get worse as you add more shards. With the export handler if you double the > shards you double the pushing power. There are no deep paging drawbacks to > adding more shards. > > On Sat, May 11, 2019 at 2:17 PM Toke Eskildsen <t...@kb.dk> wrote: > > > Justin Sweeney <justin.sweene...@gmail.com> wrote: > > > > [Index: 10 shards, 450M docs] > > > > > We are creating a CloudSolrStream and when we call > CloudSolrStream.open() > > > we see that call being slower than we had hoped. For some queries, that > > > call can take 800 ms. [...] > > > > As far as I can see in the code, CloudSolrStream.open() opens streams > > against the relevant shards and checks if there is a result. The last > step > > is important as that means the first batch of tuples must be calculated > in > > the shards. Streaming works internally by having a sliding window of 30K > > tuples through the result set in each shard, so open() results in (up to) > > 30K tuples being calculated. On the other hand, getting the first 30K > > tuples should be very fast after open(). > > > > > We are currently using Solr 5, but we’ve also tried with Solr 7 and > seen > > > similar results. > > > > Solr 7 has a performance regression for export (or rather a regression > for > > DocValues that is very visible when using export. See > > https://issues.apache.org/jira/browse/SOLR-13013), so I would expect it > > to be slower than Solr 5. You could try with Solr 8 where this regression > > should be mitigated somewhat. > > > > - Toke Eskildsen > > >