We've recently upgraded our SolrCloud (16 shards, 2 replicas) to 6.6.1 on our way to 7 and I'm getting surprising /stream results.
In one example I /select (wt=csv) and /stream [using search(...,wt=javabin)] with a query that gives a resultset size of 541 tuples. The select comes back in under a second. The stream takes 70 seconds. Should I expect this much difference? I then /select and /stream over a query with a resultset size of 3.5M documents. The select takes 14 minutes. The stream takes just under 7 minutes using `curl`. When I use solrj I get Truncated chunk ( expected size: 32768; actual size: 13830)","trace":"org.apache.http.TruncatedChunkException: Truncated chunk ( expected size: 32768; actual size: 13830) at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:200) at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:215) at org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInputStream.java:316) at org.apache.http.conn.BasicManagedEntity.streamClosed(BasicManagedEntity.java:164) at org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228) ... I found a reference to this being from a timeout of the HTTP session in CloudSolrStream but couldn't find a bug in Jira on the topic. Digging around in the source (yay OSS) I found that I could get hold of the ClouldSolrClient and up the SOTimeout so that's working now. The documentation describes /stream as "returning data as soon as available" but there seems to be a HUGE startup latency. Any thoughts on how to reduce that?