In this scenario the /export handler continues to export results until it encounters a "Broken Pipe" exception. This exception is trapped and ignored rather then logged as it's not considered an exception if the client disconnects early.
Joel Bernstein http://joelsolr.blogspot.com/ On Fri, May 12, 2017 at 2:10 PM, Susmit Shukla <shukla.sus...@gmail.com> wrote: > Hi, > > I have a question regarding solr /export handler. Here is the scenario - > I want to use the /export handler - I only need sorted data and this is the > fastest way to get it. I am doing multiple level joins using streams using > /export handler. I know the number of top level records to be retrieved but > not for each individual stream rolling up to the final result. > I observed that calling close() on a /export stream is too expensive. It > reads the stream to the very end of hits. Assuming there are 100 million > hits for each stream ,first 1k records were found after joins and we call > close() after that, it would take many minutes/hours to finish it. > Currently I have put close() call in a different thread - basically fire > and forget. But the cluster is very strained because of the unneccessary > reads. > > Internally streaming uses ChunkedInputStream of HttpClient and it has to be > drained in the close() call. But from server point of view, it should stop > sending more data once close() has been issued. > There is a read() call in close() method of ChunkedInputStream that is > indistinguishable from real read(). If /export handler stops sending more > data after close it would be very useful. > > Another option would be to use /select handler and get into business of > managing a custom cursor mark that is based on the stream sort and is reset > until it fetches the required records at topmost level. > > Any thoughts. > > Thanks, > Susmit >