Hi Joel, I was using CloudSolrStream for the above test. Below is the call stack.
at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:215) at org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInputStream.java:316) at org.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:128) at org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228) at org.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:174) at sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378) at sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193) at java.io.InputStreamReader.close(InputStreamReader.java:199) at org.apache.solr.client.solrj.io.stream.JSONTupleStream.close(JSONTupleStream.java:91) at org.apache.solr.client.solrj.io.stream.SolrStream.close(SolrStream.java:186) Thanks, Susmit On Sat, May 13, 2017 at 10:48 AM, Joel Bernstein <joels...@gmail.com> wrote: > I was just reading the Java docs on the ChunkedInputStream. > > "Note that this class NEVER closes the underlying stream" > > In that scenario the /export would indeed continue to send data. I think we > can consider this an anti-pattern for the /export handler currently. > > I would suggest using one of the Streaming Clients to connect to the export > handler. Either CloudSolrStream or SolrStream will both interact with the > /export handler in a the way that it expects. > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Sat, May 13, 2017 at 12:28 PM, Susmit Shukla <shukla.sus...@gmail.com> > wrote: > > > Hi Joel, > > > > I did not observe that. On calling close() on stream, it cycled through > all > > the hits that /export handler calculated. > > e.g. with a *:* query and export handler on a 100k document index, I > could > > see the 100kth record printed on the http wire debug log although close > was > > called after reading 1st tuple. The time taken for the operation with > > close() call was same as that if I had read all the 100k tuples. > > As I have pointed out, close() on underlying ChunkedInputStream calls > > read() and solr server has probably no way to distinguish it from read() > > happening from regular tuple reads.. > > I think there should be an abort() API for solr streams that hooks into > > httpmethod.abort() . That would enable client to disconnect early and > > probably that would disconnect the underlying socket so there would be no > > leaks. > > > > Thanks, > > Susmit > > > > > > On Sat, May 13, 2017 at 7:42 AM, Joel Bernstein <joels...@gmail.com> > > wrote: > > > > > If the client closes the connection to the export handler then this > > > exception will occur automatically on the server. > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > On Sat, May 13, 2017 at 1:46 AM, Susmit Shukla < > shukla.sus...@gmail.com> > > > wrote: > > > > > > > Hi Joel, > > > > > > > > Thanks for the insight. How can this exception be thrown/forced from > > > client > > > > side. Client can't do a System.exit() as it is running as a webapp. > > > > > > > > Thanks, > > > > Susmit > > > > > > > > On Fri, May 12, 2017 at 4:44 PM, Joel Bernstein <joels...@gmail.com> > > > > wrote: > > > > > > > > > In this scenario the /export handler continues to export results > > until > > > it > > > > > encounters a "Broken Pipe" exception. This exception is trapped and > > > > ignored > > > > > rather then logged as it's not considered an exception if the > client > > > > > disconnects early. > > > > > > > > > > Joel Bernstein > > > > > http://joelsolr.blogspot.com/ > > > > > > > > > > On Fri, May 12, 2017 at 2:10 PM, Susmit Shukla < > > > shukla.sus...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I have a question regarding solr /export handler. Here is the > > > scenario > > > > - > > > > > > I want to use the /export handler - I only need sorted data and > > this > > > is > > > > > the > > > > > > fastest way to get it. I am doing multiple level joins using > > streams > > > > > using > > > > > > /export handler. I know the number of top level records to be > > > retrieved > > > > > but > > > > > > not for each individual stream rolling up to the final result. > > > > > > I observed that calling close() on a /export stream is too > > expensive. > > > > It > > > > > > reads the stream to the very end of hits. Assuming there are 100 > > > > million > > > > > > hits for each stream ,first 1k records were found after joins and > > we > > > > call > > > > > > close() after that, it would take many minutes/hours to finish > it. > > > > > > Currently I have put close() call in a different thread - > basically > > > > fire > > > > > > and forget. But the cluster is very strained because of the > > > > unneccessary > > > > > > reads. > > > > > > > > > > > > Internally streaming uses ChunkedInputStream of HttpClient and it > > has > > > > to > > > > > be > > > > > > drained in the close() call. But from server point of view, it > > should > > > > > stop > > > > > > sending more data once close() has been issued. > > > > > > There is a read() call in close() method of ChunkedInputStream > that > > > is > > > > > > indistinguishable from real read(). If /export handler stops > > sending > > > > more > > > > > > data after close it would be very useful. > > > > > > > > > > > > Another option would be to use /select handler and get into > > business > > > of > > > > > > managing a custom cursor mark that is based on the stream sort > and > > is > > > > > reset > > > > > > until it fetches the required records at topmost level. > > > > > > > > > > > > Any thoughts. > > > > > > > > > > > > Thanks, > > > > > > Susmit > > > > > > > > > > > > > > > > > > > > >