Re: solr /export handler - behavior during close()

Susmit Shukla Sat, 13 May 2017 11:52:20 -0700

Hi Joel,

I was using CloudSolrStream for the above test. Below is the call stack.


at
org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:215)
at
org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInputStream.java:316)
at
org.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:128)
at
org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228)
at
org.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:174)
at sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)
at sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)
at java.io.InputStreamReader.close(InputStreamReader.java:199)
at
org.apache.solr.client.solrj.io.stream.JSONTupleStream.close(JSONTupleStream.java:91)
at
org.apache.solr.client.solrj.io.stream.SolrStream.close(SolrStream.java:186)

Thanks,
Susmit

On Sat, May 13, 2017 at 10:48 AM, Joel Bernstein <joels...@gmail.com> wrote:

> I was just reading the Java docs on the ChunkedInputStream.
>
> "Note that this class NEVER closes the underlying stream"
>
> In that scenario the /export would indeed continue to send data. I think we
> can consider this an anti-pattern for the /export handler currently.
>
> I would suggest using one of the Streaming Clients to connect to the export
> handler. Either CloudSolrStream or SolrStream will both interact with the
> /export handler in a the way that it expects.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sat, May 13, 2017 at 12:28 PM, Susmit Shukla <shukla.sus...@gmail.com>
> wrote:
>
> > Hi Joel,
> >
> > I did not observe that. On calling close() on stream, it cycled through
> all
> > the hits that /export handler calculated.
> > e.g. with a *:* query and export handler on a 100k document index, I
> could
> > see the 100kth record printed on the http wire debug log although close
> was
> > called after reading 1st tuple. The time taken for the operation with
> > close() call was same as that if I had read all the 100k tuples.
> > As I have pointed out, close() on underlying ChunkedInputStream calls
> > read() and solr server has probably no way to distinguish it from read()
> > happening from regular tuple reads..
> > I think there should be an abort() API for solr streams that hooks into
> > httpmethod.abort() . That would enable client to disconnect early and
> > probably that would disconnect the underlying socket so there would be no
> > leaks.
> >
> > Thanks,
> > Susmit
> >
> >
> > On Sat, May 13, 2017 at 7:42 AM, Joel Bernstein <joels...@gmail.com>
> > wrote:
> >
> > > If the client closes the connection to the export handler then this
> > > exception will occur automatically on the server.
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Sat, May 13, 2017 at 1:46 AM, Susmit Shukla <
> shukla.sus...@gmail.com>
> > > wrote:
> > >
> > > > Hi Joel,
> > > >
> > > > Thanks for the insight. How can this exception be thrown/forced from
> > > client
> > > > side. Client can't do a System.exit() as it is running as a webapp.
> > > >
> > > > Thanks,
> > > > Susmit
> > > >
> > > > On Fri, May 12, 2017 at 4:44 PM, Joel Bernstein <joels...@gmail.com>
> > > > wrote:
> > > >
> > > > > In this scenario the /export handler continues to export results
> > until
> > > it
> > > > > encounters a "Broken Pipe" exception. This exception is trapped and
> > > > ignored
> > > > > rather then logged as it's not considered an exception if the
> client
> > > > > disconnects early.
> > > > >
> > > > > Joel Bernstein
> > > > > http://joelsolr.blogspot.com/
> > > > >
> > > > > On Fri, May 12, 2017 at 2:10 PM, Susmit Shukla <
> > > shukla.sus...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I have a question regarding solr /export handler. Here is the
> > > scenario
> > > > -
> > > > > > I want to use the /export handler - I only need sorted data and
> > this
> > > is
> > > > > the
> > > > > > fastest way to get it. I am doing multiple level joins using
> > streams
> > > > > using
> > > > > > /export handler. I know the number of top level records to be
> > > retrieved
> > > > > but
> > > > > > not for each individual stream rolling up to the final result.
> > > > > > I observed that calling close() on a /export stream is too
> > expensive.
> > > > It
> > > > > > reads the stream to the very end of hits. Assuming there are 100
> > > > million
> > > > > > hits for each stream ,first 1k records were found after joins and
> > we
> > > > call
> > > > > > close() after that, it would take many minutes/hours to finish
> it.
> > > > > > Currently I have put close() call in a different thread -
> basically
> > > > fire
> > > > > > and forget. But the cluster is very strained because of the
> > > > unneccessary
> > > > > > reads.
> > > > > >
> > > > > > Internally streaming uses ChunkedInputStream of HttpClient and it
> > has
> > > > to
> > > > > be
> > > > > > drained in the close() call. But from server point of view, it
> > should
> > > > > stop
> > > > > > sending more data once close() has been issued.
> > > > > > There is a read() call in close() method of ChunkedInputStream
> that
> > > is
> > > > > > indistinguishable from real read(). If /export handler stops
> > sending
> > > > more
> > > > > > data after close it would be very useful.
> > > > > >
> > > > > > Another option would be to use /select handler and get into
> > business
> > > of
> > > > > > managing a custom cursor mark that is based on the stream sort
> and
> > is
> > > > > reset
> > > > > > until it fetches the required records at topmost level.
> > > > > >
> > > > > > Any thoughts.
> > > > > >
> > > > > > Thanks,
> > > > > > Susmit
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: solr /export handler - behavior during close()

Reply via email to