Hi Joel,

I did not observe that. On calling close() on stream, it cycled through all
the hits that /export handler calculated.
e.g. with a *:* query and export handler on a 100k document index, I could
see the 100kth record printed on the http wire debug log although close was
called after reading 1st tuple. The time taken for the operation with
close() call was same as that if I had read all the 100k tuples.
As I have pointed out, close() on underlying ChunkedInputStream calls
read() and solr server has probably no way to distinguish it from read()
happening from regular tuple reads..
I think there should be an abort() API for solr streams that hooks into
httpmethod.abort() . That would enable client to disconnect early and
probably that would disconnect the underlying socket so there would be no
leaks.

Thanks,
Susmit


On Sat, May 13, 2017 at 7:42 AM, Joel Bernstein <joels...@gmail.com> wrote:

> If the client closes the connection to the export handler then this
> exception will occur automatically on the server.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sat, May 13, 2017 at 1:46 AM, Susmit Shukla <shukla.sus...@gmail.com>
> wrote:
>
> > Hi Joel,
> >
> > Thanks for the insight. How can this exception be thrown/forced from
> client
> > side. Client can't do a System.exit() as it is running as a webapp.
> >
> > Thanks,
> > Susmit
> >
> > On Fri, May 12, 2017 at 4:44 PM, Joel Bernstein <joels...@gmail.com>
> > wrote:
> >
> > > In this scenario the /export handler continues to export results until
> it
> > > encounters a "Broken Pipe" exception. This exception is trapped and
> > ignored
> > > rather then logged as it's not considered an exception if the client
> > > disconnects early.
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Fri, May 12, 2017 at 2:10 PM, Susmit Shukla <
> shukla.sus...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have a question regarding solr /export handler. Here is the
> scenario
> > -
> > > > I want to use the /export handler - I only need sorted data and this
> is
> > > the
> > > > fastest way to get it. I am doing multiple level joins using streams
> > > using
> > > > /export handler. I know the number of top level records to be
> retrieved
> > > but
> > > > not for each individual stream rolling up to the final result.
> > > > I observed that calling close() on a /export stream is too expensive.
> > It
> > > > reads the stream to the very end of hits. Assuming there are 100
> > million
> > > > hits for each stream ,first 1k records were found after joins and we
> > call
> > > > close() after that, it would take many minutes/hours to finish it.
> > > > Currently I have put close() call in a different thread - basically
> > fire
> > > > and forget. But the cluster is very strained because of the
> > unneccessary
> > > > reads.
> > > >
> > > > Internally streaming uses ChunkedInputStream of HttpClient and it has
> > to
> > > be
> > > > drained in the close() call. But from server point of view, it should
> > > stop
> > > > sending more data once close() has been issued.
> > > > There is a read() call in close() method of ChunkedInputStream that
> is
> > > > indistinguishable from real read(). If /export handler stops sending
> > more
> > > > data after close it would be very useful.
> > > >
> > > > Another option would be to use /select handler and get into business
> of
> > > > managing a custom cursor mark that is based on the stream sort and is
> > > reset
> > > > until it fetches the required records at topmost level.
> > > >
> > > > Any thoughts.
> > > >
> > > > Thanks,
> > > > Susmit
> > > >
> > >
> >
>

Reply via email to