Re: Partial results from streaming expressions (i.e. making them "stream")

Radu Gheorghe Wed, 17 Jan 2018 05:54:41 -0800

Hello,

I have some updates on this, but it's still not very clear for me how
to move forward.


The good news is, that between sources and decorators, data seems to
be really streamed. I hope I tested this the right way, by simply
adding a log message to ReducerStream saying "hey, I got this tuple".
Now I have two nodes, nodeA with data and nodeB with a dummy
collection. If I hit nodeB's /stream endpoint and ask it for, say, a
unique() to wrap the previously mentioned expression (with 1s sleeps),
I see a log from ReducerStream every second. Good.

Now, the final result (to the client, via curl), only gets to me after
N seconds (where N is the number of results I get). I did some more
digging on this front, too. Let's assume we have chunked encoding
re-enabled (that's a must) and no other change (if I flush() the
FastWriter, say, after every tuple, then I get every tuple as it's
computed, but I'm trying to explore the buffers). I've noticed the
following:
- the first response comes after ~64K, then I get chunks of 32K each
- at this point, if I set response.setBufferSize() in
HttpSolrCall.writeResponse() to a small size (say, 128), I get the
first reply after 32K and then 8K chunks
- I thought that maybe in this context I could lower BUFSIZE in
FastWriter, but that didn't seem to make any change :(

That said, I'm not sure it's worth looking into these buffers any
deeper, because shrinking them might negatively affect other results
(e.g. regular searches or facets). It sounds like the way forward
would be that manual flushing, with chunked encoding enabled. I could
imagine adding some parameters in the lines of "flush every N tuples
or M milliseconds", that would be computed per-request, or at least
globally to the /stream handler.

What do you think? Would such a patch be welcome, to add these
parameters? But it still requires chunked encoding - would reverting
SOLR-8669 be a problem? Or maybe there's a more elegant way to enable
chunked encoding, maybe only for streams?

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Jan 15, 2018 at 10:58 AM, Radu Gheorghe
<radu.gheor...@sematext.com> wrote:
> Hello fellow solr-users!
>
> Currently, if I do an HTTP request to receive some data via streaming
> expressions, like:
>
> curl --data-urlencode 'expr=search(test,
>                                    q="foo_s:*",
>                                    fl="foo_s",
>                                    sort="foo_s asc",
>                                    qt="/export")'
> http://localhost:8983/solr/test/stream
>
> I get all results at once. This is more obvious if I simply introduce
> a one-second sleep in CloudSolrStream: with three documents, the
> request takes about three seconds, and I get all three docs after
> three seconds.
>
> Instead, I would like to get documents in a more "streaming" way. For
> example, after X seconds give me what you already have. Or if an
> Y-sized buffer fills up, give me all the tuples you have, then resume.
>
> Any ideas/opinions in terms of how I could achieve this? With or
> without changing Solr's code?
>
> Here's what I have so far:
> - this is normal with non-chunked HTTP/1.1. You get all results at
> once. If I revert this patch[1] and get Solr to use chunked encoding,
> I get partial results every... what seems to be a certain size between
> 16KB and 32KB
> - I couldn't find a way to manually change this... what I assume is a
> buffer size, but failed so far. I've tried changing Jetty's
> response.setBufferSize() in HttpSolrCall (maybe the wrong place to do
> it?) and also tried changing the default 8KB buffer in FastWriter
> - manually flushing the writer (in JSONResponseWriter) gives the
> expected results (in combination with chunking)
>
> The thing is, even if I manage to change the buffer size, I assume
> that will apply to all requests (not just streaming expressions). I
> assume that ideally it would be configurable per request. As for
> manual flushing, that would require changes to the streaming
> expressions themselves. Would that be the way to go? What do you
> think?
>
> [1] https://issues.apache.org/jira/secure/attachment/12787283/SOLR-8669.patch
>
> Best regards,
> Radu
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/

Re: Partial results from streaming expressions (i.e. making them "stream")

Reply via email to