Hello, I have some updates on this, but it's still not very clear for me how to move forward.
The good news is, that between sources and decorators, data seems to be really streamed. I hope I tested this the right way, by simply adding a log message to ReducerStream saying "hey, I got this tuple". Now I have two nodes, nodeA with data and nodeB with a dummy collection. If I hit nodeB's /stream endpoint and ask it for, say, a unique() to wrap the previously mentioned expression (with 1s sleeps), I see a log from ReducerStream every second. Good. Now, the final result (to the client, via curl), only gets to me after N seconds (where N is the number of results I get). I did some more digging on this front, too. Let's assume we have chunked encoding re-enabled (that's a must) and no other change (if I flush() the FastWriter, say, after every tuple, then I get every tuple as it's computed, but I'm trying to explore the buffers). I've noticed the following: - the first response comes after ~64K, then I get chunks of 32K each - at this point, if I set response.setBufferSize() in HttpSolrCall.writeResponse() to a small size (say, 128), I get the first reply after 32K and then 8K chunks - I thought that maybe in this context I could lower BUFSIZE in FastWriter, but that didn't seem to make any change :( That said, I'm not sure it's worth looking into these buffers any deeper, because shrinking them might negatively affect other results (e.g. regular searches or facets). It sounds like the way forward would be that manual flushing, with chunked encoding enabled. I could imagine adding some parameters in the lines of "flush every N tuples or M milliseconds", that would be computed per-request, or at least globally to the /stream handler. What do you think? Would such a patch be welcome, to add these parameters? But it still requires chunked encoding - would reverting SOLR-8669 be a problem? Or maybe there's a more elegant way to enable chunked encoding, maybe only for streams? Best regards, Radu -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Mon, Jan 15, 2018 at 10:58 AM, Radu Gheorghe <radu.gheor...@sematext.com> wrote: > Hello fellow solr-users! > > Currently, if I do an HTTP request to receive some data via streaming > expressions, like: > > curl --data-urlencode 'expr=search(test, > q="foo_s:*", > fl="foo_s", > sort="foo_s asc", > qt="/export")' > http://localhost:8983/solr/test/stream > > I get all results at once. This is more obvious if I simply introduce > a one-second sleep in CloudSolrStream: with three documents, the > request takes about three seconds, and I get all three docs after > three seconds. > > Instead, I would like to get documents in a more "streaming" way. For > example, after X seconds give me what you already have. Or if an > Y-sized buffer fills up, give me all the tuples you have, then resume. > > Any ideas/opinions in terms of how I could achieve this? With or > without changing Solr's code? > > Here's what I have so far: > - this is normal with non-chunked HTTP/1.1. You get all results at > once. If I revert this patch[1] and get Solr to use chunked encoding, > I get partial results every... what seems to be a certain size between > 16KB and 32KB > - I couldn't find a way to manually change this... what I assume is a > buffer size, but failed so far. I've tried changing Jetty's > response.setBufferSize() in HttpSolrCall (maybe the wrong place to do > it?) and also tried changing the default 8KB buffer in FastWriter > - manually flushing the writer (in JSONResponseWriter) gives the > expected results (in combination with chunking) > > The thing is, even if I manage to change the buffer size, I assume > that will apply to all requests (not just streaming expressions). I > assume that ideally it would be configurable per request. As for > manual flushing, that would require changes to the streaming > expressions themselves. Would that be the way to go? What do you > think? > > [1] https://issues.apache.org/jira/secure/attachment/12787283/SOLR-8669.patch > > Best regards, > Radu > -- > Performance Monitoring * Log Analytics * Search Analytics > Solr & Elasticsearch Support * http://sematext.com/