Hello Nicholas,

Looks like we are around the same point. Here is my branch
https://github.com/m-khl/solr-patches/tree/streaming there are only two
commits on top of it.  And here is the test
https://github.com/m-khl/solr-patches/blob/streaming/solr/core/src/test/org/apache/solr/response/ResponseStreamingTest.java
it
streams increasing int ids w/o keeping them in heap. There is also some
unnecessary stuff with Digits() but, work in progress, you know.

My usecase is really specific:
* Let's we have documents inserted with increasing PKs, then I want to
search them, and retrieve results ordered by PK
* but, in this case I can them sorted for free - they are already ordered.
* And I also can have a pretty huge results, because I don't need to store
them in response, but just stream them into output as they occurs on
searching.

Core points:
*
https://github.com/m-khl/solr-patches/blob/streaming/solr/core/src/java/org/apache/solr/servlet/ResponseStreamingRequestParsers.java
steals
servlet outputstream and put it into request
* Solr Component adds PostFilter in a chain, which puts collecting docs
into DocSetStreamer
https://github.com/m-khl/solr-patches/blob/streaming/solr/core/src/java/org/apache/solr/handler/component/ResponseStreamerComponent.java
* and DocSetStreamer writes collecting docs PKs into outputStream
https://github.com/m-khl/solr-patches/blob/streaming/solr/core/src/java/org/apache/solr/response/DocSetStreamer.java

It seems that it can have a huge results with a few of memory. It should be
damn scalable.

So, my plan is output a response, which is readable by intrinsic
distributed search. But, I'm stopped for some reason.

About chunked encoding I have a kind of common sense consideration: I guess
that there is a buffer behind servlet output stream, when output fits, it
forms http response with content-length, on overflow it switches to chunked
encoding. So, I believe but can be wrong (and even lie) that these
machinery should be behind the curtain.

WDYT?


On Thu, Mar 15, 2012 at 4:17 AM, Nicholas Ball <nicholas.b...@nodelay.com>wrote:

> Hello all,
>
> I've been working on a plugin with a custom component and a few handlers
> for a research project. It's aim is to do some interesting distributed
> work, however I seem to have come to a road block when trying to respond to
> a clients request in multiple steps. Not even sure if this is possible with
> Solr but after no luck on the IRC channel, thought I'd ask here.
>
> What I'd like to achieve is to be able to have the requestHandler return
> results to a user as soon as it has data available, then continue
> processing or performing other distributed calls, and then return some more
> data, all on the same single client request.
>
> Now my understanding is that solr does some kind of streaming. Not sure
> how it's technically done over http in Solr so any information would be
> useful. I believe something like this would work well but again not sure:
>
> http://en.m.wikipedia.org/wiki/Chunked_transfer_encoding
>
> I also came across this issue/feature request in JIRA but not completely
> sure what the conclusion was or how someone might do/use this. Is it even
> relevant to what I'm looking for?
>
> https://issues.apache.org/jira/browse/SOLR-578
>
> Thank you very much for any help and time you can spare!
>
> Nicholas (incunix)
>
>
>


-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Reply via email to