Thanks for the reply Erik,

Yep, the project is working with distributed Solr applications (i.e.
shards) but not with the Solr supplied shard implementation, rather a
custom version (not very different to it to be honest).
I understand that Solr has scoring at it's heart which is something we are
not dealing with at the moment, but will no doubt have to solve at some
later stage! One reasonable "solution" to this would be to return data to a
user unsorted, but as quickly as possible. The client can then decide what
lapse-time/score trade-off it wants to work with as speed may be more
critical than accuracy.

Clearly this is not going to be an easy change (even though I was sure it
would be! hah) but I don't mind exposing some methods/fields to the
handler/components in the 4.x code base if that could solve my issue.
However, I really don't know where to start with this! I'm hoping that all
I will have to do is call a flush() method on an output stream SOMEWHERE
and the data already written will be sent before my next chunk of data.

Any ideas on where I should start looking for this
(classes/packages/methods) and how you might go about exposing this
functionality to a handler/component/plugin?

Thanks again,
Nick


On Thu, 15 Mar 2012 12:39:45 -0500, Erick Erickson
<erickerick...@gmail.com> wrote:
> Somehow you'd have to create a custom collector, probably queue off the
> docs
> that made it to the collector and have some asynchronous thread
consuming
> those docs and sending them in bits...
> 
> But this is so antithetical to how Solr operates that I suspect my
> hand-waving
> wouldn't really work out. The problem is, at heart, that Solr assumes
that
> scoring matters. The scheme you've outlined has no way of knowing which
> documents are most important until after you've already queued *all*
> documents for return. Even the queueing will take forever, consider
> the query q=*:*. You're essentially queueing all the docs in the index
> to return.
> 
> If you're talking about the other sort of distributed search (e.g.
> Shards), that's
> already built in to Solr, though admittedly the aggregator (whichever
> server
> distributes the requests in the first place) waits for a response from
all
> the
> shards before assembling the response. It seems like you might leverage
> some of that.
> 
> But I really don't understand the end goal very well here....
> 
> Best
> Erick
> 
> On Wed, Mar 14, 2012 at 7:17 PM, Nicholas Ball
> <nicholas.b...@nodelay.com> wrote:
>> Hello all,
>>
>> I've been working on a plugin with a custom component and a few
handlers
>> for a research project. It's aim is to do some interesting distributed
>> work, however I seem to have come to a road block when trying to
respond
>> to a clients request in multiple steps. Not even sure if this is
possible
>> with Solr but after no luck on the IRC channel, thought I'd ask here.
>>
>> What I'd like to achieve is to be able to have the requestHandler
return
>> results to a user as soon as it has data available, then continue
>> processing or performing other distributed calls, and then return some
>> more data, all on the same single client request.
>>
>> Now my understanding is that solr does some kind of streaming. Not sure
>> how it's technically done over http in Solr so any information would be
>> useful. I believe something like this would work well but again not
sure:
>>
>> http://en.m.wikipedia.org/wiki/Chunked_transfer_encoding
>>
>> I also came across this issue/feature request in JIRA but not
completely
>> sure what the conclusion was or how someone might do/use this. Is it
even
>> relevant to what I'm looking for?
>>
>> https://issues.apache.org/jira/browse/SOLR-578
>>
>> Thank you very much for any help and time you can spare!
>>
>> Nicholas (incunix)
>>
>>

Reply via email to