Thanks!

Running the same code in cloud mode worked nicely almost right away. Getting it 
to work in non-cloud mode is still non-trivial. I can get the DocList in 
process(), but AFAIK it just provides Lucene docIds, not a nice DocumentList we 
could work with.

The use-case is straightforward, the resultset contains id's. I collect them 
and do a bulk getById to another Solr index. Via fl-specified retrieved fields 
from the remote index are added to the resultset, enriching each document in 
the server, without intervening middleware.

All our server run in cloud mode, so getting it to work in local mode is just a 
convenience when developing. We have quite a few components that run in cloud 
and non-cloud mode. Non-cloud mode is for some reason almost always harder to 
implement, sometimes even at Lucene level with IndexSearcher, hand crafted 
queries and all.

Thanks again, it runs as a charm.
Markus

 
-----Original message-----
> From:Chris Hostetter <hossman_luc...@fucit.org>
> Sent: Tuesday 13th December 2016 23:27
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Traverse over response docs in SearchComponent impl.
> 
> 
> FWIW: Perhaps an XY problem?  can you explain more in depth what it is you 
> plan on doing in this search component?
> 
> : I can see that Solr calls the component's process() method, but from 
> : within that method, rb.getResponseDocs(); is always null. No matter what 
> : i try, i do not seem to be able to get a hold of that list of response 
> : docs.
> 
> IIRC getResponseDocs() is only non-null when agregating distributed/cloud 
> resultsfrom multiple shards (where we already have a fully 
> populated SolrDocumentList due to agregating the remote responses), but in 
> a single-node Solr request only a "DocList" is used, and the stored field 
> values are read lazily from the IndexReader by the ResponseWriter.
> 
> So if you're not writting a distributed component, check 
> ResponseBuilder.getResults() ?
> 
> Even if you are writting a component for a distributed solr setup, what 
> method you call (and where you call it) depends a lot on when/where you 
> expect your code to run...
> 
> IIRC: 
> * prepare() runs on every node for every request (original aggregation 
> request and every sub-request to each shard).  
> * distributedProcess runs on the aggregation node, and is called 
> repeatedly for each "stage" requested by any components (so at a minimum 
> once, 
> usually twice to fetch stored fields, maybe more if there are multiple 
> facet refinement phases, etc...).  
> * modifyRequest() & handleResponses() are called on the aggregation node 
> prior/after every sub-request to every shard.
> * process() is called on each shard for each sub request. 
> * finishStage is called on the aggreation node at the ned of each stage 
> (after all the responses from all shards for that sub-request)
> 
> 
> ...so something like HighlightComponent does it's main work in the 
> process() method, because it only needs the data for each doc, the impacts 
> of other (aggregated) docs don't affect the results -- then later 
> finishStage combines the results.
> 
> If you on the otherhand want to look at all of the *final* documents being 
> returned to the user, not on a per-shard basis but on an aggregate basis, 
> you'd want to put that logic in something like finishStage and check for 
> the stage that does a GET_FIELDS -- but if you want your component to 
> *also* work in non-cloud mode, you'd need the same logic in your process() 
> method (looking at the DocList instead of the SolrDocumentList, with a 
> conditional to check for distrib=false so you don't waste a bunch of work 
> on per-shard queries when it is in fact being used in cloud-mode)
> 
> 
> None of this is very straight forward, but you are admitedly geting int 
> overy advanced expert territory here.
> 
> 
> 
> -Hoss
> http://www.lucidworks.com/
> 

Reply via email to