Thanks for the reply, Ken – it was your training session that brought the 
dispatcher core approach to my attention in the first place.  

Regarding your deep query point, if you're in a situation where numFound=5000 
and you're trying to output all 5000 records at once – your point seems to 
suggest that you're better off setting rows=5000 instead of chunking by 100.  
Is that correct?   

--  
Hector


On Wednesday, January 11, 2012 at 7:10 PM, Ken Krugler wrote:

> Hi Hector,
>  
> On Jan 9, 2012, at 4:15pm, Hector Castro wrote:
>  
> > Hi,
> >  
> > Has anyone had success with multicore single node Solr configurations that 
> > have one core acting solely as a dispatcher for the other cores? For 
> > example, say you had 4 populated Solr cores – configure a 5th to be the 
> > definitive endpoint with `shards` containing cores 1-4.  
> >  
> > Is there any advantage to this setup over simply having requests 
> > distributed randomly across the 4 populated cores (all with `shards` equal 
> > to cores 1-4)? Is it even worth distributing requests across the cores over 
> > always hitting the same one?
>  
> If you have low query rates, then using a shards approach can improve 
> performance on a multi-core (CPUs here, not Solr cores) setup.
>  
> By distributing the requests, you effectively use all CPU cores in parallel 
> on one request.
>  
> And if you spread your shards across spindles, then you're also maximizing 
> I/O throughput.
>  
> But there are a few issues with this approach:
>  
> - binary fields don't work. The results come back as "@B[<hex address>]", 
> versus the actual data.
> - short fields get "java.lang.Short" text prefixed on every value.
> - deep queries result in lots of extra load. E.g. if you want the 5000th hit 
> then you'll get (5000 * # of shards) hits being collected/returned to the 
> dispatcher. Though only the unique id & score is returned in this case, 
> followed by the second request to get the actual top N hits from the shards.
>  
> And there's something wonky with the way that distributed HTTP requests are 
> queued up & processed - under load, I see IOExceptions where it's always N-1 
> shards that succeed, and one shard request fails. But I don't have a good 
> reproducible case yet to debug.
>  
> -- Ken
>  
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr
>  
>  


Reply via email to