Yonik, I don't mean to be argumentative - just trying to understand, what is the difference between distributed search across processors, and distributed search across boxes (again, assuming that my searches are truly CPU bound)? My only basis for comparison is sphinx, which I was able to get to run in parallel across multiple processors just the same as I would across boxes. With sphinx there was overhead as well in farming out the searches and then combining the results, but as the bulk of the time (for the kind of searches and the kind of index I'm running) was spent processing, it was a net win (I saw roughly a factor of n speedup, where n was the number of processors/shards).
Thanks again, for all your help, this has been really useful so far. -Harish yonik wrote: > > On Thu, Jan 8, 2009 at 9:25 PM, smock <harish.agar...@gmail.com> wrote: >> I should have more than enough RAM to fit the index in, I don't think my >> searches will be IO bound. > > There is still overhead to distributed search - if the actual CPU > bound search/faceting stuff isn't your bottleneck, or if the index is > too small, the overhead won't be a net win. Distributed search was > not really designed to utilize multiple processors (we should probably > do that in a single Solr server if needed), it was designed to go > across multiple boxes. > > -Yonik > > >> One question - just to make sure I understand - did you use one Jetty >> instance per shard? In my case, what I'm doing is using one Tomcat >> instance >> to run multiple Solr webapps. I'm not sure if this makes a difference, >> in >> term of processor usage as I don't understand the internal workings of >> Tomcat serving up Solr (in other words, if Tomcat will be able to run the >> different Solr instances on different processors, or if its all bound to >> the >> processor Tomcat is using). >> >> Thanks for your help! >> -Harish >> >> >> Mike Klaas wrote: >>> >>> On 8-Jan-09, at 3:37 PM, smock wrote: >>> >>>> >>>> Assuming I have enough RAM then, should I be able to get a >>>> performance boost >>>> with my current setup? Basically, the question I am trying to >>>> answer is - >>>> will the Tomcat+Solr setup I have above utilize multiple processors >>>> or do I >>>> need to do something else (like having a different tomcat instance >>>> for each >>>> Solr shard)? >>>> >>>> Also - and this question comes purely out of my own ignorance of how >>>> the >>>> Tomcat/Solr relationship works - right now I'm starting Tomcat >>>> specifying >>>> the maximum memory size. I'm also setting cache parameters in >>>> solrconfig.xml for each solr instance to half of what I would for a >>>> full >>>> size index. Shouldn't the JVMs for both instances use roughly the >>>> same >>>> total amount of memory as 1 JVM for the full size index? >>>> >>>> While I'm testing things out on a 2 processor machine, I'll >>>> eventually be >>>> using an 8 proc. machine with plenty of RAM to cache the index in >>>> RAM. I'm >>>> not super worried about requests/sec. right now - I'd rather each >>>> individual >>>> search be faster, which is why I'm interested in distributing the >>>> index >>>> across my 8 procs. >>> >>> As Yonik mentioned, it depends greatly on the size of the index/RAM >>> ratio. I don't see any reason why, in theory, two Solrs in a single >>> Tomcat could not both work on a single query in parallel, but I've >>> never tried it. I _have_ had success sharding Solr on a single using >>> a webapp container per Solr instance (in my case, Jetty). >>> >>> Note that if these instances are sharing a single disk, and your RAM >>> is low, then they will be competing over the slowest resource on your >>> machine and the query could be IO bound, in which case sharding is >>> useless. >>> >>> -Mike >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Solr-on-a-multiprocessor-machine-tp21360747p21365126.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Solr-on-a-multiprocessor-machine-tp21360747p21365429.html Sent from the Solr - User mailing list archive at Nabble.com.