Re: Solr + Tomcat Undeploy Leaks

Mike Klaas Fri, 19 Oct 2007 11:18:09 -0700

On 19-Oct-07, at 7:19 AM, Ed Summers wrote:

On 10/18/07, Mike Klaas <[EMAIL PROTECTED]> wrote:

I realize this is a bit off-topic -- but I'm curious what the
rationale was behind having that many solr instances on that many
machines and how they are coordinated. Is it a master/slave setup or
are they distinct indexes? Any further details about your architecture
would be interesting to read about :-)

Rationale? Performance! I can't divulge the exact size of ourcorpus, but it is between zero and 1 billion web documents. Tosearch that many documents efficiently requires distributing overmany machines.

Most of the architecture is not Solr-related, but it is prettystandard large-scale search engine stuff (namely, distributingdocuments using some sort of unique hash across multiple machines).I'm sure Nutch's design is similar, and there are several academicpapers on the subject.

Solr plays the role of index at the nodes--it isn't the primarydocument storage. Each individual index doesn't look so differentfrom a typical-size Solr index: the main differences are 1) splittingthe stored fields among two Solr apps running in a single jvm for ioperformance (for highlighting) 2) scoring/query tweaks.


cheers,
-Mike

Re: Solr + Tomcat Undeploy Leaks

Reply via email to