Re: Solr in a distributed multi-machine high-performance environment

Shalin Shekhar Mangar Wed, 16 Jan 2008 05:41:09 -0800

Look at http://issues.apache.org/jira/browse/SOLR-303


Please note that it is still work in progress. So you may not be able to use
it immeadiately.

On Jan 16, 2008 10:53 AM, Srikant Jakilinki <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> There is a requirement in our group of indexing and searching several
> millions of documents (TREC) in real-time and millisecond responses.
> For the moment we are preferring scale-out (throw more commodity
> machines) approaches rather than scale-up (faster disks, more
> RAM). This is in-turn inspired by the "Scale-out vs. Scale-up" paper
> (mail me if you want a copy) in which it was proven that this kind of
> distribution scales better and is more resilient.
>
> So, are there any resources available (Wiki, Tutorials, Slides, README
> etc.) that throw light and guide newbies on how to run Solr in a
> multi-machine scenario? I have gone through the mailing lists and site
> but could not really find any answers or hands-on stuff to do so. An
> adhoc guideline to get things working with 2 machines might just be
> enough but for the sake of thinking out loud and solicit responses
> from the list, here are my questions:
>
> 1) Solr that has to handle a fairly large index which has to be split
> up on multiple disks (using Multicore?)
> - Space is not a problem since we can use NFS but that is not
> recommended as we would only exploit 1 processor
> 2) Solr that has to handle a large collective index which has to be
> split up on multi-machines
> - The index is ever increasing (TB scale) and dynamic and all of it
> has to be searched at any point
> 3) Solr that has to exploit multi-machines because we have plenty of
> them in a tightly coupled P2P scenario
> - Machines are not a problem but will they be if they are of varied
> configurations (PIII to Core2; Linux to Vista; 32-bit to 64-bit; J2SE
> 1.1 to 1.6)
> 4) Solr that has to distribute load on several machines
> - The index(s) could be common though like say using a distributed
> filesystem (Hadoop?)
>
> In each the above cases (we might use all of these strategies at
> various use cases) the application should use Solr as a strict backend
> and named service (IP or host:port) so that we can expose this
> application (and the service) to the web or intranet. Machine failures
> should be tolerated too. Also, does Solr manage load balancing out of
> the box if it was indeed configured to work with multi-machines?
>
> Maybe it is superfluous but is Solr and/or Nutch the only way to use
> Lucene in a multi-machine environment? Or is there some hidden
> document/project somewhere that makes it possible by exposing a
> regular Lucene process over the network using RMI or something? It is
> my understanding (could be wrong) that Nutch and to some extent, Solr
> do not perform well when there is a lot of indexing activity in
> parallel to search. Batch processing is also there and perhaps we can
> use Nutch/Solr there. Even so, we need multi-machine directions.
>
> I am sure that multi-machines make possible for a lot of other ways
> which might solve the goal better and that others have practical
> experience on. So, any advise and tips are also very welcome. We
> intend to document things and do some benchmarking along the way in
> the open spirit.
>
> Really sorry for the length but I hope some answers are forthcoming.
>
> Cheers,
> Srikant
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr in a distributed multi-machine high-performance environment

Reply via email to