Hi Bing Li,

On Thursday 17 February 2011 10:32:11 Bing Li wrote:
> Dear all,
> 
> I started to learn how to use Solr three months ago. My experiences are
> still limited.
> 
> Now I crawl Web pages with my crawler and send the data to a single Solr
> server. It runs fine.
> 
> Since the potential users are large, I decide to scale Solr. After
> configuring replication, a single index can be replicated to multiple
> servers.
> 
> For shards, I think it is also required. I attempt to split the index
> according to the data categories and priorities. After that, I will use the
> above replication techniques and get high performance. The following work
> is not so difficult.

It's better to use a consistent hashing algorithm to decide which server takes 
which documents if you want good relevancy. Using a modulo with the number of 
servers will return the shard per document. If you have integers as unique key 
then just a modulo will suffice.

> 
> I noticed some new terms, such as SolrClould, Katta and ZooKeeper.
> According to my current understandings, it seems that I can ignore them.
> Am I right? What benefits can I get if using them?

SolrCloud comes with ZooKeeper. It's designed to provide a fail-over cluster 
and more useful features. I haven't tried Katta.

> 
> Thanks so much!
> LB

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to