Hi Bing Li, On Thursday 17 February 2011 10:32:11 Bing Li wrote: > Dear all, > > I started to learn how to use Solr three months ago. My experiences are > still limited. > > Now I crawl Web pages with my crawler and send the data to a single Solr > server. It runs fine. > > Since the potential users are large, I decide to scale Solr. After > configuring replication, a single index can be replicated to multiple > servers. > > For shards, I think it is also required. I attempt to split the index > according to the data categories and priorities. After that, I will use the > above replication techniques and get high performance. The following work > is not so difficult.
It's better to use a consistent hashing algorithm to decide which server takes which documents if you want good relevancy. Using a modulo with the number of servers will return the shard per document. If you have integers as unique key then just a modulo will suffice. > > I noticed some new terms, such as SolrClould, Katta and ZooKeeper. > According to my current understandings, it seems that I can ignore them. > Am I right? What benefits can I get if using them? SolrCloud comes with ZooKeeper. It's designed to provide a fail-over cluster and more useful features. I haven't tried Katta. > > Thanks so much! > LB -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350