Hi Jonothan and Markus,

>>Why 3 shards on one machine instead of one larger shard per machine?

Good question!

We made this architectural decision several years ago and I'm not remembering 
the rationale at the moment. I believe we originally made the decision due to 
some tests showing a sweetspot for I/O performance for shards with 
500,000-600,000 documents, but those tests were made before we implemented 
CommonGrams and when we were still using attached storage.  I think we also 
might have had concerns about Java OOM errors with a really large shard/index, 
but we now know that we can keep memory usage under control by tweaking the 
amount of the terms index that gets read into memory.

We should probably do some tests and revisit the question.

The reason we don't have 12 shards on 12 machines is that current performance 
is good enough that we can't justify buying 8 more machines:)

Tom

-----Original Message-----
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Tuesday, August 02, 2011 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: performance crossover between single index and sharding

Hi Tom,

Very interesting indeed! But i keep wondering why some engineers choose to 
store multiple shards of the same index on the same machine, there must be 
significant overhead. The only reason i can think of is ease of maintenance in 
moving shards to a separate physical machine.
I know that rearranging the shard topology can be a real pain in a large 
existing cluster (e.g. consistent hashing is not consistent anymore and having 
to shuffle docs to their new shard), is this the reason you choose this 
approach?

Cheers,
bble.com.

Reply via email to