Hi Tom,

Very interesting indeed! But i keep wondering why some engineers choose to 
store multiple shards of the same index on the same machine, there must be 
significant overhead. The only reason i can think of is ease of maintenance in 
moving shards to a separate physical machine.
I know that rearranging the shard topology can be a real pain in a large 
existing cluster (e.g. consistent hashing is not consistent anymore and having 
to shuffle docs to their new shard), is this the reason you choose this 
approach?

Cheers,

> Hi Markus,
> 
> Just as a data point for a very large sharded index, we have the full text
> of 9.3 million books with an index size of about 6+ TB spread over 12
> shards on 4 machines. Each machine has 3 shards. The size of each shard
> ranges between 475GB and 550GB.  We are definitely I/O bound. Our machines
> have 144GB of memory with about 16GB dedicated to the tomcat instance
> running the 3 Solr instances, which leaves about 120 GB (or 40GB per
> shard) for the OS disk cache.  We release a new index every morning and
> then warm the caches with several thousand queries.  I probably should add
> that our disk storage is a very high performance Isilon appliance that has
> over 500 drives and every block of every file is striped over no less than
> 14 different drives. (See blog for details *)
> 
> We have a very low number of queries per second (0.3-2 qps) and our modest
> response time goal is to keep 99th percentile response time for our
> application (i.e. Solr + application) under 10 seconds.
> 
> Our current performance statistics are:
> 
> average response time  300 ms
> median response time   113 ms
> 90th percentile        663 ms
> 95th percentile        1,691 ms
> 
> We had plans to do some performance testing to determine the optimum shard
> size and optimum number of shards per machine, but that has remained on
> the back burner for a long time as other higher priority items keep
> pushing it down on the todo list.
> 
> We would be really interested to hear about the experiences of people who
> have so many shards that the overhead of distributing the queries, and
> consolidating/merging the responses becomes a serious issue.
> 
> 
> Tom Burton-West
> 
> http://www.hathitrust.org/blogs/large-scale-search
> 
> *
> http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-sea
> rch-500000-volumes-5-million-volumes-and-beyond
> 
> -----Original Message-----
> From: Markus Jelsma [mailto:markus.jel...@openindex.io]
> Sent: Tuesday, August 02, 2011 12:33 PM
> To: solr-user@lucene.apache.org
> Subject: Re: performance crossover between single index and sharding
> 
> Actually, i do worry about it. Would be marvelous if someone could provide
> some metrics for an index of many terabytes.
> 
> > [..] At some extreme point there will be diminishing
> > returns and a performance decrease, but I wouldn't worry about that at
> > all until you've got many terabytes -- I don't know how many but don't
> > worry about it.
> > 
> > ~ David
> > 
> > -----
> > 
> >  Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
> > 
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/performance-crossover-between-single-i
> > n dex-and-sharding-tp3218561p3219397.html Sent from the Solr - User
> > mailing list archive at Nabble.com.

Reply via email to