Hi Tom, Very interesting indeed! But i keep wondering why some engineers choose to store multiple shards of the same index on the same machine, there must be significant overhead. The only reason i can think of is ease of maintenance in moving shards to a separate physical machine. I know that rearranging the shard topology can be a real pain in a large existing cluster (e.g. consistent hashing is not consistent anymore and having to shuffle docs to their new shard), is this the reason you choose this approach?
Cheers, > Hi Markus, > > Just as a data point for a very large sharded index, we have the full text > of 9.3 million books with an index size of about 6+ TB spread over 12 > shards on 4 machines. Each machine has 3 shards. The size of each shard > ranges between 475GB and 550GB. We are definitely I/O bound. Our machines > have 144GB of memory with about 16GB dedicated to the tomcat instance > running the 3 Solr instances, which leaves about 120 GB (or 40GB per > shard) for the OS disk cache. We release a new index every morning and > then warm the caches with several thousand queries. I probably should add > that our disk storage is a very high performance Isilon appliance that has > over 500 drives and every block of every file is striped over no less than > 14 different drives. (See blog for details *) > > We have a very low number of queries per second (0.3-2 qps) and our modest > response time goal is to keep 99th percentile response time for our > application (i.e. Solr + application) under 10 seconds. > > Our current performance statistics are: > > average response time 300 ms > median response time 113 ms > 90th percentile 663 ms > 95th percentile 1,691 ms > > We had plans to do some performance testing to determine the optimum shard > size and optimum number of shards per machine, but that has remained on > the back burner for a long time as other higher priority items keep > pushing it down on the todo list. > > We would be really interested to hear about the experiences of people who > have so many shards that the overhead of distributing the queries, and > consolidating/merging the responses becomes a serious issue. > > > Tom Burton-West > > http://www.hathitrust.org/blogs/large-scale-search > > * > http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-sea > rch-500000-volumes-5-million-volumes-and-beyond > > -----Original Message----- > From: Markus Jelsma [mailto:markus.jel...@openindex.io] > Sent: Tuesday, August 02, 2011 12:33 PM > To: solr-user@lucene.apache.org > Subject: Re: performance crossover between single index and sharding > > Actually, i do worry about it. Would be marvelous if someone could provide > some metrics for an index of many terabytes. > > > [..] At some extreme point there will be diminishing > > returns and a performance decrease, but I wouldn't worry about that at > > all until you've got many terabytes -- I don't know how many but don't > > worry about it. > > > > ~ David > > > > ----- > > > > Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/performance-crossover-between-single-i > > n dex-and-sharding-tp3218561p3219397.html Sent from the Solr - User > > mailing list archive at Nabble.com.