What's the reasoning behind having three shards on one machine, instead
of just combining those into one shard? Just curious. I had been
thinking the point of shards was to get them on different machines, and
there'd be no reason to have multiple shards on one machine.
On 8/2/2011 1:59 PM, Burton-West, Tom wrote:
Hi Markus,
Just as a data point for a very large sharded index, we have the full text of
9.3 million books with an index size of about 6+ TB spread over 12 shards on 4
machines. Each machine has 3 shards. The size of each shard ranges between
475GB and 550GB. We are definitely I/O bound. Our machines have 144GB of
memory with about 16GB dedicated to the tomcat instance running the 3 Solr
instances, which leaves about 120 GB (or 40GB per shard) for the OS disk cache.
We release a new index every morning and then warm the caches with several
thousand queries. I probably should add that our disk storage is a very high
performance Isilon appliance that has over 500 drives and every block of every
file is striped over no less than 14 different drives. (See blog for details *)
We have a very low number of queries per second (0.3-2 qps) and our modest
response time goal is to keep 99th percentile response time for our application
(i.e. Solr + application) under 10 seconds.
Our current performance statistics are:
average response time 300 ms
median response time 113 ms
90th percentile 663 ms
95th percentile 1,691 ms
We had plans to do some performance testing to determine the optimum shard size
and optimum number of shards per machine, but that has remained on the back
burner for a long time as other higher priority items keep pushing it down on
the todo list.
We would be really interested to hear about the experiences of people who have
so many shards that the overhead of distributing the queries, and
consolidating/merging the responses becomes a serious issue.
Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search
*
http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-500000-volumes-5-million-volumes-and-beyond
-----Original Message-----
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Tuesday, August 02, 2011 12:33 PM
To: solr-user@lucene.apache.org
Subject: Re: performance crossover between single index and sharding
Actually, i do worry about it. Would be marvelous if someone could provide
some metrics for an index of many terabytes.
[..] At some extreme point there will be diminishing
returns and a performance decrease, but I wouldn't worry about that at all
until you've got many terabytes -- I don't know how many but don't worry
about it.
~ David
-----
Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
--
View this message in context:
http://lucene.472066.n3.nabble.com/performance-crossover-between-single-in
dex-and-sharding-tp3218561p3219397.html Sent from the Solr - User mailing
list archive at Nabble.com.