On 6/24/2011 2:19 AM, Dmitry Kan wrote:
If possible, can you please share some details of your setup, like the amount of shards, how big are they size/doc_count wise, what is the user load / s.
Each full chain (there are two) consists of two servers with 2 quad-core processors and 32GB of RAM. There are 9 VMs contained on those two servers. Six of them house large shards with 9 GB of RAM about 9.5 million rows each, taking up about 17.5GB of disk space. One of them houses a small shard (3GB RAM) that contains the newest data, usually about 1GB and 400,000 rows. There is a VM (512MB) for running haproxy and a VM (3GB) with a Solr instance that serves as a broker - no index, one core has the <shards> parameter in solrconfig.xml.
The small shard is updated every two minutes. Every ten minutes, deletes are run against all shards. Once an hour, the small shard is optimized. Once a night, data older than 7 days is distributed among the large shards, deleted from the small shard, and one large shard is optimized. Normally data is replicated between the two chains, but right now the primary chain is running 1.4.1 and the backup chain is running 3.2.0.
According to Solr stats, the average queries per second in production is well below 1. I don't know what it is during day when it peaks ... but it's certainly not very large. We do maintain statistics on every search in a database, I just haven't worked out yet how to turn that into usable numbers. The usual statistical functions don't seem to be enough, I'll probably have to write something myself. If anyone knows an easy way to turn a series of timestamps and QTimes into per-second statistics on arbitrary timeframes (hourly, daily, a 10 second span, etc), I'm all ears.
On my newly tuned 3.2.0 index, I can get near 100 queries per second if I run the benchmarking script a few times in a row. It uses 8 threads each pounding out 1024 queries as fast as they can. Running it against the old index with the old GC settings, I can only get about 25 queries per second. Both of these numbers are well above what I really need.
If I ever need more performance, I can increase the system memory so more of the index fits into RAM, which would also let me increase the java heap size. I actually hope one day to add servers, decrease the number of large shards, and run without virtualization ... but the funding just isn't there.
Shawn