Hi , Toke - I tried with pausing the indexing fully but got the slight improvement so the impact of indexing is not that much.
Shawn - Answer to your question - I am sending one document in one update request. I have test solr cloud configured like 2 shards on one machine and each of has one replica on another machine. So in order to check the network latency is bottleneck or not i have disabled replicas and run the test but didn't get improvement. Another thing i have tried in order to balance the load and providing more CPU and memory resources i have configured only 2 shards both are on separate machine and no replica and then run the test but in that case performance got down. Talking about the production we want to have 2 shard in order to make platform scalable and future proof. Just want inform that we have 22 collections on production in that 4 are major in terms of volume and complexity and which frequently used for querying and indexing and rest of them are comparatively minor and have less query and index hits. Below are the production index statistics. No of collections: 22 collections having 139 million documents with index size of 85 GB. Major collections: 4 collections having 134 million documents with index size of 77 GB. Minor collections: 18 collections having 5 million documents with index size of 8 GB. So any idea on how to improve query performance with this statistics along with Index-heavy (100 index updates per sec) and Query-heavy (100 queries per sec) scenario? Thanks & Regards, Bhaumik Joshi ________________________________________ From: Shawn Heisey <apa...@elyograg.org> Sent: Tuesday, April 12, 2016 7:37 AM To: solr-user@lucene.apache.org Subject: Re: Solr Sharding Strategy On 4/11/2016 6:31 AM, Bhaumik Joshi wrote: > We are using solr 5.2.0 and we have Index-heavy (100 index updates per > sec) and Query-heavy (100 queries per sec) scenario. > > *Index stats: *10 million documents and 16 GB index size > > > > Which sharding strategy is best suited in above scenario? > > Please share reference resources which states detailed comparison of > single shard over multi shard if any. > > > > Meanwhile we did some tests with SolrMeter (Standalone java tool for > stress tests with Solr) for single shard and two shards. > > *Index stats of test solr cloud: *0.7 million documents and 1 GB index > size. > > As observed in test average query time with 2 shards is much higher > than single shard. > On the same hardware, multiple shards will usually be slower than one shard, especially under a high load. Sharding can give good results with *more* hardware, providing more CPU and memory resources. When the query load is high, there should only be only one core (shard replica) per server, and Solr works best when it is running on bare metal, not virtualized. Handling 100 queries per second will require multiple copies of your index on separate hardware. This is a fairly high query load. There are installations handling much higher loads, of course. Those installations have a LOT of replicas and some way to balance load across them. For 10 million documents and 16GB of index, I'm not sure that I would shard at all, just make sure that each machine has plenty of memory -- probably somewhere in the neighborhood of 24GB to 32GB. That assumes that Solr is the only thing running on that server, and that if it's virtualized, making sure that the physical server's memory is not oversubscribed. Regarding your specific numbers: The low queries per second may be caused by one or more of these problems, or perhaps something I haven't thought of: 1) your queries are particularly heavy. 2) updates are interfering by tying up scarce resources. 3) you don't have enough memory in the machine. How many documents are in each update request that you are sending? In another thread on the list, you have stated that you have a 1 second maxTime on autoSoftCommit. This is *way* too low, and a *major* source of performance issues. Very few people actually need that level of latency -- a maxTime measured in minutes may be fast enough, and is much friendlier for performance. Thanks, Shawn