Hi Vijay,
We're working on SOLR-6816 ... would love for you to be a test site for any improvements we make ;-) Curious if you've experimented with changing the mergeFactor to a higher value, such as 25 and what happens if you set soft-auto-commits to something lower like 15 seconds? Also, make sure your indexing clients are not sending hard-commits as well, i.e. just rely on auto-commits. re: "When the number of replicas increases the bulk indexing time increase almost exponentially" ... ugh ... I'm wondering what your CPU utilization / thread counts are? the Leader sends updates to all replicas in parallel, so it shouldn't be a huge impact if you're doing 1 replica or 15 (probably a little more overhead with 15, but not exponential for sure) ... what are threads waiting on when this huge slow down occurs? jstack -l <PID> should give you some idea. Lastly, do you have GC logging enabled and have you ruled out GC pauses causing the big slow down? On Thu, Feb 12, 2015 at 4:07 PM, Vijay Sekhri <sekhrivi...@gmail.com> wrote: > Hi Erick, > We have following configuration of our solr cloud > > 1. 10 Shards > 2. 15 replicas per shard > 3. 9 GB of index size per shard > 4. a total of around 90 mil documents > 5. 2 collection viz search1 serving live traffic and search 2 for > indexing. We swap collection when indexing finishes > 6. On 150 hosts we have 2 JVMs running one for search1 collection and > other for search2 collection > 7. Each jvm has 12 GB of heap assigned to it while the host has 50GB in > total > 8. Each host has 16 processors > 9. Linux XXXXXXX 2.6.32-431.5.1.el6.x86_64 #1 SMP Wed Feb 12 00:41:43 > UTC 2014 x86_64 x86_64 x86_64 GNU/Linux > 10. We have two ways to index data. > 1. Bulk indexing . All 90 million docs pumped in from 14 parallel > process (on 14 different client hosts). This is done on > collection that is > not serving live traffic > 2. Incremental indexing . Only delta changes (Range from 100K to 5 > Mil) every two hours. This is done on collection also serving live > traffic > 11. The request per second count on live collection is around 300 TPS > 12. Hard commit setting is every 30 second with open searcher false and > soft commit setting is every 15 minutes . We have tried a lot of > different > setting here BTW. > > > > > Now we have two issues with indexing > 1) Solr just could not keep up with the bulk indexing when replicas are > also active. We have concluded this by changing the number of replicas to > just 2 , to 4 and then to 15. When the number of replicas increases the > bulk indexing time increase almost exponentially > We seem to have encountered the same issue reported here > https://issues.apache.org/jira/browse/SOLR-6816 > It gets to a point that even to index 100 docs the solr cluster would take > 300 second. It would start of indexing 100 docs in 55 millisecond and > slowly increase over time and within hour and a half just could not keep > up. We have a workaround for this and i.e we stop all the replicas , do the > bulk indexing and bring all the replicas up one by one . This sort of > defeats the purpose of solr cloud but we can still work with this > workaround. We can do this because , bulk indexing happen on the collection > that is not serving live traffic. However we would love to have a solution > from the solr cloud itself like ask it to stop replication and start via an > API at the end of indexing. > > 2) This issues is related to soft commit with incremental indexing . When > we do incremental indexing, it is done on the same collection serving live > traffic with 300 request per second throughput. Everything is fine except > whenever the soft commit happens. Each time soft commit (autosoftcommit in > sorlconfig.xml) happens which BTW happens almost at the same time > throughout the cluster , there is a spike in the response times and > throughput decreases almost to 150 tps. The spike continues for 2 minutes > and then it happens again at the exact interval when the soft commit > happens. We have monitored the logs and found a direct co relation when the > soft commit happens and when the response time tanks. > > Now the latter issue is quite disturbing , because it is serving live > traffic and we cannot sustain these periodic degradation. We have played > around with different soft commit setting . Interval ranging from 2 minutes > to 30 minutes . Auto warming half cache , auto warming full cache, auto > warming only 10 %. Doing warm up queries on every new searcher , doing NONE > warm up queries on every new searching and all the different setting yields > the same results . As and when soft commit happens the response time tanks > and throughput deceases. The difference is almost 50 % in response times > and 50 % in throughput > > > Our workaround for this solution is to also do incremental delta indexing > on the collection not serving live traffic and swap when it is done. As you > can see that this also defeats the purpose of solr cloud . We cannot do > bulk indexing because replicas cannot keeps up and we cannot do incremental > indexing because of soft commit performance. > > Is there a way to make the cluster not do soft commit all at the same time > or is there a way to make soft commit not cause this degradation ? > We are open to any ideas at this time now. > > > > > > > -- > ********************************************* > Vijay Sekhri > ********************************************* >