Thanks Eric. I don't seeing anywhere that CDCR is not recommended for production use. Took the thread dump. Seeing about 140 CDCR threads
cdcr-replicator-219-thread-8" #787 prio=5 os_prio=0 tid=0x00007f7c34009000 nid=0x50a waiting on condition [0x00007f7ec871b000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000001da724ca0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) cdcr-update-log-synchronizer-157-thread-1" #240 prio=5 os_prio=0 tid=0x00007f8782543800 nid=0x2e5 waiting on condition [0x00007f82ad99c000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000001d7f9e8e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Thanks, Raji On Sun, Mar 29, 2020 at 5:18 AM Erick Erickson <erickerick...@gmail.com> wrote: > What that error usually means is that there are a zillion threads running. > > Try taking a thread dump. It’s _probable_ that it’s CDCR, but > take a look at the thread dump to see if you have lots of > threads that are running. Any by “lots” here, I mean 100s of threads > that reference the same component, in this case that have cdcr in > the stack trace. > > CDCR is not getting active work at this point, you might want to > consider another replication strategy if you’re not willing to fix > the code. > > Best, > Erick > > > On Mar 29, 2020, at 4:17 AM, Raji N <rajis...@gmail.com> wrote: > > > > Hi All, > > > > We running solrcloud 7.6 (with the patch # > > > https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon > > production on 7 hosts in containers. The container memory is 48GB , heap > > is 24GB. > > ulimit -v > > > > unlimited > > > > ulimit -m > > > > unlimited > > We don't have any custom code in solr. We have set up bidirectional CDCR > > between primary and secondary Datacenter. Our secondary DC is very > unstable > > and many times many instances are down. > > > > We get below exception quite often. Is this because the CDCR connection > is > > broken. > > > > WARN (cdcr-update-log-synchronizer-80-thread-1) [ ] > > o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception > > > > java.lang.OutOfMemoryError: unable to create new native thread > > > > at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211] > > > > at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211] > > > > at > > > org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96) > > ~[httpclient-4.5.3.jar:4.5.3] > > > > at > > > org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219) > > ~[httpclient-4.5.3.jar:4.5.3] > > > > at > > > org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319) > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > > - nknize - 2018-12-07 14:47:53] > > > > at > > > org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330) > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > > - nknize - 2018-12-07 14:47:53] > > > > at > > > org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268) > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > > - nknize - 2018-12-07 14:47:53] > > > > at > > > org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255) > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > > - nknize - 2018-12-07 14:47:53] > > > > at > > > org.apache.solr.client.solrj.impl.HttpSolrClient.<init>(HttpSolrClient.java:200) > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > > - nknize - 2018-12-07 14:47:53] > > > > at > > > org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957) > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > > - nknize - 2018-12-07 14:47:53] > > > > at > > > org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139) > > [solr-core-7.6.0.jar:7.6.0-SNAPSHOT > > 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30 > > 14:02:46] > > > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > [?:1.8.0_211] > > > > at > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > > [?:1.8.0_211] > > > > at > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > > [?:1.8.0_211] > > > > at > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > > [?:1.8.0_211] > > > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > [?:1.8.0_211] > > > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > [?:1.8.0_211] > > > > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211] > > > > Thanks, > > Raji > >