Solrcloud 7.6 OOM due to unable to create native threads
Hi All, We running solrcloud 7.6 (with the patch # https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon production on 7 hosts in containers. The container memory is 48GB , heap is 24GB. ulimit -v unlimited ulimit -m unlimited We don't have any custom code in solr. We have set up bidirectional CDCR between primary and secondary Datacenter. Our secondary DC is very unstable and many times many instances are down. We get below exception quite often. Is this because the CDCR connection is broken. WARN (cdcr-update-log-synchronizer-80-thread-1) [ ] o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211] at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211] at org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96) ~[httpclient-4.5.3.jar:4.5.3] at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219) ~[httpclient-4.5.3.jar:4.5.3] at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53] at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53] at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53] at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53] at org.apache.solr.client.solrj.impl.HttpSolrClient.(HttpSolrClient.java:200) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53] at org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53] at org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139) [solr-core-7.6.0.jar:7.6.0-SNAPSHOT 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30 14:02:46] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_211] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_211] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_211] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_211] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_211] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_211] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211] Thanks, Raji
Re: Solrcloud 7.6 OOM due to unable to create native threads
What that error usually means is that there are a zillion threads running. Try taking a thread dump. It’s _probable_ that it’s CDCR, but take a look at the thread dump to see if you have lots of threads that are running. Any by “lots” here, I mean 100s of threads that reference the same component, in this case that have cdcr in the stack trace. CDCR is not getting active work at this point, you might want to consider another replication strategy if you’re not willing to fix the code. Best, Erick > On Mar 29, 2020, at 4:17 AM, Raji N wrote: > > Hi All, > > We running solrcloud 7.6 (with the patch # > https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon > production on 7 hosts in containers. The container memory is 48GB , heap > is 24GB. > ulimit -v > > unlimited > > ulimit -m > > unlimited > We don't have any custom code in solr. We have set up bidirectional CDCR > between primary and secondary Datacenter. Our secondary DC is very unstable > and many times many instances are down. > > We get below exception quite often. Is this because the CDCR connection is > broken. > > WARN (cdcr-update-log-synchronizer-80-thread-1) [ ] > o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception > > java.lang.OutOfMemoryError: unable to create new native thread > > at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211] > > at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211] > > at > org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96) > ~[httpclient-4.5.3.jar:4.5.3] > > at > org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219) > ~[httpclient-4.5.3.jar:4.5.3] > > at > org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319) > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > - nknize - 2018-12-07 14:47:53] > > at > org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330) > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > - nknize - 2018-12-07 14:47:53] > > at > org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268) > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > - nknize - 2018-12-07 14:47:53] > > at > org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255) > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > - nknize - 2018-12-07 14:47:53] > > at > org.apache.solr.client.solrj.impl.HttpSolrClient.(HttpSolrClient.java:200) > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > - nknize - 2018-12-07 14:47:53] > > at > org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957) > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > - nknize - 2018-12-07 14:47:53] > > at > org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139) > [solr-core-7.6.0.jar:7.6.0-SNAPSHOT > 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30 > 14:02:46] > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [?:1.8.0_211] > > at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [?:1.8.0_211] > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [?:1.8.0_211] > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [?:1.8.0_211] > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_211] > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_211] > > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211] > > Thanks, > Raji
Performance of range queries in Point vs. Trie fields
I think my original post didn't go through because I wasn't subscribed so apologizes if this is a duplicate. For both Solr 7 and Solr 8, we have found that attempts to do range queries on DatePointField when there are a large number of points performs poorly (queries were taking over 30 seconds on a 50G core). We also tried switching to IntPointField to see if it made a difference and it didn't. Just for comparison, we switched to using the deprecated TrieDateField and found the performance was significantly better, almost 5x better on average. We even tried different precision steps and although there was slight variation between various values, all were significantly faster than the DatePointField. So we are now running in production with the deprecated fields instead. Wanted to know if this is a common observance, because blogs I've read lead me to believe that the Point fields are supposed to be fast. Not sure what the testing environment was for that but that has not been our experience. I hope that these Trie fields are going to stay in the product for Solr 9, I know they were supposed to be removed in Solr 8 but there must have been a reason they were not. Thanks for your help! Michael Cooper
Re: a new CLI tool bin/postlogs
As long as the data is loading you are fine I believe. We can create a ticket to figure out that error, but it's not affecting the logic of the load in any way. Joel Bernstein http://joelsolr.blogspot.com/ On Sun, Mar 29, 2020 at 2:29 AM Kayak28 wrote: > Hello, Community: > > Thank you for replaying. > > > I run for the single log file, then I could upload solr.log file to the > core... > But, I am still failing to load class "org.slf4j.impl.StaticLoggerBinder". > Should I download some jar file and configure it to some configuration > file? > > > Sincerely, > Kaya Ota > > > > 2020年3月28日(土) 2:13 Joel Bernstein : > > > It looks like it's not finding any files. Here is the code thats failing: > > > > > > > https://github.com/apache/lucene-solr/blob/35d8e3de6d5931bfd6cba3221cfd0dca7f97c1a1/solr/core/src/java/org/apache/solr/util/SolrLogPostTool.java#L126 > > > > A couple of things to note: > > > > postlogs should only be run on log files. So if there are different types > > of files in the directory it's pointed to it will have unexpected > behavior. > > So you can run it on a single log file, or a directory containing only > log > > files. > > > > > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > > > On Fri, Mar 27, 2020 at 5:18 AM Kayak28 wrote: > > > > > Hello, Community: > > > > > > Thank you for releasing Solr 8.5.0, which contains several interesting > > > tools. > > > Especially, bin/postlogs is interesting one. > > > So, I have tried to run it on my computer (not-production use) as the > > > following. > > > > > > bin/postlogs http://localhost:8983/solr/logs ./server/logs/solr > > > > > > The result ended in: > > > > > > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > > > > > > SLF4J: Defaulting to no-operation (NOP) logger implementation > > > > > > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for > > further > > > details. > > > > > > Exception in thread "main" java.lang.NullPointerException > > > > > > at > > > > > > org.apache.solr.util.SolrLogPostTool.gatherFiles(SolrLogPostTool.java:127) > > > > > > at org.apache.solr.util.SolrLogPostTool.main(SolrLogPostTool.java:65) > > > > > > > > > Is there anything I have to do before running the postlogs command ? > > > > > > Sincerely, > > > Kaya Ota > > > > > > -- > > > > > > Sincerely, > > > Kaya > > > github: https://github.com/28kayak > > > > > > > > -- > > Sincerely, > Kaya > github: https://github.com/28kayak >
Re: Solrcloud 7.6 OOM due to unable to create native threads
Is CDCR even recommended to be used in production? Or it was abandoned before it could become production ready ? Thanks SG On Sun, Mar 29, 2020 at 5:18 AM Erick Erickson wrote: > What that error usually means is that there are a zillion threads running. > > Try taking a thread dump. It’s _probable_ that it’s CDCR, but > take a look at the thread dump to see if you have lots of > threads that are running. Any by “lots” here, I mean 100s of threads > that reference the same component, in this case that have cdcr in > the stack trace. > > CDCR is not getting active work at this point, you might want to > consider another replication strategy if you’re not willing to fix > the code. > > Best, > Erick > > > On Mar 29, 2020, at 4:17 AM, Raji N wrote: > > > > Hi All, > > > > We running solrcloud 7.6 (with the patch # > > > https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon > > production on 7 hosts in containers. The container memory is 48GB , heap > > is 24GB. > > ulimit -v > > > > unlimited > > > > ulimit -m > > > > unlimited > > We don't have any custom code in solr. We have set up bidirectional CDCR > > between primary and secondary Datacenter. Our secondary DC is very > unstable > > and many times many instances are down. > > > > We get below exception quite often. Is this because the CDCR connection > is > > broken. > > > > WARN (cdcr-update-log-synchronizer-80-thread-1) [ ] > > o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception > > > > java.lang.OutOfMemoryError: unable to create new native thread > > > > at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211] > > > > at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211] > > > > at > > > org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96) > > ~[httpclient-4.5.3.jar:4.5.3] > > > > at > > > org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219) > > ~[httpclient-4.5.3.jar:4.5.3] > > > > at > > > org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319) > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > > - nknize - 2018-12-07 14:47:53] > > > > at > > > org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330) > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > > - nknize - 2018-12-07 14:47:53] > > > > at > > > org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268) > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > > - nknize - 2018-12-07 14:47:53] > > > > at > > > org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255) > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > > - nknize - 2018-12-07 14:47:53] > > > > at > > > org.apache.solr.client.solrj.impl.HttpSolrClient.(HttpSolrClient.java:200) > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > > - nknize - 2018-12-07 14:47:53] > > > > at > > > org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957) > > ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f > > - nknize - 2018-12-07 14:47:53] > > > > at > > > org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139) > > [solr-core-7.6.0.jar:7.6.0-SNAPSHOT > > 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30 > > 14:02:46] > > > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > [?:1.8.0_211] > > > > at > > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > > [?:1.8.0_211] > > > > at > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > > [?:1.8.0_211] > > > > at > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > > [?:1.8.0_211] > > > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > [?:1.8.0_211] > > > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > [?:1.8.0_211] > > > > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211] > > > > Thanks, > > Raji > >
Re: Cross DC CloudSolr Client
Is there any good way of having a load-balancer across two SolrCloud clusters (version 8.x or 7.x) that are in different regions (like Azure East and Azure West) ? Thanks SG On Thu, Mar 26, 2020 at 4:53 AM Erick Erickson wrote: > I’ve never even heard of someone trying to put > different ensembles in the same connection > string for a single client. > > Create N CloudSolrClients, one for each DC. > > And why do you want to try to contact individual nodes? > CloudSolrClient will do that for you. > > Best, > Erick > > > On Mar 26, 2020, at 2:38 AM, Lucky Sharma wrote: > > > > Hi all, > > Just wish to confirm on the cross DC connection situation from the > > CloudSolrClient. > > Scenario: > > We have multiple DC with the same collection data. Can we add the > zookeeper > > connect string of the DC's to the cloud SolrClient. > > > > Will it work like this: > > The client will utilise this connection string to fetch the Solr config, > > from ZK. > > reading of the connection string will be in a sequence i.e. if the first > > node itself is available, then that will be used to fetch the > ClusterState. > > if not available, the next node will be used. > > > > If we put two ZK clusters in one connection string, what will behave with > > two/multiple leaders since the Zk-clients embedded inside SolrClient? > > -- > > Warm Regards, > > > > Lucky Sharma > > Contact No :+91 9821559918 > >
Re: Solrcloud 7.6 OOM due to unable to create native threads
I don’t recommend CDCR at this point, I think there better approaches. The root problem is that CDCR uses tlog files as a queueing mechanism. If the connection between the DCs is broken for any reason, the tlogs grow without limit. This could probably be fixed, but a better alternative is to use something designed to insure messages (updates) are delivered to separate DCs rathe than try to have CDCR re-invent that wheel. Best, Erick > On Mar 29, 2020, at 6:47 PM, S G wrote: > > Is CDCR even recommended to be used in production? > Or it was abandoned before it could become production ready ? > > Thanks > SG > > > On Sun, Mar 29, 2020 at 5:18 AM Erick Erickson > wrote: > >> What that error usually means is that there are a zillion threads running. >> >> Try taking a thread dump. It’s _probable_ that it’s CDCR, but >> take a look at the thread dump to see if you have lots of >> threads that are running. Any by “lots” here, I mean 100s of threads >> that reference the same component, in this case that have cdcr in >> the stack trace. >> >> CDCR is not getting active work at this point, you might want to >> consider another replication strategy if you’re not willing to fix >> the code. >> >> Best, >> Erick >> >>> On Mar 29, 2020, at 4:17 AM, Raji N wrote: >>> >>> Hi All, >>> >>> We running solrcloud 7.6 (with the patch # >>> >> https://issues.apache.org/jira/secure/attachment/12969150)/SOLR-11724.patchon >>> production on 7 hosts in containers. The container memory is 48GB , heap >>> is 24GB. >>> ulimit -v >>> >>> unlimited >>> >>> ulimit -m >>> >>> unlimited >>> We don't have any custom code in solr. We have set up bidirectional CDCR >>> between primary and secondary Datacenter. Our secondary DC is very >> unstable >>> and many times many instances are down. >>> >>> We get below exception quite often. Is this because the CDCR connection >> is >>> broken. >>> >>> WARN (cdcr-update-log-synchronizer-80-thread-1) [ ] >>> o.a.s.h.CdcrUpdateLogSynchronizer Caught unexpected exception >>> >>> java.lang.OutOfMemoryError: unable to create new native thread >>> >>> at java.lang.Thread.start0(Native Method) ~[?:1.8.0_211] >>> >>> at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_211] >>> >>> at >>> >> org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96) >>> ~[httpclient-4.5.3.jar:4.5.3] >>> >>> at >>> >> org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1219) >>> ~[httpclient-4.5.3.jar:4.5.3] >>> >>> at >>> >> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:319) >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f >>> - nknize - 2018-12-07 14:47:53] >>> >>> at >>> >> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:330) >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f >>> - nknize - 2018-12-07 14:47:53] >>> >>> at >>> >> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:268) >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f >>> - nknize - 2018-12-07 14:47:53] >>> >>> at >>> >> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:255) >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f >>> - nknize - 2018-12-07 14:47:53] >>> >>> at >>> >> org.apache.solr.client.solrj.impl.HttpSolrClient.(HttpSolrClient.java:200) >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f >>> - nknize - 2018-12-07 14:47:53] >>> >>> at >>> >> org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:957) >>> ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f >>> - nknize - 2018-12-07 14:47:53] >>> >>> at >>> >> org.apache.solr.handler.CdcrUpdateLogSynchronizer$UpdateLogSynchronisation.run(CdcrUpdateLogSynchronizer.java:139) >>> [solr-core-7.6.0.jar:7.6.0-SNAPSHOT >>> 34d82ed033cccd8120431b73e93554b85b24a278 - i843100 - 2019-09-30 >>> 14:02:46] >>> >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>> [?:1.8.0_211] >>> >>> at >>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) >>> [?:1.8.0_211] >>> >>> at >>> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) >>> [?:1.8.0_211] >>> >>> at >>> >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) >>> [?:1.8.0_211] >>> >>> at >>> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>> [?:1.8.0_211] >>> >>> at >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolEx