How to print number of lost bucket in locator
Hi Team, I need to know , how to print number of lost bucket , when 2 JVM goes down and all JVM holding the customer partition region. Suppose, I have 5 jvm in cluster where CUSTOMER region is partition region which 1 copy of data[redundant-copies="1"] in ALL JVM. Thanks, Dinesh Akhand “Amdocs’ email platform is based on a third-party, worldwide, cloud-based system. Any emails sent to Amdocs will be processed and stored using such system and are accessible by third party providers of such system on a limited basis. Your sending of emails to Amdocs evidences your consent to the use of such system and such processing, storing and access”.
Thread block on org.apache.geode.cache.CacheFactory.getAnyInstance(CacheFactory.java:282)
Hi team, Recently we see JVM stuck , in stack trace I can see below method having problem As per document From link : https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/CacheListener.html this warning: WARNING: To avoid risk of deadlock, do not invoke CacheFactory.getAnyInstance() from within any callback methods. Instead use EntryEvent.getRegion().getCache() or RegionEvent.getRegion().getCache() What is the best solution to avoid it. Function Execution Processor1" #247 daemon prio=10 os_prio=0 tid=0x7f5798268000 nid=0x3ff5 waiting for monitor entry [0x7f576adf] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.geode.cache.CacheFactory.getAnyInstance(CacheFactory.java:282) - waiting to lock <0x000699feafa0> (a java.lang.Class for org.apache.geode.cache.CacheFactory) at org.apache.geode.management.internal.cli.functions.GetRegionsFunction.execute(GetRegionsFunction.java:44) at org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:185) at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:374) at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:440) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.geode.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:662) at org.apache.geode.distributed.internal.DistributionManager$9$1.run(DistributionManager.java:1108) at java.lang.Thread.run(Thread.java:745 "P2P message reader for 10.218.110.61(sbimgapp16-server1:65602):1026 shared ordered uid=139 port=62033" #403 daemon prio=10 os_prio=0 tid=0x7f1ad4114800 nid=0xda7c waiting for monitor entry [0x 7f1a28fcc000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.geode.cache.CacheFactory.getAnyInstance(CacheFactory.java:282) - waiting to lock <0x00021df685a8> (a java.lang.Class for org.apache.geode.cache.CacheFactory) at amdocs.imdg.statistics.GemFireStatisticsFactory.getStatisticsFactory(GemFireStatisticsFactory.java:43) at amdocs.imdg.statistics.VSDCountersManager.(VSDCountersManager.java:35) at amdocs.imdg.statistics.VSDCountersManager.(VSDCountersManager.java:19) at amdocs.imdg.statistics.CountersManagerFactory.getCountersManager(CountersManagerFactory.java:27) at amdocs.imdg.utils.pooling.DataPoolFactory.makeObject(DataPoolFactory.java:42) at org.apache.commons.pool.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:797) - locked <0x00021ebbe630> (a org.apache.commons.pool.impl.GenericKeyedObjectPool) at amdocs.imdg.utils.pooling.DataPool$DataPoolManager.getByteArray(DataPool.java:236) at amdocs.imdg.utils.pooling.DataPool.getByteArray(DataPool.java:98) at amdocs.imdg.model.BusinessData.populateData(BusinessData.java:110) at amdocs.imdg.utils.FlatBuffersUtils.updateBusinessData(FlatBuffersUtils.java:2255) at amdocs.imdg.utils.FlatBuffersUtils.updateCustomerData(FlatBuffersUtils.java:3083) at amdocs.imdg.utils.FlatBuffersUtils.updateNewCustomer(FlatBuffersUtils.java:3103) at amdocs.imdg.utils.FlatBuffersUtils.updateFromCustomerData(FlatBuffersUtils.java:2798) at amdocs.imdg.model.Customer.fromData(Customer.java:696) - locked <0x00021f874d08> (a amdocs.imdg.model.Customer) at org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2372) at org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2395) at org.apache.geode.internal.InternalDataSerializer.basicRead Thanks, Dinesh Akhand “Amdocs’ email platform is based on a third-party, worldwide, cloud-based system. Any emails sent to Amdocs will be processed and stored using such system and are accessible by third party providers of such system on a limited basis. Your sending of emails to Amdocs evidences your consent to the use of such system and such processing, storing and access”.
Re: Thread block on org.apache.geode.cache.CacheFactory.getAnyInstance(CacheFactory.java:282)
Dinesh, have you analyzed the full thread dump to see if there is a deadlock? I can’t tell just from these 2 threads if there is a deadlock. Anthony > On Oct 23, 2018, at 6:32 AM, Dinesh Akhand wrote: > > Hi team, > > Recently we see JVM stuck , in stack trace I can see below method having > problem > > As per document > >From link : > > > https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/CacheListener.html > >this warning: > > WARNING: To avoid risk of deadlock, do not invoke > CacheFactory.getAnyInstance() from within any callback methods. Instead use > EntryEvent.getRegion().getCache() or RegionEvent.getRegion().getCache() > What is the best solution to avoid it. > > Function Execution Processor1" #247 daemon prio=10 os_prio=0 > tid=0x7f5798268000 nid=0x3ff5 waiting for monitor entry > [0x7f576adf] > java.lang.Thread.State: BLOCKED (on object monitor) >at > org.apache.geode.cache.CacheFactory.getAnyInstance(CacheFactory.java:282) >- waiting to lock <0x000699feafa0> (a java.lang.Class for > org.apache.geode.cache.CacheFactory) >at > org.apache.geode.management.internal.cli.functions.GetRegionsFunction.execute(GetRegionsFunction.java:44) >at > org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:185) >at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:374) >at > org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:440) >at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >at > org.apache.geode.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:662) >at > org.apache.geode.distributed.internal.DistributionManager$9$1.run(DistributionManager.java:1108) >at java.lang.Thread.run(Thread.java:745 > > "P2P message reader for 10.218.110.61(sbimgapp16-server1:65602):1026 > shared ordered uid=139 port=62033" #403 daemon prio=10 os_prio=0 > tid=0x7f1ad4114800 nid=0xda7c waiting for monitor entry [0x > 7f1a28fcc000] > java.lang.Thread.State: BLOCKED (on object monitor) >at > org.apache.geode.cache.CacheFactory.getAnyInstance(CacheFactory.java:282) >- waiting to lock <0x00021df685a8> (a java.lang.Class for > org.apache.geode.cache.CacheFactory) >at > amdocs.imdg.statistics.GemFireStatisticsFactory.getStatisticsFactory(GemFireStatisticsFactory.java:43) >at > amdocs.imdg.statistics.VSDCountersManager.(VSDCountersManager.java:35) >at > amdocs.imdg.statistics.VSDCountersManager.(VSDCountersManager.java:19) >at > amdocs.imdg.statistics.CountersManagerFactory.getCountersManager(CountersManagerFactory.java:27) >at > amdocs.imdg.utils.pooling.DataPoolFactory.makeObject(DataPoolFactory.java:42) >at > org.apache.commons.pool.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:797) >- locked <0x00021ebbe630> (a > org.apache.commons.pool.impl.GenericKeyedObjectPool) >at > amdocs.imdg.utils.pooling.DataPool$DataPoolManager.getByteArray(DataPool.java:236) >at amdocs.imdg.utils.pooling.DataPool.getByteArray(DataPool.java:98) >at amdocs.imdg.model.BusinessData.populateData(BusinessData.java:110) >at > amdocs.imdg.utils.FlatBuffersUtils.updateBusinessData(FlatBuffersUtils.java:2255) >at > amdocs.imdg.utils.FlatBuffersUtils.updateCustomerData(FlatBuffersUtils.java:3083) >at > amdocs.imdg.utils.FlatBuffersUtils.updateNewCustomer(FlatBuffersUtils.java:3103) >at > amdocs.imdg.utils.FlatBuffersUtils.updateFromCustomerData(FlatBuffersUtils.java:2798) >at amdocs.imdg.model.Customer.fromData(Customer.java:696) >- locked <0x00021f874d08> (a amdocs.imdg.model.Customer) >at > org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2372) >at > org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2395) >at org.apache.geode.internal.InternalDataSerializer.basicRead > > > > Thanks, > Dinesh Akhand > > “Amdocs’ email platform is based on a third-party, worldwide, cloud-based > system. Any emails sent to Amdocs will be processed and stored using such > system and are accessible by third party providers of such system on a > limited basis. Your sending of emails to Amdocs evidences your consent to the > use of such system and such processing, storing and access”.
RE: About JIRA GEODE-5896
Hi, Udo I already fork the geode and commit my code in https://github.com/twosand/geode.git feature/GEODE-5896 . Think I need finish the test code before create pull request. But actually I hope I can get some suggestion or maybe someone can review the code changes. I do some investigate about the code invocation chain. The attachment chat can show the whole idea. We can find the problem at on-server node, FunctionStreamingReplyMessage comes from onRegion node and there should have a processor exist missed. Then a PartitionedRegionFunctionStreamingAbortMessage can send from this point, here we have the sender member, processorId, that's enouth. Then the abort message received at on-region node, at this node, user-defined function is still running and continuously invoke the PartitionedRegionFunctionResultSender.sendResult method to send the result as stream way. It's running in another thread. We need a shared variable can notify that sender the remote processor already dropped. So PartitionedRegionFunctionStreamingContext class here is tracing the processorId, normally it should be placed into a map before send action and remove after last send. Once abort message arrived, the processorId will be removed, then the next sendResult method can throw an exception to endup the useless function. I am trying to follow the Github PR workflow, now are writing the test code. But like I mentioned above, I need some suggestion from develop team, is my idea suitable or something I missed. Thanks Dong -Original Message- From: Udo Kohlmeyer Sent: Monday, October 22, 2018 4:53 PM To: Yang, Dong [GTSUS Non-J&J] Cc: dev@geode.apache.org Subject: [EXTERNAL] Re: About JIRA GEODE-5896 Hi there Dong Yang, If you have completed a fix, please submit it via the PR mechanism within Github. We will most gladly review and incorporate. --Udo On 10/18/18 06:00, Yang, Dong [GTSUS Non-J&J] wrote: > Hi, > > I am Dong Yang, and my apache account is twosand. What we are using Gemfire > is not commonly usage scenario in other company, it's more like a OLTP and > OLAP mixed scenario. The concept is very similar to using Spark-Gemfire > connect, we have some server-side function that can shuffle data from server > to client as stream style. And we encountered the thread lock issue in > different environments. Before we use Gemfire8 , now we are upgrading to > GemFrie9. > About GEODE-5896, it's very important usage for us, and I think the same for > others if they want using spark to connect to Gemfire. Now we just do some > patch at client-side the force the meta ready before function executed. But > the perfect solution should fix some sever-side code. > I can share what I found and where I want to fix, you can review it , > resonale or not . Fix it by current geode team or I can do it as a > contributor. > > > > Dong Yang, Dong [GTSUS Non-J&J > Thanks >
Re: About JIRA GEODE-5896
> Think I need finish the test code before create pull request. We have integrations into GitHub that launch precheckin testing in our continuous integration Concourse pipelines. PR status hooks updated when tests pass or fail. Of course, from a philosophical point of view, every bug is the result of insufficient testing coverage, but as long as your PR includes / updates tests that would identify this bug, then opening the PR will cover the rest. > But like I mentioned above, I need some suggestion from develop team, is my idea suitable or something I missed. In my mind, this is what the PR is meant to do -- facilitate discussion around immediate proposed changes. When the PR is opened, the community can review the change set, and if anything jumps out at us, we have the opportunity to shore up any deficiencies then. If you were looking for a collaborator to help you with a problem that you didn't know how to start, we could figure something out. But if you believe you have a fix, we'll all look forward to the pull request! On Tue, Oct 23, 2018 at 2:41 AM, Yang, Dong [GTSUS Non-J&J] < dyan...@its.jnj.com> wrote: > Hi, Udo > > > > I already fork the geode and commit my code in https://github.com/twosand/ > geode.git feature/GEODE-5896 . > > Think I need finish the test code before create pull request. But actually > I hope I can get some suggestion or maybe someone can review the code > changes. > > > > I do some investigate about the code invocation chain. The attachment chat > can show the whole idea. We can find the problem at on-server node, > FunctionStreamingReplyMessage comes from onRegion node and there should > have a processor exist missed. Then a > PartitionedRegionFunctionStreamingAbortMessage > can send from this point, here we have the sender member, processorId, > that’s enouth. > > Then the abort message received at on-region node, at this node, > user-defined function is still running and continuously invoke the > PartitionedRegionFunctionResultSender.sendResult method to send the > result as stream way. It’s running in another thread. We need a shared > variable can notify that sender the remote processor already dropped. So > PartitionedRegionFunctionStreamingContext class here is tracing the > processorId, normally it should be placed into a map before send action and > remove after last send. Once abort message arrived, the processorId will be > removed, then the next sendResult method can throw an exception to endup > the useless function. > > > > I am trying to follow the Github PR workflow, now are writing the test > code. But like I mentioned above, I need some suggestion from develop team, > is my idea suitable or something I missed. > > > > Thanks > > Dong > > > > -Original Message- > From: Udo Kohlmeyer > Sent: Monday, October 22, 2018 4:53 PM > To: Yang, Dong [GTSUS Non-J&J] > Cc: dev@geode.apache.org > Subject: [EXTERNAL] Re: About JIRA GEODE-5896 > > > > Hi there Dong Yang, > > > > If you have completed a fix, please submit it via the PR mechanism within > Github. We will most gladly review and incorporate. > > > > --Udo > > > > On 10/18/18 06:00, Yang, Dong [GTSUS Non-J&J] wrote: > > > Hi, > > > > > > I am Dong Yang, and my apache account is twosand. What we are using > Gemfire is not commonly usage scenario in other company, it's more like a > OLTP and OLAP mixed scenario. The concept is very similar to using > Spark-Gemfire connect, we have some server-side function that can shuffle > data from server to client as stream style. And we encountered the thread > lock issue in different environments. Before we use Gemfire8 , now we are > upgrading to GemFrie9. > > > About GEODE-5896, it's very important usage for us, and I think the same > for others if they want using spark to connect to Gemfire. Now we just do > some patch at client-side the force the meta ready before function > executed. But the perfect solution should fix some sever-side code. > > > I can share what I found and where I want to fix, you can review it , > resonale or not . Fix it by current geode team or I can do it as a > contributor. > > > > > > > > > > > > Dong Yang, Dong [GTSUS Non-J&J > > > Thanks > > > > > >
ManagementListener$handleEvent warning being logged
I noticed the following warning log statement while digging through the ouput in a dunit. It's caused by a bug in org.apache.geode.management.internal.beans.ManagementListener$handleEvent -- the readLock is acquired under a condition (event != ResourceEvent.SYSTEM_ALERT) but released without condition. It's easy to fix so I'll file a bug and submit a PR. I'm not sure why this wasn't noticed before unless maybe recent LoggingThread changes allowed it to be revealed? I'm also not sure why dunit grep for suspect strings isn't finding this and causing dunit failures. Does anyone know why this doesn't get picked by DUnitLauncher.closeAndCheckForSuspects()? I'm not sure that we have any tests to prevent us from breaking the check for suspect strings. [vm0] [warn 2018/10/23 12:51:35.339 PDT tid=0x58] attempt to unlock read lock, not locked by current thread [vm0] java.lang.IllegalMonitorStateException: attempt to unlock read lock, not locked by current thread [vm0] at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.unmatchedUnlockException(ReentrantReadWriteLock.java:444) [vm0] at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:428) [vm0] at java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341) [vm0] at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881) [vm0] at org.apache.geode.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:232) [vm0] at org.apache.geode.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2219) [vm0] at org.apache.geode.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:595) [vm0] at org.apache.geode.internal.admin.remote.AlertListenerMessage.process(AlertListenerMessage.java:106) [vm0] at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:367) [vm0] at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:432) [vm0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [vm0] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [vm0] at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:954) [vm0] at org.apache.geode.distributed.internal.ClusterDistributionManager.doProcessingThread(ClusterDistributionManager.java:820) [vm0] at org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) [vm0] at java.lang.Thread.run(Thread.java:748)
[Spring CI] Spring Data GemFire > Nightly-ApacheGeode > #1079 was SUCCESSFUL (with 2456 tests)
--- Spring Data GemFire > Nightly-ApacheGeode > #1079 was successful. --- Scheduled 2458 tests in total. https://build.spring.io/browse/SGF-NAG-1079/ -- This message is automatically generated by Atlassian Bamboo
I want to be a geode committer
My ASF account is: NAMEivorzhou EMAIL: ivorz...@gmail.com Thanks Ivor Zhou