How to print number of lost bucket in locator

2018-10-23 Thread Dinesh Akhand
Hi Team,

I need to know , how to print number of lost bucket , when 2 JVM goes down and 
all JVM holding the customer partition region.
Suppose,  I have 5 jvm in cluster where CUSTOMER region is partition region 
which 1 copy of data[redundant-copies="1"] in ALL JVM.

Thanks,
Dinesh Akhand

“Amdocs’ email platform is based on a third-party, worldwide, cloud-based 
system. Any emails sent to Amdocs will be processed and stored using such 
system and are accessible by third party providers of such system on a limited 
basis. Your sending of emails to Amdocs evidences your consent to the use of 
such system and such processing, storing and access”.


Thread block on org.apache.geode.cache.CacheFactory.getAnyInstance(CacheFactory.java:282)

2018-10-23 Thread Dinesh Akhand
Hi team,

Recently we see JVM stuck , in stack trace I can see below method having problem

As per document

From link :


https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/CacheListener.html

this warning:

WARNING: To avoid risk of deadlock, do not invoke CacheFactory.getAnyInstance() 
from within any callback methods. Instead use EntryEvent.getRegion().getCache() 
or RegionEvent.getRegion().getCache()
What  is the best solution to avoid it.

Function Execution Processor1" #247 daemon prio=10 os_prio=0 
tid=0x7f5798268000 nid=0x3ff5 waiting for monitor entry [0x7f576adf]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.geode.cache.CacheFactory.getAnyInstance(CacheFactory.java:282)
- waiting to lock <0x000699feafa0> (a java.lang.Class for 
org.apache.geode.cache.CacheFactory)
at 
org.apache.geode.management.internal.cli.functions.GetRegionsFunction.execute(GetRegionsFunction.java:44)
at 
org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:185)
at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:374)
at 
org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:440)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at 
org.apache.geode.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:662)
at 
org.apache.geode.distributed.internal.DistributionManager$9$1.run(DistributionManager.java:1108)
at java.lang.Thread.run(Thread.java:745

"P2P message reader for 10.218.110.61(sbimgapp16-server1:65602):1026 shared 
ordered uid=139 port=62033" #403 daemon prio=10 os_prio=0 
tid=0x7f1ad4114800 nid=0xda7c waiting for monitor entry [0x
7f1a28fcc000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.geode.cache.CacheFactory.getAnyInstance(CacheFactory.java:282)
- waiting to lock <0x00021df685a8> (a java.lang.Class for 
org.apache.geode.cache.CacheFactory)
at 
amdocs.imdg.statistics.GemFireStatisticsFactory.getStatisticsFactory(GemFireStatisticsFactory.java:43)
at 
amdocs.imdg.statistics.VSDCountersManager.(VSDCountersManager.java:35)
at 
amdocs.imdg.statistics.VSDCountersManager.(VSDCountersManager.java:19)
at 
amdocs.imdg.statistics.CountersManagerFactory.getCountersManager(CountersManagerFactory.java:27)
at 
amdocs.imdg.utils.pooling.DataPoolFactory.makeObject(DataPoolFactory.java:42)
at 
org.apache.commons.pool.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:797)
- locked <0x00021ebbe630> (a 
org.apache.commons.pool.impl.GenericKeyedObjectPool)
at 
amdocs.imdg.utils.pooling.DataPool$DataPoolManager.getByteArray(DataPool.java:236)
at amdocs.imdg.utils.pooling.DataPool.getByteArray(DataPool.java:98)
at amdocs.imdg.model.BusinessData.populateData(BusinessData.java:110)
at 
amdocs.imdg.utils.FlatBuffersUtils.updateBusinessData(FlatBuffersUtils.java:2255)
at 
amdocs.imdg.utils.FlatBuffersUtils.updateCustomerData(FlatBuffersUtils.java:3083)
at 
amdocs.imdg.utils.FlatBuffersUtils.updateNewCustomer(FlatBuffersUtils.java:3103)
at 
amdocs.imdg.utils.FlatBuffersUtils.updateFromCustomerData(FlatBuffersUtils.java:2798)
at amdocs.imdg.model.Customer.fromData(Customer.java:696)
- locked <0x00021f874d08> (a amdocs.imdg.model.Customer)
at 
org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2372)
at 
org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2395)
at org.apache.geode.internal.InternalDataSerializer.basicRead



Thanks,
Dinesh Akhand

“Amdocs’ email platform is based on a third-party, worldwide, cloud-based 
system. Any emails sent to Amdocs will be processed and stored using such 
system and are accessible by third party providers of such system on a limited 
basis. Your sending of emails to Amdocs evidences your consent to the use of 
such system and such processing, storing and access”.


Re: Thread block on org.apache.geode.cache.CacheFactory.getAnyInstance(CacheFactory.java:282)

2018-10-23 Thread Anthony Baker
Dinesh, have you analyzed the full thread dump to see if there is a deadlock?  
I can’t tell just from these 2 threads if there is a deadlock.

Anthony


> On Oct 23, 2018, at 6:32 AM, Dinesh Akhand  wrote:
> 
> Hi team,
> 
> Recently we see JVM stuck , in stack trace I can see below method having 
> problem
> 
> As per document
> 
>From link :
> 
>
> https://geode.apache.org/releases/latest/javadoc/org/apache/geode/cache/CacheListener.html
> 
>this warning:
> 
> WARNING: To avoid risk of deadlock, do not invoke 
> CacheFactory.getAnyInstance() from within any callback methods. Instead use 
> EntryEvent.getRegion().getCache() or RegionEvent.getRegion().getCache()
> What  is the best solution to avoid it.
> 
> Function Execution Processor1" #247 daemon prio=10 os_prio=0 
> tid=0x7f5798268000 nid=0x3ff5 waiting for monitor entry 
> [0x7f576adf]
>   java.lang.Thread.State: BLOCKED (on object monitor)
>at 
> org.apache.geode.cache.CacheFactory.getAnyInstance(CacheFactory.java:282)
>- waiting to lock <0x000699feafa0> (a java.lang.Class for 
> org.apache.geode.cache.CacheFactory)
>at 
> org.apache.geode.management.internal.cli.functions.GetRegionsFunction.execute(GetRegionsFunction.java:44)
>at 
> org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:185)
>at 
> org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:374)
>at 
> org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:440)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>at 
> org.apache.geode.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:662)
>at 
> org.apache.geode.distributed.internal.DistributionManager$9$1.run(DistributionManager.java:1108)
>at java.lang.Thread.run(Thread.java:745
> 
> "P2P message reader for 10.218.110.61(sbimgapp16-server1:65602):1026 
> shared ordered uid=139 port=62033" #403 daemon prio=10 os_prio=0 
> tid=0x7f1ad4114800 nid=0xda7c waiting for monitor entry [0x
> 7f1a28fcc000]
>   java.lang.Thread.State: BLOCKED (on object monitor)
>at 
> org.apache.geode.cache.CacheFactory.getAnyInstance(CacheFactory.java:282)
>- waiting to lock <0x00021df685a8> (a java.lang.Class for 
> org.apache.geode.cache.CacheFactory)
>at 
> amdocs.imdg.statistics.GemFireStatisticsFactory.getStatisticsFactory(GemFireStatisticsFactory.java:43)
>at 
> amdocs.imdg.statistics.VSDCountersManager.(VSDCountersManager.java:35)
>at 
> amdocs.imdg.statistics.VSDCountersManager.(VSDCountersManager.java:19)
>at 
> amdocs.imdg.statistics.CountersManagerFactory.getCountersManager(CountersManagerFactory.java:27)
>at 
> amdocs.imdg.utils.pooling.DataPoolFactory.makeObject(DataPoolFactory.java:42)
>at 
> org.apache.commons.pool.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:797)
>- locked <0x00021ebbe630> (a 
> org.apache.commons.pool.impl.GenericKeyedObjectPool)
>at 
> amdocs.imdg.utils.pooling.DataPool$DataPoolManager.getByteArray(DataPool.java:236)
>at amdocs.imdg.utils.pooling.DataPool.getByteArray(DataPool.java:98)
>at amdocs.imdg.model.BusinessData.populateData(BusinessData.java:110)
>at 
> amdocs.imdg.utils.FlatBuffersUtils.updateBusinessData(FlatBuffersUtils.java:2255)
>at 
> amdocs.imdg.utils.FlatBuffersUtils.updateCustomerData(FlatBuffersUtils.java:3083)
>at 
> amdocs.imdg.utils.FlatBuffersUtils.updateNewCustomer(FlatBuffersUtils.java:3103)
>at 
> amdocs.imdg.utils.FlatBuffersUtils.updateFromCustomerData(FlatBuffersUtils.java:2798)
>at amdocs.imdg.model.Customer.fromData(Customer.java:696)
>- locked <0x00021f874d08> (a amdocs.imdg.model.Customer)
>at 
> org.apache.geode.internal.InternalDataSerializer.invokeFromData(InternalDataSerializer.java:2372)
>at 
> org.apache.geode.internal.InternalDataSerializer.readDataSerializable(InternalDataSerializer.java:2395)
>at org.apache.geode.internal.InternalDataSerializer.basicRead
> 
> 
> 
> Thanks,
> Dinesh Akhand
> 
> “Amdocs’ email platform is based on a third-party, worldwide, cloud-based 
> system. Any emails sent to Amdocs will be processed and stored using such 
> system and are accessible by third party providers of such system on a 
> limited basis. Your sending of emails to Amdocs evidences your consent to the 
> use of such system and such processing, storing and access”.



RE: About JIRA GEODE-5896

2018-10-23 Thread
Hi, Udo



I already fork the geode and commit my code in 
https://github.com/twosand/geode.git  feature/GEODE-5896 .

Think I need finish the test code before create pull request. But actually I 
hope I can get some suggestion or maybe someone can review the code changes.



I do some investigate about the code invocation chain. The attachment chat can 
show the whole idea. We can find the problem at on-server node, 
FunctionStreamingReplyMessage comes from onRegion node and there should have a 
processor exist missed. Then a PartitionedRegionFunctionStreamingAbortMessage 
can send from this point, here we have the sender member, processorId, that's 
enouth.

Then the abort message received at on-region node, at this node, user-defined 
function is still running and continuously invoke the 
PartitionedRegionFunctionResultSender.sendResult method to send the result as 
stream way. It's running in another thread. We need a shared variable can 
notify that sender the remote processor already dropped. So 
PartitionedRegionFunctionStreamingContext class here is tracing the 
processorId, normally it should be placed into a map before send action and 
remove after last send. Once abort message arrived, the processorId will be 
removed, then the next sendResult method can throw an exception to endup the 
useless function.



I am trying to follow the Github PR workflow, now are writing the test code. 
But like I mentioned above, I need some suggestion from develop team, is my 
idea suitable or something I missed.



Thanks

Dong



-Original Message-
From: Udo Kohlmeyer 
Sent: Monday, October 22, 2018 4:53 PM
To: Yang, Dong [GTSUS Non-J&J] 
Cc: dev@geode.apache.org
Subject: [EXTERNAL] Re: About JIRA GEODE-5896



Hi there Dong Yang,



If you have completed a fix, please submit it via the PR mechanism within 
Github. We will most gladly review and incorporate.



--Udo



On 10/18/18 06:00, Yang, Dong [GTSUS Non-J&J] wrote:

> Hi,

>

> I am Dong Yang, and my apache account is twosand.  What we are using Gemfire 
> is not commonly usage scenario in other company, it's more like a OLTP and 
> OLAP mixed scenario. The concept is very similar to using Spark-Gemfire 
> connect, we have some server-side function that can shuffle data from server 
> to client as stream style. And we encountered the thread lock issue in 
> different environments. Before we use Gemfire8 , now we are upgrading to 
> GemFrie9.

> About GEODE-5896, it's very important usage for us, and I think the same for 
> others if they want using spark to connect to Gemfire. Now we just do some 
> patch at client-side the force the meta ready before function executed. But 
> the perfect solution should fix some sever-side code.

> I can share what I found and where I want to fix, you can review it , 
> resonale or not . Fix it by current geode team or I can do it as a 
> contributor.

>

>

>

> Dong Yang, Dong [GTSUS Non-J&J

> Thanks

>




Re: About JIRA GEODE-5896

2018-10-23 Thread Patrick Rhomberg
> Think I need finish the test code before create pull request.

We have integrations into GitHub that launch precheckin testing in our
continuous integration Concourse pipelines.  PR status hooks updated when
tests pass or fail.

Of course, from a philosophical point of view, every bug is the result of
insufficient testing coverage, but as long as your PR includes / updates
tests that would identify this bug, then opening the PR will cover the rest.

> But like I mentioned above, I need some suggestion from develop team, is
my idea suitable or something I missed.


In my mind, this is what the PR is meant to do -- facilitate discussion
around immediate proposed changes.  When the PR is opened, the community
can review the change set, and if anything jumps out at us, we have the
opportunity to shore up any deficiencies then.

If you were looking for a collaborator to help you with a problem that you
didn't know how to start, we could figure something out.  But if you
believe you have a fix, we'll all look forward to the pull request!

On Tue, Oct 23, 2018 at 2:41 AM, Yang, Dong [GTSUS Non-J&J] <
dyan...@its.jnj.com> wrote:

> Hi, Udo
>
>
>
> I already fork the geode and commit my code in https://github.com/twosand/
> geode.git  feature/GEODE-5896 .
>
> Think I need finish the test code before create pull request. But actually
> I hope I can get some suggestion or maybe someone can review the code
> changes.
>
>
>
> I do some investigate about the code invocation chain. The attachment chat
> can show the whole idea. We can find the problem at on-server node,
> FunctionStreamingReplyMessage comes from onRegion node and there should
> have a processor exist missed. Then a 
> PartitionedRegionFunctionStreamingAbortMessage
> can send from this point, here we have the sender member, processorId,
> that’s enouth.
>
> Then the abort message received at on-region node, at this node,
> user-defined function is still running and continuously invoke the
> PartitionedRegionFunctionResultSender.sendResult method to send the
> result as stream way. It’s running in another thread. We need a shared
> variable can notify that sender the remote processor already dropped. So
> PartitionedRegionFunctionStreamingContext class here is tracing the
> processorId, normally it should be placed into a map before send action and
> remove after last send. Once abort message arrived, the processorId will be
> removed, then the next sendResult method can throw an exception to endup
> the useless function.
>
>
>
> I am trying to follow the Github PR workflow, now are writing the test
> code. But like I mentioned above, I need some suggestion from develop team,
> is my idea suitable or something I missed.
>
>
>
> Thanks
>
> Dong
>
>
>
> -Original Message-
> From: Udo Kohlmeyer 
> Sent: Monday, October 22, 2018 4:53 PM
> To: Yang, Dong [GTSUS Non-J&J] 
> Cc: dev@geode.apache.org
> Subject: [EXTERNAL] Re: About JIRA GEODE-5896
>
>
>
> Hi there Dong Yang,
>
>
>
> If you have completed a fix, please submit it via the PR mechanism within
> Github. We will most gladly review and incorporate.
>
>
>
> --Udo
>
>
>
> On 10/18/18 06:00, Yang, Dong [GTSUS Non-J&J] wrote:
>
> > Hi,
>
> >
>
> > I am Dong Yang, and my apache account is twosand.  What we are using
> Gemfire is not commonly usage scenario in other company, it's more like a
> OLTP and OLAP mixed scenario. The concept is very similar to using
> Spark-Gemfire connect, we have some server-side function that can shuffle
> data from server to client as stream style. And we encountered the thread
> lock issue in different environments. Before we use Gemfire8 , now we are
> upgrading to GemFrie9.
>
> > About GEODE-5896, it's very important usage for us, and I think the same
> for others if they want using spark to connect to Gemfire. Now we just do
> some patch at client-side the force the meta ready before function
> executed. But the perfect solution should fix some sever-side code.
>
> > I can share what I found and where I want to fix, you can review it ,
> resonale or not . Fix it by current geode team or I can do it as a
> contributor.
>
> >
>
> >
>
> >
>
> > Dong Yang, Dong [GTSUS Non-J&J
>
> > Thanks
>
> >
>
>
>


ManagementListener$handleEvent warning being logged

2018-10-23 Thread Kirk Lund
I noticed the following warning log statement while digging through the
ouput in a dunit.

It's caused by a bug in
org.apache.geode.management.internal.beans.ManagementListener$handleEvent
-- the readLock is acquired under a condition (event !=
ResourceEvent.SYSTEM_ALERT) but released without condition. It's easy to
fix so I'll file a bug and submit a PR.

I'm not sure why this wasn't noticed before unless maybe recent LoggingThread
changes allowed it to be revealed? I'm also not sure why dunit grep for
suspect strings isn't finding this and causing dunit failures.

Does anyone know why this doesn't get picked by
DUnitLauncher.closeAndCheckForSuspects()?
I'm not sure that we have any tests to prevent us from breaking the check
for suspect strings.

[vm0] [warn 2018/10/23 12:51:35.339 PDT 
tid=0x58] attempt to unlock read lock, not locked by current thread
[vm0] java.lang.IllegalMonitorStateException: attempt to unlock read lock,
not locked by current thread
[vm0] at
java.util.concurrent.locks.ReentrantReadWriteLock$Sync.unmatchedUnlockException(ReentrantReadWriteLock.java:444)
[vm0] at
java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:428)
[vm0] at
java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)
[vm0] at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)
[vm0] at
org.apache.geode.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:232)
[vm0] at
org.apache.geode.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2219)
[vm0] at
org.apache.geode.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:595)
[vm0] at
org.apache.geode.internal.admin.remote.AlertListenerMessage.process(AlertListenerMessage.java:106)
[vm0] at
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:367)
[vm0] at
org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:432)
[vm0] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[vm0] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[vm0] at
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:954)
[vm0] at
org.apache.geode.distributed.internal.ClusterDistributionManager.doProcessingThread(ClusterDistributionManager.java:820)
[vm0] at
org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
[vm0] at java.lang.Thread.run(Thread.java:748)


[Spring CI] Spring Data GemFire > Nightly-ApacheGeode > #1079 was SUCCESSFUL (with 2456 tests)

2018-10-23 Thread Spring CI

---
Spring Data GemFire > Nightly-ApacheGeode > #1079 was successful.
---
Scheduled
2458 tests in total.

https://build.spring.io/browse/SGF-NAG-1079/





--
This message is automatically generated by Atlassian Bamboo

I want to be a geode committer

2018-10-23 Thread ivor
My ASF account is:
NAMEivorzhou
EMAIL:   ivorz...@gmail.com

Thanks

Ivor Zhou