[ 
https://issues.apache.org/jira/browse/GEODE-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181476#comment-17181476
 ] 

Blake Bender edited comment on GEODE-8436 at 8/21/20, 3:17 PM:
---------------------------------------------------------------

[~alberto.bustamante.reyes] this is causing a failure in the test
{code:java}
testThinClientPoolExecuteHAFunction {code}
on RedHat (RHEL7 & RHEL8 both fail).  Per our policy I've reverted the change 
while we investigate.  If you have access to a RHEL machine, you're welcome to 
try and track things down.  I will investigate here as time permits.  What I 
see consistently in our output logs is this:
{quote}[error 2020/08/20 17:09:00.313552 UTC 
heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] 
Execute: An exception (org.apache.geode.cache.execute.FunctionException: 
org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException:
 memberDeparted event for < 
heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001
 > crashed, false
 at 
org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResultInternal(PRFunctionStreamingResultCollector.java:115)
 at 
org.apache.geode.internal.cache.execute.ResultCollectorHolder.getResult(ResultCollectorHolder.java:53)
 at 
org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:88)
 at 
org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.executeFunctionWithResult(ExecuteRegionFunction66.java:406)
 at 
org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.cmdExecute(ExecuteRegionFunction66.java:201)
 at 
org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:183)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:848)
 at 
org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:72)
 at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1212)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:676)
 at 
org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
 at java.lang.Thread.run(Thread.java:748)
 Caused by: 
org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException:
 memberDeparted event for < 
heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001
 > crashed, false
 at 
org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.memberDeparted(PRFunctionStreamingResultCollector.java:375)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberDepartedEvent.handleEvent(ClusterDistributionManager.java:2502)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2432)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2421)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.handleMemberEvent(ClusterDistributionManager.java:1401)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager.access$200(ClusterDistributionManager.java:108)
 at 
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEventInvoker.run(ClusterDistributionManager.java:1433)
 ... 1 more
 ) happened at remote server.
 [info 2020/08/20 17:09:00.314091 UTC 
heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Close 
connection message failed with msg: TcrConnection::send: connection failure
 [info 2020/08/20 17:09:00.314303 UTC 
heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] 
Removing bucketServerLocation 
[heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1.c.gemfire-dev.internal:24955]--1-0-0
 due to GF_IOERR0% tests passed, 1 tests failed out of 1{quote}


was (Author: bbender):
[~alberto.bustamante.reyes] this is causing a failure in the test 
`testThinClientPoolExecuteHAFunction` on RedHat (RHEL7 & RHEL8 both fail).  Per 
our policy I've reverted the change while we investigate.  If you have access 
to a RHEL machine, you're welcome to try and track things down.  I will 
investigate here as time permits.  What I see consistently in our output logs 
is this:
{quote}[error 2020/08/20 17:09:00.313552 UTC 
heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] 
Execute: An exception (org.apache.geode.cache.execute.FunctionException: 
org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException:
 memberDeparted event for < 
heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001
 > crashed, false
        at 
org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResultInternal(PRFunctionStreamingResultCollector.java:115)
        at 
org.apache.geode.internal.cache.execute.ResultCollectorHolder.getResult(ResultCollectorHolder.java:53)
        at 
org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:88)
        at 
org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.executeFunctionWithResult(ExecuteRegionFunction66.java:406)
        at 
org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.cmdExecute(ExecuteRegionFunction66.java:201)
        at 
org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:183)
        at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:848)
        at 
org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:72)
        at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1212)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:676)
        at 
org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
        at java.lang.Thread.run(Thread.java:748)
Caused by: 
org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException:
 memberDeparted event for < 
heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001
 > crashed, false
        at 
org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.memberDeparted(PRFunctionStreamingResultCollector.java:375)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberDepartedEvent.handleEvent(ClusterDistributionManager.java:2502)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2432)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2421)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager.handleMemberEvent(ClusterDistributionManager.java:1401)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager.access$200(ClusterDistributionManager.java:108)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEventInvoker.run(ClusterDistributionManager.java:1433)
        ... 1 more
) happened at remote server.
[info 2020/08/20 17:09:00.314091 UTC 
heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Close 
connection message failed with msg: TcrConnection::send: connection failure
[info 2020/08/20 17:09:00.314303 UTC 
heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] 
Removing bucketServerLocation 
[heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1.c.gemfire-dev.internal:24955]--1-0-0
 due to GF_IOERR0% tests passed, 1 tests failed out of 1{quote}

> Several threads calling PdxInstanceFactory::create() causes seg fault
> ---------------------------------------------------------------------
>
>                 Key: GEODE-8436
>                 URL: https://issues.apache.org/jira/browse/GEODE-8436
>             Project: Geode
>          Issue Type: Bug
>          Components: native client
>            Reporter: Alberto Bustamante Reyes
>            Assignee: Alberto Bustamante Reyes
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.14.0
>
>         Attachments: main.cpp
>
>
> I have seen a problem when "PdxInstanceFactory::create()" is called by 
> several threads that are registering the same new pdx type.
> The core is produced here:
> {code}
> void PdxInstanceImpl::toDataMutable(PdxWriter& writer) {
>    auto pt = getPdxType();
>    std::vector<std::shared_ptr<PdxFieldType>>* pdxFieldList =
>        pt->getPdxFieldTypes();
> {code}
> The problem is that "getPdxType()" returns nullptr, so in the next line, 
> there is segmentation fault when calling "pt->getPdxFieldTypes()".
> The issue can be reproduced using the attached client, and executing it using 
> 8 threads. This is the stack got in gdb:
> {code}
> #0  apache::geode::client::PdxType::getPdxFieldTypes (this=0x0) at 
> /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxType.hpp:178
> #1  0x00007f43dc4651b7 in 
> apache::geode::client::PdxInstanceImpl::toDataMutable (this=0x7f43c0001600, 
> writer=...) at 
> /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxInstanceImpl.cpp:1336
> #2  0x00007f43dc4650fd in apache::geode::client::PdxInstanceImpl::toData 
> (this=0x7f43c0001600, writer=...) at 
> /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxInstanceImpl.cpp:1327
> #3  0x00007f43dc444971 in apache::geode::client::PdxHelper::serializePdx 
> (output=..., pdxObject=warning: RTTI symbol not found for class 
> 'std::_Sp_counted_ptr_inplace<apache::geode::client::PdxInstanceImpl, 
> std::allocator<apache::geode::client::PdxInstanceImpl>, 
> (__gnu_cxx::_Lock_policy)2>'
> warning: RTTI symbol not found for class 
> 'std::_Sp_counted_ptr_inplace<apache::geode::client::PdxInstanceImpl, 
> std::allocator<apache::geode::client::PdxInstanceImpl>, 
> (__gnu_cxx::_Lock_policy)2>'
> std::shared_ptr<apache::geode::client::PdxSerializable> (use count 3, weak 
> count 0) = {...})
>     at 
> /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxHelper.cpp:77
> #4  0x00007f43dc44b4bc in apache::geode::client::PdxInstanceFactory::create 
> (this=0x7f43c7ffecc8) at 
> /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxInstanceFactory.cpp:53
> #5  0x000000000040de2f in doPut () at 
> /home/alb3rtobr/CLionProjects/dummy-client/main.cpp:60
> #6  0x0000000000427767 in std::__invoke_impl<void, void (*)()> 
> (__f=@0x2561aa8: 0x40d860 <doPut()>) at 
> /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/invoke.h:60
> #7  0x00000000004276fd in std::__invoke<void (*)()> (__fn=@0x2561aa8: 
> 0x40d860 <doPut()>) at 
> /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/invoke.h:95
> #8  0x00000000004276d5 in std::thread::_Invoker<std::tuple<void (*)()> 
> >::_M_invoke<0ul> (this=0x2561aa8) at 
> /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/thread:234
> #9  0x00000000004276a5 in std::thread::_Invoker<std::tuple<void (*)()> 
> >::operator() (this=0x2561aa8) at 
> /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/thread:243
> #10 0x0000000000427589 in 
> std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > 
> >::_M_run (this=0x2561aa0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to