[ https://issues.apache.org/jira/browse/GEODE-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181476#comment-17181476 ]
Blake Bender edited comment on GEODE-8436 at 8/21/20, 3:17 PM: --------------------------------------------------------------- [~alberto.bustamante.reyes] this is causing a failure in the test {code:java} testThinClientPoolExecuteHAFunction {code} on RedHat (RHEL7 & RHEL8 both fail). Per our policy I've reverted the change while we investigate. If you have access to a RHEL machine, you're welcome to try and track things down. I will investigate here as time permits. What I see consistently in our output logs is this: {quote}[error 2020/08/20 17:09:00.313552 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Execute: An exception (org.apache.geode.cache.execute.FunctionException: org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException: memberDeparted event for < heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001 > crashed, false at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResultInternal(PRFunctionStreamingResultCollector.java:115) at org.apache.geode.internal.cache.execute.ResultCollectorHolder.getResult(ResultCollectorHolder.java:53) at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:88) at org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.executeFunctionWithResult(ExecuteRegionFunction66.java:406) at org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.cmdExecute(ExecuteRegionFunction66.java:201) at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:183) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:848) at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:72) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1212) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:676) at org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException: memberDeparted event for < heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001 > crashed, false at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.memberDeparted(PRFunctionStreamingResultCollector.java:375) at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberDepartedEvent.handleEvent(ClusterDistributionManager.java:2502) at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2432) at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2421) at org.apache.geode.distributed.internal.ClusterDistributionManager.handleMemberEvent(ClusterDistributionManager.java:1401) at org.apache.geode.distributed.internal.ClusterDistributionManager.access$200(ClusterDistributionManager.java:108) at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEventInvoker.run(ClusterDistributionManager.java:1433) ... 1 more ) happened at remote server. [info 2020/08/20 17:09:00.314091 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Close connection message failed with msg: TcrConnection::send: connection failure [info 2020/08/20 17:09:00.314303 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Removing bucketServerLocation [heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1.c.gemfire-dev.internal:24955]--1-0-0 due to GF_IOERR0% tests passed, 1 tests failed out of 1{quote} was (Author: bbender): [~alberto.bustamante.reyes] this is causing a failure in the test `testThinClientPoolExecuteHAFunction` on RedHat (RHEL7 & RHEL8 both fail). Per our policy I've reverted the change while we investigate. If you have access to a RHEL machine, you're welcome to try and track things down. I will investigate here as time permits. What I see consistently in our output logs is this: {quote}[error 2020/08/20 17:09:00.313552 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Execute: An exception (org.apache.geode.cache.execute.FunctionException: org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException: memberDeparted event for < heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001 > crashed, false at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResultInternal(PRFunctionStreamingResultCollector.java:115) at org.apache.geode.internal.cache.execute.ResultCollectorHolder.getResult(ResultCollectorHolder.java:53) at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.getResult(PRFunctionStreamingResultCollector.java:88) at org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.executeFunctionWithResult(ExecuteRegionFunction66.java:406) at org.apache.geode.internal.cache.tier.sockets.command.ExecuteRegionFunction66.cmdExecute(ExecuteRegionFunction66.java:201) at org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:183) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMessage(ServerConnection.java:848) at org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:72) at org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1212) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$initializeServerConnectionThreadPool$3(AcceptorImpl.java:676) at org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.geode.internal.cache.execute.InternalFunctionInvocationTargetException: memberDeparted event for < heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1(GFECS24955:104622)<ec><v1>:41001 > crashed, false at org.apache.geode.internal.cache.partitioned.PRFunctionStreamingResultCollector.memberDeparted(PRFunctionStreamingResultCollector.java:375) at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberDepartedEvent.handleEvent(ClusterDistributionManager.java:2502) at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2432) at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEvent.handleEvent(ClusterDistributionManager.java:2421) at org.apache.geode.distributed.internal.ClusterDistributionManager.handleMemberEvent(ClusterDistributionManager.java:1401) at org.apache.geode.distributed.internal.ClusterDistributionManager.access$200(ClusterDistributionManager.java:108) at org.apache.geode.distributed.internal.ClusterDistributionManager$MemberEventInvoker.run(ClusterDistributionManager.java:1433) ... 1 more ) happened at remote server. [info 2020/08/20 17:09:00.314091 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Close connection message failed with msg: TcrConnection::send: connection failure [info 2020/08/20 17:09:00.314303 UTC heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1:104377 139991808305216] Removing bucketServerLocation [heavy-lifter-ae0a174c-1be5-522e-8b3f-b521b672e4d1.c.gemfire-dev.internal:24955]--1-0-0 due to GF_IOERR0% tests passed, 1 tests failed out of 1{quote} > Several threads calling PdxInstanceFactory::create() causes seg fault > --------------------------------------------------------------------- > > Key: GEODE-8436 > URL: https://issues.apache.org/jira/browse/GEODE-8436 > Project: Geode > Issue Type: Bug > Components: native client > Reporter: Alberto Bustamante Reyes > Assignee: Alberto Bustamante Reyes > Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > Attachments: main.cpp > > > I have seen a problem when "PdxInstanceFactory::create()" is called by > several threads that are registering the same new pdx type. > The core is produced here: > {code} > void PdxInstanceImpl::toDataMutable(PdxWriter& writer) { > auto pt = getPdxType(); > std::vector<std::shared_ptr<PdxFieldType>>* pdxFieldList = > pt->getPdxFieldTypes(); > {code} > The problem is that "getPdxType()" returns nullptr, so in the next line, > there is segmentation fault when calling "pt->getPdxFieldTypes()". > The issue can be reproduced using the attached client, and executing it using > 8 threads. This is the stack got in gdb: > {code} > #0 apache::geode::client::PdxType::getPdxFieldTypes (this=0x0) at > /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxType.hpp:178 > #1 0x00007f43dc4651b7 in > apache::geode::client::PdxInstanceImpl::toDataMutable (this=0x7f43c0001600, > writer=...) at > /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxInstanceImpl.cpp:1336 > #2 0x00007f43dc4650fd in apache::geode::client::PdxInstanceImpl::toData > (this=0x7f43c0001600, writer=...) at > /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxInstanceImpl.cpp:1327 > #3 0x00007f43dc444971 in apache::geode::client::PdxHelper::serializePdx > (output=..., pdxObject=warning: RTTI symbol not found for class > 'std::_Sp_counted_ptr_inplace<apache::geode::client::PdxInstanceImpl, > std::allocator<apache::geode::client::PdxInstanceImpl>, > (__gnu_cxx::_Lock_policy)2>' > warning: RTTI symbol not found for class > 'std::_Sp_counted_ptr_inplace<apache::geode::client::PdxInstanceImpl, > std::allocator<apache::geode::client::PdxInstanceImpl>, > (__gnu_cxx::_Lock_policy)2>' > std::shared_ptr<apache::geode::client::PdxSerializable> (use count 3, weak > count 0) = {...}) > at > /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxHelper.cpp:77 > #4 0x00007f43dc44b4bc in apache::geode::client::PdxInstanceFactory::create > (this=0x7f43c7ffecc8) at > /home/alb3rtobr/CLionProjects/Nordix/geode-native/cppcache/src/PdxInstanceFactory.cpp:53 > #5 0x000000000040de2f in doPut () at > /home/alb3rtobr/CLionProjects/dummy-client/main.cpp:60 > #6 0x0000000000427767 in std::__invoke_impl<void, void (*)()> > (__f=@0x2561aa8: 0x40d860 <doPut()>) at > /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/invoke.h:60 > #7 0x00000000004276fd in std::__invoke<void (*)()> (__fn=@0x2561aa8: > 0x40d860 <doPut()>) at > /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/invoke.h:95 > #8 0x00000000004276d5 in std::thread::_Invoker<std::tuple<void (*)()> > >::_M_invoke<0ul> (this=0x2561aa8) at > /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/thread:234 > #9 0x00000000004276a5 in std::thread::_Invoker<std::tuple<void (*)()> > >::operator() (this=0x2561aa8) at > /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/thread:243 > #10 0x0000000000427589 in > std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > > >::_M_run (this=0x2561aa0) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)