[jira] [Commented] (GEODE-8656) Ping not sent to the right gateway receiver endpoint when several receivers have the same hostname-for-senders value

2020-11-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240337#comment-17240337
 ] 

ASF GitHub Bot commented on GEODE-8656:
---

gesterzhou commented on pull request #5775:
URL: https://github.com/apache/geode/pull/5775#issuecomment-735441681


   I reviewed the code. It's the phase 2 of a good feature. The phase 1 is 
GEODE-7565. At that time, there's already a potential issue to break backward 
compatibility. Base on my limited knowledge, I feel that introducing 
ServerLocationAndMemberId to replace ServerLocation might not be the correct 
way. The better way should be to introduce a new field "String memberId" into 
the ServerLocation class, then use toDataPre_GEODE_1_14_0_0 and 
fromDataPre_GEODE_1_14_0_0 in ServerLocation to handle the new field for 
rollingupgrade. 
   
   As for how to find such issue earlier in geode, we do have rollingupgrade 
dunit tests. The pull request does not have this type of test, then it has to 
rely on hydra to find the issue. It's not hydra's fault though. We should feel 
happy that hydra protected us and found the issue today. 
   
   As for how to cooperate with remote committer, it's a long story. I'd like 
to cooperate with remote committer to enhance the GEODE-8656. But even I am not 
skillful to know the root cause and how to fix, as the rule of regression 
analysis, we should still revert first then re-fix. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ping not sent to the right gateway receiver endpoint when several receivers 
> have the same hostname-for-senders value
> 
>
> Key: GEODE-8656
> URL: https://issues.apache.org/jira/browse/GEODE-8656
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Alberto Gomez
>Assignee: Alberto Gomez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> When several gateway receivers have the same value for hostname-for-senders 
> (for example when running Geode under kubernetes and a load balancer balances 
> the load among the remote servers), the ping messages sent from the gateway 
> sender client are only sent to the receiver (endpoint) to which the sender 
> connected first.
> The other receivers will not get the ping message which will imply that the 
> connections towards them will be closed after the configured timeout expires.
> This ticket is a continuation of the work done ticket GEODE-7565.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8656) Ping not sent to the right gateway receiver endpoint when several receivers have the same hostname-for-senders value

2020-11-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240346#comment-17240346
 ] 

ASF GitHub Bot commented on GEODE-8656:
---

gesterzhou edited a comment on pull request #5775:
URL: https://github.com/apache/geode/pull/5775#issuecomment-735441681


   I reviewed the code. It's the phase 2 of a good feature. The phase 1 is 
GEODE-7565. At that time, there's already a potential issue to break backward 
compatibility. Base on my limited knowledge, I feel that introducing 
ServerLocationAndMemberId to replace ServerLocation might not be the correct 
way. The better way should be to introduce a new field "String memberId" into 
the ServerLocation class, then use toDataPre_GEODE_1_14_0_0 and 
fromDataPre_GEODE_1_14_0_0 in ServerLocation to handle the new field for 
rollingupgrade. 
   
   As for how to find such issue earlier in geode, we do have rollingupgrade 
dunit tests. The pull request does not have this type of test, then it has to 
rely on "private test" to find the issue. It's not the private test's fault 
though. We should feel happy that we found the issue today. 
   
   As for how to cooperate with remote committer, it's a long story. I'd like 
to cooperate with remote committer to enhance the GEODE-8656. But even I am not 
skillful to know the root cause and how to fix, as the rule, we should still 
revert first then re-fix. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ping not sent to the right gateway receiver endpoint when several receivers 
> have the same hostname-for-senders value
> 
>
> Key: GEODE-8656
> URL: https://issues.apache.org/jira/browse/GEODE-8656
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Alberto Gomez
>Assignee: Alberto Gomez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> When several gateway receivers have the same value for hostname-for-senders 
> (for example when running Geode under kubernetes and a load balancer balances 
> the load among the remote servers), the ping messages sent from the gateway 
> sender client are only sent to the receiver (endpoint) to which the sender 
> connected first.
> The other receivers will not get the ping message which will imply that the 
> connections towards them will be closed after the configured timeout expires.
> This ticket is a continuation of the work done ticket GEODE-7565.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8656) Ping not sent to the right gateway receiver endpoint when several receivers have the same hostname-for-senders value

2020-11-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240423#comment-17240423
 ] 

ASF GitHub Bot commented on GEODE-8656:
---

kohlmu-pivotal commented on pull request #5775:
URL: https://github.com/apache/geode/pull/5775#issuecomment-735503985


   I, like @onichols-pivotal do not see any problem with the original 
[PR.(https://github.com/apache/geode/pull/5670). It was even signed off by 
arguably one of the SMEs in this code area.
   
   If there is some private, non-Geode, related failure, then is this really a 
concern to Geode? ServerLocation is in the "internal" package space. Which 
means that any Geode committers/developers have fullest right and privilege to 
alter and modify its behavior without concern of breaking backward 
compatibility.
   
   @gesterzhou It seems there is a growing concern that this change has broken 
a rolling upgrade and backward compatibility. I think there is a much larger 
concern now. How did an internal class leak out of its "internal" containment? 
How did we not catch THAT? WHY would an internal class now break rolling 
upgrades? 
   
   I fully support the reverting of this PR if we can prove that rolling 
upgrades will suffer from this change AND then raise a corresponding ticket to 
fix the underlying root cause, which it seems is the leaking of internal 
classes.
   
   I disagree with the proposed fix of "adding of the memberId String to the 
ServerLocation". Whilst it may "fix" the problem, it merely glosses over the 
fact that we seem to have a potentially larger problem here.
   
   @bschuchardt would you mind looking at this problem? As it seems that there 
is a larger problem here that would possibly not have been considered when we 
initially approved the [PR](https://github.com/apache/geode/pull/5670). 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ping not sent to the right gateway receiver endpoint when several receivers 
> have the same hostname-for-senders value
> 
>
> Key: GEODE-8656
> URL: https://issues.apache.org/jira/browse/GEODE-8656
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Alberto Gomez
>Assignee: Alberto Gomez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> When several gateway receivers have the same value for hostname-for-senders 
> (for example when running Geode under kubernetes and a load balancer balances 
> the load among the remote servers), the ping messages sent from the gateway 
> sender client are only sent to the receiver (endpoint) to which the sender 
> connected first.
> The other receivers will not get the ping message which will imply that the 
> connections towards them will be closed after the configured timeout expires.
> This ticket is a continuation of the work done ticket GEODE-7565.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8532) Parse chunked replies in gnmsg tool

2020-11-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240438#comment-17240438
 ] 

ASF GitHub Bot commented on GEODE-8532:
---

pdxcodemonkey opened a new pull request #702:
URL: https://github.com/apache/geode-native/pull/702


   - This is still not a complete implementation, but it's close enough to go 
in, I think.  It should handle all except cases where an app has chunked 
responses coming in on multiple simultaneous threads, which is (I think) kind 
of an obscure scenario at the moment.
   - The tool doesn't yet attempt to parse any of the content of a chunked 
response, but rather just prints a summary of the chunk info, e.g.:
   
   ```
   {
 "message": {
   "Type": "RESPONSE",
   "Connection": "0x7fe7e9704320",
   "Direction": "<---",
   "Parts": 1,
   "TransactionId": -1,
   "ChunkInfo": [
 {
   "Chunk0": {
 "ChunkLength": 2410,
 "Flags": 0
   }
 },
 {
   "Chunk1": {
 "ChunkLength": 682,
 "Flags": 1
   }
 }
   ]
 }
   }
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Parse chunked replies in gnmsg tool
> ---
>
> Key: GEODE-8532
> URL: https://issues.apache.org/jira/browse/GEODE-8532
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Reporter: Blake Bender
>Assignee: Blake Bender
>Priority: Major
>
> As a native client developer, I would like to be able to see all reply 
> messages from server to client when debugging with gnmsg.  I can, in fact, 
> see replies/responses when they come back in a "complete" message, but at 
> present when a response is "chunked" gnmsg ignores it, so things like, for 
> example, `getAll()` responses don't show up in the message dump.  This is 
> probably a complex task, and may require logging more data for chunk 
> responses in the C++ native client code, but it's important.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8532) Parse chunked replies in gnmsg tool

2020-11-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-8532:
--
Labels: pull-request-available  (was: )

> Parse chunked replies in gnmsg tool
> ---
>
> Key: GEODE-8532
> URL: https://issues.apache.org/jira/browse/GEODE-8532
> Project: Geode
>  Issue Type: Improvement
>  Components: native client
>Reporter: Blake Bender
>Assignee: Blake Bender
>Priority: Major
>  Labels: pull-request-available
>
> As a native client developer, I would like to be able to see all reply 
> messages from server to client when debugging with gnmsg.  I can, in fact, 
> see replies/responses when they come back in a "complete" message, but at 
> present when a response is "chunked" gnmsg ignores it, so things like, for 
> example, `getAll()` responses don't show up in the message dump.  This is 
> probably a complex task, and may require logging more data for chunk 
> responses in the C++ native client code, but it's important.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (GEODE-2896) CI Failure: ClassCastException in GMSMembershipManagerJUnitTest

2020-11-29 Thread Owen Nichols (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235698#comment-17235698
 ] 

Owen Nichols edited comment on GEODE-2896 at 11/30/20, 6:00 AM:


I've pinned the 1.13 pipeline back to the known-working Oct 14 image for now


was (Author: onichols-pivotal):
I've pinned the pipeline back to the known-working Oct 14 image for now

> CI Failure:  ClassCastException in GMSMembershipManagerJUnitTest
> 
>
> Key: GEODE-2896
> URL: https://issues.apache.org/jira/browse/GEODE-2896
> Project: Geode
>  Issue Type: Bug
>  Components: tests
>Reporter: Nabarun Nag
>Priority: Major
>  Labels: ci
>
> {noformat}
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest
>  > testDirectChannelSendFailureDueToForcedDisconnect FAILED
> java.lang.ClassCastException: 
> org.apache.geode.distributed.internal.LonerDistributionManager cannot be cast 
> to org.apache.geode.distributed.internal.DistributionManager
> at 
> org.apache.geode.distributed.internal.HighPriorityAckedMessage.(HighPriorityAckedMessage.java:68)
> at 
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest.testDirectChannelSendFailureDueToForcedDisconnect(GMSMembershipManagerJUnitTest.java:343)
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest
>  > testStartupEvents FAILED
> java.lang.ClassCastException: 
> org.apache.geode.distributed.internal.LonerDistributionManager cannot be cast 
> to org.apache.geode.distributed.internal.DistributionManager
> at 
> org.apache.geode.distributed.internal.HighPriorityAckedMessage.(HighPriorityAckedMessage.java:68)
> at 
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest.testStartupEvents(GMSMembershipManagerJUnitTest.java:219)
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest
>  > testSendToEmptyListIsRejected FAILED
> java.lang.ClassCastException: 
> org.apache.geode.distributed.internal.LonerDistributionManager cannot be cast 
> to org.apache.geode.distributed.internal.DistributionManager
> at 
> org.apache.geode.distributed.internal.HighPriorityAckedMessage.(HighPriorityAckedMessage.java:68)
> at 
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest.testSendToEmptyListIsRejected(GMSMembershipManagerJUnitTest.java:177)
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest
>  > testDirectChannelSendAllRecipients FAILED
> java.lang.ClassCastException: 
> org.apache.geode.distributed.internal.LonerDistributionManager cannot be cast 
> to org.apache.geode.distributed.internal.DistributionManager
> at 
> org.apache.geode.distributed.internal.HighPriorityAckedMessage.(HighPriorityAckedMessage.java:68)
> at 
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest.testDirectChannelSendAllRecipients(GMSMembershipManagerJUnitTest.java:331)
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest
>  > testDirectChannelSend FAILED
> java.lang.ClassCastException: 
> org.apache.geode.distributed.internal.LonerDistributionManager cannot be cast 
> to org.apache.geode.distributed.internal.DistributionManager
> at 
> org.apache.geode.distributed.internal.HighPriorityAckedMessage.(HighPriorityAckedMessage.java:68)
> at 
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest.testDirectChannelSend(GMSMembershipManagerJUnitTest.java:281)
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest
>  > testDirectChannelSendFailureToOneRecipient FAILED
> java.lang.ClassCastException: 
> org.apache.geode.distributed.internal.LonerDistributionManager cannot be cast 
> to org.apache.geode.distributed.internal.DistributionManager
> at 
> org.apache.geode.distributed.internal.HighPriorityAckedMessage.(HighPriorityAckedMessage.java:68)
> at 
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest.testDirectChannelSendFailureToOneRecipient(GMSMembershipManagerJUnitTest.java:294)
> org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest
>  > testDirectChannelSendFailureToAll FAILED
> java.lang.ClassCastException: 
> org.apache.geode.distributed.internal.LonerDistributionManager cannot be cast 
> to org.apache.geode.distributed.internal.DistributionManager
> at 
> org.apache.geode.distributed.internal.HighPriorityAckedMessage.(HighPriorityAckedMessage.java:68)
> at 
> o

[jira] [Commented] (GEODE-8687) Durable client is continuously re-registering CQs on all servers when event de-serialization fails causing resource exhaustion on servers

2020-11-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240509#comment-17240509
 ] 

ASF GitHub Bot commented on GEODE-8687:
---

mkevo commented on a change in pull request #5730:
URL: https://github.com/apache/geode/pull/5730#discussion_r532377042



##
File path: 
geode-cq/src/distributedTest/java/org/apache/geode/internal/cache/tier/sockets/DurableClientCQAutoSerializer.java
##
@@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
+ * agreements. See the NOTICE file distributed with this work for additional 
information regarding
+ * copyright ownership. The ASF licenses this file to You under the Apache 
License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the 
License. You may obtain a
+ * copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software 
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
KIND, either express
+ * or implied. See the License for the specific language governing permissions 
and limitations under
+ * the License.
+ */
+package org.apache.geode.internal.cache.tier.sockets;
+
+import static org.apache.geode.cache.Region.SEPARATOR;
+import static org.apache.geode.test.awaitility.GeodeAwaitility.await;
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+
+import java.io.Serializable;
+import java.util.Map;
+import java.util.Objects;
+
+import com.google.common.collect.ImmutableMap;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import org.apache.geode.cache.Region;
+import org.apache.geode.cache.client.ClientCache;
+import org.apache.geode.cache.client.ClientRegionFactory;
+import org.apache.geode.cache.client.ClientRegionShortcut;
+import org.apache.geode.cache.client.internal.PoolImpl;
+import org.apache.geode.cache.query.CqAttributesFactory;
+import org.apache.geode.cache.query.CqQuery;
+import org.apache.geode.cache.query.QueryService;
+import org.apache.geode.internal.cache.CacheServerImpl;
+import org.apache.geode.pdx.ReflectionBasedAutoSerializer;
+import org.apache.geode.pdx.internal.AutoSerializableManager;
+import org.apache.geode.test.dunit.Invoke;
+import org.apache.geode.test.dunit.rules.ClientVM;
+import org.apache.geode.test.dunit.rules.ClusterStartupRule;
+import org.apache.geode.test.dunit.rules.MemberVM;
+import org.apache.geode.test.junit.categories.ClientSubscriptionTest;
+import org.apache.geode.test.junit.categories.SerializationTest;
+import org.apache.geode.test.junit.rules.GfshCommandRule;
+
+@Category({ClientSubscriptionTest.class, SerializationTest.class})
+public class DurableClientCQAutoSerializer implements Serializable {

Review comment:
   Thanks Jakov!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Durable client is continuously re-registering CQs on all servers when event 
> de-serialization fails causing resource exhaustion on servers 
> --
>
> Key: GEODE-8687
> URL: https://issues.apache.org/jira/browse/GEODE-8687
> Project: Geode
>  Issue Type: Bug
>  Components: client/server
>Affects Versions: 1.13.0
>Reporter: Jakov Varenina
>Assignee: Jakov Varenina
>Priority: Major
>  Labels: pull-request-available
> Attachments: deserialzationFault.log
>
>
> When ReflectionBasedAutoSerializer is wrongly/not set it results with 
> serialization exception on client at the reception of the CQ events. 
> Serialization exception isn't logged which is misleading, and is hard to find 
> that actually ReflectionBasedAutoSerializer isn't set correctly. Only log 
> that can be seen is that client/servers subscription connections are closed 
> due to EOF. This is because client destroys subscriptions connections 
> intentionally, but doesn't log reason (PdxSerializationException) that led to 
> this. It would be good that serialization exceptions are logged as error or 
> warn.
> Client destroys subscription connection and perform server fail-over whenever 
> serialization issue occurs. Additionally when subscription connection for 
> particular server fails multiple time