[jira] [Commented] (GEODE-8656) Ping not sent to the right gateway receiver endpoint when several receivers have the same hostname-for-senders value
[ https://issues.apache.org/jira/browse/GEODE-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240337#comment-17240337 ] ASF GitHub Bot commented on GEODE-8656: --- gesterzhou commented on pull request #5775: URL: https://github.com/apache/geode/pull/5775#issuecomment-735441681 I reviewed the code. It's the phase 2 of a good feature. The phase 1 is GEODE-7565. At that time, there's already a potential issue to break backward compatibility. Base on my limited knowledge, I feel that introducing ServerLocationAndMemberId to replace ServerLocation might not be the correct way. The better way should be to introduce a new field "String memberId" into the ServerLocation class, then use toDataPre_GEODE_1_14_0_0 and fromDataPre_GEODE_1_14_0_0 in ServerLocation to handle the new field for rollingupgrade. As for how to find such issue earlier in geode, we do have rollingupgrade dunit tests. The pull request does not have this type of test, then it has to rely on hydra to find the issue. It's not hydra's fault though. We should feel happy that hydra protected us and found the issue today. As for how to cooperate with remote committer, it's a long story. I'd like to cooperate with remote committer to enhance the GEODE-8656. But even I am not skillful to know the root cause and how to fix, as the rule of regression analysis, we should still revert first then re-fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Ping not sent to the right gateway receiver endpoint when several receivers > have the same hostname-for-senders value > > > Key: GEODE-8656 > URL: https://issues.apache.org/jira/browse/GEODE-8656 > Project: Geode > Issue Type: Bug > Components: wan >Reporter: Alberto Gomez >Assignee: Alberto Gomez >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > When several gateway receivers have the same value for hostname-for-senders > (for example when running Geode under kubernetes and a load balancer balances > the load among the remote servers), the ping messages sent from the gateway > sender client are only sent to the receiver (endpoint) to which the sender > connected first. > The other receivers will not get the ping message which will imply that the > connections towards them will be closed after the configured timeout expires. > This ticket is a continuation of the work done ticket GEODE-7565. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8656) Ping not sent to the right gateway receiver endpoint when several receivers have the same hostname-for-senders value
[ https://issues.apache.org/jira/browse/GEODE-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240346#comment-17240346 ] ASF GitHub Bot commented on GEODE-8656: --- gesterzhou edited a comment on pull request #5775: URL: https://github.com/apache/geode/pull/5775#issuecomment-735441681 I reviewed the code. It's the phase 2 of a good feature. The phase 1 is GEODE-7565. At that time, there's already a potential issue to break backward compatibility. Base on my limited knowledge, I feel that introducing ServerLocationAndMemberId to replace ServerLocation might not be the correct way. The better way should be to introduce a new field "String memberId" into the ServerLocation class, then use toDataPre_GEODE_1_14_0_0 and fromDataPre_GEODE_1_14_0_0 in ServerLocation to handle the new field for rollingupgrade. As for how to find such issue earlier in geode, we do have rollingupgrade dunit tests. The pull request does not have this type of test, then it has to rely on "private test" to find the issue. It's not the private test's fault though. We should feel happy that we found the issue today. As for how to cooperate with remote committer, it's a long story. I'd like to cooperate with remote committer to enhance the GEODE-8656. But even I am not skillful to know the root cause and how to fix, as the rule, we should still revert first then re-fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Ping not sent to the right gateway receiver endpoint when several receivers > have the same hostname-for-senders value > > > Key: GEODE-8656 > URL: https://issues.apache.org/jira/browse/GEODE-8656 > Project: Geode > Issue Type: Bug > Components: wan >Reporter: Alberto Gomez >Assignee: Alberto Gomez >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > When several gateway receivers have the same value for hostname-for-senders > (for example when running Geode under kubernetes and a load balancer balances > the load among the remote servers), the ping messages sent from the gateway > sender client are only sent to the receiver (endpoint) to which the sender > connected first. > The other receivers will not get the ping message which will imply that the > connections towards them will be closed after the configured timeout expires. > This ticket is a continuation of the work done ticket GEODE-7565. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8656) Ping not sent to the right gateway receiver endpoint when several receivers have the same hostname-for-senders value
[ https://issues.apache.org/jira/browse/GEODE-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240423#comment-17240423 ] ASF GitHub Bot commented on GEODE-8656: --- kohlmu-pivotal commented on pull request #5775: URL: https://github.com/apache/geode/pull/5775#issuecomment-735503985 I, like @onichols-pivotal do not see any problem with the original [PR.(https://github.com/apache/geode/pull/5670). It was even signed off by arguably one of the SMEs in this code area. If there is some private, non-Geode, related failure, then is this really a concern to Geode? ServerLocation is in the "internal" package space. Which means that any Geode committers/developers have fullest right and privilege to alter and modify its behavior without concern of breaking backward compatibility. @gesterzhou It seems there is a growing concern that this change has broken a rolling upgrade and backward compatibility. I think there is a much larger concern now. How did an internal class leak out of its "internal" containment? How did we not catch THAT? WHY would an internal class now break rolling upgrades? I fully support the reverting of this PR if we can prove that rolling upgrades will suffer from this change AND then raise a corresponding ticket to fix the underlying root cause, which it seems is the leaking of internal classes. I disagree with the proposed fix of "adding of the memberId String to the ServerLocation". Whilst it may "fix" the problem, it merely glosses over the fact that we seem to have a potentially larger problem here. @bschuchardt would you mind looking at this problem? As it seems that there is a larger problem here that would possibly not have been considered when we initially approved the [PR](https://github.com/apache/geode/pull/5670). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Ping not sent to the right gateway receiver endpoint when several receivers > have the same hostname-for-senders value > > > Key: GEODE-8656 > URL: https://issues.apache.org/jira/browse/GEODE-8656 > Project: Geode > Issue Type: Bug > Components: wan >Reporter: Alberto Gomez >Assignee: Alberto Gomez >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > When several gateway receivers have the same value for hostname-for-senders > (for example when running Geode under kubernetes and a load balancer balances > the load among the remote servers), the ping messages sent from the gateway > sender client are only sent to the receiver (endpoint) to which the sender > connected first. > The other receivers will not get the ping message which will imply that the > connections towards them will be closed after the configured timeout expires. > This ticket is a continuation of the work done ticket GEODE-7565. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-8532) Parse chunked replies in gnmsg tool
[ https://issues.apache.org/jira/browse/GEODE-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240438#comment-17240438 ] ASF GitHub Bot commented on GEODE-8532: --- pdxcodemonkey opened a new pull request #702: URL: https://github.com/apache/geode-native/pull/702 - This is still not a complete implementation, but it's close enough to go in, I think. It should handle all except cases where an app has chunked responses coming in on multiple simultaneous threads, which is (I think) kind of an obscure scenario at the moment. - The tool doesn't yet attempt to parse any of the content of a chunked response, but rather just prints a summary of the chunk info, e.g.: ``` { "message": { "Type": "RESPONSE", "Connection": "0x7fe7e9704320", "Direction": "<---", "Parts": 1, "TransactionId": -1, "ChunkInfo": [ { "Chunk0": { "ChunkLength": 2410, "Flags": 0 } }, { "Chunk1": { "ChunkLength": 682, "Flags": 1 } } ] } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Parse chunked replies in gnmsg tool > --- > > Key: GEODE-8532 > URL: https://issues.apache.org/jira/browse/GEODE-8532 > Project: Geode > Issue Type: Improvement > Components: native client >Reporter: Blake Bender >Assignee: Blake Bender >Priority: Major > > As a native client developer, I would like to be able to see all reply > messages from server to client when debugging with gnmsg. I can, in fact, > see replies/responses when they come back in a "complete" message, but at > present when a response is "chunked" gnmsg ignores it, so things like, for > example, `getAll()` responses don't show up in the message dump. This is > probably a complex task, and may require logging more data for chunk > responses in the C++ native client code, but it's important. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-8532) Parse chunked replies in gnmsg tool
[ https://issues.apache.org/jira/browse/GEODE-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated GEODE-8532: -- Labels: pull-request-available (was: ) > Parse chunked replies in gnmsg tool > --- > > Key: GEODE-8532 > URL: https://issues.apache.org/jira/browse/GEODE-8532 > Project: Geode > Issue Type: Improvement > Components: native client >Reporter: Blake Bender >Assignee: Blake Bender >Priority: Major > Labels: pull-request-available > > As a native client developer, I would like to be able to see all reply > messages from server to client when debugging with gnmsg. I can, in fact, > see replies/responses when they come back in a "complete" message, but at > present when a response is "chunked" gnmsg ignores it, so things like, for > example, `getAll()` responses don't show up in the message dump. This is > probably a complex task, and may require logging more data for chunk > responses in the C++ native client code, but it's important. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (GEODE-2896) CI Failure: ClassCastException in GMSMembershipManagerJUnitTest
[ https://issues.apache.org/jira/browse/GEODE-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235698#comment-17235698 ] Owen Nichols edited comment on GEODE-2896 at 11/30/20, 6:00 AM: I've pinned the 1.13 pipeline back to the known-working Oct 14 image for now was (Author: onichols-pivotal): I've pinned the pipeline back to the known-working Oct 14 image for now > CI Failure: ClassCastException in GMSMembershipManagerJUnitTest > > > Key: GEODE-2896 > URL: https://issues.apache.org/jira/browse/GEODE-2896 > Project: Geode > Issue Type: Bug > Components: tests >Reporter: Nabarun Nag >Priority: Major > Labels: ci > > {noformat} > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest > > testDirectChannelSendFailureDueToForcedDisconnect FAILED > java.lang.ClassCastException: > org.apache.geode.distributed.internal.LonerDistributionManager cannot be cast > to org.apache.geode.distributed.internal.DistributionManager > at > org.apache.geode.distributed.internal.HighPriorityAckedMessage.(HighPriorityAckedMessage.java:68) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest.testDirectChannelSendFailureDueToForcedDisconnect(GMSMembershipManagerJUnitTest.java:343) > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest > > testStartupEvents FAILED > java.lang.ClassCastException: > org.apache.geode.distributed.internal.LonerDistributionManager cannot be cast > to org.apache.geode.distributed.internal.DistributionManager > at > org.apache.geode.distributed.internal.HighPriorityAckedMessage.(HighPriorityAckedMessage.java:68) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest.testStartupEvents(GMSMembershipManagerJUnitTest.java:219) > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest > > testSendToEmptyListIsRejected FAILED > java.lang.ClassCastException: > org.apache.geode.distributed.internal.LonerDistributionManager cannot be cast > to org.apache.geode.distributed.internal.DistributionManager > at > org.apache.geode.distributed.internal.HighPriorityAckedMessage.(HighPriorityAckedMessage.java:68) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest.testSendToEmptyListIsRejected(GMSMembershipManagerJUnitTest.java:177) > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest > > testDirectChannelSendAllRecipients FAILED > java.lang.ClassCastException: > org.apache.geode.distributed.internal.LonerDistributionManager cannot be cast > to org.apache.geode.distributed.internal.DistributionManager > at > org.apache.geode.distributed.internal.HighPriorityAckedMessage.(HighPriorityAckedMessage.java:68) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest.testDirectChannelSendAllRecipients(GMSMembershipManagerJUnitTest.java:331) > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest > > testDirectChannelSend FAILED > java.lang.ClassCastException: > org.apache.geode.distributed.internal.LonerDistributionManager cannot be cast > to org.apache.geode.distributed.internal.DistributionManager > at > org.apache.geode.distributed.internal.HighPriorityAckedMessage.(HighPriorityAckedMessage.java:68) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest.testDirectChannelSend(GMSMembershipManagerJUnitTest.java:281) > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest > > testDirectChannelSendFailureToOneRecipient FAILED > java.lang.ClassCastException: > org.apache.geode.distributed.internal.LonerDistributionManager cannot be cast > to org.apache.geode.distributed.internal.DistributionManager > at > org.apache.geode.distributed.internal.HighPriorityAckedMessage.(HighPriorityAckedMessage.java:68) > at > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest.testDirectChannelSendFailureToOneRecipient(GMSMembershipManagerJUnitTest.java:294) > org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManagerJUnitTest > > testDirectChannelSendFailureToAll FAILED > java.lang.ClassCastException: > org.apache.geode.distributed.internal.LonerDistributionManager cannot be cast > to org.apache.geode.distributed.internal.DistributionManager > at > org.apache.geode.distributed.internal.HighPriorityAckedMessage.(HighPriorityAckedMessage.java:68) > at > o
[jira] [Commented] (GEODE-8687) Durable client is continuously re-registering CQs on all servers when event de-serialization fails causing resource exhaustion on servers
[ https://issues.apache.org/jira/browse/GEODE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240509#comment-17240509 ] ASF GitHub Bot commented on GEODE-8687: --- mkevo commented on a change in pull request #5730: URL: https://github.com/apache/geode/pull/5730#discussion_r532377042 ## File path: geode-cq/src/distributedTest/java/org/apache/geode/internal/cache/tier/sockets/DurableClientCQAutoSerializer.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.geode.internal.cache.tier.sockets; + +import static org.apache.geode.cache.Region.SEPARATOR; +import static org.apache.geode.test.awaitility.GeodeAwaitility.await; +import static org.assertj.core.api.Assertions.assertThat; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNotNull; + +import java.io.Serializable; +import java.util.Map; +import java.util.Objects; + +import com.google.common.collect.ImmutableMap; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +import org.apache.geode.cache.Region; +import org.apache.geode.cache.client.ClientCache; +import org.apache.geode.cache.client.ClientRegionFactory; +import org.apache.geode.cache.client.ClientRegionShortcut; +import org.apache.geode.cache.client.internal.PoolImpl; +import org.apache.geode.cache.query.CqAttributesFactory; +import org.apache.geode.cache.query.CqQuery; +import org.apache.geode.cache.query.QueryService; +import org.apache.geode.internal.cache.CacheServerImpl; +import org.apache.geode.pdx.ReflectionBasedAutoSerializer; +import org.apache.geode.pdx.internal.AutoSerializableManager; +import org.apache.geode.test.dunit.Invoke; +import org.apache.geode.test.dunit.rules.ClientVM; +import org.apache.geode.test.dunit.rules.ClusterStartupRule; +import org.apache.geode.test.dunit.rules.MemberVM; +import org.apache.geode.test.junit.categories.ClientSubscriptionTest; +import org.apache.geode.test.junit.categories.SerializationTest; +import org.apache.geode.test.junit.rules.GfshCommandRule; + +@Category({ClientSubscriptionTest.class, SerializationTest.class}) +public class DurableClientCQAutoSerializer implements Serializable { Review comment: Thanks Jakov! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Durable client is continuously re-registering CQs on all servers when event > de-serialization fails causing resource exhaustion on servers > -- > > Key: GEODE-8687 > URL: https://issues.apache.org/jira/browse/GEODE-8687 > Project: Geode > Issue Type: Bug > Components: client/server >Affects Versions: 1.13.0 >Reporter: Jakov Varenina >Assignee: Jakov Varenina >Priority: Major > Labels: pull-request-available > Attachments: deserialzationFault.log > > > When ReflectionBasedAutoSerializer is wrongly/not set it results with > serialization exception on client at the reception of the CQ events. > Serialization exception isn't logged which is misleading, and is hard to find > that actually ReflectionBasedAutoSerializer isn't set correctly. Only log > that can be seen is that client/servers subscription connections are closed > due to EOF. This is because client destroys subscriptions connections > intentionally, but doesn't log reason (PdxSerializationException) that led to > this. It would be good that serialization exceptions are logged as error or > warn. > Client destroys subscription connection and perform server fail-over whenever > serialization issue occurs. Additionally when subscription connection for > particular server fails multiple time