[jira] [Commented] (GEODE-8664) JGroups exception is not propagated

2020-11-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231974#comment-17231974
 ] 

ASF GitHub Bot commented on GEODE-8664:
---

gaussianrecurrence opened a new pull request #5751:
URL: https://github.com/apache/geode/pull/5751


- If while calling DistributionImpl.start there was either a
  MembershipConfigurationException or MemberStartupException exceptions, its
  original cause was not propagated, therefore being unable to properly 
tackle
  issues on startup.
- This commit propagates both exceptions.
- Also 2 junit test were added to make sure this is working.
- Also 2 DUnit tests were modified in order to integrate with this
  change.
   
   Note: After having to revert the original change in PR #5725 due to some
 issues with the CI system, test have been sorted out and everything
 is working.
   
   ---
   
   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   - [ ] Does `gradlew build` run cleanly?
   
   - [ ] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build 
issues and
   submit an update to your PR as soon as possible. If you need help, please 
send an
   email to d...@geode.apache.org.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> JGroups exception is not propagated
> ---
>
> Key: GEODE-8664
> URL: https://issues.apache.org/jira/browse/GEODE-8664
> Project: Geode
>  Issue Type: Bug
>  Components: membership, messaging
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Mario Salazar de Torres
>Assignee: Mario Salazar de Torres
>Priority: Minor
>  Labels: pull-request-available
> Attachments: GEODE-8664.patch
>
>
> *AS A* geode contributor
> *I WANT* exceptions thrown in DistributionImpl.start to be propagated
> *SO THAT* more information is provided while tackling issues.
>  
> 
> *Additional information:*
> After looking at an issue while starting a locator having to do with 
> JGroupMessenger I noticed that exceptions from Membership.start are not 
> correctly propagated in DistributionImpl.start method.
> To ilustrate the problem I attach below the reported exception while starting 
> the locator:
>  
> {noformat}
> locator is exiting due to an exception
> org.apache.geode.GemFireConfigException: unable to create jgroups channel
> at 
> org.apache.geode.distributed.internal.DistributionImpl.start(DistributionImpl.java:184)
> at 
> org.apache.geode.distributed.internal.DistributionImpl.createDistribution(DistributionImpl.java:222)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:464)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.(ClusterDistributionManager.java:497)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:326)
> at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:779)
> at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135)
> at 
> org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:3033)
> at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:290)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startDistributedSystem(InternalLocator.java:743)
> at 
> org.apache.geode.distributed.internal.InternalLocator.startLocator(InternalLocator.java:388)
> at 
> org.apache.geode.distributed.LocatorLauncher.start(LocatorLauncher.java:716)
> at org.apache.geode.distributed.LocatorLauncher.run(LocatorLauncher.java:623)
> at 
> org.apache.geode.distributed.LocatorLauncher.main

[jira] [Commented] (GEODE-8623) Timing between DNS and Geode startup can result in permanent unknown host exceptions.

2020-11-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232037#comment-17232037
 ] 

ASF GitHub Bot commented on GEODE-8623:
---

jinmeiliao commented on a change in pull request #5743:
URL: https://github.com/apache/geode/pull/5743#discussion_r523309085



##
File path: geode-common/src/main/java/org/apache/geode/internal/Retry.java
##
@@ -47,45 +47,44 @@ public void sleep(long sleepTime, TimeUnit sleepTimeUnit) 
throws InterruptedExce
 }
   }
 
-  private static final SteadyClock steadyClock = new SteadyClock();
+  private static final SteadyTimer steadyClock = new SteadyTimer();
 
   /**
* Try the supplier function until the predicate is true or timeout occurs.
*
* @param timeout to retry for
+   * @param timeoutUnit the unit for timeout
* @param interval time between each try
-   * @param timeUnit to retry for
+   * @param intervalUnit the unit for interval
* @param supplier to execute until predicate is true or times out
* @param predicate to test for retry
* @param  type of return value
* @return value from supplier after it passes predicate or times out.
*/
-  public static  T tryFor(long timeout,
-  long interval,
-  TimeUnit timeUnit,
+  public static  T tryFor(long timeout, TimeUnit timeoutUnit,
+  long interval, TimeUnit intervalUnit,
   Supplier supplier,
   Predicate predicate) throws TimeoutException, InterruptedException {
-return tryFor(timeout, interval, timeUnit, supplier, predicate, 
steadyClock);
+return tryFor(timeout, timeoutUnit, interval, intervalUnit, supplier, 
predicate, steadyClock);
   }
 
   @VisibleForTesting
-  static  T tryFor(long timeout,
-  long interval,
-  TimeUnit timeUnit,
+  static  T tryFor(long timeout, TimeUnit timeoutUnit,
+  long interval, TimeUnit intervalUnit,
   Supplier supplier,
   Predicate predicate,
-  Clock clock) throws TimeoutException, InterruptedException {
-long until = clock.nanoTime() + NANOSECONDS.convert(timeout, timeUnit);
+  Timer timer) throws TimeoutException, InterruptedException {
+long until = timer.nanoTime() + NANOSECONDS.convert(timeout, timeoutUnit);
 
 T value;
 do {
   value = supplier.get();
   if (predicate.test(value)) {
 return value;
   } else {
-clock.sleep(interval, timeUnit);
+timer.sleep(interval, intervalUnit);

Review comment:
   I am not sure I follow here. if we sleep past the timeout then we won't 
execute the predicate anymore, it would just end the while loop with 
TimeoutException. 
   
   Assuming the supplier and predicate can execute at a negligible period, 
timeout is 10 sec and sleep time is 3 sec. Sleeping in min time would sleep for 
3,3,3,1, while sleeping for just the sleeping period is 3,3,3,3. What I am 
saying the optimization only affects the last iteration and the only effect is 
that we throw the TimeoutException exactly at the timeout instead of past the 
timeout. The benefit is minimal AFAIK, unless I am missing something 
extraordinarily important.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Timing between DNS and Geode startup can result in permanent unknown host 
> exceptions.
> -
>
> Key: GEODE-8623
> URL: https://issues.apache.org/jira/browse/GEODE-8623
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.9.1, 1.10.0, 1.9.2, 1.11.0, 1.12.0, 1.13.0, 
> 1.14.0, 1.13.1
>Reporter: Jacob Barrett
>Priority: Minor
>  Labels: pull-request-available
>
> In a managed environment were local host name DNS entries and the startup of 
> Geode happen concurrently it is possible for Geode to fail name resolution in 
> the local hostname caching. If it fails to resolve the local hostname when 
> loading the caching utility class then any service dependent on this name 
> will fail without chance for recovery.
> {code}
> [error 2020/09/30 19:50:21.644 UTC  tid=0x1] Jmx manager could not be 
> started because java.net.UnknownHostException
> org.apache.geode.management.ManagementException: java.net.UnknownHostException
>   at 
> org.apache.geode.management.internal.ManagementAgent.startAgent(ManagementAgent.java:133)
>   at 
> org.apache.geode.management.internal.SystemManagementService.startManager(SystemManagementService.java:432)
>   at 
> org.apache.geode.management.internal.beans.ManagementAdapter.handleCacheCreation(ManagementAdapter.java:181)
> 

[jira] [Commented] (GEODE-8623) Timing between DNS and Geode startup can result in permanent unknown host exceptions.

2020-11-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232038#comment-17232038
 ] 

ASF GitHub Bot commented on GEODE-8623:
---

jinmeiliao commented on a change in pull request #5743:
URL: https://github.com/apache/geode/pull/5743#discussion_r523309085



##
File path: geode-common/src/main/java/org/apache/geode/internal/Retry.java
##
@@ -47,45 +47,44 @@ public void sleep(long sleepTime, TimeUnit sleepTimeUnit) 
throws InterruptedExce
 }
   }
 
-  private static final SteadyClock steadyClock = new SteadyClock();
+  private static final SteadyTimer steadyClock = new SteadyTimer();
 
   /**
* Try the supplier function until the predicate is true or timeout occurs.
*
* @param timeout to retry for
+   * @param timeoutUnit the unit for timeout
* @param interval time between each try
-   * @param timeUnit to retry for
+   * @param intervalUnit the unit for interval
* @param supplier to execute until predicate is true or times out
* @param predicate to test for retry
* @param  type of return value
* @return value from supplier after it passes predicate or times out.
*/
-  public static  T tryFor(long timeout,
-  long interval,
-  TimeUnit timeUnit,
+  public static  T tryFor(long timeout, TimeUnit timeoutUnit,
+  long interval, TimeUnit intervalUnit,
   Supplier supplier,
   Predicate predicate) throws TimeoutException, InterruptedException {
-return tryFor(timeout, interval, timeUnit, supplier, predicate, 
steadyClock);
+return tryFor(timeout, timeoutUnit, interval, intervalUnit, supplier, 
predicate, steadyClock);
   }
 
   @VisibleForTesting
-  static  T tryFor(long timeout,
-  long interval,
-  TimeUnit timeUnit,
+  static  T tryFor(long timeout, TimeUnit timeoutUnit,
+  long interval, TimeUnit intervalUnit,
   Supplier supplier,
   Predicate predicate,
-  Clock clock) throws TimeoutException, InterruptedException {
-long until = clock.nanoTime() + NANOSECONDS.convert(timeout, timeUnit);
+  Timer timer) throws TimeoutException, InterruptedException {
+long until = timer.nanoTime() + NANOSECONDS.convert(timeout, timeoutUnit);
 
 T value;
 do {
   value = supplier.get();
   if (predicate.test(value)) {
 return value;
   } else {
-clock.sleep(interval, timeUnit);
+timer.sleep(interval, intervalUnit);

Review comment:
   I am not sure I follow here. if we sleep past the timeout then we won't 
execute the predicate anymore, it would just end the while loop with 
TimeoutException. 
   
   Assuming the supplier and predicate can execute at a negligible period, 
timeout is 10 sec and sleep time is 3 sec. Sleeping in min time would sleep for 
3,3,3,1, while sleeping for just the sleeping period is 3,3,3,3. What I am 
saying is the optimization only affects the last iteration and the only effect 
is that we throw the TimeoutException exactly at the timeout instead of past 
the timeout. The benefit is minimal AFAIK, unless I am missing something 
extraordinarily important.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Timing between DNS and Geode startup can result in permanent unknown host 
> exceptions.
> -
>
> Key: GEODE-8623
> URL: https://issues.apache.org/jira/browse/GEODE-8623
> Project: Geode
>  Issue Type: Bug
>Affects Versions: 1.9.0, 1.9.1, 1.10.0, 1.9.2, 1.11.0, 1.12.0, 1.13.0, 
> 1.14.0, 1.13.1
>Reporter: Jacob Barrett
>Priority: Minor
>  Labels: pull-request-available
>
> In a managed environment were local host name DNS entries and the startup of 
> Geode happen concurrently it is possible for Geode to fail name resolution in 
> the local hostname caching. If it fails to resolve the local hostname when 
> loading the caching utility class then any service dependent on this name 
> will fail without chance for recovery.
> {code}
> [error 2020/09/30 19:50:21.644 UTC  tid=0x1] Jmx manager could not be 
> started because java.net.UnknownHostException
> org.apache.geode.management.ManagementException: java.net.UnknownHostException
>   at 
> org.apache.geode.management.internal.ManagementAgent.startAgent(ManagementAgent.java:133)
>   at 
> org.apache.geode.management.internal.SystemManagementService.startManager(SystemManagementService.java:432)
>   at 
> org.apache.geode.management.internal.beans.ManagementAdapter.handleCacheCreation(ManagementAdapter.java:181)

[jira] [Assigned] (GEODE-8714) Events stuck in parallel GW senders queue

2020-11-14 Thread Mario Ivanac (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mario Ivanac reassigned GEODE-8714:
---

Assignee: Mario Ivanac

> Events stuck in parallel GW senders queue
> -
>
> Key: GEODE-8714
> URL: https://issues.apache.org/jira/browse/GEODE-8714
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>
> While restarting servers and gateway senders, in some cases, events are stuck 
> in queue, and never unqueued.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (GEODE-8714) Events stuck in parallel GW senders queue

2020-11-14 Thread Mario Ivanac (Jira)
Mario Ivanac created GEODE-8714:
---

 Summary: Events stuck in parallel GW senders queue
 Key: GEODE-8714
 URL: https://issues.apache.org/jira/browse/GEODE-8714
 Project: Geode
  Issue Type: Bug
  Components: wan
Reporter: Mario Ivanac


While restarting servers and gateway senders, in some cases, events are stuck 
in queue, and never unqueued.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (GEODE-8714) Events stuck in parallel GW senders queue

2020-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/GEODE-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated GEODE-8714:
--
Labels: pull-request-available  (was: )

> Events stuck in parallel GW senders queue
> -
>
> Key: GEODE-8714
> URL: https://issues.apache.org/jira/browse/GEODE-8714
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>  Labels: pull-request-available
>
> While restarting servers and gateway senders, in some cases, events are stuck 
> in queue, and never unqueued.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-8714) Events stuck in parallel GW senders queue

2020-11-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232099#comment-17232099
 ] 

ASF GitHub Bot commented on GEODE-8714:
---

mivanac opened a new pull request #5752:
URL: https://github.com/apache/geode/pull/5752


   Do not review
   
   
   
   Thank you for submitting a contribution to Apache Geode.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [*] Is there a JIRA ticket associated with this PR? Is it referenced in 
the commit message?
   
   - [*] Has your PR been rebased against the latest commit within the target 
branch (typically `develop`)?
   
   - [*] Is your initial contribution a single, squashed commit?
   
   - [*] Does `gradlew build` run cleanly?
   
   - [ ] Have you written or updated unit tests to verify your changes?
   
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   
   ### Note:
   Please ensure that once the PR is submitted, check Concourse for build 
issues and
   submit an update to your PR as soon as possible. If you need help, please 
send an
   email to d...@geode.apache.org.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Events stuck in parallel GW senders queue
> -
>
> Key: GEODE-8714
> URL: https://issues.apache.org/jira/browse/GEODE-8714
> Project: Geode
>  Issue Type: Bug
>  Components: wan
>Reporter: Mario Ivanac
>Assignee: Mario Ivanac
>Priority: Major
>
> While restarting servers and gateway senders, in some cases, events are stuck 
> in queue, and never unqueued.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)