[ https://issues.apache.org/jira/browse/GEODE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238530#comment-17238530 ]
ASF GitHub Bot commented on GEODE-8623: --------------------------------------- jinmeiliao commented on a change in pull request #5743: URL: https://github.com/apache/geode/pull/5743#discussion_r530120553 ########## File path: geode-common/src/main/java/org/apache/geode/internal/Retry.java ########## @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.geode.internal; + +import static java.util.concurrent.TimeUnit.NANOSECONDS; + +import java.util.concurrent.TimeUnit; +import java.util.concurrent.TimeoutException; +import java.util.function.Predicate; +import java.util.function.Supplier; + +import org.apache.geode.annotations.VisibleForTesting; + +/** + * Utility class for retrying operations. + */ +public class Retry { + + interface Timer { + long nanoTime(); + + void sleep(long sleepTimeInNano) throws InterruptedException; + } + + static class SteadyTimer implements Timer { + @Override + public long nanoTime() { + return System.nanoTime(); + } + + @Override + public void sleep(long sleepTimeInNano) throws InterruptedException { + long millis = NANOSECONDS.toMillis(sleepTimeInNano); + // avoid throwing IllegalArgumentException + if (millis > 0) { + Thread.sleep(millis); + } + } + } + + private static final SteadyTimer steadyClock = new SteadyTimer(); + + /** + * Try the supplier function until the predicate is true or timeout occurs. + * + * @param timeout to retry for + * @param timeoutUnit the unit for timeout + * @param interval time between each try + * @param intervalUnit the unit for interval + * @param supplier to execute until predicate is true or times out + * @param predicate to test for retry + * @param <T> type of return value + * @return value from supplier after it passes predicate or times out. + */ + public static <T> T tryFor(long timeout, TimeUnit timeoutUnit, + long interval, TimeUnit intervalUnit, + Supplier<T> supplier, + Predicate<T> predicate) throws TimeoutException, InterruptedException { + return tryFor(timeout, timeoutUnit, interval, intervalUnit, supplier, predicate, steadyClock); + } + + @VisibleForTesting + static <T> T tryFor(long timeout, TimeUnit timeoutUnit, + long interval, TimeUnit intervalUnit, + Supplier<T> supplier, + Predicate<T> predicate, + Timer timer) throws TimeoutException, InterruptedException { + long until = timer.nanoTime() + NANOSECONDS.convert(timeout, timeoutUnit); + long intervalNano = NANOSECONDS.convert(interval, intervalUnit); + + T value; + for (;;) { + value = supplier.get(); + if (predicate.test(value)) { + return value; + } else { + // if there is still more time left after we sleep for interval period, then sleep and retry + // otherwise break out and throw TimeoutException + if ((timer.nanoTime() + intervalNano) < until) { Review comment: > A user of `tryFor()` does not know (usually cannot know) how much clock time their supplier and predicate will take. So they won't be able to do the reasoning you just did there. I think what they'll usually do is try to pick `timeout` much larger than the (supplier+predicate) time. > > We can presume that they will usually try to set `timeout` long enough so that at least one retry (two tries) can happen. If we let`timeout` mean: keep trying until `timeout` has elapsed, then the "error" term I referred to in my previous comment is the difference between `timeout` and the actual time of the final attempt. As a user I'd expect `tryFor()` to make a reasonable effort to minimize that error. No matter how long the `timeout` to be or how short the `interval` to be, a user can NOT "reasonably" assume two tries should happen at all. Even if I set timeout to be 100 seconds, and interval to be 1 second, there are plenty of chances that I could try only one time and the timeout period has already elapsed. > > Maybe it would help if, in my proposal,`interval` was interpreted as "typical sleep time". For all but the last iteration `interval` is the actual sleep time. Usually, in the last interval (under my proposal) `sleepNanos` will be less than `maximumSleepTime`. > > Is there some reason why you object to (usually) sleeping a shorter time before the final try? Do you think that calling `min()` and capturing the result in a variable (`sleepNanos`) is overly burdensome? Do you see any value in making some effort to make the timing of the actual final attempt correspond to `timeout`? My objection is : 1. there is no point in sleeping if we know we are going to timeout at the end of sleep. 2. It feels even more counter-intuitive that we know we've already reached the end of the timeout period and then still go ahead and try again. It's actually wrong. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Timing between DNS and Geode startup can result in permanent unknown host > exceptions. > ------------------------------------------------------------------------------------- > > Key: GEODE-8623 > URL: https://issues.apache.org/jira/browse/GEODE-8623 > Project: Geode > Issue Type: Bug > Affects Versions: 1.9.0, 1.9.1, 1.10.0, 1.9.2, 1.11.0, 1.12.0, 1.13.0, > 1.14.0, 1.13.1 > Reporter: Jacob Barrett > Priority: Minor > Labels: pull-request-available > > In a managed environment were local host name DNS entries and the startup of > Geode happen concurrently it is possible for Geode to fail name resolution in > the local hostname caching. If it fails to resolve the local hostname when > loading the caching utility class then any service dependent on this name > will fail without chance for recovery. > {code} > [error 2020/09/30 19:50:21.644 UTC <main> tid=0x1] Jmx manager could not be > started because java.net.UnknownHostException > org.apache.geode.management.ManagementException: java.net.UnknownHostException > at > org.apache.geode.management.internal.ManagementAgent.startAgent(ManagementAgent.java:133) > at > org.apache.geode.management.internal.SystemManagementService.startManager(SystemManagementService.java:432) > at > org.apache.geode.management.internal.beans.ManagementAdapter.handleCacheCreation(ManagementAdapter.java:181) > at > org.apache.geode.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:127) > at > org.apache.geode.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2063) > at > org.apache.geode.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:606) > at > org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1239) > at > org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:219) > at > org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:171) > at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142) > at > org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52) > at > org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:887) > at > org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:803) > at > org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:732) > at > org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:251) > Caused by: java.net.UnknownHostException > at > org.apache.geode.internal.net.SocketCreator.getLocalHost(SocketCreator.java:285) > at > org.apache.geode.management.internal.ManagementAgent.configureAndStart(ManagementAgent.java:310) > at > org.apache.geode.management.internal.ManagementAgent.startAgent(ManagementAgent.java:131) > ... 14 more > [error 2020/09/30 19:50:21.724 UTC <main> tid=0x1] > org.apache.geode.management.ManagementException: java.net.UnknownHostException > Exception in thread "main" org.apache.geode.management.ManagementException: > java.net.UnknownHostException > at > org.apache.geode.management.internal.ManagementAgent.startAgent(ManagementAgent.java:133) > at > org.apache.geode.management.internal.SystemManagementService.startManager(SystemManagementService.java:432) > at > org.apache.geode.management.internal.beans.ManagementAdapter.handleCacheCreation(ManagementAdapter.java:181) > at > org.apache.geode.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:127) > at > org.apache.geode.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2063) > at > org.apache.geode.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:606) > at > org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1239) > at > org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:219) > at > org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:171) > at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142) > at > org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52) > at > org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:887) > at > org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:803) > at > org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:732) > at > org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:251) > Caused by: java.net.UnknownHostException > at > org.apache.geode.internal.net.SocketCreator.getLocalHost(SocketCreator.java:285) > at > org.apache.geode.management.internal.ManagementAgent.configureAndStart(ManagementAgent.java:310) > at > org.apache.geode.management.internal.ManagementAgent.startAgent(ManagementAgent.java:131) > ... 14 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)