[ https://issues.apache.org/jira/browse/GEODE-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545688#comment-17545688 ]
ASF subversion and git services commented on GEODE-9615: -------------------------------------------------------- Commit 495f3b0cffe0d0200521c266ea26c9e53cf6d629 in geode's branch refs/heads/develop from Kirk Lund [ https://gitbox.apache.org/repos/asf?p=geode.git;h=495f3b0cff ] GEODE-10327: Overhaul GfshRule to kill processes and save artifacts (#7758) PROBLEM Tests that use GfshRule leave behind orphaned processes and do not save artifacts for debugging failures. SOLUTION GfshRule needs to cleanup all processes it forks. It also needs to save off all runtime artifacts such as logging, stats, pid files, diskstores to enable debugging of test failures. DETAILS Enhance GfshRule and modify all tests using it for proper debugging and to prevent test pollution. Overhaul of GfshRule: * kill ALL geode processes during cleanup * use FolderRule to ensure all logs and files are properly saved off when a test fails * extract GfshExecutor from JUnit rule code * GfshExecutor allows a test to use any number of Geode versions with just one GfshRule * add Gfsh log level support for easier debugging * add support for new VmConfiguration to allow control over Geode and Java versions * overhaul API of GfshRule and companion classes for better consistency and design New FolderRule: * replaces TemporaryFolder and saves off all content when a test fails * creates root directory under the gradle worker instead of under temp Update HTTP session caching module tests: * use new FolderRule to save all artifacts when a test fails * use nio Paths for filesystem variables Update acceptance and upgrade tests that use GfshRule: * use new improved GfshRule and GfshExecutor * use new FolderRule instead of TemporaryFolder to save all artifacts when a test fails * use --disable-default-server in tests with no clients * fix flakiness of many tests by using random ports instead of default or hardcoded port values * reformat GfshRule API usage in tests to improve readability and consistency * add GfshStopper to provide common place to await process stop (stop locator/server is async so restarting with same ports is very prone to hitting BindExceptions) Update ProcessUtils: * extract NativeProcessUtils and make it public for direct use * rename InternalProcessUtils as ProcessUtilsProvider and move to its own class * rethrow IOExceptions as UncheckedIOExceptions * fix flakiness in NativeProcessUtilsTest by moving findAvailablePid into test method Minor changes: * improve code formatting and readability * convert from old io File to nio Path APIs as much as possible * close output streams to fix filesystem issues on Windows Fixes flaky test tickets: * DeployJarAcceptanceTest GEODE-9615 * possibly other tests that uses GfshRule Changes for resubmit: * log error message if unable to delete folder * keep default constructor on GfshRule * ensure IO streams have proper error handling and don't cause failures on windows Changes to build pipelines: * make jdk17 tests gating NOTES The labels jdk8, jdk17, windows, windows-jdk8 and windows-jdk17 were used to run tests on more environments. This PR contains mostly test and framework changes. The only product code altered is ServerLauncher and several classes in org.apache.geode.internal.process, all of which is in geode-core. > CI Failure: Acceptance Tests fails with exit value 1 from start locator or > start server > --------------------------------------------------------------------------------------- > > Key: GEODE-9615 > URL: https://issues.apache.org/jira/browse/GEODE-9615 > Project: Geode > Issue Type: Bug > Components: tests > Reporter: Kirk Lund > Assignee: Kirk Lund > Priority: Major > > This failure occurs because the locator or server was stopped and then > immediately restarted with the same ports. When Gfsh returns from stop > locator or stop server, the stopped process is asynchronously stopping and > may continue to hold those ports when the next start command for that process > is issued. It then fails with an exit value of 1 instead of the expected > value of 0. > Any test using GfshRule to stop and then immediately start a new process may > fail in this way. The underlying exception in the locator or server log is a > BindException because the port is still in use by the previous instance of > that process which is still in the process of stopping. > The only way to close this gap is to have the test get the pid for the > process being stopped and then await until the process identified by that pid > no longer exists. > {code:java} > org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest > > onlineStatusCommandShouldSucceedWhenConnected_locator_host_and_port FAILED > org.junit.ComparisonFailure: [Exit value from process started by > [test-frame: gfsh -e connect --locator=localhost[20608] -e status locator > --host=localhost --port=20608]] expected:<[0]> but was:<[1]> > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:137) > at > org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:128) > at > org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:133) > at > org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.executeScriptWithExpectedExitCode(StatusLocatorExitCodeAcceptanceTest.java:255) > at > org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.onlineStatusCommandShouldSucceedWhenConnected_locator_host_and_port(StatusLocatorExitCodeAcceptanceTest.java:128) > org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest > > offlineStatusCommandShouldSucceedWhenConnected_locator_dir FAILED > org.junit.ComparisonFailure: [Exit value from process started by > [test-frame: gfsh -e connect --locator=localhost[20608] -e status locator > --dir=/tmp/junit11722670533134972918/member-controller/locator-chase-obedient-cake]] > expected:<[0]> but was:<[1]> > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:137) > at > org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:128) > at > org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:133) > at > org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.executeScriptWithExpectedExitCode(StatusLocatorExitCodeAcceptanceTest.java:255) > at > org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.offlineStatusCommandShouldSucceedWhenConnected_locator_dir(StatusLocatorExitCodeAcceptanceTest.java:140) > org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest > > onlineStatusCommandShouldSucceedWhenConnected_locator_name FAILED > org.junit.ComparisonFailure: [Exit value from process started by > [test-frame: gfsh -e connect --locator=localhost[20608] -e status locator > --name=locator-chase-obedient-cake]] expected:<[0]> but was:<[1]> > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:137) > at > org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:128) > at > org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:133) > at > org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.executeScriptWithExpectedExitCode(StatusLocatorExitCodeAcceptanceTest.java:255) > at > org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.onlineStatusCommandShouldSucceedWhenConnected_locator_name(StatusLocatorExitCodeAcceptanceTest.java:116) > org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest > > onlineStatusCommandShouldSucceedWhenConnected_locator_port FAILED > org.junit.ComparisonFailure: [Exit value from process started by > [test-frame: gfsh -e connect --locator=localhost[20608] -e status locator > --port=20608]] expected:<[0]> but was:<[1]> > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:137) > at > org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:128) > at > org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:133) > at > org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.executeScriptWithExpectedExitCode(StatusLocatorExitCodeAcceptanceTest.java:255) > at > org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.onlineStatusCommandShouldSucceedWhenConnected_locator_port(StatusLocatorExitCodeAcceptanceTest.java:122) > {code} > {noformat} > org.apache.geode.modules.DeployJarAcceptanceTest > classMethod FAILED > org.junit.ComparisonFailure: [Exit value from process started by > [41497e8cf7689a63: gfsh -e start locator --name=locator -e configure pdx > --read-serialized=true -e start server --name=server > --locators=localhost[10334]]] expected:<[0]> but was:<[1]> > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:103) > at > org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:143) > at > org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:152) > at > org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:153) > at > org.apache.geode.modules.DeployJarAcceptanceTest.setup(DeployJarAcceptanceTest.java:62) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)