[ 
https://issues.apache.org/jira/browse/GEODE-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542171#comment-17542171
 ] 

ASF subversion and git services commented on GEODE-9615:
--------------------------------------------------------

Commit 774505e7c74cff8c572be1ec4f4bb2b0f3e1a091 in geode's branch 
refs/heads/develop from Kirk Lund
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=774505e7c7 ]

GEODE-10327: Overhaul GfshRule to kill processes and save artifacts for 
failures (#7571)

PROBLEM

Tests that use GfshRule leave behind orphaned processes and do not save 
artifacts for debugging failures.

SOLUTION

GfshRule needs to cleanup all processes it forks. It also needs to save off all 
runtime artifacts such as logging, stats, pid files, diskstores to enable 
debugging of test failures.

DETAILS

Enhance GfshRule and modify all tests using it for proper debugging and to 
prevent test pollution.

Overhaul of GfshRule:

* kill ALL geode processes during cleanup
* use FolderRule to ensure all logs and files are properly saved off when a 
test fails
* extract GfshExecutor from JUnit rule code
* GfshExecutor allows a test to use any number of Geode versions with just one 
GfshRule
* add Gfsh log level support for easier debugging
* add support for new VmConfiguration to allow control over Geode and Java 
versions
* overhaul API of GfshRule and companion classes for better consistency and 
design

New FolderRule:

* replaces TemporaryFolder and saves off all content when a test fails
* creates root directory under the gradle worker instead of under temp

Update HTTP session caching module tests:

* use new FolderRule to save all artifacts when a test fails
* use nio Paths for filesystem variables

Update acceptance and upgrade tests that use GfshRule:

* use new improved GfshRule and GfshExecutor
* use new FolderRule instead of TemporaryFolder to save all artifacts when a 
test fails
* use --disable-default-server in tests with no clients
* fix flakiness of many tests by using random ports instead of default or 
hardcoded port values
* reformat GfshRule API usage in tests to improve readability and consistency
* add GfshStopper to provide common place to await process stop (stop 
locator/server is async so restarting with same ports is very prone to hitting 
BindExceptions)

Update ProcessUtils:

* extract NativeProcessUtils and make it public for direct use
* rename InternalProcessUtils as ProcessUtilsProvider and move to its own class
* rethrow IOExceptions as UncheckedIOExceptions
* fix flakiness in NativeProcessUtilsTest by moving findAvailablePid into test 
method

Minor changes:

* improve code formatting and readability
* convert from old io File to nio Path APIs as much as possible
* close output streams to fix filesystem issues on Windows

Fixes flaky test tickets:

* DeployJarAcceptanceTest GEODE-9615
* possibly other tests that uses GfshRule

NOTES

The jdk8, jdk17 and windows labels were used to run tests on more environments.

This PR contains mostly test and framework changes. The only product code 
altered is ServerLauncher and several classes in 
org.apache.geode.internal.process, all of which is in geode-core.

> CI Failure: Acceptance Tests fails with exit value 1 from start locator or 
> start server
> ---------------------------------------------------------------------------------------
>
>                 Key: GEODE-9615
>                 URL: https://issues.apache.org/jira/browse/GEODE-9615
>             Project: Geode
>          Issue Type: Bug
>          Components: tests
>            Reporter: Kirk Lund
>            Assignee: Kirk Lund
>            Priority: Major
>
> This failure occurs because the locator or server was stopped and then 
> immediately restarted with the same ports. When Gfsh returns from stop 
> locator or stop server, the stopped process is asynchronously stopping and 
> may continue to hold those ports when the next start command for that process 
> is issued. It then fails with an exit value of 1 instead of the expected 
> value of 0.
> Any test using GfshRule to stop and then immediately start a new process may 
> fail in this way. The underlying exception in the locator or server log is a 
> BindException because the port is still in use by the previous instance of 
> that process which is still in the process of stopping.
> The only way to close this gap is to have the test get the pid for the 
> process being stopped and then await until the process identified by that pid 
> no longer exists.
> {code:java}
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest
>  > onlineStatusCommandShouldSucceedWhenConnected_locator_host_and_port FAILED
>     org.junit.ComparisonFailure: [Exit value from process started by 
> [test-frame: gfsh -e connect --locator=localhost[20608] -e status locator 
> --host=localhost --port=20608]] expected:<[0]> but was:<[1]>
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:137)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:128)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:133)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.executeScriptWithExpectedExitCode(StatusLocatorExitCodeAcceptanceTest.java:255)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.onlineStatusCommandShouldSucceedWhenConnected_locator_host_and_port(StatusLocatorExitCodeAcceptanceTest.java:128)
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest
>  > offlineStatusCommandShouldSucceedWhenConnected_locator_dir FAILED
>     org.junit.ComparisonFailure: [Exit value from process started by 
> [test-frame: gfsh -e connect --locator=localhost[20608] -e status locator 
> --dir=/tmp/junit11722670533134972918/member-controller/locator-chase-obedient-cake]]
>  expected:<[0]> but was:<[1]>
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:137)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:128)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:133)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.executeScriptWithExpectedExitCode(StatusLocatorExitCodeAcceptanceTest.java:255)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.offlineStatusCommandShouldSucceedWhenConnected_locator_dir(StatusLocatorExitCodeAcceptanceTest.java:140)
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest
>  > onlineStatusCommandShouldSucceedWhenConnected_locator_name FAILED
>     org.junit.ComparisonFailure: [Exit value from process started by 
> [test-frame: gfsh -e connect --locator=localhost[20608] -e status locator 
> --name=locator-chase-obedient-cake]] expected:<[0]> but was:<[1]>
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:137)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:128)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:133)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.executeScriptWithExpectedExitCode(StatusLocatorExitCodeAcceptanceTest.java:255)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.onlineStatusCommandShouldSucceedWhenConnected_locator_name(StatusLocatorExitCodeAcceptanceTest.java:116)
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest
>  > onlineStatusCommandShouldSucceedWhenConnected_locator_port FAILED
>     org.junit.ComparisonFailure: [Exit value from process started by 
> [test-frame: gfsh -e connect --locator=localhost[20608] -e status locator 
> --port=20608]] expected:<[0]> but was:<[1]>
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:137)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:128)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:133)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.executeScriptWithExpectedExitCode(StatusLocatorExitCodeAcceptanceTest.java:255)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.onlineStatusCommandShouldSucceedWhenConnected_locator_port(StatusLocatorExitCodeAcceptanceTest.java:122)
>  {code}
> {noformat}
> org.apache.geode.modules.DeployJarAcceptanceTest > classMethod FAILED
>     org.junit.ComparisonFailure: [Exit value from process started by 
> [41497e8cf7689a63: gfsh -e start locator --name=locator -e configure pdx 
> --read-serialized=true -e start server --name=server 
> --locators=localhost[10334]]] expected:<[0]> but was:<[1]>
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:103)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:143)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:152)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:153)
>         at 
> org.apache.geode.modules.DeployJarAcceptanceTest.setup(DeployJarAcceptanceTest.java:62)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to