[ https://issues.apache.org/jira/browse/GEODE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243444#comment-17243444 ]
Bill Burcham edited comment on GEODE-8730 at 12/3/20, 6:59 PM: --------------------------------------------------------------- >From the IDE I ran the docker Gradle task in geode-assembly to create a fresh >Geode Docker image. Then from /Users/bburcham/Projects/geode/geode-assembly/src/acceptanceTest/resources/org/apache/geode/client/sni I ran "docker-compose up" and from the Docker app dashboard I opened a shell into the running ("geode") container. Once in, I "apt-get update" and "apt-get install net-tools". {noformat} # netstat -lp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.11:37221 0.0.0.0:* LISTEN - udp 0 0 127.0.0.11:42103 0.0.0.0:* - Active UNIX domain sockets (only servers) Proto RefCnt Flags Type State I-Node PID/Program name Path {noformat} and then ran the gfsh startup script: "gfsh run --file=/geode/scripts/geode-starter-2.gfsh". For reference that file contains: {noformat} start locator --name=locator-maeve --connect=false --redirect-output --hostname-for-clients=locator-maeve --properties-file=/geode/config/gemfire.properties --security-properties-file=/geode/config/gfsecurity.properties --J=-Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks start server --name=server-dolores --group=group-dolores --hostname-for-clients=server-dolores --locators=geode[10334] --properties-file=/geode/config/gemfire.properties --security-properties-file=/geode/config/gfsecurity.properties --J=-Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks start server --name=server-clementine --group=group-clementine --hostname-for-clients=server-clementine --server-port=40405 --locators=geode[10334] --properties-file=/geode/config/gemfire.properties --security-properties-file=/geode/config/gfsecurity.properties --J=-Dgemfire.ssl-keystore=/geode/config/server-clementine-keystore.jks connect --locator=geode[10334] --use-ssl=true --security-properties-file=/geode/config/gfsecurity.properties create region --name=region-dolores --group=group-dolores --type=REPLICATE create region --name=region-clementine --group=group-clementine --type=REPLICATE {noformat} {noformat} # netstat -lp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 geode:46867 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:43540 0.0.0.0:* LISTEN 515/java tcp 0 0 0.0.0.0:40404 0.0.0.0:* LISTEN 419/java tcp 0 0 0.0.0.0:40405 0.0.0.0:* LISTEN 515/java tcp 0 0 0.0.0.0:40053 0.0.0.0:* LISTEN 515/java tcp 0 0 0.0.0.0:46649 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:57053 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:55518 0.0.0.0:* LISTEN 515/java tcp 0 0 geode:55486 0.0.0.0:* LISTEN 419/java tcp 0 0 0.0.0.0:7070 0.0.0.0:* LISTEN 256/java tcp 0 0 0.0.0.0:10334 0.0.0.0:* LISTEN 256/java tcp 0 0 0.0.0.0:33953 0.0.0.0:* LISTEN 419/java tcp 0 0 127.0.0.11:37221 0.0.0.0:* LISTEN - tcp 0 0 geode:48715 0.0.0.0:* LISTEN 419/java tcp 0 0 0.0.0.0:1099 0.0.0.0:* LISTEN 256/java udp 0 0 geode:41000 0.0.0.0:* 256/java udp 0 0 geode:41001 0.0.0.0:* 419/java udp 0 0 geode:41002 0.0.0.0:* 515/java udp 0 0 127.0.0.11:42103 0.0.0.0:* - Active UNIX domain sockets (only servers) Proto RefCnt Flags Type State I-Node PID/Program name Path unix 2 [ ACC ] STREAM LISTENING 155159 256/java /tmp/.java_pid256.tmp unix 2 [ ACC ] STREAM LISTENING 158787 419/java /tmp/.java_pid419.tmp unix 2 [ ACC ] STREAM LISTENING 159071 515/java /tmp/.java_pid515.tmp {noformat} Grouping these by PID: locator first, then cache servers: {noformat} tcp 0 0 0.0.0.0:10334 0.0.0.0:* LISTEN 256/java for locator clients tcp 0 0 0.0.0.0:1099 0.0.0.0:* LISTEN 256/java for gfsh tcp 0 0 0.0.0.0:7070 0.0.0.0:* LISTEN 256/java for browser (pulse) tcp 0 0 0.0.0.0:46649 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:46867 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:57053 0.0.0.0:* LISTEN 256/java udp 0 0 geode:41000 0.0.0.0:* 256/java for membership tcp 0 0 0.0.0.0:40404 0.0.0.0:* LISTEN 419/java for client's cache tcp 0 0 0.0.0.0:33953 *** 0.0.0.0:* LISTEN 419/java tcp 0 0 geode:55486 0.0.0.0:* LISTEN 419/java for health tcp 0 0 geode:48715 0.0.0.0:* LISTEN 419/java for peer's cache udp 0 0 geode:41001 0.0.0.0:* 419/java for membership tcp 0 0 0.0.0.0:40405 0.0.0.0:* LISTEN 515/java for client's cache tcp 0 0 0.0.0.0:40053 *** 0.0.0.0:* LISTEN 515/java tcp 0 0 geode:43540 0.0.0.0:* LISTEN 515/java for peer's cache tcp 0 0 geode:55518 0.0.0.0:* LISTEN 515/java for health udp 0 0 geode:41002 0.0.0.0:* 515/java for membership {noformat} I've highlighted with "***" the two bindings that are odd. These are ephemeral ports but are not within the default (configured) port range 41000-61000. I expect these are different each time we run and are the cause of this bug. I searched the logs for those ports and didn't find them. I wonder what those bindings are? A cache server binds these TCP ports: * client's cache (40404, 40405 above) * peer's cache ostensibly in port range (41000-61000) * health monitoring also ostensibly in port range (41000-61000) Of the three unknown TCP port bindings per cache server in the netstat output above we only have categories for two (peer's cache, health monitoring.) What's that third category? was (Author: bburcham): >From the IDE I ran the docker Gradle task in geode-assembly to create a fresh >Geode Docker image. Then from /Users/bburcham/Projects/geode/geode-assembly/src/acceptanceTest/resources/org/apache/geode/client/sni I ran "docker-compose up" and from the Docker app dashboard I opened a shell into the running ("geode") container. Once in, I "apt-get update" and "apt-get install net-tools". {noformat} # netstat -lp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.11:37221 0.0.0.0:* LISTEN - udp 0 0 127.0.0.11:42103 0.0.0.0:* - Active UNIX domain sockets (only servers) Proto RefCnt Flags Type State I-Node PID/Program name Path {noformat} and then ran the gfsh startup script: "gfsh run --file=/geode/scripts/geode-starter-2.gfsh". For reference that file contains: {noformat} start locator --name=locator-maeve --connect=false --redirect-output --hostname-for-clients=locator-maeve --properties-file=/geode/config/gemfire.properties --security-properties-file=/geode/config/gfsecurity.properties --J=-Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks start server --name=server-dolores --group=group-dolores --hostname-for-clients=server-dolores --locators=geode[10334] --properties-file=/geode/config/gemfire.properties --security-properties-file=/geode/config/gfsecurity.properties --J=-Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks start server --name=server-clementine --group=group-clementine --hostname-for-clients=server-clementine --server-port=40405 --locators=geode[10334] --properties-file=/geode/config/gemfire.properties --security-properties-file=/geode/config/gfsecurity.properties --J=-Dgemfire.ssl-keystore=/geode/config/server-clementine-keystore.jks connect --locator=geode[10334] --use-ssl=true --security-properties-file=/geode/config/gfsecurity.properties create region --name=region-dolores --group=group-dolores --type=REPLICATE create region --name=region-clementine --group=group-clementine --type=REPLICATE {noformat} {noformat} # netstat -lp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 geode:46867 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:43540 0.0.0.0:* LISTEN 515/java tcp 0 0 0.0.0.0:40404 0.0.0.0:* LISTEN 419/java tcp 0 0 0.0.0.0:40405 0.0.0.0:* LISTEN 515/java tcp 0 0 0.0.0.0:40053 0.0.0.0:* LISTEN 515/java tcp 0 0 0.0.0.0:46649 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:57053 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:55518 0.0.0.0:* LISTEN 515/java tcp 0 0 geode:55486 0.0.0.0:* LISTEN 419/java tcp 0 0 0.0.0.0:7070 0.0.0.0:* LISTEN 256/java tcp 0 0 0.0.0.0:10334 0.0.0.0:* LISTEN 256/java tcp 0 0 0.0.0.0:33953 0.0.0.0:* LISTEN 419/java tcp 0 0 127.0.0.11:37221 0.0.0.0:* LISTEN - tcp 0 0 geode:48715 0.0.0.0:* LISTEN 419/java tcp 0 0 0.0.0.0:1099 0.0.0.0:* LISTEN 256/java udp 0 0 geode:41000 0.0.0.0:* 256/java udp 0 0 geode:41001 0.0.0.0:* 419/java udp 0 0 geode:41002 0.0.0.0:* 515/java udp 0 0 127.0.0.11:42103 0.0.0.0:* - Active UNIX domain sockets (only servers) Proto RefCnt Flags Type State I-Node PID/Program name Path unix 2 [ ACC ] STREAM LISTENING 155159 256/java /tmp/.java_pid256.tmp unix 2 [ ACC ] STREAM LISTENING 158787 419/java /tmp/.java_pid419.tmp unix 2 [ ACC ] STREAM LISTENING 159071 515/java /tmp/.java_pid515.tmp {noformat} Grouping these by PID: locator first, then cache servers: {noformat} tcp 0 0 0.0.0.0:10334 0.0.0.0:* LISTEN 256/java for locator clients tcp 0 0 0.0.0.0:1099 0.0.0.0:* LISTEN 256/java for gfsh tcp 0 0 0.0.0.0:7070 0.0.0.0:* LISTEN 256/java for browser (pulse) tcp 0 0 0.0.0.0:46649 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:46867 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:57053 0.0.0.0:* LISTEN 256/java udp 0 0 geode:41000 0.0.0.0:* 256/java for membership tcp 0 0 0.0.0.0:40404 0.0.0.0:* LISTEN 419/java for client's cache tcp 0 0 0.0.0.0:33953 *** 0.0.0.0:* LISTEN 419/java tcp 0 0 geode:55486 0.0.0.0:* LISTEN 419/java tcp 0 0 geode:48715 0.0.0.0:* LISTEN 419/java udp 0 0 geode:41001 0.0.0.0:* 419/java for membership tcp 0 0 0.0.0.0:40405 0.0.0.0:* LISTEN 515/java for client's cache tcp 0 0 0.0.0.0:40053 *** 0.0.0.0:* LISTEN 515/java tcp 0 0 geode:43540 0.0.0.0:* LISTEN 515/java tcp 0 0 geode:55518 0.0.0.0:* LISTEN 515/java udp 0 0 geode:41002 0.0.0.0:* 515/java for membership {noformat} I've highlighted with "***" the two bindings that are odd. These are ephemeral ports but are not within the default (configured) port range 41000-61000. I expect these are different each time we run and are the cause of this bug. I searched the logs for those ports and didn't find them. I wonder what those bindings are? A cache server binds these TCP ports: * client's cache (40404, 40405 above) * peer's cache ostensibly in port range (41000-61000) * health monitoring also ostensibly in port range (41000-61000) Of the three unknown TCP port bindings per cache server in the netstat output above we only have categories for two (peer's cache, health monitoring.) What's that third category? > CI failure: DualServerSNIAcceptanceTest fails to start server because port is > in use > ------------------------------------------------------------------------------------ > > Key: GEODE-8730 > URL: https://issues.apache.org/jira/browse/GEODE-8730 > Project: Geode > Issue Type: Bug > Components: membership > Reporter: Darrel Schneider > Assignee: Bill Burcham > Priority: Major > > The run is here: > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/AcceptanceTestOpenJDK8/builds/587] > {noformat} > org.apache.geode.client.sni.DualServerSNIAcceptanceTest > classMethod FAILED > com.palantir.docker.compose.execution.DockerExecutionException: > 'docker-compose exec -T geode gfsh run > --file=/geode/scripts/geode-starter-2.gfsh' returned exit code 1 > The output was: > 1. Executing - start locator --name=locator-maeve --connect=false > --redirect-output --hostname-for-clients=locator-maeve > --properties-file=/geode/config/gemfire.properties > --security-properties-file=******** > --J=-Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks > ........................... > Locator in /locator-maeve on geode[10334] as locator-maeve is currently > online. > Process ID: 47 > Uptime: 16 seconds > Geode Version: 1.14.0-build.0 > Java Version: 11.0.9.1 > Log File: /locator-maeve/locator-maeve.log > JVM Arguments: -DgemfirePropertyFile=/geode/config/gemfire.properties > -DgemfireSecurityPropertyFile=/geode/config/gfsecurity.properties > -Dgemfire.enable-cluster-configuration=true > -Dgemfire.load-cluster-configuration-from-dir=false > -Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks > -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true > -Dsun.rmi.dgc.server.gcInterval=9223372036854775806 > -Dgemfire.OSProcess.DISABLE_REDIRECTION_CONFIGURATION=true > Class-Path: > /geode/lib/geode-core-1.14.0-build.0.jar:/geode/lib/geode-dependencies.jar > 2. Executing - start server --name=server-dolores --group=group-dolores > --hostname-for-clients=server-dolores --locators=geode[10334] > --properties-file=/geode/config/gemfire.properties > --security-properties-file=******** > --J=-Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks > ....... > Server in /server-dolores on geode[40404] as server-dolores is currently > online. > Process ID: 199 > Uptime: 5 seconds > Geode Version: 1.14.0-build.0 > Java Version: 11.0.9.1 > Log File: /server-dolores/server-dolores.log > JVM Arguments: -DgemfirePropertyFile=/geode/config/gemfire.properties > -DgemfireSecurityPropertyFile=/geode/config/gfsecurity.properties > -Dgemfire.start-dev-rest-api=false -Dgemfire.locators=geode[10334] > -Dgemfire.use-cluster-configuration=true -Dgemfire.groups=group-dolores > -Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks > -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true > -Dsun.rmi.dgc.server.gcInterval=9223372036854775806 > Class-Path: > /geode/lib/geode-core-1.14.0-build.0.jar:/geode/lib/geode-dependencies.jar > 3. Executing - start server --name=server-clementine > --group=group-clementine --hostname-for-clients=server-clementine > --server-port=40405 --locators=geode[10334] > --properties-file=/geode/config/gemfire.properties > --security-properties-file=******** > --J=-Dgemfire.ssl-keystore=/geode/config/server-clementine-keystore.jks > ......The Cache Server process terminated unexpectedly with exit status > 1. Please refer to the log file in /server-clementine for full details. > Exception in thread "main" java.lang.RuntimeException: An IO error > occurred while starting a Server in /server-clementine on geode[40405]: > Network is unreachable; port (40405) is not available on localhost. > at > org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:852) > at > org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:737) > at > org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:256) > Caused by: java.net.BindException: Network is unreachable; port (40405) > is not available on localhost. > at > org.apache.geode.distributed.AbstractLauncher.assertPortAvailable(AbstractLauncher.java:142) > at > org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:794) > ... 2 more > ************************* Execution Summary *********************** > Script file: /geode/scripts/geode-starter-2.gfsh > Command-1 : start locator --name=locator-maeve --connect=false > --redirect-output --hostname-for-clients=locator-maeve > --properties-file=/geode/config/gemfire.properties > --security-properties-file=/geode/config/gfsecurity.properties > --J=-Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks > Status : PASSED > Command-2 : start server --name=server-dolores --group=group-dolores > --hostname-for-clients=server-dolores --locators=geode[10334] > --properties-file=/geode/config/gemfire.properties > --security-properties-file=/geode/config/gfsecurity.properties > --J=-Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks > Status : PASSED > Command-3 : start server --name=server-clementine > --group=group-clementine --hostname-for-clients=server-clementine > --server-port=40405 --locators=geode[10334] > --properties-file=/geode/config/gemfire.properties > --security-properties-file=/geode/config/gfsecurity.properties > --J=-Dgemfire.ssl-keystore=/geode/config/server-clementine-keystore.jks > Status : FAILED > at > com.palantir.docker.compose.execution.Command.lambda$throwingOnError$12(Command.java:60) > at > com.palantir.docker.compose.execution.Command.execute(Command.java:50) > at > com.palantir.docker.compose.execution.DefaultDockerCompose.exec(DefaultDockerCompose.java:122) > at > com.palantir.docker.compose.execution.DelegatingDockerCompose.exec(DelegatingDockerCompose.java:86) > at > com.palantir.docker.compose.execution.RetryingDockerCompose.exec(RetryingDockerCompose.java:22) > at > com.palantir.docker.compose.DockerComposeRule.exec(DockerComposeRule.java:171) > at > org.apache.geode.client.sni.DualServerSNIAcceptanceTest.beforeClass(DualServerSNIAcceptanceTest.java:77) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)