[ https://issues.apache.org/jira/browse/GEODE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243444#comment-17243444 ]
Bill Burcham edited comment on GEODE-8730 at 12/3/20, 7:53 PM: --------------------------------------------------------------- >From the IDE I ran the docker Gradle task in geode-assembly to create a fresh >Geode Docker image. Then from /Users/bburcham/Projects/geode/geode-assembly/src/acceptanceTest/resources/org/apache/geode/client/sni I ran "docker-compose up" and from the Docker app dashboard I opened a shell into the running ("geode") container. Once in, I "apt-get update" and "apt-get install net-tools". {noformat} # netstat -lp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.11:37221 0.0.0.0:* LISTEN - udp 0 0 127.0.0.11:42103 0.0.0.0:* - Active UNIX domain sockets (only servers) Proto RefCnt Flags Type State I-Node PID/Program name Path {noformat} and then ran the gfsh startup script: "gfsh run --file=/geode/scripts/geode-starter-2.gfsh". For reference that file contains: {noformat} start locator --name=locator-maeve --connect=false --redirect-output --hostname-for-clients=locator-maeve --properties-file=/geode/config/gemfire.properties --security-properties-file=/geode/config/gfsecurity.properties --J=-Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks start server --name=server-dolores --group=group-dolores --hostname-for-clients=server-dolores --locators=geode[10334] --properties-file=/geode/config/gemfire.properties --security-properties-file=/geode/config/gfsecurity.properties --J=-Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks start server --name=server-clementine --group=group-clementine --hostname-for-clients=server-clementine --server-port=40405 --locators=geode[10334] --properties-file=/geode/config/gemfire.properties --security-properties-file=/geode/config/gfsecurity.properties --J=-Dgemfire.ssl-keystore=/geode/config/server-clementine-keystore.jks connect --locator=geode[10334] --use-ssl=true --security-properties-file=/geode/config/gfsecurity.properties create region --name=region-dolores --group=group-dolores --type=REPLICATE create region --name=region-clementine --group=group-clementine --type=REPLICATE {noformat} {noformat} # netstat -lp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 geode:46867 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:43540 0.0.0.0:* LISTEN 515/java tcp 0 0 0.0.0.0:40404 0.0.0.0:* LISTEN 419/java tcp 0 0 0.0.0.0:40405 0.0.0.0:* LISTEN 515/java tcp 0 0 0.0.0.0:40053 0.0.0.0:* LISTEN 515/java tcp 0 0 0.0.0.0:46649 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:57053 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:55518 0.0.0.0:* LISTEN 515/java tcp 0 0 geode:55486 0.0.0.0:* LISTEN 419/java tcp 0 0 0.0.0.0:7070 0.0.0.0:* LISTEN 256/java tcp 0 0 0.0.0.0:10334 0.0.0.0:* LISTEN 256/java tcp 0 0 0.0.0.0:33953 0.0.0.0:* LISTEN 419/java tcp 0 0 127.0.0.11:37221 0.0.0.0:* LISTEN - tcp 0 0 geode:48715 0.0.0.0:* LISTEN 419/java tcp 0 0 0.0.0.0:1099 0.0.0.0:* LISTEN 256/java udp 0 0 geode:41000 0.0.0.0:* 256/java udp 0 0 geode:41001 0.0.0.0:* 419/java udp 0 0 geode:41002 0.0.0.0:* 515/java udp 0 0 127.0.0.11:42103 0.0.0.0:* - Active UNIX domain sockets (only servers) Proto RefCnt Flags Type State I-Node PID/Program name Path unix 2 [ ACC ] STREAM LISTENING 155159 256/java /tmp/.java_pid256.tmp unix 2 [ ACC ] STREAM LISTENING 158787 419/java /tmp/.java_pid419.tmp unix 2 [ ACC ] STREAM LISTENING 159071 515/java /tmp/.java_pid515.tmp {noformat} Grouping these by PID: locator first, then cache servers: {noformat} tcp 0 0 0.0.0.0:10334 0.0.0.0:* LISTEN 256/java for locator clients tcp 0 0 0.0.0.0:1099 0.0.0.0:* LISTEN 256/java for gfsh tcp 0 0 0.0.0.0:7070 0.0.0.0:* LISTEN 256/java for browser (pulse) tcp 0 0 0.0.0.0:46649 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:46867 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:57053 0.0.0.0:* LISTEN 256/java udp 0 0 geode:41000 0.0.0.0:* 256/java for membership tcp 0 0 0.0.0.0:40404 0.0.0.0:* LISTEN 419/java for client's cache tcp 0 0 0.0.0.0:33953 *** 0.0.0.0:* LISTEN 419/java tcp 0 0 geode:55486 0.0.0.0:* LISTEN 419/java for health tcp 0 0 geode:48715 0.0.0.0:* LISTEN 419/java for peer's cache udp 0 0 geode:41001 0.0.0.0:* 419/java for membership tcp 0 0 0.0.0.0:40405 0.0.0.0:* LISTEN 515/java for client's cache tcp 0 0 0.0.0.0:40053 *** 0.0.0.0:* LISTEN 515/java tcp 0 0 geode:43540 0.0.0.0:* LISTEN 515/java for peer's cache tcp 0 0 geode:55518 0.0.0.0:* LISTEN 515/java for health udp 0 0 geode:41002 0.0.0.0:* 515/java for membership {noformat} I've highlighted with "***" the two bindings that are odd. These are ephemeral ports but are not within the default (configured) port range 41000-61000. I expect these are different each time we run and are the cause of this bug. I searched the logs for those ports and didn't find them. I wonder what those bindings are? A cache server binds these TCP ports: * client's cache (40404, 40405 above) * peer's cache ostensibly in port range (41000-61000) * health monitoring also ostensibly in port range (41000-61000) Of the three unknown TCP port bindings per cache server in the netstat output above we only have categories for two (peer's cache, health monitoring.) What's that third category? In summary we have these two unexplained bindings (one per cache server) and we have the one unexplained TCP binding before Geode even starts (see first netstat above.) A jstack (stack dump) showed that RMI is the culprit for those unexplained cache server ports. "jstack 419 | less" showed: {noformat} "RMI TCP Accept-0" #26 daemon prio=9 os_prio=0 cpu=3.69ms elapsed=5311.36s tid=0x00007fbe28005800 nid=0x1c5 runnable [0x00007fbe38862000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(java.base@11.0.9.1/Native Method) at java.net.AbstractPlainSocketImpl.accept(java.base@11.0.9.1/AbstractPlainSocketImpl.java:458) at java.net.ServerSocket.implAccept(java.base@11.0.9.1/ServerSocket.java:565) at java.net.ServerSocket.accept(java.base@11.0.9.1/ServerSocket.java:533) at sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(jdk.management.agent@11.0.9.1/LocalRMIServerSocketFactory.java:52) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(java.rmi@11.0.9.1/TCPTransport.java:394) at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(java.rmi@11.0.9.1/TCPTransport.java:366) at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:834) {noformat} I'll see if there is a way to lock that down. And I'll see if/how that port 37221 that is bound before Geode starts, changes next time I spin up a container. was (Author: bburcham): >From the IDE I ran the docker Gradle task in geode-assembly to create a fresh >Geode Docker image. Then from /Users/bburcham/Projects/geode/geode-assembly/src/acceptanceTest/resources/org/apache/geode/client/sni I ran "docker-compose up" and from the Docker app dashboard I opened a shell into the running ("geode") container. Once in, I "apt-get update" and "apt-get install net-tools". {noformat} # netstat -lp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.11:37221 0.0.0.0:* LISTEN - udp 0 0 127.0.0.11:42103 0.0.0.0:* - Active UNIX domain sockets (only servers) Proto RefCnt Flags Type State I-Node PID/Program name Path {noformat} and then ran the gfsh startup script: "gfsh run --file=/geode/scripts/geode-starter-2.gfsh". For reference that file contains: {noformat} start locator --name=locator-maeve --connect=false --redirect-output --hostname-for-clients=locator-maeve --properties-file=/geode/config/gemfire.properties --security-properties-file=/geode/config/gfsecurity.properties --J=-Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks start server --name=server-dolores --group=group-dolores --hostname-for-clients=server-dolores --locators=geode[10334] --properties-file=/geode/config/gemfire.properties --security-properties-file=/geode/config/gfsecurity.properties --J=-Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks start server --name=server-clementine --group=group-clementine --hostname-for-clients=server-clementine --server-port=40405 --locators=geode[10334] --properties-file=/geode/config/gemfire.properties --security-properties-file=/geode/config/gfsecurity.properties --J=-Dgemfire.ssl-keystore=/geode/config/server-clementine-keystore.jks connect --locator=geode[10334] --use-ssl=true --security-properties-file=/geode/config/gfsecurity.properties create region --name=region-dolores --group=group-dolores --type=REPLICATE create region --name=region-clementine --group=group-clementine --type=REPLICATE {noformat} {noformat} # netstat -lp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 geode:46867 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:43540 0.0.0.0:* LISTEN 515/java tcp 0 0 0.0.0.0:40404 0.0.0.0:* LISTEN 419/java tcp 0 0 0.0.0.0:40405 0.0.0.0:* LISTEN 515/java tcp 0 0 0.0.0.0:40053 0.0.0.0:* LISTEN 515/java tcp 0 0 0.0.0.0:46649 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:57053 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:55518 0.0.0.0:* LISTEN 515/java tcp 0 0 geode:55486 0.0.0.0:* LISTEN 419/java tcp 0 0 0.0.0.0:7070 0.0.0.0:* LISTEN 256/java tcp 0 0 0.0.0.0:10334 0.0.0.0:* LISTEN 256/java tcp 0 0 0.0.0.0:33953 0.0.0.0:* LISTEN 419/java tcp 0 0 127.0.0.11:37221 0.0.0.0:* LISTEN - tcp 0 0 geode:48715 0.0.0.0:* LISTEN 419/java tcp 0 0 0.0.0.0:1099 0.0.0.0:* LISTEN 256/java udp 0 0 geode:41000 0.0.0.0:* 256/java udp 0 0 geode:41001 0.0.0.0:* 419/java udp 0 0 geode:41002 0.0.0.0:* 515/java udp 0 0 127.0.0.11:42103 0.0.0.0:* - Active UNIX domain sockets (only servers) Proto RefCnt Flags Type State I-Node PID/Program name Path unix 2 [ ACC ] STREAM LISTENING 155159 256/java /tmp/.java_pid256.tmp unix 2 [ ACC ] STREAM LISTENING 158787 419/java /tmp/.java_pid419.tmp unix 2 [ ACC ] STREAM LISTENING 159071 515/java /tmp/.java_pid515.tmp {noformat} Grouping these by PID: locator first, then cache servers: {noformat} tcp 0 0 0.0.0.0:10334 0.0.0.0:* LISTEN 256/java for locator clients tcp 0 0 0.0.0.0:1099 0.0.0.0:* LISTEN 256/java for gfsh tcp 0 0 0.0.0.0:7070 0.0.0.0:* LISTEN 256/java for browser (pulse) tcp 0 0 0.0.0.0:46649 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:46867 0.0.0.0:* LISTEN 256/java tcp 0 0 geode:57053 0.0.0.0:* LISTEN 256/java udp 0 0 geode:41000 0.0.0.0:* 256/java for membership tcp 0 0 0.0.0.0:40404 0.0.0.0:* LISTEN 419/java for client's cache tcp 0 0 0.0.0.0:33953 *** 0.0.0.0:* LISTEN 419/java tcp 0 0 geode:55486 0.0.0.0:* LISTEN 419/java for health tcp 0 0 geode:48715 0.0.0.0:* LISTEN 419/java for peer's cache udp 0 0 geode:41001 0.0.0.0:* 419/java for membership tcp 0 0 0.0.0.0:40405 0.0.0.0:* LISTEN 515/java for client's cache tcp 0 0 0.0.0.0:40053 *** 0.0.0.0:* LISTEN 515/java tcp 0 0 geode:43540 0.0.0.0:* LISTEN 515/java for peer's cache tcp 0 0 geode:55518 0.0.0.0:* LISTEN 515/java for health udp 0 0 geode:41002 0.0.0.0:* 515/java for membership {noformat} I've highlighted with "***" the two bindings that are odd. These are ephemeral ports but are not within the default (configured) port range 41000-61000. I expect these are different each time we run and are the cause of this bug. I searched the logs for those ports and didn't find them. I wonder what those bindings are? A cache server binds these TCP ports: * client's cache (40404, 40405 above) * peer's cache ostensibly in port range (41000-61000) * health monitoring also ostensibly in port range (41000-61000) Of the three unknown TCP port bindings per cache server in the netstat output above we only have categories for two (peer's cache, health monitoring.) What's that third category? In summary we have these two unexplained bindings (one per cache server) and we have the one unexplained TCP binding before Geode even starts (see first netstat above.) A jstack (stack dump) showed that RMI is the culprit for those unexplained cache server ports. I'll see if there is a way to lock that down. And I'll see if/how that port 37221 that is bound before Geode starts, changes next time I spin up a container. > CI failure: DualServerSNIAcceptanceTest fails to start server because port is > in use > ------------------------------------------------------------------------------------ > > Key: GEODE-8730 > URL: https://issues.apache.org/jira/browse/GEODE-8730 > Project: Geode > Issue Type: Bug > Components: membership > Reporter: Darrel Schneider > Assignee: Bill Burcham > Priority: Major > > The run is here: > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/AcceptanceTestOpenJDK8/builds/587] > {noformat} > org.apache.geode.client.sni.DualServerSNIAcceptanceTest > classMethod FAILED > com.palantir.docker.compose.execution.DockerExecutionException: > 'docker-compose exec -T geode gfsh run > --file=/geode/scripts/geode-starter-2.gfsh' returned exit code 1 > The output was: > 1. Executing - start locator --name=locator-maeve --connect=false > --redirect-output --hostname-for-clients=locator-maeve > --properties-file=/geode/config/gemfire.properties > --security-properties-file=******** > --J=-Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks > ........................... > Locator in /locator-maeve on geode[10334] as locator-maeve is currently > online. > Process ID: 47 > Uptime: 16 seconds > Geode Version: 1.14.0-build.0 > Java Version: 11.0.9.1 > Log File: /locator-maeve/locator-maeve.log > JVM Arguments: -DgemfirePropertyFile=/geode/config/gemfire.properties > -DgemfireSecurityPropertyFile=/geode/config/gfsecurity.properties > -Dgemfire.enable-cluster-configuration=true > -Dgemfire.load-cluster-configuration-from-dir=false > -Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks > -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true > -Dsun.rmi.dgc.server.gcInterval=9223372036854775806 > -Dgemfire.OSProcess.DISABLE_REDIRECTION_CONFIGURATION=true > Class-Path: > /geode/lib/geode-core-1.14.0-build.0.jar:/geode/lib/geode-dependencies.jar > 2. Executing - start server --name=server-dolores --group=group-dolores > --hostname-for-clients=server-dolores --locators=geode[10334] > --properties-file=/geode/config/gemfire.properties > --security-properties-file=******** > --J=-Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks > ....... > Server in /server-dolores on geode[40404] as server-dolores is currently > online. > Process ID: 199 > Uptime: 5 seconds > Geode Version: 1.14.0-build.0 > Java Version: 11.0.9.1 > Log File: /server-dolores/server-dolores.log > JVM Arguments: -DgemfirePropertyFile=/geode/config/gemfire.properties > -DgemfireSecurityPropertyFile=/geode/config/gfsecurity.properties > -Dgemfire.start-dev-rest-api=false -Dgemfire.locators=geode[10334] > -Dgemfire.use-cluster-configuration=true -Dgemfire.groups=group-dolores > -Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks > -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true > -Dsun.rmi.dgc.server.gcInterval=9223372036854775806 > Class-Path: > /geode/lib/geode-core-1.14.0-build.0.jar:/geode/lib/geode-dependencies.jar > 3. Executing - start server --name=server-clementine > --group=group-clementine --hostname-for-clients=server-clementine > --server-port=40405 --locators=geode[10334] > --properties-file=/geode/config/gemfire.properties > --security-properties-file=******** > --J=-Dgemfire.ssl-keystore=/geode/config/server-clementine-keystore.jks > ......The Cache Server process terminated unexpectedly with exit status > 1. Please refer to the log file in /server-clementine for full details. > Exception in thread "main" java.lang.RuntimeException: An IO error > occurred while starting a Server in /server-clementine on geode[40405]: > Network is unreachable; port (40405) is not available on localhost. > at > org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:852) > at > org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:737) > at > org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:256) > Caused by: java.net.BindException: Network is unreachable; port (40405) > is not available on localhost. > at > org.apache.geode.distributed.AbstractLauncher.assertPortAvailable(AbstractLauncher.java:142) > at > org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:794) > ... 2 more > ************************* Execution Summary *********************** > Script file: /geode/scripts/geode-starter-2.gfsh > Command-1 : start locator --name=locator-maeve --connect=false > --redirect-output --hostname-for-clients=locator-maeve > --properties-file=/geode/config/gemfire.properties > --security-properties-file=/geode/config/gfsecurity.properties > --J=-Dgemfire.ssl-keystore=/geode/config/locator-maeve-keystore.jks > Status : PASSED > Command-2 : start server --name=server-dolores --group=group-dolores > --hostname-for-clients=server-dolores --locators=geode[10334] > --properties-file=/geode/config/gemfire.properties > --security-properties-file=/geode/config/gfsecurity.properties > --J=-Dgemfire.ssl-keystore=/geode/config/server-dolores-keystore.jks > Status : PASSED > Command-3 : start server --name=server-clementine > --group=group-clementine --hostname-for-clients=server-clementine > --server-port=40405 --locators=geode[10334] > --properties-file=/geode/config/gemfire.properties > --security-properties-file=/geode/config/gfsecurity.properties > --J=-Dgemfire.ssl-keystore=/geode/config/server-clementine-keystore.jks > Status : FAILED > at > com.palantir.docker.compose.execution.Command.lambda$throwingOnError$12(Command.java:60) > at > com.palantir.docker.compose.execution.Command.execute(Command.java:50) > at > com.palantir.docker.compose.execution.DefaultDockerCompose.exec(DefaultDockerCompose.java:122) > at > com.palantir.docker.compose.execution.DelegatingDockerCompose.exec(DelegatingDockerCompose.java:86) > at > com.palantir.docker.compose.execution.RetryingDockerCompose.exec(RetryingDockerCompose.java:22) > at > com.palantir.docker.compose.DockerComposeRule.exec(DockerComposeRule.java:171) > at > org.apache.geode.client.sni.DualServerSNIAcceptanceTest.beforeClass(DualServerSNIAcceptanceTest.java:77) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)