gyorfimi opened a new issue, #17152:
URL: https://github.com/apache/pinot/issues/17152
### Current Behavior
When attempting to start the Pinot Server (_version 1.4.0_) in an
environment with no outbound internet access (e.g., a firewalled Docker Swarm
overlay network), the server fails to initialize.
The startup process throws a java.io.UncheckedIOException:
java.net.SocketException: Network is unreachable. This occurs because
NetUtils.getHostAddress() is called, which attempts to open a socket to an
external address (which is now 8.8.8.8) to determine the host's IP, failing in
an offline environment.
### Steps to Reproduce
Configure a Pinot Server (v1.4.0) in an environment that has no access to
the public internet (e.g., a Docker container with networking restricted, or a
fully air-gapped VM).
Attempt to start the server using the standard StartServer command. **Note
that the failure occurs even when explicitly providing the -serverHost
parameter** (which is expected to set `pinot.server.netty.host` and bypass the
network lookup)
The server process will fail during initialization.
Observe the stack trace (see below).
### Error Log / Stack Trace
The following exception is thrown during startup:
```
2025/11/05 10:35:20.073 INFO [StartServerCommand] [main] Executing command:
StartServer -clusterName [my-cluster] -serverHost pinot-server-node1
-serverPort 8098 -serverAdminPort 8097 -serverGrpcPort 8090
-serverMultistageServerPort 0 -serverMultistageRunnerPort 0 -dataDir
/var/lib/pinot/server -segmentDir /var/lib/pinot/segments -zkAddress
zookeeper1:2181
2025/11/05 10:35:20.078 INFO [StartServiceManagerCommand] [main] Executing
command: StartServiceManager -clusterName [my-cluster] -zkAddress
zookeeper1:2181 -port -1 -bootstrapServices []
2025/11/05 10:35:20.079 INFO [StartServiceManagerCommand] [main] Starting a
Pinot [SERVICE_MANAGER] at 0.217s since launch
2025/11/05 10:35:20.081 INFO [StartServiceManagerCommand] [main] Started
Pinot [SERVICE_MANAGER] instance [ServiceManager_pinot-server-node1_-1] at
0.219s since launch
2025/11/05 10:35:20.082 INFO [StartServiceManagerCommand] [Start a Pinot
[SERVER]] Starting a Pinot [SERVER] at 0.22s since launch
2025/11/05 10:35:20.354 ERROR [StartServiceManagerCommand] [Start a Pinot
[SERVER]] Failed to start a Pinot [SERVER] at 0.493 since launch
java.io.UncheckedIOException: java.net.SocketException: Network is
unreachable
at
java.base/sun.nio.ch.DatagramSocketAdaptor.connect(DatagramSocketAdaptor.java:120)
at java.base/java.net.DatagramSocket.connect(DatagramSocket.java:474)
at org.apache.pinot.spi.utils.NetUtils.getHostAddress(NetUtils.java:62)
at
org.apache.pinot.server.starter.helix.BaseServerStarter.init(BaseServerStarter.java:198)
at
org.apache.pinot.tools.service.PinotServiceManager.startServer(PinotServiceManager.java:166)
at
org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:97)
at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.lambda$run$0(StartServiceManagerCommand.java:267)
at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:293)
at
org.apache.pinot.tools.admin.command.StartServiceManagerCommand$1.run(StartServiceManagerCommand.java:267)
Caused by: java.net.SocketException: Network is unreachable
at java.base/sun.nio.ch.Net.connect0(Native Method)
at java.base/sun.nio.ch.Net.connect(Net.java:579)
at
java.base/sun.nio.ch.DatagramChannelImpl.connect(DatagramChannelImpl.java:1249)
at
java.base/sun.nio.ch.DatagramSocketAdaptor.connectInternal(DatagramSocketAdaptor.java:91)
at
java.base/sun.nio.ch.DatagramSocketAdaptor.connect(DatagramSocketAdaptor.java:118)
... 8 more
```
### Analysis & Root Cause
The root cause is in
org.apache.pinot.server.starter.helix.BaseServerStarter.init (line 198 in
version 1.4.0, or [line 205 on
master](https://github.com/apache/pinot/blob/master/pinot-server/src/main/java/org/apache/pinot/server/starter/helix/BaseServerStarter.java#L205)):
```java
_hostname = _serverConf.getProperty(Helix.KEY_OF_SERVER_NETTY_HOST,
_serverConf.getProperty(Helix.SET_INSTANCE_ID_TO_HOSTNAME_KEY,
false) ? NetUtils.getHostnameOrAddress()
: NetUtils.getHostAddress());
```
The issue is that the second argument (the default value) to
_serverConf.getProperty() is evaluated eagerly.
This means that NetUtils.getHostAddress() (or
NetUtils.getHostnameOrAddress()) is always called, even if the user has
explicitly provided a value for `pinot.server.netty.host`
(Helix.KEY_OF_SERVER_NETTY_HOST) in their configuration to avoid this network
lookup.
In an offline environment, this eager call to NetUtils.getHostAddress()
fails with Network is unreachable, preventing the server from starting.
### Proposed Solution
I suggest a two-part solution:
1. Primary Fix (Lazy Evaluation): Refactor the logic in
BaseServerStarter.init to only call NetUtils if Helix.KEY_OF_SERVER_NETTY_HOST
is not already defined in the configuration.
2. Robustness Fix (NetUtils): It would also be beneficial to make
NetUtils.getHostAddress() more robust. Instead of throwing an exception if the
default probe address (e.g., 8.8.8.8) is unreachable, it could log a warning
and fall back to the first available non-loopback IP address.
3. (Addition): The probe address(es) should be customizable.
### Environment
- *Pinot Version*: 1.4.0
- *Java Version*: Amazon Corretto 17
- *Deployment*: Docker Swarm (on an internal overlay network)
- *OS*: Ubuntu 24.04
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]