martin-schulze-e2m opened a new issue, #2734:
URL: https://github.com/apache/apisix-ingress-controller/issues/2734
### Current Behavior
We did a K8s failover test where some nodes of the cluster were shut down to
simulate a partial outage and some (but not all) apisix routes did not function
afterwards.
Further investigation turned up error messages in the ingress-controller
logs and after restarting the pod, the routes started working again.
### Expected Behavior
After the system has settled, the ingress-controllelr should be able to
resume normal operation.
### Error Logs
These are the logs filtered for "error", newest lines on top (the failover
test started around 15:05 UTC):
```
# the next 3 lines repeated each minute until restart
2026-03-19T15:15:57.300Z DEBUG provider apisix/provider.go:306
handled ADC execution errors {"status_record": null, "status_update": {}}
2026-03-19T15:15:57.300Z INFO provider.client client/client.go:182
no GatewayProxy configs provided
2026-03-19T15:15:57.300Z INFO provider.client client/client.go:177
syncing all resources
# repeated in bursts until restart
2026-03-19T15:12:57.575Z ERROR controllers.GatewayProxy
controller/utils.go:1278 failed to list resource {"error": "Index with
name field:serviceRefs does not exist"}
# next two lines repeated until restart
2026-03-19T15:12:57.517Z ERROR controller-runtime
controller/controller.go:347 Reconciler error {"controller":
"gatewayproxy", "controllerGroup": "apisix.apache.org", "controllerKind":
"GatewayProxy", "GatewayProxy": {"name":"apisix-config","namespace":"apisix"},
"namespace": "apisix", "name": "apisix-config", "reconcileID":
"d3b4ea4f-c502-42b2-8b75-da07b9a7ab62", "error": "Index with name
field:ingressClassParametersRef does not exist"}
2026-03-19T15:12:57.517Z ERROR controllers.GatewayProxy
controller/gatewayproxy_controller.go:172 failed to list IngressClassList
{"error": "Index with name field:ingressClassParametersRef does not exist"}
# repeated 10s of times per millisecond until 2026-03-19T15:12:57.518Z
2026-03-19T15:12:57.509Z ERROR controllers.GatewayProxy
controller/utils.go:1278 failed to list resource {"error": "Index with
name field:secretRefs does not exist"}
# repeated 10s of times per millisecond, note the different field name
(sevrice vs secret)
2026-03-19T15:12:57.410Z ERROR controllers.GatewayProxy
controller/utils.go:1278 failed to list resource {"error": "Index with
name field:serviceRefs does not exist"}
# this line is from the apisix pod instead of inngress-controller, repeated
until 15:12:53
2026/03/19 15:12:33 [error] 51#51: *257113185 recv() failed (111: Connection
refused), context: ngx.timer, client: 10.62.14.1, server: 0.0.0.0:9080
# repeated 7x until 15:12:20.830650
E0319 15:11:52.704232 1 leaderelection.go:436] error retrieving
resource lock apisix/apisix-ingress-controller-leader: Get
"https://10.63.0.1:443/apis/coordination.k8s.io/v1/namespaces/apisix/leases/apisix-ingress-controller-leader?timeout=10s":
dial tcp 10.63.0.1:443: connect: connection refused
2026-03-19T15:10:41.225Z INFO setup manager/run.go:283 failed
to get Kubernetes server version {"error": "Get
\"https://10.63.0.1:443/version?timeout=32s\": dial tcp 10.63.0.1:443: connect:
no route to host"}
# message repeated many times for different kinds
2026-03-19T15:09:42.798Z INFO controller-runtime.api-detection
utils/k8s.go:65 group/version not available in cluster {"kind": "Ingress",
"group": "networking.k8s.io", "version": "v1", "groupVersion":
"networking.k8s.io/v1", "error": "Get
\"https://10.63.0.1:443/apis/networking.k8s.io/v1?timeout=32s\": dial tcp
10.63.0.1:443: connect: no route to host"}
2026-03-19T15:09:39.725Z INFO setup manager/run.go:283 failed
to get Kubernetes server version {"error": "Get
\"https://10.63.0.1:443/version?timeout=32s\": dial tcp 10.63.0.1:443: connect:
no route to host"}
2026-03-19T15:09:28.772Z INFO controller-runtime.api-detection
utils/k8s.go:65 group/version not available in cluster {"kind": "ApisixRoute",
"group": "apisix.apache.org", "version": "v2", "groupVersion":
"apisix.apache.org/v2", "error": "Get
\"https://10.63.0.1:443/apis/apisix.apache.org/v2?timeout=32s\": context
deadline exceeded - error from a previous attempt: http2: client connection
lost"}
2026-03-19T15:08:56.770Z INFO controller-runtime.api-detection
utils/k8s.go:65 group/version not available in cluster {"kind": "Ingress",
"group": "networking.k8s.io", "version": "v1", "groupVersion":
"networking.k8s.io/v1", "error": "Get
\"https://10.63.0.1:443/apis/networking.k8s.io/v1?timeout=32s\": context
deadline exceeded"}
Error: leader election lost
Error: leader election lost
E0319 15:08:24.254648 1 leaderelection.go:436] error retrieving
resource lock apisix/apisix-ingress-controller-leader: Get
"https://10.63.0.1:443/apis/coordination.k8s.io/v1/namespaces/apisix/leases/apisix-ingress-controller-leader?timeout=10s":
context deadline exceeded
2026-03-19T15:07:32.132Z DEBUG provider apisix/provider.go:306
handled ADC execution errors
2026-03-19T15:06:33.149Z DEBUG provider apisix/provider.go:306
handled ADC execution errors {"status_record": {}, "status_update": {}}
2026-03-19T15:06:32.131Z ERROR provider apisix/provider.go:282
failed to sync {"error": "failed to sync 1 configs:
GatewayProxy/apisix/apisix-config"}
2026-03-19T15:06:32.131Z DEBUG provider apisix/provider.go:306
handled ADC execution errors {"status_record":
{"GatewayProxy/apisix/apisix-config":{"Errors":[{"Name":"GatewayProxy/apisix/apisix-config","FailedErrors":[{"Err":"socket
hang
up","ServerAddr":"http://10.62.17.169:9180","FailedStatuses":[{"event":{"resourceType":"","type":"","resourceId":"","resourceName":""},"failed_at":"2026-03-19T15:06:32.13Z","synced_at":"0001-01-01T00:00:00Z","reason":"socket
hang up","response":{"status":0,"headers":null}}]}]}]}}, "status_update":
{"ApisixGlobalRule/apisix/opentelemetry":["ServerAddr:
http://10.62.17.169:9180, Err: socket hang up"], ... <redacted many more>}}
2026-03-19T15:06:32.131Z ERROR provider.client client/client.go:210
failed to sync resources {"name": "GatewayProxy/apisix/apisix-config",
"error": "ADC execution errors: [ADC execution error for
GatewayProxy/apisix/apisix-config: [ServerAddr: http://10.62.17.169:9180, Err:
socket hang up]]"}
2026-03-19T15:06:32.131Z ERROR provider.client client/client.go:269
failed to execute adc command {"config":
{"name":"GatewayProxy/apisix/apisix-config","serverAddrs":["http://10.62.17.169:9180"],"tlsVerify":false},
"error": "ADC execution error for GatewayProxy/apisix/apisix-config:
[ServerAddr: http://10.62.17.169:9180, Err: socket hang up]"}
2026-03-19T15:06:32.131Z ERROR provider.executor
client/executor.go:142 failed to run http sync for server {"server":
"http://10.62.17.169:9180", "error": "ServerAddr: http://10.62.17.169:9180,
Err: socket hang up"}
2026-03-19T15:06:32.131Z ERROR provider.executor
client/executor.go:328 ADC Server sync failed {"result":
{"status":"all_failed","total_resources":1,"success_count":0,"failed_count":1,"success":[],"failed":[{"event":{"resourceType":"","type":"","resourceId":"","resourceName":""},"failed_at":"2026-03-19T15:06:32.13Z","synced_at":"0001-01-01T00:00:00Z","reason":"socket
hang up","response":{"status":0,"headers":null}}]}, "error": "ADC Server sync
failed: socket hang up"}
2026-03-19T15:05:32.132Z DEBUG provider apisix/provider.go:306
handled ADC execution errors {"status_record": {}, "status_update": {}}
```
### Steps to Reproduce
We don't know yet what exactly triggered this. This is our best guess so far:
1. install apisix helm chart on k8s
2. shut down some cluster nodes (including some control plane nodes?)
3. ???
### Environment
- APISIX Ingress controller version (run `apisix-ingress-controller version
--long`): apache/apisix-ingress-controller:2.0.1 (docker image)
- Kubernetes cluster version (run `kubectl version`): v1.29.15
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]