martin-schulze-e2m opened a new issue, #2734:
URL: https://github.com/apache/apisix-ingress-controller/issues/2734

   ### Current Behavior
   
   We did a K8s failover test where some nodes of the cluster were shut down to 
simulate a partial outage and some (but not all) apisix routes did not function 
afterwards. 
   Further investigation turned up error messages in the ingress-controller 
logs and after restarting the pod, the routes started working again.
   
   ### Expected Behavior
   
   After the system has settled, the ingress-controllelr should be able to 
resume normal operation.
   
   ### Error Logs
   
   These are the logs filtered for "error", newest lines on top (the failover 
test started around 15:05 UTC):
   ```
   # the next 3 lines repeated each minute until restart
   2026-03-19T15:15:57.300Z     DEBUG   provider        apisix/provider.go:306  
handled ADC execution errors    {"status_record": null, "status_update": {}}
   2026-03-19T15:15:57.300Z     INFO    provider.client client/client.go:182    
no GatewayProxy configs provided
   2026-03-19T15:15:57.300Z     INFO    provider.client client/client.go:177    
syncing all resources
   # repeated in bursts until restart
   2026-03-19T15:12:57.575Z     ERROR   controllers.GatewayProxy        
controller/utils.go:1278        failed to list resource {"error": "Index with 
name field:serviceRefs does not exist"}
   # next two lines repeated until restart
   2026-03-19T15:12:57.517Z     ERROR   controller-runtime      
controller/controller.go:347    Reconciler error        {"controller": 
"gatewayproxy", "controllerGroup": "apisix.apache.org", "controllerKind": 
"GatewayProxy", "GatewayProxy": {"name":"apisix-config","namespace":"apisix"}, 
"namespace": "apisix", "name": "apisix-config", "reconcileID": 
"d3b4ea4f-c502-42b2-8b75-da07b9a7ab62", "error": "Index with name 
field:ingressClassParametersRef does not exist"}
   2026-03-19T15:12:57.517Z     ERROR   controllers.GatewayProxy        
controller/gatewayproxy_controller.go:172       failed to list IngressClassList 
{"error": "Index with name field:ingressClassParametersRef does not exist"}
   # repeated 10s of times per millisecond until 2026-03-19T15:12:57.518Z
   2026-03-19T15:12:57.509Z     ERROR   controllers.GatewayProxy        
controller/utils.go:1278        failed to list resource {"error": "Index with 
name field:secretRefs does not exist"}
   # repeated 10s of times per millisecond, note the different field name 
(sevrice vs secret)
   2026-03-19T15:12:57.410Z     ERROR   controllers.GatewayProxy        
controller/utils.go:1278        failed to list resource {"error": "Index with 
name field:serviceRefs does not exist"}
   # this line is from the apisix pod instead of inngress-controller, repeated 
until 15:12:53
   2026/03/19 15:12:33 [error] 51#51: *257113185 recv() failed (111: Connection 
refused), context: ngx.timer, client: 10.62.14.1, server: 0.0.0.0:9080
   # repeated 7x until 15:12:20.830650
   E0319 15:11:52.704232       1 leaderelection.go:436] error retrieving 
resource lock apisix/apisix-ingress-controller-leader: Get 
"https://10.63.0.1:443/apis/coordination.k8s.io/v1/namespaces/apisix/leases/apisix-ingress-controller-leader?timeout=10s":
 dial tcp 10.63.0.1:443: connect: connection refused
   2026-03-19T15:10:41.225Z     INFO    setup   manager/run.go:283      failed 
to get Kubernetes server version {"error": "Get 
\"https://10.63.0.1:443/version?timeout=32s\": dial tcp 10.63.0.1:443: connect: 
no route to host"}
   # message repeated many times for different kinds
   2026-03-19T15:09:42.798Z     INFO    controller-runtime.api-detection        
utils/k8s.go:65 group/version not available in cluster  {"kind": "Ingress", 
"group": "networking.k8s.io", "version": "v1", "groupVersion": 
"networking.k8s.io/v1", "error": "Get 
\"https://10.63.0.1:443/apis/networking.k8s.io/v1?timeout=32s\": dial tcp 
10.63.0.1:443: connect: no route to host"}
   2026-03-19T15:09:39.725Z     INFO    setup   manager/run.go:283      failed 
to get Kubernetes server version {"error": "Get 
\"https://10.63.0.1:443/version?timeout=32s\": dial tcp 10.63.0.1:443: connect: 
no route to host"}
   2026-03-19T15:09:28.772Z     INFO    controller-runtime.api-detection        
utils/k8s.go:65 group/version not available in cluster  {"kind": "ApisixRoute", 
"group": "apisix.apache.org", "version": "v2", "groupVersion": 
"apisix.apache.org/v2", "error": "Get 
\"https://10.63.0.1:443/apis/apisix.apache.org/v2?timeout=32s\": context 
deadline exceeded - error from a previous attempt: http2: client connection 
lost"}
   2026-03-19T15:08:56.770Z     INFO    controller-runtime.api-detection        
utils/k8s.go:65 group/version not available in cluster  {"kind": "Ingress", 
"group": "networking.k8s.io", "version": "v1", "groupVersion": 
"networking.k8s.io/v1", "error": "Get 
\"https://10.63.0.1:443/apis/networking.k8s.io/v1?timeout=32s\": context 
deadline exceeded"}
   Error: leader election lost
   Error: leader election lost
   E0319 15:08:24.254648       1 leaderelection.go:436] error retrieving 
resource lock apisix/apisix-ingress-controller-leader: Get 
"https://10.63.0.1:443/apis/coordination.k8s.io/v1/namespaces/apisix/leases/apisix-ingress-controller-leader?timeout=10s":
 context deadline exceeded
   2026-03-19T15:07:32.132Z     DEBUG   provider        apisix/provider.go:306  
handled ADC execution errors
   2026-03-19T15:06:33.149Z     DEBUG   provider        apisix/provider.go:306  
handled ADC execution errors    {"status_record": {}, "status_update": {}}
   2026-03-19T15:06:32.131Z     ERROR   provider        apisix/provider.go:282  
failed to sync  {"error": "failed to sync 1 configs: 
GatewayProxy/apisix/apisix-config"}
   2026-03-19T15:06:32.131Z     DEBUG   provider        apisix/provider.go:306  
handled ADC execution errors    {"status_record": 
{"GatewayProxy/apisix/apisix-config":{"Errors":[{"Name":"GatewayProxy/apisix/apisix-config","FailedErrors":[{"Err":"socket
 hang 
up","ServerAddr":"http://10.62.17.169:9180","FailedStatuses":[{"event":{"resourceType":"","type":"","resourceId":"","resourceName":""},"failed_at":"2026-03-19T15:06:32.13Z","synced_at":"0001-01-01T00:00:00Z","reason":"socket
 hang up","response":{"status":0,"headers":null}}]}]}]}}, "status_update": 
{"ApisixGlobalRule/apisix/opentelemetry":["ServerAddr: 
http://10.62.17.169:9180, Err: socket hang up"], ... <redacted many more>}}
   2026-03-19T15:06:32.131Z     ERROR   provider.client client/client.go:210    
failed to sync resources        {"name": "GatewayProxy/apisix/apisix-config", 
"error": "ADC execution errors: [ADC execution error for 
GatewayProxy/apisix/apisix-config: [ServerAddr: http://10.62.17.169:9180, Err: 
socket hang up]]"}
   2026-03-19T15:06:32.131Z     ERROR   provider.client client/client.go:269    
failed to execute adc command   {"config": 
{"name":"GatewayProxy/apisix/apisix-config","serverAddrs":["http://10.62.17.169:9180"],"tlsVerify":false},
 "error": "ADC execution error for GatewayProxy/apisix/apisix-config: 
[ServerAddr: http://10.62.17.169:9180, Err: socket hang up]"}
   2026-03-19T15:06:32.131Z     ERROR   provider.executor       
client/executor.go:142  failed to run http sync for server      {"server": 
"http://10.62.17.169:9180";, "error": "ServerAddr: http://10.62.17.169:9180, 
Err: socket hang up"}
   2026-03-19T15:06:32.131Z     ERROR   provider.executor       
client/executor.go:328  ADC Server sync failed  {"result": 
{"status":"all_failed","total_resources":1,"success_count":0,"failed_count":1,"success":[],"failed":[{"event":{"resourceType":"","type":"","resourceId":"","resourceName":""},"failed_at":"2026-03-19T15:06:32.13Z","synced_at":"0001-01-01T00:00:00Z","reason":"socket
 hang up","response":{"status":0,"headers":null}}]}, "error": "ADC Server sync 
failed: socket hang up"}
   2026-03-19T15:05:32.132Z     DEBUG   provider        apisix/provider.go:306  
handled ADC execution errors    {"status_record": {}, "status_update": {}}
   ```
   
   ### Steps to Reproduce
   
   We don't know yet what exactly triggered this. This is our best guess so far:
   
   1. install apisix helm chart on k8s
   2. shut down some cluster nodes (including some control plane nodes?)
   3. ???
   
   ### Environment
   
   - APISIX Ingress controller version (run `apisix-ingress-controller version 
--long`): apache/apisix-ingress-controller:2.0.1 (docker image)
   - Kubernetes cluster version (run `kubectl version`): v1.29.15
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to