TimServers opened a new issue, #12921:
URL: https://github.com/apache/cloudstack/issues/12921
### problem
##### ISSUE TYPE
* Bug Report
##### COMPONENT NAME
KVM, HA, Out-of-band Management, Redfish
##### CLOUDSTACK VERSION
4.22.x
##### CONFIGURATION
- KVM hypervisor
- Shared primary storage on NFS
- Host HA enabled
- Host under test has Out-of-band management enabled
- OBM driver: `redfish`
- OBM address: `10.9.3.166`
- Host under test:
- Name: `kvm-18-1.servercontrol.com.au`
- UUID: `0fb632c6-c3d4-418a-9fa8-ee21afdc9f0a`
- HA provider: `kvmhaprovider`
- sync.interval = 60
- no ha.tag configured
##### OS / ENVIRONMENT
- CloudStack management servers on Ubuntu
- MySQL 8
- KVM hosts on Ubuntu 24.04 / libvirt
- BMC/iLO exposed via Redfish
- Primary storage: NFS
##### SUMMARY
When a KVM host is powered off unexpectedly, CloudStack detects the host
failure and enters HA fencing for the host. However, fencing fails with a
misleading exception:
`HAFenceException: OBM service is not configured or enabled for this host`
In reality, OBM is configured and enabled, and the host object in CloudStack
confirms this. The actual underlying failure is that the Redfish reset request
returns HTTP 400.
This leaves the host stuck in `HA state = Fencing` and appears to
block/delay HA recovery actions for VMs on that host.
Additionally, manual Redfish testing from the management server confirms
that the Redfish endpoint exists and accepts a valid `POST` reset request with
`{"ResetType":"On"}`, which strongly suggests the issue is not OBM
configuration but the specific fencing action CloudStack is attempting.
##### EXPECTED RESULTS
If the host is powered off and OBM/Redfish is correctly configured,
CloudStack should be able to fence the host successfully or recognize that the
host is already safely powered off.
Expected behavior:
1. CloudStack detects the host failure.
2. CloudStack investigates the host and confirms it is down.
3. CloudStack performs a successful fence action using the configured OBM
provider, or treats an already-powered-off host as successfully fenced.
4. The host exits the fencing workflow cleanly.
5. HA recovery for affected VMs proceeds normally.
The system should not report that OBM is “not configured or enabled” when:
- OBM is enabled on the host object, and
- the real failure is an HTTP 400 from a Redfish action.
##### ACTUAL RESULTS
CloudStack detects the host failure and enters fencing, but fencing fails
repeatedly.
Host details in CloudStack show OBM is enabled and configured:
- `Out-of-band management = true`
- `Out-of-band management driver = redfish`
- `Out-of-band management address = 10.9.3.166`
- `Out-of-band management power state = Off`
- `HA state = Fencing`
Management log shows repeated fencing failures with a misleading error
message:
```text
2026-03-31 03:19:33,574 WARN [o.a.c.k.h.KVMHAProvider] ... OOBM service is
not configured or enabled for this host Host
{"id":6,"name":"kvm-18-1.servercontrol.com.au"...} error is Failed to execute
System power command for host by performing 'POST' request on URL
'https://10.9.3.166/redfish/v1/Systems/1/Actions/ComputerSystem.Reset' and host
address '10.9.3.166'. The expected HTTP status code is '2XX' but it got '400'
2026-03-31 03:19:33,575 WARN [o.a.c.h.t.FenceTask] ...
org.apache.cloudstack.ha.provider.HAFenceException: OBM service is not
configured or enabled for this host kvm-18-1.servercontrol.com.au```
When manually running a curl from the manager, the host does reboot
correctly:
```curl -k -u ADMIN:xxxxxx -H 'Content-Type: application/json' -X POST
https://10.
9.3.166/redfish/v1/Systems/1/Actions/ComputerSystem.Reset -d
'{"ResetType":"On"}'
{"error":{"code":"iLO.0.10.ExtendedInfo","message":"See
@Message.ExtendedInfo for more
information.","@Message.ExtendedInfo":[{"MessageId":"Base.1.18.Success"}]}}```
### versions
ACS 4.22
Ubuntu 24.04
HPE ILO 1.72 Nov 09 2025
### The steps to reproduce the bug
##### STEPS TO REPRODUCE
1. Configure a KVM host with:
- Host HA enabled
- OBM enabled
- OBM driver `redfish`
- valid Redfish endpoint / credentials
2. Confirm the host is healthy and part of a KVM cluster.
3. Power off the host unexpectedly.
4. Observe the management server log and host HA state in the CloudStack UI.
5. Observe that the host enters `Fencing` state and fencing repeatedly fails.
6. Test the Redfish endpoint manually from the management server:
- `GET /redfish/v1/Systems/1`
- `POST /redfish/v1/Systems/1/Actions/ComputerSystem.Reset` with
`{"ResetType":"On"}`
### What to do about it?
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]