janhoy commented on PR #712: URL: https://github.com/apache/solr-operator/pull/712#issuecomment-4178687696
Insight from Claude on the deadlock, not sure if that helps: The operator waits on `status.capacity` reaching the new size before issuing the rolling restart, but for drivers that don't implement `ExpandInUsePersistentVolumes`, the kubelet only updates `status.capacity` *after* expanding the filesystem on pod mount — which requires a pod restart first. Deadlock. ## Fix Pivot the wait condition from `status.capacity` to `status.conditions`. After patching PVCs, wait until each PVC is in one of: - `status.capacity >= requested` → online expansion complete, no restart needed - `status.conditions` contains `FileSystemResizePending=True` → block device resized, pod restart needed to expand filesystem. Then issue the rolling restart, and verify `status.capacity` as a post-condition rather than a gate. ## Which providers require this - **AWS EBS CSI, GCE PD CSI, Azure Disk CSI**: support `ExpandInUsePersistentVolumes`, no pod restart needed, won't hit this path - **OpenEBS LocalPV, local-path-provisioner, Longhorn (some modes), NFS-backed**: require pod restart, set `FileSystemResizePending` - **Rook/Ceph RBD**: online. Ceph FS: offline. Rough rule: cloud-managed block storage → online. Local/file/distributed → offline. The operator needs to handle both. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
