janhoy commented on PR #712:
URL: https://github.com/apache/solr-operator/pull/712#issuecomment-4178687696

   Insight from Claude on the deadlock, not sure if that helps:
   
   The operator waits on `status.capacity` reaching the new size before issuing 
the rolling restart, but for drivers that don't implement 
`ExpandInUsePersistentVolumes`, the kubelet only updates `status.capacity` 
*after* expanding the filesystem on pod mount — which requires a pod restart 
first. Deadlock.
   
   ## Fix
   
   Pivot the wait condition from `status.capacity` to `status.conditions`. 
After patching PVCs, wait until each PVC is in one of:
   - `status.capacity >= requested` → online expansion complete, no restart 
needed
   - `status.conditions` contains `FileSystemResizePending=True` → block device 
resized, pod restart needed to expand filesystem. Then issue the rolling 
restart, and verify `status.capacity` as a post-condition rather than a gate.
   
   ## Which providers require this
   
   - **AWS EBS CSI, GCE PD CSI, Azure Disk CSI**: support 
`ExpandInUsePersistentVolumes`, no pod restart needed, won't hit this path
   - **OpenEBS LocalPV, local-path-provisioner, Longhorn (some modes), 
NFS-backed**: require pod restart, set `FileSystemResizePending`
   - **Rook/Ceph RBD**: online. Ceph FS: offline.
   
   Rough rule: cloud-managed block storage → online. Local/file/distributed → 
offline. The operator needs to handle both.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to