On Tue, Mar 03, 2026 at 03:35:31PM -0800, Tanmay Shah wrote: > Remote processor can crash or hang during normal execution. Linux > remoteproc framework supports different mechanisms to recover the > remote processor and re-establish the RPMsg communication in such case. > > Crash reporting on AMD-Xilinx platform: > > 1) Using debugfs node > > User can report the crash to the core framework via debugfs node using > following command: > > echo 1 > /sys/kernel/debug/remoteproc/remoteproc0/crash > > 2) Remoteproc notify to the host about crash state and crash reason > via the resource table > > This is a platform specific method where the remote firmware contains > vendor specific resource to update the crash state and the crash > reason. Then the remote notifies the crash to the host via mailbox > notification. The host then will check this resource on every mbox > notification and reports the crash to the core framework if needed. > > Crash recovery mechanism on AMD-Xilnx platform: > > There are two mechanisms available to recover the remote processor from > the crash. 1) boot recovery, 2) attach on recovery > > Remoteproc core framework will choose proper mechanism based on the > rproc features set by the platform driver. > > 1) Boot recovery > > This is the default mechanism to recover the remote processor. > In this method core framework will first stop the remote processor, > load the firmware again and then starts the remote processor. On > AMD-Xilinx platforms this method is supported. The default coredump > method is supported. > > 2) Attach on recovery > > If RPROC_ATTACH_ON_RECOVERY feature is enabled by the platform driver, > then the core framework will choose this method for recovery. > > On versal and later platforms following is the sequence of events expected > during remoteproc crash and attach on recovery: > > a) Remoteproc attach/detach flow is working, and RPMsg comm is established > b) Remote processor (RPU) crashed (crash not reported yet) > c) Platform management controller is instructed to stop and reload elf > on inactive remote processor before reboot (platform specific method) > d) Platform management controller reboots the remote processor > e) Remote processor boots again, and detects previous crash (platform > specific mechanism to detect the crash) > f) Remote processor Reports crash to the Linux (Host) and wait for > the recovery. > g) Linux performs full detach and reattach to remote processor. > h) Normal RPMsg communication is established. > > It is required to destroy all RPMsg related resources and recreate them > during recovery to establish successful RPMsg communication. To achieve > this complete rproc_detach followed by rproc_boot calls are needed. That > is what this patch-series is fixing along with adding rproc recovery > methods for AMD-Xilinx platforms. > > Change log: > > Changes in 3: > - both rproc_attach_recovery() and > rproc_boot_recovery() are called the same way. > - remove unrelated changes > - %s/kick/mailbox notification/ > - %s/core framework/rproc core framework/ > - fold simple function within zynqmp_r5_handle_rsc(). > - remove spurious change > - reset crash state after reporting the crash > - document set and reset of ATTACH_ON_RECOVERY flag > - set recovery_disabled flag to false > - check condition rproc->crash_reason != NULL >
For V3 Bjorn made several comments in relation with QCOM use cases. As such I will let him continue with this patchset. Thanks, Mathieu > Changes in v2: > - use rproc_boot instead of rproc_attach > - move debug message early in the function > - clear attach recovery boot flag during detach and stop ops > Tanmay Shah (2): > remoteproc: core: full attach detach during recovery > remoteproc: xlnx: add crash detection mechanism > > drivers/remoteproc/remoteproc_core.c | 15 +++++- > drivers/remoteproc/xlnx_r5_remoteproc.c | 71 ++++++++++++++++++++++++- > 2 files changed, 84 insertions(+), 2 deletions(-) > > > base-commit: 098493c6dced7b02545e8bd0053ef4099a2b769e > -- > 2.34.1 >

