Remote processor can crash or hang during normal execution. Linux remoteproc framework supports different mechanisms to recover the remote processor and re-establish the RPMsg communication in such case.
Crash reporting: 1) Using debugfs node User can report the crash to the core framework via debugfs node using following command: echo 1 > /sys/kernel/debug/remoteproc/remoteproc0/crash 2) Remoteproc notify to the host about crash state and crash reason via the resource table This is a platform specific method where the remote firmware contains vendor specific resource to update the crash state and the crash reason. Then the remote notifies the crash to the host via mailbox notification. The host then will check this resource on every mbox notification and reports the crash to the core framework if needed. Crash recovery mechanism: There are two mechanisms available to recover the remote processor from the crash. 1) boot recovery, 2) attach on recovery Remoteproc core framework will choose proper mechanism based on the rproc features set by the platform driver. 1) Boot recovery This is the default mechanism to recover the remote processor. In this method core framework will first stop the remote processor, load the firmware again and then starts the remote processor. On AMD-Xilinx platforms this method is supported. The coredump callback in the platform driver isn't implemented so far, but that shouldn't cause the recovery failure. 2) Attach on recovery If RPROC_ATTACH_ON_RECOVERY feature is enabled by the platform driver, then the core framework will choose this method for recovery. On zynqmp platform following is the sequence of events expected during remoteproc crash and attach on recovery: a) rproc attach/detach flow is working, and RPMsg comm is established b) Remote processor (RPU) crashed (crash not reported yet) c) Platform management controller stops and reloads elf on inactive remote processor before reboot d) platform management controller reboots the remote processor e) Remote processor boots again, and detects previous crash (platform specific mechanism to detect the crash) f) Remote processor Reports crash to the Linux (Host) and wait for the recovery. g) Linux performs full detach and reattach to remote processor. h) Normal RPMsg communication is established. It is required to destroy all RPMsg related resource and re-create them during recovery to establish successful RPMsg communication. To achieve this complete rproc_detach followed by rproc_boot calls are needed. That is what this patch-series is fixing along with adding rproc recovery methods for xlnx platform. Change log: Changes in v2: - use rproc_boot instead of rproc_attach - move debug message early in the function - clear attach recovery boot flag during detach and stop ops Tanmay Shah (3): remoteproc: xlnx: enable boot recovery remoteproc: core: full attach detach during recovery remoteproc: xlnx: add crash detection mechanism drivers/remoteproc/remoteproc_core.c | 26 +++++----- drivers/remoteproc/xlnx_r5_remoteproc.c | 64 ++++++++++++++++++++++++- 2 files changed, 78 insertions(+), 12 deletions(-) base-commit: f982fbb1a6ca3553c15763ad9eb2beeae78d3684 -- 2.34.1

