Hi Tanmay,
On Mon, Oct 27, 2025 at 09:57:28PM -0700, Tanmay Shah wrote:
Remote processor can crash or hang during normal execution. Linux
remoteproc framework supports different mechanisms to recover the
remote processor and re-establish the RPMsg communication in such case.
Crash reporting:
1) Using debugfs node
User can report the crash to the core framework via debugfs node using
following command:
echo 1 > /sys/kernel/debug/remoteproc/remoteproc0/crash
2) Remoteproc notify to the host about crash state and crash reason
via the resource table
This is a platform specific method where the remote firmware contains
vendor specific resource to update the crash state and the crash
reason. Then the remote notifies the crash to the host via mailbox
notification. The host then will check this resource on every mbox
notification and reports the crash to the core framework if needed.
Crash recovery mechanism:
There are two mechanisms available to recover the remote processor from
the crash. 1) boot recovery, 2) attach on recovery
Remoteproc core framework will choose proper mechanism based on the
rproc features set by the platform driver.
1) Boot recovery
This is the default mechanism to recover the remote processor.
In this method core framework will first stop the remote processor,
load the firmware again and then starts the remote processor. On
AMD-Xilinx platforms this method is supported. The coredump callback in
the platform driver isn't implemented so far, but that shouldn't cause
the recovery failure.
2) Attach on recovery
If RPROC_ATTACH_ON_RECOVERY feature is enabled by the platform driver,
then the core framework will choose this method for recovery.
On zynqmp platform following is the sequence of events expected during
remoteproc crash and attach on recovery:
a) rproc attach/detach flow is working, and RPMsg comm is established
b) Remote processor (RPU) crashed (crash not reported yet)
c) Platform management controller stops and reloads elf on inactive
remote processor before reboot
d) platform management controller reboots the remote processor
e) Remote processor boots again, and detects previous crash (platform
specific mechanism to detect the crash)
f) Remote processor Reports crash to the Linux (Host) and wait for
the recovery.
g) Linux performs full detach and reattach to remote processor.
h) Normal RPMsg communication is established.
It is required to destroy all RPMsg related resource and re-create them
during recovery to establish successful RPMsg communication. To achieve
this complete rproc_detach followed by rproc_attach calls are needed.
Tanmay Shah (3):
remoteproc: xlnx: enable boot recovery
remoteproc: core: full attach detach during recovery
remoteproc: xlnx: add crash detection mechanism
I gave a test on i.MX8QM-MEK, there are failures, 1st test pass, 2nd
test fail.
Without this patch, I not see failures.
root@imx8qmmek:~#
remoteproc remoteproc0: crash detected in imx-rproc: type watchdog
Partition3 reset!
remoteproc remoteproc0: handling crash #1 in imx-rproc
remoteproc remoteproc0: detached remote processor imx-rproc
rproc-virtio rproc-virtio.1.auto: assigned reserved memory node
vdevbuffer@90400000
virtio_rpmsg_bus virtio0: rpmsg host is online
rproc-virtio rproc-virtio.1.auto: registered virtio0 (type 7)
rproc-virtio rproc-virtio.2.auto: assigned reserved memory node
vdevbuffer@90400000
virtio_rpmsg_bus virtio1: rpmsg host is online
rproc-virtio rproc-virtio.2.auto: registered virtio1 (type 7)
remoteproc remoteproc0: remote processor imx-rproc is now attached
virtio_rpmsg_bus virtio1: creating channel rpmsg-openamp-demo-channel
addr 0x1e
remoteproc remoteproc0: crash detected in imx-rproc: type watchdog
Partition3 reset!
remoteproc remoteproc0: handling crash #2 in imx-rproc
rproc-virtio rproc-virtio.1.auto: assigned reserved memory node
vdevbuffer@90400000
virtio_rpmsg_bus virtio4: probe with driver virtio_rpmsg_bus failed
with error -12
rproc-virtio rproc-virtio.1.auto: registered virtio4 (type 7)
rproc-virtio rproc-virtio.2.auto: assigned reserved memory node
vdevbuffer@90400000
virtio_rpmsg_bus virtio5: probe with driver virtio_rpmsg_bus failed
with error -12
rproc-virtio rproc-virtio.2.auto: registered virtio5 (type 7)
rproc-virtio rproc-virtio.5.auto: assigned reserved memory node
vdevbuffer@90400000
virtio_rpmsg_bus virtio6: probe with driver virtio_rpmsg_bus failed
with error -12
rproc-virtio rproc-virtio.5.auto: registered virtio6 (type 7)
rproc-virtio rproc-virtio.6.auto: assigned reserved memory node
vdevbuffer@90400000
virtio_rpmsg_bus virtio7: probe with driver virtio_rpmsg_bus failed
with error -12
rproc-virtio rproc-virtio.6.auto: registered virtio7 (type 7)
remoteproc remoteproc0: remote processor imx-rproc is now attached