On Thu, Mar 19, 2026 at 12:36:15PM +0800, Jingyi Wang wrote: > > > On 3/13/2026 10:37 AM, Dmitry Baryshkov wrote: > > On Wed, Mar 11, 2026 at 01:39:50AM -0700, Bartosz Golaszewski wrote: > > > On Wed, 11 Mar 2026 03:11:42 +0100, Dmitry Baryshkov > > > <[email protected]> said: > > > > On Tue, Mar 10, 2026 at 06:50:30AM -0700, Bartosz Golaszewski wrote: > > > > > > > > > > Ideally things like this would be passed to the rproc core in some > > > > > kind of a > > > > > config structure and only set when registration succeeds. This looks > > > > > to me > > > > > like papering over the real issue and I think it's still racy as > > > > > there's no > > > > > true synchronization. > > > > > > > > > > Wouldn't it be better to take rproc->lock for the entire duration of > > > > > rproc_add()? It's already initialized in rproc_alloc(). > > > > > > > > It would still be racy as rproc_trigger_recovery() is called outside of > > > > the lock. Instead the error cleanup path (and BTW, rproc_del() path too) > > > > must explicitly call cancel_work_sync() on the crash_handler work (and > > > > any other work items that can be scheduled). > > > > > > > > > > This looks weird TBH. For example: rproc_crash_handler_work() takes the > > > lock, > > > but releases it right before calling inspecting rproc->recovery_disabled > > > and > > > calling rproc_trigger_recovery(). It looks wrong, I think it should keep > > > the > > > lock and rptoc_trigger_recovery() should enforce it being taken before the > > > call. > > > > Yes. Nevertheless the driver should cancel the work too. > > > > Hi Dmitry & Bartosz, > > rproc_crash_handler_work() may call rproc_trigger_recovery() and > rproc_add() may call rproc_boot(), both the function have already > hold the lock. And the lock cannot protect resources like glink_subdev > in the patch. > > And there is a possible case for cancel_work, rproc_add tear down call > cancel work and wait for the work finished, the reboot run successfully, > and the tear down continued and the resources all released, including sysfs > and glink_subdev. > > Indeed recovery_disabled is kind of hacky. > The root cause for this issue is that for remoteproc with RPROC_OFFLINE > state, the rproc_start will be called asynchronously, but for the remoteproc > with RPROC_DETACHED, the attach function is called directly, the failure > in this path will cause the rproc_add() fail and the resource release. > I think the current patch can be dropped, we are thinking about make > rproc_attach > called asynchronously to avoid this race.
Isn't this patch necessary for SoCCP bringup? If not, why did you include it into the series? -- With best wishes Dmitry

