On Thu, 2023-03-09 at 15:29 -0800, Teres Alexis, Alan Previn wrote:
> > > 
alan:snip

> > > +static int guc_log_relay_subbuf_size_get(void *data, u64 *val)
> > > +{
> > > + struct intel_guc_log *log = data;
> > > +
> > > + if (!log->vma)
> > > +         return -ENODEV;
> > 
> > For the record, from the other email thread, the issue here is whether this
> > check is needed.
> > 
> > Also, the issue is what happens if the relay is open and we unload the
> > module, what happens?
> > 
> I'll retest this - but I clearly remember that if the user space app was stil 
> holding
> onto the debugfs handle, the i915 unload would go through most of the driver 
> unload /
> unregister steps, while the app doesnt get any signals but if the app were to 
> close that
> handle after that, (guc_log_relay_ctl_release gets called), we do get invalid 
> ptr access
> in kernel. Take note the logger tool runs with sudo. That said something 
> "like" above check
> is required but perhaps hanging off a still-valid ptr (like i915->foo - maybe 
> gt-struct validity
> - but needs something that is explicitly cleared on unload, not left around 
> with stale ptrs.
> 

An update on this above after some digging / testing : I believe we dont we 
need to check
for "log->vma" validity as you had suspected. However, I did find other legacy 
debugfs
functions for relay logging that DID check for it - so I must have been trying 
to maintain
consistency. That said, i will probably remove the check from other legacy 
functions as well
so they are all consistently not checking for it since its not required.

However, in the process of testing, i found an issue when connecting relay 
logger tool
and unloading driver. On one hand this is a debugfs interface and we may be 
able to fix that
later as the use-case doesnt really expect used to run this tool while 
unloading the driver.
On the other hand some of my colleagues did stress that crashing in kernel is 
something we cannot
igore and knowably allow. Considering the fact that relay logging tool is not 
working at all
upstream today, this patch could "unmask" that error. Finally, i too find 
myself, as part of testing /
debugging, occasionally forgetting to stop the relay logger tool when unloading 
and i cant even do
simple soft-reboot because of how bad things get in the i915. Given all 
considerations, I'm compelled
to fix that properly now. Previously, the majority of the time taken for this 
series was mostly
tied to the intel_guc_logger side of the effort, not the kernel changes. But 
for this fix, i think
more time + changes will be required on the kernel side.

Reply via email to