On 5/20/2020 5:16 PM, Jakub Kicinski wrote:
> On Wed, 20 May 2020 17:03:02 -0700 Jacob Keller wrote:
>> Hi Jiri, Jakub,
>>
>> I've been asked to investigate using devlink as a mechanism for
>> reporting asynchronous events/messages from firmware including
>> diagnostic messages, etc.
>>
>> Essentially, the ice firmware can report various status or diagnostic
>> messages which are useful for debugging internal behavior. We want to be
>> able to get these messages (and relevant data associated with them) in a
>> format beyond just "dump it to the dmesg buffer and recover it later".
>>
>> It seems like this would be an appropriate use of devlink. I thought
>> maybe this would work with devlink health:
>>
>> i.e. we create a devlink health reporter, and then when firmware sends a
>> message, we use devlink_health_report.
>>
>> But when I dug into this, it doesn't seem like a natural fit. The health
>> reporters expect to see an "error" state, and don't seem to really fit
>> the notion of "log a message from firmware" notion.
>>
>> One of the issues is that the health reporter only keeps one dump, when
>> what we really want is a way to have a monitoring application get the
>> dump and then store its contents.
>>
>> Thoughts on what might make sense for this? It feels like a stretch of
>> the health interface...
>>
>> I mean basically what I am thinking of having is using the devlink_fmsg
>> interface to just send a netlink message that then gets sent over the
>> devlink monitor socket and gets dumped immediately.
>
> Why does user space need a raw firmware interface in the first place?
>
> Examples?
>
So the ice firmware can optionally send diagnostic debug messages via
its control queue. The current solutions we've used internally
essentially hex-dump the binary contents to the kernel log, and then
these get scraped and converted into a useful format for human consumption.
I'm not 100% of the format, but I know it's based on a decoding file
that is specific to a given firmware image, and thus attempting to tie
this into the driver is problematic.
There is also a plan to provide a simpler interface for some of the
diagnostic messages where a simple bijection between one code to one
message for a handful of events, like if the link engine can detect a
known reason why it wasn't able to get link. I suppose these could be
translated and immediately printed by the driver without a special
interface.
-Jake