On Thu, Oct 30, 2025 at 06:32:54PM -0700, Jakub Kicinski wrote:
> On Mon, 29 Sep 2025 17:44:13 -0400 Rodrigo Vivi wrote:
> > Introduces the DRM RAS infrastructure over generic netlink.
>
> Can't comment on the merits but in terms of netlink..
>
> > + ./tools/net/ynl/pyynl/cli.py --spec
> > Documentation/netlink/specs/drm_ras.yaml --dump list-nodes
>
> We recommend using the "installed" syntax in examples, so:
>
> ynl --family drm_ras
>
> instead of:
>
> ./tools/net/ynl/pyynl/cli.py --spec
> Documentation/netlink/specs/drm_ras.yaml
That's really neat. Thank you
$ sudo ynl --family drm_ras --dump list-nodes
[{'device-name': '00:02.0',
'node-id': 0,
'node-name': 'non-fatal',
'node-type': 'error-counter'},
{'device-name': '00:02.0',
'node-id': 1,
'node-name': 'correctable',
'node-type': 'error-counter'}]
>
> If you're using Fedora or another good distro ynl CLI is packaged (for
> Fedora in kernel-tools). The in-tree syntax is a bit verbose.
I didn't know this tool was getting package with the kernel-tools
I thought it was only helping for debug during the development.
Now I'm even wondering if we really need to code a user-space tool
for this drm-ras, or simply recommending the kernel-tools/ynl as
the official consumer of this API.
>
> > + xa_for_each(&drm_ras_xa, id, node) {
> > + if (id < ctx->restart)
> > + continue;
>
> IIRC xa_for_each_start can make this simpler?
indeed. I will attempt that on the next revision.
>
> > + hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid,
> > + cb->nlh->nlmsg_seq,
> > + &drm_ras_nl_family, NLM_F_MULTI,
> > + DRM_RAS_CMD_LIST_NODES);
>
> genlmsg_iput()
> genl_info_dump(cb) to get info
I was taking a look to other examples that was using the genl
but I will try this on the next round. Thanks.
>
> > + if (!hdr) {
> > + ret = -EMSGSIZE;
> > + break;
> > + }
> > +
> > + ret = nla_put_u32(skb, DRM_RAS_A_NODE_ATTRS_NODE_ID, node->id);
> > + if (ret) {
> > + genlmsg_cancel(skb, hdr);
> > + break;
> > + }
> > +
> > + ret = nla_put_string(skb, DRM_RAS_A_NODE_ATTRS_DEVICE_NAME,
> > + node->device_name);
> > + if (ret) {
> > + genlmsg_cancel(skb, hdr);
> > + break;
> > + }
> > +
> > + ret = nla_put_string(skb, DRM_RAS_A_NODE_ATTRS_NODE_NAME,
> > + node->node_name);
> > + if (ret) {
> > + genlmsg_cancel(skb, hdr);
> > + break;
> > + }
> > +
> > + ret = nla_put_u32(skb, DRM_RAS_A_NODE_ATTRS_NODE_TYPE,
> > + node->type);
> > + if (ret) {
> > + genlmsg_cancel(skb, hdr);
> > + break;
> > + }
> > +
> > + genlmsg_end(skb, hdr);
> > + }
> > +
> > + if (ret == -EMSGSIZE) {
> > + ctx->restart = id;
> > + return skb->len;
> > + }
> > +
> > + return ret;
>
> Separate handling of -EMSGSIZE and returning skb->len is not necessary
> as of a few releases ago. Just return ret; core will do the right thing
> if ret == -EMSGSIZE and skb->len != 0
Any good modern example that I could get the right inspiration from?
>
>
> > +static int doit_reply_value(struct genl_info *info, u32 node_id,
> > + u32 error_id)
> > +{
> > + struct sk_buff *msg;
> > + struct nlattr *hdr;
> > + const char *error_name;
> > + u32 value;
> > + int ret;
> > +
> > + msg = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
> > + if (!msg)
> > + return -ENOMEM;
> > + hdr = genlmsg_put_reply(msg, info, &drm_ras_nl_family, 0,
> > + DRM_RAS_CMD_QUERY_ERROR_COUNTER);
> > + if (!hdr) {
> > + nlmsg_free(msg);
> > + return -EMSGSIZE;
> > + }
> > +
> > + ret = get_node_error_counter(node_id, error_id,
> > + &error_name, &value);
> > + if (ret)
> > + return ret;
> > +
> > + ret = msg_reply_value(msg, error_id, error_name, value);
> > + if (ret)
> > + return ret;
>
> Leaking message on errors?
good catch! will fix
Thanks a lot,
Rodrigo.
>
> > + genlmsg_end(msg, hdr);
> > +
> > + return genlmsg_reply(msg, info);
> > +}
>