Re: [PATCH v4 2/6] dwarf: create annotation DIEs for btf tags

Richard Biener Wed, 02 Jul 2025 23:32:44 -0700

On Wed, Jul 2, 2025 at 7:17 PM David Faust <david.fa...@oracle.com> wrote:
>
>
>
> On 7/2/25 00:35, Richard Biener wrote:
> > On Tue, Jul 1, 2025 at 11:20 PM David Faust <david.fa...@oracle.com> wrote:
> >>
> >>
> >>
> >> On 7/1/25 01:02, Richard Biener wrote:
> >>> On Mon, Jun 30, 2025 at 9:12 PM David Faust <david.fa...@oracle.com> 
> >>> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 6/30/25 06:11, Richard Biener wrote:
> >>>>>> +static void
> >>>>>> +gen_btf_decl_tag_dies (tree t, dw_die_ref target, dw_die_ref 
> >>>>>> context_die)
> >>>>>> +{
> >>>>>> +  if (t == NULL_TREE || !DECL_P (t) || !target)
> >>>>>> +    return;
> >>>>>> +
> >>>>>> +  tree attr = lookup_attribute ("btf_decl_tag", DECL_ATTRIBUTES (t));
> >>>>>> +  if (attr == NULL_TREE)
> >>>>>> +    return;
> >>>>>> +
> >>>>>> +  gen_btf_tag_dies (attr, target, context_die);
> >>>>>> +
> >>>>>> +  /* Strip the decl tag attribute once we have created the annotation 
> >>>>>> DIEs
> >>>>>> +     to avoid attempting process it multiple times.  Global variable
> >>>>>> +     declarations may reach this function more than once.  */
> >>>>>> +  DECL_ATTRIBUTES (t)
> >>>>>> +    = remove_attribute ("btf_decl_tag", DECL_ATTRIBUTES (t));
> >>>>> I do not like modifying trees as part of dwarf2out.  You should be able 
> >>>>> to
> >>>>> see whether a DIE already has the respective attribute applied?
> >>>>
> >>>> Yes, you're right. For decl_tag the case is simple and better handled by
> >>>> consulting the hash table. Simple fix and this remove_attribute can be
> >>>> deleted.
> >>>>
> >>>> Understood re: modifying trees in dwarf2out. I agree it's not ideal.
> >>>>
> >>>> For this case the remove_attribute can be deleted. For the two below,
> >>>> one is already immediately restored and the other could be as well so
> >>>> that there are no lasting changes in the tree at all.
> >>>>
> >>>> I will explain the reasoning some more below.
> >>>>
> >>>>>
> >>>>>> +}
> >>>>>> +
> >>>>>>  /* Given a pointer to an arbitrary ..._TYPE tree node, return a 
> >>>>>> debugging
> >>>>>>     entry that chains the modifiers specified by CV_QUALS in front of 
> >>>>>> the
> >>>>>>     given type.  REVERSE is true if the type is to be interpreted in 
> >>>>>> the
> >>>>>> @@ -13674,6 +13894,7 @@ modified_type_die (tree type, int cv_quals, 
> >>>>>> bool reverse,
> >>>>>>    tree item_type = NULL;
> >>>>>>    tree qualified_type;
> >>>>>>    tree name, low, high;
> >>>>>> +  tree tags;
> >>>>>>    dw_die_ref mod_scope;
> >>>>>>    struct array_descr_info info;
> >>>>>>    /* Only these cv-qualifiers are currently handled.  */
> >>>>>> @@ -13783,10 +14004,62 @@ modified_type_die (tree type, int cv_quals, 
> >>>>>> bool reverse,
> >>>>>>           dquals &= cv_qual_mask;
> >>>>>>           if ((dquals & ~cv_quals) != TYPE_UNQUALIFIED
> >>>>>>               || (cv_quals == dquals && DECL_ORIGINAL_TYPE (name) != 
> >>>>>> type))
> >>>>>> -           /* cv-unqualified version of named type.  Just use
> >>>>>> -              the unnamed type to which it refers.  */
> >>>>>> -           return modified_type_die (DECL_ORIGINAL_TYPE (name), 
> >>>>>> cv_quals,
> >>>>>> -                                     reverse, context_die);
> >>>>>> +           {
> >>>>>> +             tree dtags = lookup_attribute ("btf_type_tag",
> >>>>>> +                                            TYPE_ATTRIBUTES (dtype));
> >>>>>> +             if ((tags = lookup_attribute ("btf_type_tag",
> >>>>>> +                                           TYPE_ATTRIBUTES (type)))
> >>>>>> +                 && !attribute_list_equal (tags, dtags))
> >>>>>> +               {
> >>>>>> +                 /* Use of a typedef with additional btf_type_tags.
> >>>>>> +                    Create a new typedef DIE to which we can attach 
> >>>>>> the
> >>>>>> +                    additional type_tag DIEs without disturbing other 
> >>>>>> users of
> >>>>>> +                    the underlying typedef.  */
> >>>>>> +                 dw_die_ref mod_die = modified_type_die (dtype, 
> >>>>>> cv_quals,
> >>>>>> +                                                         reverse, 
> >>>>>> context_die);
> >>>>>> +                 mod_die = clone_die (mod_die);
> >>>>>> +                 add_child_die (comp_unit_die (), mod_die);
> >>>>>> +                 if (!lookup_type_die (type))
> >>>>>> +                   equate_type_number_to_die (type, mod_die);
> >>>>>> +
> >>>>>> +                 /* 'tags' is an accumulated list of type_tag 
> >>>>>> attributes
> >>>>>> +                    for the typedef'd type on both sides of the 
> >>>>>> typedef.
> >>>>>> +                    'dtags' is the set of type_tag attributes only 
> >>>>>> appearing
> >>>>>> +                    in the typedef itself.
> >>>>>> +                    Find the set of type_tags only on the _use_ of the
> >>>>>> +                    typedef, i.e. (tags - dtags).  By construction 
> >>>>>> these
> >>>>>> +                    additional type_tags have been chained onto the 
> >>>>>> head of
> >>>>>> +                    the attribute list of the original typedef.  */
> >>>>>> +                 tree t = tags;
> >>>>>> +                 bool altered_chain = false;
> >>>>>> +                 while (t)
> >>>>>> +                   {
> >>>>>> +                     if (TREE_CHAIN (t) == dtags)
> >>>>>> +                       {
> >>>>>> +                         TREE_CHAIN (t) = NULL_TREE;
> >>>>>> +                         altered_chain = true;
> >>>>>> +                         break;
> >>>>>> +                       }
> >>>>>> +                     t = TREE_CHAIN (t);
> >>>>>> +                   }
> >>>>> And this might be part of that and should thus split out, or be part
> >>>>> of gen_btf_type_tag_dies itself?  I expected the hashtable to
> >>>>> cover the DIEs we have for existing chains of tags?
> >>>>
> >>>> OK, this can move into gen_btf_type_tag_dies.
> >>>>
> >>>> The reason the hash table doesn't help us here is that the uncertainty
> >>>> is in the attribute list itself.
> >>>>
> >>>> Specifically, if the type of a variable is a typedef with an additional
> >>>> type attribute applied, then it is unclear from just the tree node of
> >>>> the type of the variable whether that attribute is "new" on the use of
> >>>> the typedef, or is part of the typedef itself.
> >>>>
> >>>> For example:
> >>>>
> >>>>   typedef int __attribute__((btf_type_tag ("tag1"))) foo;
> >>>>   foo __attribute__((btf_type_tag ("tag2"))) x;
> >>>>
> >>>> For the typedef, we create a DW_TAG_typedef DIE with name "foo", with
> >>>> AT_type referring to a base type DIE for an integer type with "tag1"
> >>>> attached via AT_GNU_annotation.  So far so good.
> >>>>
> >>>> When generating dwarf for variable 'x', the tree node we have for
> >>>> the type of 'x' is an integer type node with name "foo" but the attribute
> >>>> list contains _both_ "tag2" and "tag1".  This is what I mean with the
> >>>> comment in the code above about the "accumulated list" from both sides
> >>>> of the typedef.  It happens because we have a variant type node of
> >>>> the named type foo created, in order to be a distinct node for
> >>>> (foo + "tag2") for the type of 'x'.
> >>>
> >>> So I view DW_TAG_GNU_annotation as a way to represent attributes
> >>> on types and decls going forward (eventually DWARF will gain sth
> >>> standard here for [[..]] - who knows).  As you say we treat attributed
> >>> types as variants, I do not see why the attribute list of the
> >>> variant type for 'x' has both tags?
> >> AFAIU, the attribute list has both tags just because there is not a
> >> distinct type node for the typedef, rather it adds a name to the
> >> integer type node.  So there isn't a place in the tree currently to
> >> put the attribute to distinguish it being on one side of the typedef
> >> or the other, they all get lumped into one attribute list.
> >
> > I see.  So
> >
> > typedef int __attribute__((btf_type_tag ("tag1"))) foo;
> > typedef int __attribute__((btf_type_tag ("tag2"))) bar;
> >
> > gets us two identical tagged types?  Or is it that
>
> This gives us two separate integer_type nodes, one with name "foo"
> and "tag1" in the attribute list, the second named "bar" with "tag2".
>
> >
> > foo __attribute__((btf_type_tag ("tag2"))) x;
> >
> > fails to generate a type variant for applying the attribute to?  And
> > thus a followup
>
> I think my wording above was imprecise.  We do have a variant created,
> the difficulty is that the type variant inherits the attribute list
> from the original, and adds to it.
>
> So, for x a variant of the above integer_type named "foo" is created,
> and we have a second integer_type also named "foo" only used by x.
> This variant type inherits the attribute list from the original,
> so at its creation it has "tag1" in its attribute list. Then, the new
> "tag2" attribute is pushed to its attribute list.  Now the attribute
> list of x's "foo" integer_type has both "tag1" and "tag2":
>
> <type_decl 0x7ffff7831820 foo
>     type <integer_type 0x7ffff79cedc8 foo SI
>         size <integer_cst 0x7ffff78241b0 constant 32>
>         unit-size <integer_cst 0x7ffff78241c8 constant 4>
>         align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
> 0x7ffff78225e8
>         attributes <tree_list 0x7ffff79d03e8
>             purpose <identifier_node 0x7ffff79e4c30 btf_type_tag>
>             value <tree_list 0x7ffff79d0370
>                 value <string_cst 0x7ffff79c2a60 type <array_type 
> 0x7ffff79cec78>
>                     readonly constant static "tag1\000">>>
>
> <var_decl 0x7ffff7600000 x
>     type <integer_type 0x7ffff79ce690 foo SI
>         size <integer_cst 0x7ffff78241b0 constant 32>
>         unit-size <integer_cst 0x7ffff78241c8 constant 4>
>         align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
> 0x7ffff78225e8
>         attributes <tree_list 0x7ffff79d0488
>             purpose <identifier_node 0x7ffff79e4c30 btf_type_tag>
>             value <tree_list 0x7ffff79d0438
>                 value <string_cst 0x7ffff79c2b20 type <array_type 
> 0x7ffff79cec78>
>                     readonly constant static "tag2\000">>
>             chain <tree_list 0x7ffff79d03e8 purpose <identifier_node 
> 0x7ffff79e4c30 btf_type_tag>
>                 value <tree_list 0x7ffff79d0370
>                     value <string_cst 0x7ffff79c2a60 type <array_type 
> 0x7ffff79cec78>
>                         readonly constant static "tag1\000">>>>
>
>
> "tag1" came from the typedef, and "tag2" came from the use of the
> typedef, but we don't know that from looking only at x's "foo".
>
> In the code above we have both nodes, "dtype" is the original foo
> from the typedef, and "type" is the foo used by x.  Exploiting the
> fact that the new attribute ("tag2") on the use of the typedef is
> pushed at the head of the variant type's attribute list, we can
> find the point in the attribute list where where following tags
> appeared in the typedef (since they are in "dtype"s attribute list)
> and preceding tags appeared only in the use of the typedef.
>
> That is what the while loop is doing - finding this point so that
> we only attach to the use of the typedef the new "tag2":
>
> variable "x"
>   type -------> typedef "foo"
>                   annotation -> "tag2" -> end
>                   AT_type    ------->   integer_type
>                                           annotation -> "tag1" -> end
>
> instead of (note the annotation chain on "tag2")
>
> variable "x"
>   type -------> typedef "foo"
>                   annotation -> "tag2" -> "tag1"
>                   AT_type    ------->   integer_type
>                                           annotation -> "tag1" -> end


Ah, yes.  Thinking about this this is even required for correctness on GENERIC,
a type tree needs to have all relevant information in itself, not in
possibly types
is related to.

Of course when you'd use the aligned attribute that changes, so how this is
handled can (and is) attribute dependent (see my other mail).

> The problem as you noted with doing this reverse engineering what was
> probably in the source is that we cannot distinguish:
>
>   foo __attribute__((btf_type_tag ("tag2"))) x;
>
> from
>
>   foo __attribute__((btf_type_tag ("tag2"), btf_type_tag ("tag1"))) x;
>
> Because in the lower case, the "tag1" on the use of the typedef does
> not get pushed into the attribute list, since there is already
> a "tag1" in the attribute list inherited from the original dtype foo.
>
> >
> > foo y;
> >
> > will also be tag2-tagged?
>
> Here "y" uses the exact integer_type node produced for "foo" originally
> (not the variant used by x, and not a new variant. In the above it has
> exactly type <integer_type 0x7ffff79cedc8 foo SI).
>
> >
> > It feels like we must get one of those cases "wrong" if it is like
> > you say?
> >
> >> It's also possible that I am doing something wrong in the FE
> >> attribute handler, but I'm not sure how else the attributes could
> >> be represented in the tree.
> >
> > I'll see in the other patch if I spot something.
> >
> >>> And even if it has, I'd simply let
> >>> it go and have DW_TAG_GNU_annotation with both tags on the
> >>> type DIE and one with just 'tag1' on the typedef DIE for 'foo'.
> >>
> >> OK, I can do that.  For the current uses of type_tag in the kernel
> >> BPF subsystem, I don't think this case is very relevant.
> >>
> >>>
> >>> What about
> >>>
> >>> typedef int __attribute__((btf_type_tag ("tag1"))) foo;
> >>> foo __attribute__((btf_type_tag ("tag2"), btftype_tag ("tag1"))) x;
> >>>
> >>> in your scheme?
> >>
> >> The resulting tree node for the type of x here is actually identical
> >> to what we get in the above example when x only has "tag2". We can't
> >> distinguish the two cases from the tree.
> >>
> >> If we have both, e.g.
> >>
> >> typedef int __attribute__((btf_type_tag ("tag1"))) foo;
> >> foo __attribute__((btf_type_tag ("tag2"), btf_type_tag ("tag1"))) x;
> >> foo __attribute__((btf_type_tag ("tag2"))) y;
> >>
> >> Then both var_decl nodes for 'x' and 'y' share exactly the same
> >> node as their type, an integer_type node named "foo" with exactly
> >> one each of "tag1" and "tag2" in the attribute list.
> >>
> >> I think that the current attribute handling will not add an attribute
> >> with the same name and value more than once to a given type's
> >> attribute list.  That affects us here because there isn't a distinct
> >> attribute list for the typedef. So any variant of "foo" just has
> >> one attribute list that lumps together all the relevant attributes,
> >> which also ends up in effect deduplicating them.
> >
> > OK, I see.  I think given it's imperfect on the tree side we should
> > (at least initially) not worry about the DWARF side getting things
> > "wrong".
>
> OK, agreed.

I see it mostly as optimization on the DWARF side.

Richard.

> >
> >>>
> >>> I think dwarf2out needs to emit what is on the actual tree types and
> >>> decls, not what we reverse engineer would be likely present on the
> >>> source.  If what the trees represent does not agree with what we like
> >>> to be present in the DWARF we should fix the tree side.
> >>
> >> That makes sense to me.  I do think that this means putting information
> >> in the tree that isn't necessarily relevant from program-semantics or
> >> C type system view, but allows us to more accurately reflect source
> >> constructs in DWARF.
> >>
> >> Modifying the tree nodes like this will probably ultimately be the right
> >> approach. No objection there, nor to working on the implementation.
> >>
> >> I am worried though that it could take quite some time to design and fully
> >> implement. The kernel BPF subsystem has real need for these tags in some
> >> form, it's been a couple of years now and they are piling up hacks to work
> >> around GCC not supporting type_tag and decl_tag yet.
> >>
> >> So, if the overall design is OK (with a 2nd ack) it would be great if
> >> we can give them something that works, and then work on improving and
> >> extending it.
> >
> > I think the overall design is OK.
> >
> >>>
> >>>> IOW, 'dtags' holds "tag1" since it comes from the typedef itself,
> >>>> and 'tags' holds both "tag1" and "tag2".  We need to do some extra work
> >>>> looking at the tree to figure out which type_tags are actually on the
> >>>> _use_ of the typedef vs. part of the typdef itself, in order to correctly
> >>>> decide which chains of tags should be associated to which DIE.
> >>>>
> >>>> Note that the attribute chain in the tree is immediately restored
> >>>> after generating the tag dies...
> >>>>>
> >>>>>> +
> >>>>>> +                 /* Now generate the type_tag DIEs only for the new
> >>>>>> +                    type_tags appearing in the use of the typedef, and
> >>>>>> +                    attach them to the cloned typedef DIE.  */
> >>>>>> +                 gen_btf_type_tag_dies (type, tags, mod_die, 
> >>>>>> context_die);
> >>>>>> +
> >>>>>> +                 /* Restore the list of type_tags.  In principle, the 
> >>>>>> same
> >>>>>> +                    attribute list could be shared by unrelated 
> >>>>>> types.  */
> >>>>>> +                 if (altered_chain)
> >>>>>> +                   TREE_CHAIN (t) = dtags;
> >>>>
> >>>> ...here.  So there are no lasting changes in the tree.
> >>>>
> >>>> It could be done w/o altering the tree at all by duplicating the relevant
> >>>> attribute list and operating on that, but it seems a little wasteful.
> >>>>
> >>>>>> +
> >>>>>> +                 return mod_die;
> >>>>>> +               }
> >>>>>> +             /* cv-unqualified version of named type.  Just use
> >>>>>> +                the unnamed type to which it refers.  */
> >>>>>> +             return modified_type_die (DECL_ORIGINAL_TYPE (name), 
> >>>>>> cv_quals,
> >>>>>> +                                       reverse, context_die);
> >>>>>> +           }
> >>>>>>           /* Else cv-qualified version of named type; fall through.  */
> >>>>>>         }
> >>>>>>      }
> >>>>>> @@ -13887,6 +14160,17 @@ modified_type_die (tree type, int cv_quals, 
> >>>>>> bool reverse,
> >>>>>>             first_quals |= dwarf_qual_info[i].q;
> >>>>>>           }
> >>>>>>      }
> >>>>>> +  else if ((tags = lookup_attribute ("btf_type_tag", TYPE_ATTRIBUTES 
> >>>>>> (type))))
> >>>>>> +    {
> >>>>>> +      /* Remove type tags before the recursive call to avoid 
> >>>>>> processing the
> >>>>>> +        same attribute multiple times.  */
> >>>>>> +      TYPE_ATTRIBUTES (type) = remove_attribute ("btf_type_tag",
> >>>>>> +                                                TYPE_ATTRIBUTES 
> >>>>>> (type));
> >>>>> See above.
> >>>>
> >>>> The reason for this one is simply the issue of recursion.
> >>>>
> >>>> Doing remove_attribute here is analogous to the cv-qual handling using
> >>>> get_nearest_type_subqualifiers and recursing with the lesser qualified
> >>>> type. In this case the lesser qualified type is the same type without
> >>>> the type_tag attribute(s)
> >>>
> >>> But don't we have a variant type without the attribute?  That's the basic
> >>> premise of how modified_type_die works.  If we cannot fit attribute
> >>> handling into this scheme then it's not the appropriate "target"?  Maybe
> >>> we need to force the attribute to be always applied to a decl then?
> >>> Or make sure there's a type variant with all CV qualifiers but no 
> >>> attributes
> >>> present.  Like for
> >>>
> >>> const int __attribute__((btf_type_tag("tag1"))) a;
> >>>
> >>> make sure we have type trees for 'int' (main variant - always present),
> >>> 'const int' (possibly not present variant tree right now?) and
> >>> 'const int __attribute__((btf_type_tag("tag1")))' (the TREE_TYPE of 'a')?
> >>
> >> Hmm... right. There should always be a non-attributed variant of the type
> >> somewhere. Rather than remove the attribute and recurse to generate
> >> DWARF for it, it would be better to directly locate that variant and
> >> ensure DWARF is generated for it first.  If the non-attributed type is
> >> used, then we duplicate its DIE to use here and attach the annotations.
> >>
> >> But, if the non-attributed variant is not used elsewhere in the TU, then
> >> maybe by this point it doesn't exist to find. It sounds a little familiar,
> >> maybe I ran into this originally which led to the remove_attribute...
> >
> > Quite possibly.
> >
> >>>
> >>> Otherwise the attribute handling would need to be integrated into the
> >>> "CV" qualification handing of dwarf2out.
> >>>
> >>>> I think the alternative would be to split the type_tags into a separate
> >>>> argument to modified_type_die, similar to cv_quals, which seemed like
> >>>> a much more invasive set of changes to make.
> >>>
> >>> It might be the proper thing if the attribute-less type tree isn't in
> >>> the variant
> >>> set.  That is, I'd pass the whole TYPE_ATTRIBUTES as separate 
> >>> qualification
> >>> from cv_quals and to be handled/stripped first, forcing a separate type 
> >>> DIE
> >>> to not interfere with cv_quals handling.
> >>
> >> Maybe this will be the best option for now then.
> >>
> >> Either way, I agree that the remove_attribute is not good and
> >> should not be necessary.  I will look into this more and find a way
> >> to get rid of it.
> >
> > Thanks!
> >
> >>>
> >>> This is after all laying ground for TYPE_ATTRIBUTES encoding (currently
> >>> nothing in dwarf2out looks at those, almost nothing at DECL_ATTRIBUTES).
> >>>
> >>> Richard.
> >>
> >> Thank you for the reviews and feedback so far, it's very helpful :)
> >>
> >>>
> >>>>>
> >>>>>> +      dw_die_ref mod_die = modified_type_die (type, cv_quals, reverse,
> >>>>>> +                                             context_die);
> >>>>>> +      gen_btf_type_tag_dies (type, tags, mod_die, context_die);
> >>>>
> >>>> It should work fine to save TYPE_ATTRIBUTES above and restore it to the
> >>>> original here after the recursive call so that ultimately the tree
> >>>> is unchanged.
> >>>>
> >>>> Would the same approach with that modification be OK?
> >>>>
> >>>>>> +      return mod_die;
> >>>>>> +    }
> >>>>>>    else if (code == POINTER_TYPE || code == REFERENCE_TYPE)
> >>>>>>      {
> >>>>>>        dwarf_tag tag = DW_TAG_pointer_type;
> >>>>>> @@ -22770,6 +23054,7 @@ gen_array_type_die (tree type, dw_die_ref 
> >>>>>> context_die)
> >>>>
> >>
>

Re: [PATCH v4 2/6] dwarf: create annotation DIEs for btf tags

Reply via email to