Hi Yonghong.

> On 6/21/22 9:12 AM, Jose E. Marchesi wrote:
>> 
>>> On 6/17/22 10:18 AM, Jose E. Marchesi wrote:
>>>> Hi Yonghong.
>>>>
>>>>> On 6/15/22 1:57 PM, David Faust wrote:
>>>>>>
>>>>>> On 6/14/22 22:53, Yonghong Song wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 6/7/22 2:43 PM, David Faust wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> This patch series adds support for:
>>>>>>>>
>>>>>>>> - Two new C-language-level attributes that allow to associate (to 
>>>>>>>> "annotate" or
>>>>>>>>       to "tag") particular declarations and types with arbitrary 
>>>>>>>> strings. As
>>>>>>>>       explained below, this is intended to be used to, for example, 
>>>>>>>> characterize
>>>>>>>>       certain pointer types.
>>>>>>>>
>>>>>>>> - The conveyance of that information in the DWARF output in the form 
>>>>>>>> of a new
>>>>>>>>       DIE: DW_TAG_GNU_annotation.
>>>>>>>>
>>>>>>>> - The conveyance of that information in the BTF output in the form of 
>>>>>>>> two new
>>>>>>>>       kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
>>>>>>>>
>>>>>>>> All of these facilities are being added to the eBPF ecosystem, and 
>>>>>>>> support for
>>>>>>>> them exists in some form in LLVM.
>>>>>>>>
>>>>>>>> Purpose
>>>>>>>> =======
>>>>>>>>
>>>>>>>> 1)  Addition of C-family language constructs (attributes) to specify 
>>>>>>>> free-text
>>>>>>>>         tags on certain language elements, such as struct fields.
>>>>>>>>
>>>>>>>>         The purpose of these annotations is to provide additional 
>>>>>>>> information about
>>>>>>>>         types, variables, and function parameters of interest to the 
>>>>>>>> kernel. A
>>>>>>>>         driving use case is to tag pointer types within the linux 
>>>>>>>> kernel and eBPF
>>>>>>>>         programs with additional semantic information, such as 
>>>>>>>> '__user' or '__rcu'.
>>>>>>>>
>>>>>>>>         For example, consider the linux kernel function do_execve with 
>>>>>>>> the
>>>>>>>>         following declaration:
>>>>>>>>
>>>>>>>>           static int do_execve(struct filename *filename,
>>>>>>>>              const char __user *const __user *__argv,
>>>>>>>>              const char __user *const __user *__envp);
>>>>>>>>
>>>>>>>>         Here, __user could be defined with these annotations to record 
>>>>>>>> semantic
>>>>>>>>         information about the pointer parameters (e.g., they are 
>>>>>>>> user-provided) in
>>>>>>>>         DWARF and BTF information. Other kernel facilites such as the 
>>>>>>>> eBPF verifier
>>>>>>>>         can read the tags and make use of the information.
>>>>>>>>
>>>>>>>> 2)  Conveying the tags in the generated DWARF debug info.
>>>>>>>>
>>>>>>>>         The main motivation for emitting the tags in DWARF is that the 
>>>>>>>> Linux kernel
>>>>>>>>         generates its BTF information via pahole, using DWARF as a 
>>>>>>>> source:
>>>>>>>>
>>>>>>>>             +--------+  BTF                  BTF   +----------+
>>>>>>>>             | pahole |-------> vmlinux.btf ------->| verifier |
>>>>>>>>             +--------+                             +----------+
>>>>>>>>                 ^                                        ^
>>>>>>>>                 |                                        |
>>>>>>>>           DWARF |                                    BTF |
>>>>>>>>                 |                                        |
>>>>>>>>              vmlinux                              +-------------+
>>>>>>>>              module1.ko                           | BPF program |
>>>>>>>>              module2.ko                           +-------------+
>>>>>>>>                ...
>>>>>>>>
>>>>>>>>         This is because:
>>>>>>>>
>>>>>>>>         a)  Unlike GCC, LLVM will only generate BTF for BPF programs.
>>>>>>>>
>>>>>>>>         b)  GCC can generate BTF for whatever target with -gbtf, but 
>>>>>>>> there is no
>>>>>>>>             support for linking/deduplicating BTF in the linker.
>>>>>>>>
>>>>>>>>         In the scenario above, the verifier needs access to the 
>>>>>>>> pointer tags of
>>>>>>>>         both the kernel types/declarations (conveyed in the DWARF and 
>>>>>>>> translated
>>>>>>>>         to BTF by pahole) and those of the BPF program (available 
>>>>>>>> directly in BTF).
>>>>>>>>
>>>>>>>>         Another motivation for having the tag information in DWARF, 
>>>>>>>> unrelated to
>>>>>>>>         BPF and BTF, is that the drgn project (another DWARF consumer) 
>>>>>>>> also wants
>>>>>>>>         to benefit from these tags in order to differentiate between 
>>>>>>>> different
>>>>>>>>         kinds of pointers in the kernel.
>>>>>>>>
>>>>>>>> 3)  Conveying the tags in the generated BTF debug info.
>>>>>>>>
>>>>>>>>         This is easy: the main purpose of having this info in BTF is 
>>>>>>>> for the
>>>>>>>>         compiled eBPF programs. The kernel verifier can then access 
>>>>>>>> the tags
>>>>>>>>         of pointers used by the eBPF programs.
>>>>>>>>
>>>>>>>>
>>>>>>>> For more information about these tags and the motivation behind them, 
>>>>>>>> please
>>>>>>>> refer to the following linux kernel discussions:
>>>>>>>>
>>>>>>>>       https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
>>>>>>>>       https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
>>>>>>>>       https://lore.kernel.org/bpf/20211112012604.1504583-1-...@fb.com/
>>>>>>>>
>>>>>>>>
>>>>>>>> Implementation Overview
>>>>>>>> =======================
>>>>>>>>
>>>>>>>> To enable these annotations, two new C language attributes are added:
>>>>>>>> __attribute__((debug_annotate_decl("foo"))) and
>>>>>>>> __attribute__((debug_annotate_type("bar"))). Both attributes accept a 
>>>>>>>> single
>>>>>>>> arbitrary string constant argument, which will be recorded in the 
>>>>>>>> generated
>>>>>>>> DWARF and/or BTF debug information. They have no effect on code 
>>>>>>>> generation.
>>>>>>>>
>>>>>>>> Note that we are not using the same attribute names as LLVM 
>>>>>>>> (btf_decl_tag and
>>>>>>>> btf_type_tag, respectively). While these attributes are functionally 
>>>>>>>> very
>>>>>>>> similar, they have grown beyond purely BTF-specific uses, so inclusion 
>>>>>>>> of "btf"
>>>>>>>> in the attribute name seems misleading.
>>>>>>>>
>>>>>>>> DWARF support is enabled via a new DW_TAG_GNU_annotation. When 
>>>>>>>> generating DWARF,
>>>>>>>> declarations and types will be checked for the corresponding 
>>>>>>>> attributes. If
>>>>>>>> present, a DW_TAG_GNU_annotation DIE will be created as a child of the 
>>>>>>>> DIE for
>>>>>>>> the annotated type or declaration, one for each tag. These DIEs link 
>>>>>>>> the
>>>>>>>> arbitrary tag value to the item they annotate.
>>>>>>>>
>>>>>>>> For example, the following variable declaration:
>>>>>>>>
>>>>>>>>       #define __typetag1 __attribute__((debug_annotate_type 
>>>>>>>> ("typetag1")))
>>>>>>>>
>>>>>>>>       #define __decltag1 __attribute__((debug_annotate_decl 
>>>>>>>> ("decltag1")))
>>>>>>>>       #define __decltag2 __attribute__((debug_annotate_decl 
>>>>>>>> ("decltag2")))
>>>>>>>>
>>>>>>>>       int * __typetag1 x __decltag1 __decltag2;
>>>>>>>
>>>>>>> Based on the above example
>>>>>>>             static int do_execve(struct filename *filename,
>>>>>>>               const char __user *const __user *__argv,
>>>>>>>               const char __user *const __user *__envp);
>>>>>>>
>>>>>>> Should the above example should be the below?
>>>>>>>         int __typetag1 * x __decltag1 __decltag2
>>>>>>>
>>>>>> This example is not related to the one above. It is just meant to
>>>>>> show the behavior of both attributes. My apologies for not making
>>>>>> that clear.
>>>>>
>>>>> Okay, it should be fine if the dwarf debug_info is shown.
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> Produces the following DWARF information:
>>>>>>>>
>>>>>>>>      <1><1e>: Abbrev Number: 3 (DW_TAG_variable)
>>>>>>>>         <1f>   DW_AT_name        : x
>>>>>>>>         <21>   DW_AT_decl_file   : 1
>>>>>>>>         <22>   DW_AT_decl_line   : 7
>>>>>>>>         <23>   DW_AT_decl_column : 18
>>>>>>>>         <24>   DW_AT_type        : <0x49>
>>>>>>>>         <28>   DW_AT_external    : 1
>>>>>>>>         <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0     
>>>>>>>> (DW_OP_addr: 0)
>>>>>>>>         <32>   DW_AT_sibling     : <0x49>
>>>>>>>>      <2><36>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>>>>>         <37>   DW_AT_name        : (indirect string, offset: 0xd6): 
>>>>>>>> debug_annotate_decl
>>>>>>>>         <3b>   DW_AT_const_value : (indirect string, offset: 0xcd): 
>>>>>>>> decltag2
>>>>>>>>      <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>>>>>         <40>   DW_AT_name        : (indirect string, offset: 0xd6): 
>>>>>>>> debug_annotate_decl
>>>>>>>>         <44>   DW_AT_const_value : (indirect string, offset: 0x0): 
>>>>>>>> decltag1
>>>>>>>>      <2><48>: Abbrev Number: 0
>>>>>>>>      <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type)
>>>>>>>>         <4a>   DW_AT_byte_size   : 8
>>>>>>>>         <4b>   DW_AT_type        : <0x5d>
>>>>>>>>         <4f>   DW_AT_sibling     : <0x5d>
>>>>>>>>      <2><53>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>>>>>         <54>   DW_AT_name        : (indirect string, offset: 0x9): 
>>>>>>>> debug_annotate_type
>>>>>>>>         <58>   DW_AT_const_value : (indirect string, offset: 0x1d): 
>>>>>>>> typetag1
>>>>>>>>      <2><5c>: Abbrev Number: 0
>>>>>>>>      <1><5d>: Abbrev Number: 5 (DW_TAG_base_type)
>>>>>>>>         <5e>   DW_AT_byte_size   : 4
>>>>>>>>         <5f>   DW_AT_encoding    : 5   (signed)
>>>>>>>>         <60>   DW_AT_name        : int
>>>>>>>>      <1><64>: Abbrev Number: 0
>>>>>
>>>>> This shows the info in .debug_abbrev. What I mean is to
>>>>> show the related info in .debug_info section which seems more useful to
>>>>> understand the relationships between different tags. Maybe this is due
>>>>> to that I am not fully understanding what <1>/<2> means in <1><49> and
>>>>> <2><53> etc.
>>>> I think that dump actually shows .debug_info, with the abbrevs
>>>> expanded...
>>>> Anyway, it seems to us that the root of this problem is the fact the
>>>> kernel sparse annotations, such as address_space(__user), are:
>>>> 1) To be processed by an external kernel-specific tool (
>>>>      https://sparse.docs.kernel.org/en/latest/annotations.html) and not a
>>>>      C compiler, and therefore,
>>>> 2) Not quite the same than compiler attributes (despite the way they
>>>>      look.)  In particular, they seem to assume an ordering different than
>>>>      of GNU attributes: in some cases given the same written order, they
>>>>      refer to different things!.  Which is quite unfortunate :(
>>>
>>> Yes, currently __user/__kernel macros (implemented with address_space
>>> attribute) are processed by macros.
>>>
>>>> Now, if I understood properly, you plan to change the definition of
>>>> __user and __kernel in the kernel sources in order to generate the tag
>>>> compiler attributes, correct?
>>>
>>> Right. The original __user definition likes:
>>>    # define __user         __attribute__((noderef, address_space(__user)))
>>>
>>> The new attribute looks like
>>>    # define BTF_TYPE_TAG(value) __attribute__((btf_type_tag(#value)))
>>>    #  define __user        BTF_TYPE_TAG(user)
>> Ok I see.  So the kernel will stop using sparse attributes to
>> implement
>> __user and __kernel and start using compiler attributes for tags
>> instead.
>> 
>>>> Is that the reason why LLVM implements what we assume to be the
>>>> sparse
>>>> ordering, and not the correct GNU attributes ordering, for the tag
>>>> attributes?
>>>
>>> Note that __user attributes apply to pointee's and not pointers.
>>> Just like
>>>     const int *p;
>>> the 'const' is not applied to pointer 'p', but the pointee of 'p'.
>>>
>>> What current llvm dwarf generation with
>>>     pointer
>>>       <--- btf_type_tag
>>> is just ONE implementation. As I said earlier, I am okay to
>>> have dwarf implementation like
>>>     p->btf_type_tag->const->int.
>>> If you can propose an implementation like this in dwarf. I can propose
>>> to change implementation in llvm.
>> I think we are miscommunicating.
>> Looks like there is a divergence on what attributes apply to what
>> language entities between the sparse compiler and GCC/LLVM.  How to
>> represent that in DWARF is a different matter.
>> For this example:
>>    int __typetag1 * __typetag2 __typetag3 * g;
>> a) GCC associates __typetag1 with the pointer-to-pointer-to-int.
>> b) LLVM associates __typetag1 to pointer-to-int.
>> Where:
>> a) Is the expected behavior of a compiler attributes, as documented
>> in
>>     the GCC manual.
>> b) Is presumably what the sparse compiler expects, but _not_ the
>>     ordering expected for a compiler GNU attribute.
>> So, if the kernel source __user and __kernel annotations (which
>> currently expand to sparse attributes) follow the sparse ordering, and
>> you want to implement __user and __kernel in terms of compiler
>> attributes instead (the annotation attributes) then you will have to:
>> 1) Fix LLVM to implement the usual ordering for these attributes and
>> 2) fix the kernel sources to use that ordering
>> [Incidentally, the same applies to another "ex-sparse" attribute you
>>   have in the kernel and also implemented in LLVM with a weird ordering:
>>   the address_space attribute.]
>> For 2), it may be possible to write a coccinnelle script to generate
>> the
>> patch...
>
> I don't think (2) (to change kernel source for different attr ordering)
> will work. So the only thing we can do is in compiler/pahole except
> macro replacement in kernel.

I looked at sparse and its parser.  Wanted to be sure the ordering it
uses to interpret sparse annotations (such as address_space, alignment,
etc) is definitely _not_ the same ordering used by __attribute__ in C
compilers.

It is very different indeed and the same can be said about how sparse
interprets other modifiers like `const': in sparse both `int const *foo'
and `int *const foo' parse to a constant pointer to int, for example.

I am not to judge how sparse handles its annotations.  It may be very
well and pertinent for its particular purpose.

But I am not sure if it is reasonable to expect C compilers to implement
certain type __attributes__ to parse differently, just because it
happens these attributes are reused from sparse annotations in a
particular program (in this case the kernel.)  The debug_annotate_decl
and debug_annotate_type attributes are not even intended to be
kernel-specific.

So, if changing the kernel sources is not an option (why btw, other than
being a PITA?) at this point I really don't know what else to suggest :/

Any suggestion from the front-end people?

>> Does this make sense?
>> 
>>>> If that is so, we have quite a problem here: I don't think we can
>>>> change
>>>> the way GCC handles GNU-like attributes just because the kernel sources
>>>> want to hook on these __user/__kernel sparse annotations to generate the
>>>> compiler tags, even if we could mayhaps get GCC to handle
>>>> debug_annotate_type and debug_annotate_decl differently.  Some would say
>>>> doing so would perpetuate the mistake instead of fixing it...
>>>> Is my understanding correct?
>>>
>>> Let us just say that the btf_type_tag attribute applies to pointees.
>>> Does this help?
>>>
>>>>
>>>>>>>
>>>>>>> Maybe you can also show what dwarf debug_info looks like
>>>>>> I am not sure what you mean. This is the .debug_info section as output
>>>>>> by readelf -w. I did trim some information not relevant to the discussion
>>>>>> such as the DW_TAG_compile_unit DIE, for brevity.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> In the case of BTF, the annotations are recorded in two type kinds 
>>>>>>>> recently
>>>>>>>> added to the BTF specification: BTF_KIND_DECL_TAG and 
>>>>>>>> BTF_KIND_TYPE_TAG.
>>>>>>>> The above example declaration prodcues the following BTF information:
>>>>>>>>
>>>>>>>> [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
>>>>>>>> [2] PTR '(anon)' type_id=3
>>>>>>>> [3] TYPE_TAG 'typetag1' type_id=1
>>>>>>>> [4] DECL_TAG 'decltag1' type_id=6 component_idx=-1
>>>>>>>> [5] DECL_TAG 'decltag2' type_id=6 component_idx=-1
>>>>>>>> [6] VAR 'x' type_id=2, linkage=global
>>>>>>>> [7] DATASEC '.bss' size=0 vlen=1
>>>>>>>>        type_id=6 offset=0 size=8 (VAR 'x')
>>>>>>>>
>>>>>>>>
>>>>>>> [...]

Reply via email to