Hi Yonghong.
> On 6/21/22 9:12 AM, Jose E. Marchesi wrote: >> >>> On 6/17/22 10:18 AM, Jose E. Marchesi wrote: >>>> Hi Yonghong. >>>> >>>>> On 6/15/22 1:57 PM, David Faust wrote: >>>>>> >>>>>> On 6/14/22 22:53, Yonghong Song wrote: >>>>>>> >>>>>>> >>>>>>> On 6/7/22 2:43 PM, David Faust wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> This patch series adds support for: >>>>>>>> >>>>>>>> - Two new C-language-level attributes that allow to associate (to >>>>>>>> "annotate" or >>>>>>>> to "tag") particular declarations and types with arbitrary >>>>>>>> strings. As >>>>>>>> explained below, this is intended to be used to, for example, >>>>>>>> characterize >>>>>>>> certain pointer types. >>>>>>>> >>>>>>>> - The conveyance of that information in the DWARF output in the form >>>>>>>> of a new >>>>>>>> DIE: DW_TAG_GNU_annotation. >>>>>>>> >>>>>>>> - The conveyance of that information in the BTF output in the form of >>>>>>>> two new >>>>>>>> kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. >>>>>>>> >>>>>>>> All of these facilities are being added to the eBPF ecosystem, and >>>>>>>> support for >>>>>>>> them exists in some form in LLVM. >>>>>>>> >>>>>>>> Purpose >>>>>>>> ======= >>>>>>>> >>>>>>>> 1) Addition of C-family language constructs (attributes) to specify >>>>>>>> free-text >>>>>>>> tags on certain language elements, such as struct fields. >>>>>>>> >>>>>>>> The purpose of these annotations is to provide additional >>>>>>>> information about >>>>>>>> types, variables, and function parameters of interest to the >>>>>>>> kernel. A >>>>>>>> driving use case is to tag pointer types within the linux >>>>>>>> kernel and eBPF >>>>>>>> programs with additional semantic information, such as >>>>>>>> '__user' or '__rcu'. >>>>>>>> >>>>>>>> For example, consider the linux kernel function do_execve with >>>>>>>> the >>>>>>>> following declaration: >>>>>>>> >>>>>>>> static int do_execve(struct filename *filename, >>>>>>>> const char __user *const __user *__argv, >>>>>>>> const char __user *const __user *__envp); >>>>>>>> >>>>>>>> Here, __user could be defined with these annotations to record >>>>>>>> semantic >>>>>>>> information about the pointer parameters (e.g., they are >>>>>>>> user-provided) in >>>>>>>> DWARF and BTF information. Other kernel facilites such as the >>>>>>>> eBPF verifier >>>>>>>> can read the tags and make use of the information. >>>>>>>> >>>>>>>> 2) Conveying the tags in the generated DWARF debug info. >>>>>>>> >>>>>>>> The main motivation for emitting the tags in DWARF is that the >>>>>>>> Linux kernel >>>>>>>> generates its BTF information via pahole, using DWARF as a >>>>>>>> source: >>>>>>>> >>>>>>>> +--------+ BTF BTF +----------+ >>>>>>>> | pahole |-------> vmlinux.btf ------->| verifier | >>>>>>>> +--------+ +----------+ >>>>>>>> ^ ^ >>>>>>>> | | >>>>>>>> DWARF | BTF | >>>>>>>> | | >>>>>>>> vmlinux +-------------+ >>>>>>>> module1.ko | BPF program | >>>>>>>> module2.ko +-------------+ >>>>>>>> ... >>>>>>>> >>>>>>>> This is because: >>>>>>>> >>>>>>>> a) Unlike GCC, LLVM will only generate BTF for BPF programs. >>>>>>>> >>>>>>>> b) GCC can generate BTF for whatever target with -gbtf, but >>>>>>>> there is no >>>>>>>> support for linking/deduplicating BTF in the linker. >>>>>>>> >>>>>>>> In the scenario above, the verifier needs access to the >>>>>>>> pointer tags of >>>>>>>> both the kernel types/declarations (conveyed in the DWARF and >>>>>>>> translated >>>>>>>> to BTF by pahole) and those of the BPF program (available >>>>>>>> directly in BTF). >>>>>>>> >>>>>>>> Another motivation for having the tag information in DWARF, >>>>>>>> unrelated to >>>>>>>> BPF and BTF, is that the drgn project (another DWARF consumer) >>>>>>>> also wants >>>>>>>> to benefit from these tags in order to differentiate between >>>>>>>> different >>>>>>>> kinds of pointers in the kernel. >>>>>>>> >>>>>>>> 3) Conveying the tags in the generated BTF debug info. >>>>>>>> >>>>>>>> This is easy: the main purpose of having this info in BTF is >>>>>>>> for the >>>>>>>> compiled eBPF programs. The kernel verifier can then access >>>>>>>> the tags >>>>>>>> of pointers used by the eBPF programs. >>>>>>>> >>>>>>>> >>>>>>>> For more information about these tags and the motivation behind them, >>>>>>>> please >>>>>>>> refer to the following linux kernel discussions: >>>>>>>> >>>>>>>> https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ >>>>>>>> https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ >>>>>>>> https://lore.kernel.org/bpf/20211112012604.1504583-1-...@fb.com/ >>>>>>>> >>>>>>>> >>>>>>>> Implementation Overview >>>>>>>> ======================= >>>>>>>> >>>>>>>> To enable these annotations, two new C language attributes are added: >>>>>>>> __attribute__((debug_annotate_decl("foo"))) and >>>>>>>> __attribute__((debug_annotate_type("bar"))). Both attributes accept a >>>>>>>> single >>>>>>>> arbitrary string constant argument, which will be recorded in the >>>>>>>> generated >>>>>>>> DWARF and/or BTF debug information. They have no effect on code >>>>>>>> generation. >>>>>>>> >>>>>>>> Note that we are not using the same attribute names as LLVM >>>>>>>> (btf_decl_tag and >>>>>>>> btf_type_tag, respectively). While these attributes are functionally >>>>>>>> very >>>>>>>> similar, they have grown beyond purely BTF-specific uses, so inclusion >>>>>>>> of "btf" >>>>>>>> in the attribute name seems misleading. >>>>>>>> >>>>>>>> DWARF support is enabled via a new DW_TAG_GNU_annotation. When >>>>>>>> generating DWARF, >>>>>>>> declarations and types will be checked for the corresponding >>>>>>>> attributes. If >>>>>>>> present, a DW_TAG_GNU_annotation DIE will be created as a child of the >>>>>>>> DIE for >>>>>>>> the annotated type or declaration, one for each tag. These DIEs link >>>>>>>> the >>>>>>>> arbitrary tag value to the item they annotate. >>>>>>>> >>>>>>>> For example, the following variable declaration: >>>>>>>> >>>>>>>> #define __typetag1 __attribute__((debug_annotate_type >>>>>>>> ("typetag1"))) >>>>>>>> >>>>>>>> #define __decltag1 __attribute__((debug_annotate_decl >>>>>>>> ("decltag1"))) >>>>>>>> #define __decltag2 __attribute__((debug_annotate_decl >>>>>>>> ("decltag2"))) >>>>>>>> >>>>>>>> int * __typetag1 x __decltag1 __decltag2; >>>>>>> >>>>>>> Based on the above example >>>>>>> static int do_execve(struct filename *filename, >>>>>>> const char __user *const __user *__argv, >>>>>>> const char __user *const __user *__envp); >>>>>>> >>>>>>> Should the above example should be the below? >>>>>>> int __typetag1 * x __decltag1 __decltag2 >>>>>>> >>>>>> This example is not related to the one above. It is just meant to >>>>>> show the behavior of both attributes. My apologies for not making >>>>>> that clear. >>>>> >>>>> Okay, it should be fine if the dwarf debug_info is shown. >>>>> >>>>>> >>>>>>>> >>>>>>>> Produces the following DWARF information: >>>>>>>> >>>>>>>> <1><1e>: Abbrev Number: 3 (DW_TAG_variable) >>>>>>>> <1f> DW_AT_name : x >>>>>>>> <21> DW_AT_decl_file : 1 >>>>>>>> <22> DW_AT_decl_line : 7 >>>>>>>> <23> DW_AT_decl_column : 18 >>>>>>>> <24> DW_AT_type : <0x49> >>>>>>>> <28> DW_AT_external : 1 >>>>>>>> <28> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 >>>>>>>> (DW_OP_addr: 0) >>>>>>>> <32> DW_AT_sibling : <0x49> >>>>>>>> <2><36>: Abbrev Number: 1 (User TAG value: 0x6000) >>>>>>>> <37> DW_AT_name : (indirect string, offset: 0xd6): >>>>>>>> debug_annotate_decl >>>>>>>> <3b> DW_AT_const_value : (indirect string, offset: 0xcd): >>>>>>>> decltag2 >>>>>>>> <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000) >>>>>>>> <40> DW_AT_name : (indirect string, offset: 0xd6): >>>>>>>> debug_annotate_decl >>>>>>>> <44> DW_AT_const_value : (indirect string, offset: 0x0): >>>>>>>> decltag1 >>>>>>>> <2><48>: Abbrev Number: 0 >>>>>>>> <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type) >>>>>>>> <4a> DW_AT_byte_size : 8 >>>>>>>> <4b> DW_AT_type : <0x5d> >>>>>>>> <4f> DW_AT_sibling : <0x5d> >>>>>>>> <2><53>: Abbrev Number: 1 (User TAG value: 0x6000) >>>>>>>> <54> DW_AT_name : (indirect string, offset: 0x9): >>>>>>>> debug_annotate_type >>>>>>>> <58> DW_AT_const_value : (indirect string, offset: 0x1d): >>>>>>>> typetag1 >>>>>>>> <2><5c>: Abbrev Number: 0 >>>>>>>> <1><5d>: Abbrev Number: 5 (DW_TAG_base_type) >>>>>>>> <5e> DW_AT_byte_size : 4 >>>>>>>> <5f> DW_AT_encoding : 5 (signed) >>>>>>>> <60> DW_AT_name : int >>>>>>>> <1><64>: Abbrev Number: 0 >>>>> >>>>> This shows the info in .debug_abbrev. What I mean is to >>>>> show the related info in .debug_info section which seems more useful to >>>>> understand the relationships between different tags. Maybe this is due >>>>> to that I am not fully understanding what <1>/<2> means in <1><49> and >>>>> <2><53> etc. >>>> I think that dump actually shows .debug_info, with the abbrevs >>>> expanded... >>>> Anyway, it seems to us that the root of this problem is the fact the >>>> kernel sparse annotations, such as address_space(__user), are: >>>> 1) To be processed by an external kernel-specific tool ( >>>> https://sparse.docs.kernel.org/en/latest/annotations.html) and not a >>>> C compiler, and therefore, >>>> 2) Not quite the same than compiler attributes (despite the way they >>>> look.) In particular, they seem to assume an ordering different than >>>> of GNU attributes: in some cases given the same written order, they >>>> refer to different things!. Which is quite unfortunate :( >>> >>> Yes, currently __user/__kernel macros (implemented with address_space >>> attribute) are processed by macros. >>> >>>> Now, if I understood properly, you plan to change the definition of >>>> __user and __kernel in the kernel sources in order to generate the tag >>>> compiler attributes, correct? >>> >>> Right. The original __user definition likes: >>> # define __user __attribute__((noderef, address_space(__user))) >>> >>> The new attribute looks like >>> # define BTF_TYPE_TAG(value) __attribute__((btf_type_tag(#value))) >>> # define __user BTF_TYPE_TAG(user) >> Ok I see. So the kernel will stop using sparse attributes to >> implement >> __user and __kernel and start using compiler attributes for tags >> instead. >> >>>> Is that the reason why LLVM implements what we assume to be the >>>> sparse >>>> ordering, and not the correct GNU attributes ordering, for the tag >>>> attributes? >>> >>> Note that __user attributes apply to pointee's and not pointers. >>> Just like >>> const int *p; >>> the 'const' is not applied to pointer 'p', but the pointee of 'p'. >>> >>> What current llvm dwarf generation with >>> pointer >>> <--- btf_type_tag >>> is just ONE implementation. As I said earlier, I am okay to >>> have dwarf implementation like >>> p->btf_type_tag->const->int. >>> If you can propose an implementation like this in dwarf. I can propose >>> to change implementation in llvm. >> I think we are miscommunicating. >> Looks like there is a divergence on what attributes apply to what >> language entities between the sparse compiler and GCC/LLVM. How to >> represent that in DWARF is a different matter. >> For this example: >> int __typetag1 * __typetag2 __typetag3 * g; >> a) GCC associates __typetag1 with the pointer-to-pointer-to-int. >> b) LLVM associates __typetag1 to pointer-to-int. >> Where: >> a) Is the expected behavior of a compiler attributes, as documented >> in >> the GCC manual. >> b) Is presumably what the sparse compiler expects, but _not_ the >> ordering expected for a compiler GNU attribute. >> So, if the kernel source __user and __kernel annotations (which >> currently expand to sparse attributes) follow the sparse ordering, and >> you want to implement __user and __kernel in terms of compiler >> attributes instead (the annotation attributes) then you will have to: >> 1) Fix LLVM to implement the usual ordering for these attributes and >> 2) fix the kernel sources to use that ordering >> [Incidentally, the same applies to another "ex-sparse" attribute you >> have in the kernel and also implemented in LLVM with a weird ordering: >> the address_space attribute.] >> For 2), it may be possible to write a coccinnelle script to generate >> the >> patch... > > I don't think (2) (to change kernel source for different attr ordering) > will work. So the only thing we can do is in compiler/pahole except > macro replacement in kernel. I looked at sparse and its parser. Wanted to be sure the ordering it uses to interpret sparse annotations (such as address_space, alignment, etc) is definitely _not_ the same ordering used by __attribute__ in C compilers. It is very different indeed and the same can be said about how sparse interprets other modifiers like `const': in sparse both `int const *foo' and `int *const foo' parse to a constant pointer to int, for example. I am not to judge how sparse handles its annotations. It may be very well and pertinent for its particular purpose. But I am not sure if it is reasonable to expect C compilers to implement certain type __attributes__ to parse differently, just because it happens these attributes are reused from sparse annotations in a particular program (in this case the kernel.) The debug_annotate_decl and debug_annotate_type attributes are not even intended to be kernel-specific. So, if changing the kernel sources is not an option (why btw, other than being a PITA?) at this point I really don't know what else to suggest :/ Any suggestion from the front-end people? >> Does this make sense? >> >>>> If that is so, we have quite a problem here: I don't think we can >>>> change >>>> the way GCC handles GNU-like attributes just because the kernel sources >>>> want to hook on these __user/__kernel sparse annotations to generate the >>>> compiler tags, even if we could mayhaps get GCC to handle >>>> debug_annotate_type and debug_annotate_decl differently. Some would say >>>> doing so would perpetuate the mistake instead of fixing it... >>>> Is my understanding correct? >>> >>> Let us just say that the btf_type_tag attribute applies to pointees. >>> Does this help? >>> >>>> >>>>>>> >>>>>>> Maybe you can also show what dwarf debug_info looks like >>>>>> I am not sure what you mean. This is the .debug_info section as output >>>>>> by readelf -w. I did trim some information not relevant to the discussion >>>>>> such as the DW_TAG_compile_unit DIE, for brevity. >>>>>> >>>>>>> >>>>>>>> >>>>>>>> In the case of BTF, the annotations are recorded in two type kinds >>>>>>>> recently >>>>>>>> added to the BTF specification: BTF_KIND_DECL_TAG and >>>>>>>> BTF_KIND_TYPE_TAG. >>>>>>>> The above example declaration prodcues the following BTF information: >>>>>>>> >>>>>>>> [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED >>>>>>>> [2] PTR '(anon)' type_id=3 >>>>>>>> [3] TYPE_TAG 'typetag1' type_id=1 >>>>>>>> [4] DECL_TAG 'decltag1' type_id=6 component_idx=-1 >>>>>>>> [5] DECL_TAG 'decltag2' type_id=6 component_idx=-1 >>>>>>>> [6] VAR 'x' type_id=2, linkage=global >>>>>>>> [7] DATASEC '.bss' size=0 vlen=1 >>>>>>>> type_id=6 offset=0 size=8 (VAR 'x') >>>>>>>> >>>>>>>> >>>>>>> [...]