Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes
On 7/7/22 1:24 PM, Jose E. Marchesi wrote: Hi Yonghong. On 6/21/22 9:12 AM, Jose E. Marchesi wrote: On 6/17/22 10:18 AM, Jose E. Marchesi wrote: Hi Yonghong. On 6/15/22 1:57 PM, David Faust wrote: On 6/14/22 22:53, Yonghong Song wrote: On 6/7/22 2:43 PM, David Faust wrote: Hello, This patch series adds support for: - Two new C-language-level attributes that allow to associate (to "annotate" or to "tag") particular declarations and types with arbitrary strings. As explained below, this is intended to be used to, for example, characterize certain pointer types. - The conveyance of that information in the DWARF output in the form of a new DIE: DW_TAG_GNU_annotation. - The conveyance of that information in the BTF output in the form of two new kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. All of these facilities are being added to the eBPF ecosystem, and support for them exists in some form in LLVM. Purpose === 1) Addition of C-family language constructs (attributes) to specify free-text tags on certain language elements, such as struct fields. The purpose of these annotations is to provide additional information about types, variables, and function parameters of interest to the kernel. A driving use case is to tag pointer types within the linux kernel and eBPF programs with additional semantic information, such as '__user' or '__rcu'. For example, consider the linux kernel function do_execve with the following declaration: static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Here, __user could be defined with these annotations to record semantic information about the pointer parameters (e.g., they are user-provided) in DWARF and BTF information. Other kernel facilites such as the eBPF verifier can read the tags and make use of the information. 2) Conveying the tags in the generated DWARF debug info. The main motivation for emitting the tags in DWARF is that the Linux kernel generates its BTF information via pahole, using DWARF as a source: ++ BTF BTF +--+ | pahole |---> vmlinux.btf --->| verifier | ++ +--+ ^^ || DWARF |BTF | || vmlinux +-+ module1.ko | BPF program | module2.ko +-+ ... This is because: a) Unlike GCC, LLVM will only generate BTF for BPF programs. b) GCC can generate BTF for whatever target with -gbtf, but there is no support for linking/deduplicating BTF in the linker. In the scenario above, the verifier needs access to the pointer tags of both the kernel types/declarations (conveyed in the DWARF and translated to BTF by pahole) and those of the BPF program (available directly in BTF). Another motivation for having the tag information in DWARF, unrelated to BPF and BTF, is that the drgn project (another DWARF consumer) also wants to benefit from these tags in order to differentiate between different kinds of pointers in the kernel. 3) Conveying the tags in the generated BTF debug info. This is easy: the main purpose of having this info in BTF is for the compiled eBPF programs. The kernel verifier can then access the tags of pointers used by the eBPF programs. For more information about these tags and the motivation behind them, please refer to the following linux kernel discussions: https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/ Implementation Overview === To enable these annotations, two new C language attributes are added: __attribute__((debug_annotate_decl("foo"))) and __attribute__((debug_annotate_type("bar"))). Both attributes accept a single arbitrary string constant argument, which will be recorded in the generated DWARF and/or BTF debug information. They have no effect on code generation. Note that we are not using the same attribute names as LLVM (btf_decl_tag and btf_type_tag, respectively). While these attributes are functionally very similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf" in the attri
Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes
On 7/14/22 8:09 AM, Jose E. Marchesi wrote: Hi Yonghong. On 7/7/22 1:24 PM, Jose E. Marchesi wrote: Hi Yonghong. On 6/21/22 9:12 AM, Jose E. Marchesi wrote: On 6/17/22 10:18 AM, Jose E. Marchesi wrote: Hi Yonghong. On 6/15/22 1:57 PM, David Faust wrote: On 6/14/22 22:53, Yonghong Song wrote: On 6/7/22 2:43 PM, David Faust wrote: Hello, This patch series adds support for: - Two new C-language-level attributes that allow to associate (to "annotate" or to "tag") particular declarations and types with arbitrary strings. As explained below, this is intended to be used to, for example, characterize certain pointer types. - The conveyance of that information in the DWARF output in the form of a new DIE: DW_TAG_GNU_annotation. - The conveyance of that information in the BTF output in the form of two new kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. All of these facilities are being added to the eBPF ecosystem, and support for them exists in some form in LLVM. Purpose === 1) Addition of C-family language constructs (attributes) to specify free-text tags on certain language elements, such as struct fields. The purpose of these annotations is to provide additional information about types, variables, and function parameters of interest to the kernel. A driving use case is to tag pointer types within the linux kernel and eBPF programs with additional semantic information, such as '__user' or '__rcu'. For example, consider the linux kernel function do_execve with the following declaration: static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Here, __user could be defined with these annotations to record semantic information about the pointer parameters (e.g., they are user-provided) in DWARF and BTF information. Other kernel facilites such as the eBPF verifier can read the tags and make use of the information. 2) Conveying the tags in the generated DWARF debug info. The main motivation for emitting the tags in DWARF is that the Linux kernel generates its BTF information via pahole, using DWARF as a source: ++ BTF BTF +--+ | pahole |---> vmlinux.btf --->| verifier | ++ +--+ ^^ || DWARF |BTF | || vmlinux +-+ module1.ko | BPF program | module2.ko +-+ ... This is because: a) Unlike GCC, LLVM will only generate BTF for BPF programs. b) GCC can generate BTF for whatever target with -gbtf, but there is no support for linking/deduplicating BTF in the linker. In the scenario above, the verifier needs access to the pointer tags of both the kernel types/declarations (conveyed in the DWARF and translated to BTF by pahole) and those of the BPF program (available directly in BTF). Another motivation for having the tag information in DWARF, unrelated to BPF and BTF, is that the drgn project (another DWARF consumer) also wants to benefit from these tags in order to differentiate between different kinds of pointers in the kernel. 3) Conveying the tags in the generated BTF debug info. This is easy: the main purpose of having this info in BTF is for the compiled eBPF programs. The kernel verifier can then access the tags of pointers used by the eBPF programs. For more information about these tags and the motivation behind them, please refer to the following linux kernel discussions: https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/ Implementation Overview === To enable these annotations, two new C language attributes are added: __attribute__((debug_annotate_decl("foo"))) and __attribute__((debug_annotate_type("bar"))). Both attributes accept a single arbitrary string constant argument, which will be recorded in the generated DWARF and/or BTF debug information. They have no effect on code generation. Note that we are not using the same attribute names as LLVM (btf_decl_tag and btf_type_tag, respectively). While these attributes are
Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes
On 7/15/22 7:17 AM, Jose E. Marchesi wrote: On 7/14/22 8:09 AM, Jose E. Marchesi wrote: Hi Yonghong. On 7/7/22 1:24 PM, Jose E. Marchesi wrote: Hi Yonghong. On 6/21/22 9:12 AM, Jose E. Marchesi wrote: On 6/17/22 10:18 AM, Jose E. Marchesi wrote: Hi Yonghong. On 6/15/22 1:57 PM, David Faust wrote: On 6/14/22 22:53, Yonghong Song wrote: On 6/7/22 2:43 PM, David Faust wrote: Hello, This patch series adds support for: - Two new C-language-level attributes that allow to associate (to "annotate" or to "tag") particular declarations and types with arbitrary strings. As explained below, this is intended to be used to, for example, characterize certain pointer types. - The conveyance of that information in the DWARF output in the form of a new DIE: DW_TAG_GNU_annotation. - The conveyance of that information in the BTF output in the form of two new kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. All of these facilities are being added to the eBPF ecosystem, and support for them exists in some form in LLVM. Purpose === 1) Addition of C-family language constructs (attributes) to specify free-text tags on certain language elements, such as struct fields. The purpose of these annotations is to provide additional information about types, variables, and function parameters of interest to the kernel. A driving use case is to tag pointer types within the linux kernel and eBPF programs with additional semantic information, such as '__user' or '__rcu'. For example, consider the linux kernel function do_execve with the following declaration: static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Here, __user could be defined with these annotations to record semantic information about the pointer parameters (e.g., they are user-provided) in DWARF and BTF information. Other kernel facilites such as the eBPF verifier can read the tags and make use of the information. 2) Conveying the tags in the generated DWARF debug info. The main motivation for emitting the tags in DWARF is that the Linux kernel generates its BTF information via pahole, using DWARF as a source: ++ BTF BTF +--+ | pahole |---> vmlinux.btf --->| verifier | ++ +--+ ^^ || DWARF |BTF | || vmlinux +-+ module1.ko | BPF program | module2.ko +-+ ... This is because: a) Unlike GCC, LLVM will only generate BTF for BPF programs. b) GCC can generate BTF for whatever target with -gbtf, but there is no support for linking/deduplicating BTF in the linker. In the scenario above, the verifier needs access to the pointer tags of both the kernel types/declarations (conveyed in the DWARF and translated to BTF by pahole) and those of the BPF program (available directly in BTF). Another motivation for having the tag information in DWARF, unrelated to BPF and BTF, is that the drgn project (another DWARF consumer) also wants to benefit from these tags in order to differentiate between different kinds of pointers in the kernel. 3) Conveying the tags in the generated BTF debug info. This is easy: the main purpose of having this info in BTF is for the compiled eBPF programs. The kernel verifier can then access the tags of pointers used by the eBPF programs. For more information about these tags and the motivation behind them, please refer to the following linux kernel discussions: https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/ Implementation Overview === To enable these annotations, two new C language attributes are added: __attribute__((debug_annotate_decl("foo"))) and __attribute__((debug_annotate_type("bar"))). Both attributes accept a single arbitrary string constant argument, which will be recorded in the generated DWARF and/or BTF debug information. They have no effect on code generation. Note that we are not using the same
Re: [PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
On 4/1/22 12:42 PM, David Faust wrote: Hello, This patch series is a first attempt at adding support for: - Two new C-language-level attributes that allow to associate (to "tag") particular declarations and types with arbitrary strings. As explained below, this is intended to be used to, for example, characterize certain pointer types. - The conveyance of that information in the DWARF output in the form of a new DIE: DW_TAG_GNU_annotation. - The conveyance of that information in the BTF output in the form of two new kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. All of these facilities are being added to the eBPF ecosystem, and support for them exists in some form in LLVM. However, as we shall see, we have found some problems implementing them so some discussion is in order. Purpose === 1) Addition of C-family language constructs (attributes) to specify free-text tags on certain language elements, such as struct fields. The purpose of these annotations is to provide additional information about types, variables, and function paratemeters of interest to the kernel. A driving use case is to tag pointer types within the linux kernel and eBPF programs with additional semantic information, such as '__user' or '__rcu'. For example, consider the linux kernel function do_execve with the following declaration: static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Here, __user could be defined with these annotations to record semantic information about the pointer parameters (e.g., they are user-provided) in DWARF and BTF information. Other kernel facilites such as the eBPF verifier can read the tags and make use of the information. 2) Conveying the tags in the generated DWARF debug info. The main motivation for emitting the tags in DWARF is that the Linux kernel generates its BTF information via pahole, using DWARF as a source: ++ BTF BTF +--+ | pahole |---> vmlinux.btf --->| verifier | ++ +--+ ^^ || DWARF |BTF | || vmlinux +-+ module1.ko | BPF program | module2.ko +-+ ... This is because: a) Unlike GCC, LLVM will only generate BTF for BPF programs. b) GCC can generate BTF for whatever target with -gbtf, but there is no support for linking/deduplicating BTF in the linker. In the scenario above, the verifier needs access to the pointer tags of both the kernel types/declarations (conveyed in the DWARF and translated to BTF by pahole) and those of the BPF program (available directly in BTF). Another motivation for having the tag information in DWARF, unrelated to BPF and BTF, is that the drgn project (another DWARF consumer) also wants to benefit from these tags in order to differentiate between different kinds of pointers in the kernel. 3) Conveying the tags in the generated BTF debug info. This is easy: the main purpose of having this info in BTF is for the compiled eBPF programs. The kernel verifier can then access the tags of pointers used by the eBPF programs. For more information about these tags and the motivation behind them, please refer to the following linux kernel discussions: https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/ What is in this patch series This patch series adds support for these annotations in GCC. The implementation is largely complete. However, in some cases the produced debug info (both DWARF and BTF) differs significantly from that produced by LLVM. This issue is discussed in detail below, along with a few specific questions for both GCC and LLVM. Any input would be much appreciated. Hi, David, Thanks for the RFC implementation! I will answer your questions related to llvm and kernel. Implementation Overview === To enable these annotations, two new C language attributes are added: __attribute__((btf_decl_tag("foo")) and __attribute__((btf_type_tag("bar"))). Both attributes accept a single arbitrary string constant argument, which will be recorded in the generated DWARF and/or BTF debugging information. They have no effect on code generation. Note that we are using the same attribute names as LLVM, which include "bt
Re: [PATCH 0/9] Add debug_annotate attributes
Hi, Jose and David, Any progress on implement debug_annotate attribute in gcc? Thanks, Yonghong On 6/15/22 3:56 PM, Yonghong Song wrote: On 6/15/22 1:57 PM, David Faust wrote: On 6/14/22 22:53, Yonghong Song wrote: On 6/7/22 2:43 PM, David Faust wrote: Hello, This patch series adds support for: - Two new C-language-level attributes that allow to associate (to "annotate" or to "tag") particular declarations and types with arbitrary strings. As explained below, this is intended to be used to, for example, characterize certain pointer types. - The conveyance of that information in the DWARF output in the form of a new DIE: DW_TAG_GNU_annotation. - The conveyance of that information in the BTF output in the form of two new kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. All of these facilities are being added to the eBPF ecosystem, and support for them exists in some form in LLVM. Purpose === 1) Addition of C-family language constructs (attributes) to specify free-text tags on certain language elements, such as struct fields. The purpose of these annotations is to provide additional information about types, variables, and function parameters of interest to the kernel. A driving use case is to tag pointer types within the linux kernel and eBPF programs with additional semantic information, such as '__user' or '__rcu'. For example, consider the linux kernel function do_execve with the following declaration: static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Here, __user could be defined with these annotations to record semantic information about the pointer parameters (e.g., they are user-provided) in DWARF and BTF information. Other kernel facilites such as the eBPF verifier can read the tags and make use of the information. 2) Conveying the tags in the generated DWARF debug info. The main motivation for emitting the tags in DWARF is that the Linux kernel generates its BTF information via pahole, using DWARF as a source: ++ BTF BTF +--+ | pahole |---> vmlinux.btf --->| verifier | ++ +--+ ^ ^ | | DWARF | BTF | | | vmlinux +-+ module1.ko | BPF program | module2.ko +-+ ... This is because: a) Unlike GCC, LLVM will only generate BTF for BPF programs. b) GCC can generate BTF for whatever target with -gbtf, but there is no support for linking/deduplicating BTF in the linker. In the scenario above, the verifier needs access to the pointer tags of both the kernel types/declarations (conveyed in the DWARF and translated to BTF by pahole) and those of the BPF program (available directly in BTF). Another motivation for having the tag information in DWARF, unrelated to BPF and BTF, is that the drgn project (another DWARF consumer) also wants to benefit from these tags in order to differentiate between different kinds of pointers in the kernel. 3) Conveying the tags in the generated BTF debug info. This is easy: the main purpose of having this info in BTF is for the compiled eBPF programs. The kernel verifier can then access the tags of pointers used by the eBPF programs. For more information about these tags and the motivation behind them, please refer to the following linux kernel discussions: https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/ Implementation Overview === To enable these annotations, two new C language attributes are added: __attribute__((debug_annotate_decl("foo"))) and __attribute__((debug_annotate_type("bar"))). Both attributes accept a single arbitrary string constant argument, which will be recorded in the generated DWARF and/or BTF debug information. They have no effect on code generation. Note that we are not using the same attribute names as LLVM (btf_decl_tag and btf_type_tag, respectively). While these attributes are functionally very similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf" in the attribute name seems misleading. DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF, declarations and types w
Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
On 5/4/22 10:03 AM, David Faust wrote: On 5/3/22 15:32, Joseph Myers wrote: On Mon, 2 May 2022, David Faust via Gcc-patches wrote: Consider the following example: #define __typetag1 __attribute__((btf_type_tag("tag1"))) #define __typetag2 __attribute__((btf_type_tag("tag2"))) #define __typetag3 __attribute__((btf_type_tag("tag3"))) int __typetag1 * __typetag2 __typetag3 * g; The expected behavior is that 'g' is "a pointer with tags 'tag2' and 'tag3', to a pointer with tag 'tag1' to an int". i.e.: That's not a correct expectation for either GNU __attribute__ or C2x [[]] attribute syntax. In either syntax, __typetag2 __typetag3 should apply to the type to which g points, not to g or its type, just as if you had a type qualifier there. You'd need to put the attributes (or qualifier) after the *, not before, to make them apply to the pointer type. See "Attribute Syntax" in the GCC manual for how the syntax is defined for GNU attributes and deduce in turn, for each subsequence of the tokens matching the syntax for some kind of declarator, what the type for "T D1" would be as defined there and in the C standard, as deduced from the type for "T D" for a sub-declarator D. >> But GCC's attribute parsing produces a variable 'g' which is "a pointer with tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e. In GNU syntax, __typetag1 applies to the declaration, whereas in C2x syntax it applies to int. Again, if you wanted it to apply to the pointer type it would need to go after the * not before. If you are concerned with the fine details of what construct an attribute appertains to, I recommend using C2x syntax not GNU syntax. Joseph, thank you! This is very helpful. My understanding of the syntax was not correct. (Actually, I made a bad mistake in paraphrasing this example from the discussion of it in the series cover letter. But, the reason why it is incorrect is the same.) Yonghong, is the specific ordering an expectation in BPF programs or other users of the tags? This is probably a language writing issue. We are saying tags only apply to pointer. We probably should say it only apply to pointee. $ cat t.c int const *ptr; the llvm ir debuginfo: !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64) !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7) !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) We could replace 'const' with a tag like below: int __attribute__((btf_type_tag("tag"))) *ptr; !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64, annotations: !7) !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) !7 = !{!8} !8 = !{!"btf_type_tag", !"tag"} In the above IR, we generate annotations to pointer_type because we didn't invent a new DI type for encode btf_type_tag. But it is totally okay to have IR looks like !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64) !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag") !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) This example comes from my testing against clang to check that the BTF generated by both toolchains is compatible. In this case we get different results when using the GNU attribute syntax. To avoid confusion, here is the full example (from the cover letter). The difference in the results is clear in the DWARF. Consider the following example: #define __typetag1 __attribute__((btf_type_tag("type-tag-1"))) #define __typetag2 __attribute__((btf_type_tag("type-tag-2"))) #define __typetag3 __attribute__((btf_type_tag("type-tag-3"))) int __typetag1 * __typetag2 __typetag3 * g; type 0x774495e8 int> asm_written unsigned DI size unit-size align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x77450888 attributes value value 0x77509738> readonly constant static "type-tag-3\000">> chain value value readonly constant static "type-tag-2\000" pointer_to_this > asm_written unsigned DI size unit-size align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x77509930 attributes 0x7753a1e0 btf_type_tag> value value 0x77509738> readonly constant static "type-tag-1\000" public static unsigned DI defer-output /home/dfaust/playpen/btf/annotate.c:29:42 size 0x7743c450 64> unit-size align:64 warn_if_not_align:0> The current implementation produces the following DWARF: <1><1e>: Abbrev Number: 4 (DW_TAG_variable) <1f> DW_AT_name : g <21> DW_AT_decl_file : 1 <22> DW_AT_decl_line : 6 <23> DW_AT_decl_column : 42 <24> DW_AT_type
Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
On 5/6/22 2:18 PM, David Faust wrote: On 5/5/22 16:00, Yonghong Song wrote: On 5/4/22 10:03 AM, David Faust wrote: On 5/3/22 15:32, Joseph Myers wrote: On Mon, 2 May 2022, David Faust via Gcc-patches wrote: Consider the following example: #define __typetag1 __attribute__((btf_type_tag("tag1"))) #define __typetag2 __attribute__((btf_type_tag("tag2"))) #define __typetag3 __attribute__((btf_type_tag("tag3"))) int __typetag1 * __typetag2 __typetag3 * g; The expected behavior is that 'g' is "a pointer with tags 'tag2' and 'tag3', to a pointer with tag 'tag1' to an int". i.e.: That's not a correct expectation for either GNU __attribute__ or C2x [[]] attribute syntax. In either syntax, __typetag2 __typetag3 should apply to the type to which g points, not to g or its type, just as if you had a type qualifier there. You'd need to put the attributes (or qualifier) after the *, not before, to make them apply to the pointer type. See "Attribute Syntax" in the GCC manual for how the syntax is defined for GNU attributes and deduce in turn, for each subsequence of the tokens matching the syntax for some kind of declarator, what the type for "T D1" would be as defined there and in the C standard, as deduced from the type for "T D" for a sub-declarator D. >> But GCC's attribute parsing produces a variable 'g' which is "a pointer with tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e. In GNU syntax, __typetag1 applies to the declaration, whereas in C2x syntax it applies to int. Again, if you wanted it to apply to the pointer type it would need to go after the * not before. If you are concerned with the fine details of what construct an attribute appertains to, I recommend using C2x syntax not GNU syntax. Joseph, thank you! This is very helpful. My understanding of the syntax was not correct. (Actually, I made a bad mistake in paraphrasing this example from the discussion of it in the series cover letter. But, the reason why it is incorrect is the same.) Yonghong, is the specific ordering an expectation in BPF programs or other users of the tags? This is probably a language writing issue. We are saying tags only apply to pointer. We probably should say it only apply to pointee. $ cat t.c int const *ptr; the llvm ir debuginfo: !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64) !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7) !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) We could replace 'const' with a tag like below: int __attribute__((btf_type_tag("tag"))) *ptr; !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64, annotations: !7) !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) !7 = !{!8} !8 = !{!"btf_type_tag", !"tag"} In the above IR, we generate annotations to pointer_type because we didn't invent a new DI type for encode btf_type_tag. But it is totally okay to have IR looks like !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64) !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag") !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) OK, thanks. There is still the question of why the DWARF generated for this case that I have been concerned about: int __typetag1 * __typetag2 __typetag3 * g; differs between GCC (with this series) and clang. After studying it, GCC is doing with the attributes exactly as is described in the Attribute Syntax portion of the GCC manual where the GNU syntax is described. I do not think there is any problem here. So the difference in DWARF suggests to me that clang is not handling the GNU attribute syntax in this particular case correctly, since it seems to be associating __typetag2 and __typetag3 to g's type rather than the type to which it points. I am not sure whether for the use purposes of the tags this difference is very important, but it is worth noting. As Joseph suggested, it may be better to encourage users of these tags to use the C2x attribute syntax if they are concerned with precisely which construct the tag applies. This would also be a way around any issues in handling the attributes due to the GNU syntax. I tried a few test cases using C2x syntax BTF type tags with a clang-15 build, but ran into some issues (in particular, some of the tag attributes being ignored altogether). I couldn't find confirmation whether C2x attribute syntax is fully supported in clang yet, so maybe this isn't expected to work. Do you know whether the C2x syntax is fully supported in clang yet? Actually, I don't know either. But since the btf decl_tag and type_tag are also used to compile linux kernel and the minimum compiler version to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1 supports c2x or not, I guess probably not. So I think we most likely cannot use c2x syntax. This example comes from my testing against clang to check that
Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
On 5/10/22 8:43 PM, Yonghong Song wrote: On 5/6/22 2:18 PM, David Faust wrote: On 5/5/22 16:00, Yonghong Song wrote: On 5/4/22 10:03 AM, David Faust wrote: On 5/3/22 15:32, Joseph Myers wrote: On Mon, 2 May 2022, David Faust via Gcc-patches wrote: Consider the following example: #define __typetag1 __attribute__((btf_type_tag("tag1"))) #define __typetag2 __attribute__((btf_type_tag("tag2"))) #define __typetag3 __attribute__((btf_type_tag("tag3"))) int __typetag1 * __typetag2 __typetag3 * g; The expected behavior is that 'g' is "a pointer with tags 'tag2' and 'tag3', to a pointer with tag 'tag1' to an int". i.e.: That's not a correct expectation for either GNU __attribute__ or C2x [[]] attribute syntax. In either syntax, __typetag2 __typetag3 should apply to the type to which g points, not to g or its type, just as if you had a type qualifier there. You'd need to put the attributes (or qualifier) after the *, not before, to make them apply to the pointer type. See "Attribute Syntax" in the GCC manual for how the syntax is defined for GNU attributes and deduce in turn, for each subsequence of the tokens matching the syntax for some kind of declarator, what the type for "T D1" would be as defined there and in the C standard, as deduced from the type for "T D" for a sub-declarator D. >> But GCC's attribute parsing produces a variable 'g' which is "a pointer with tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e. In GNU syntax, __typetag1 applies to the declaration, whereas in C2x syntax it applies to int. Again, if you wanted it to apply to the pointer type it would need to go after the * not before. If you are concerned with the fine details of what construct an attribute appertains to, I recommend using C2x syntax not GNU syntax. Joseph, thank you! This is very helpful. My understanding of the syntax was not correct. (Actually, I made a bad mistake in paraphrasing this example from the discussion of it in the series cover letter. But, the reason why it is incorrect is the same.) Yonghong, is the specific ordering an expectation in BPF programs or other users of the tags? This is probably a language writing issue. We are saying tags only apply to pointer. We probably should say it only apply to pointee. $ cat t.c int const *ptr; the llvm ir debuginfo: !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64) !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7) !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) We could replace 'const' with a tag like below: int __attribute__((btf_type_tag("tag"))) *ptr; !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64, annotations: !7) !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) !7 = !{!8} !8 = !{!"btf_type_tag", !"tag"} In the above IR, we generate annotations to pointer_type because we didn't invent a new DI type for encode btf_type_tag. But it is totally okay to have IR looks like !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64) !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag") !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) OK, thanks. There is still the question of why the DWARF generated for this case that I have been concerned about: int __typetag1 * __typetag2 __typetag3 * g; differs between GCC (with this series) and clang. After studying it, GCC is doing with the attributes exactly as is described in the Attribute Syntax portion of the GCC manual where the GNU syntax is described. I do not think there is any problem here. So the difference in DWARF suggests to me that clang is not handling the GNU attribute syntax in this particular case correctly, since it seems to be associating __typetag2 and __typetag3 to g's type rather than the type to which it points. I am not sure whether for the use purposes of the tags this difference is very important, but it is worth noting. As Joseph suggested, it may be better to encourage users of these tags to use the C2x attribute syntax if they are concerned with precisely which construct the tag applies. This would also be a way around any issues in handling the attributes due to the GNU syntax. I tried a few test cases using C2x syntax BTF type tags with a clang-15 build, but ran into some issues (in particular, some of the tag attributes being ignored altogether). I couldn't find confirmation whether C2x attribute syntax is fully supported in clang yet, so maybe this isn't expected to work. Do you know whether the C2x syntax is fully supported in clang yet? Actually, I don't know either. But since the btf decl_tag and type_tag are also used to compile linux kernel and the minimum compiler version to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1 supports c2x or not, I guess probably not. So I think we most likely cannot use c2x syntax. Okay, I think we can gu
Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
On 5/11/22 11:44 AM, David Faust wrote: On 5/10/22 22:05, Yonghong Song wrote: On 5/10/22 8:43 PM, Yonghong Song wrote: On 5/6/22 2:18 PM, David Faust wrote: On 5/5/22 16:00, Yonghong Song wrote: On 5/4/22 10:03 AM, David Faust wrote: On 5/3/22 15:32, Joseph Myers wrote: On Mon, 2 May 2022, David Faust via Gcc-patches wrote: Consider the following example: #define __typetag1 __attribute__((btf_type_tag("tag1"))) #define __typetag2 __attribute__((btf_type_tag("tag2"))) #define __typetag3 __attribute__((btf_type_tag("tag3"))) int __typetag1 * __typetag2 __typetag3 * g; The expected behavior is that 'g' is "a pointer with tags 'tag2' and 'tag3', to a pointer with tag 'tag1' to an int". i.e.: That's not a correct expectation for either GNU __attribute__ or C2x [[]] attribute syntax. In either syntax, __typetag2 __typetag3 should apply to the type to which g points, not to g or its type, just as if you had a type qualifier there. You'd need to put the attributes (or qualifier) after the *, not before, to make them apply to the pointer type. See "Attribute Syntax" in the GCC manual for how the syntax is defined for GNU attributes and deduce in turn, for each subsequence of the tokens matching the syntax for some kind of declarator, what the type for "T D1" would be as defined there and in the C standard, as deduced from the type for "T D" for a sub-declarator D. >> But GCC's attribute parsing produces a variable 'g' which is "a pointer with tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e. In GNU syntax, __typetag1 applies to the declaration, whereas in C2x syntax it applies to int. Again, if you wanted it to apply to the pointer type it would need to go after the * not before. If you are concerned with the fine details of what construct an attribute appertains to, I recommend using C2x syntax not GNU syntax. Joseph, thank you! This is very helpful. My understanding of the syntax was not correct. (Actually, I made a bad mistake in paraphrasing this example from the discussion of it in the series cover letter. But, the reason why it is incorrect is the same.) Yonghong, is the specific ordering an expectation in BPF programs or other users of the tags? This is probably a language writing issue. We are saying tags only apply to pointer. We probably should say it only apply to pointee. $ cat t.c int const *ptr; the llvm ir debuginfo: !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64) !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7) !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) We could replace 'const' with a tag like below: int __attribute__((btf_type_tag("tag"))) *ptr; !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64, annotations: !7) !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) !7 = !{!8} !8 = !{!"btf_type_tag", !"tag"} In the above IR, we generate annotations to pointer_type because we didn't invent a new DI type for encode btf_type_tag. But it is totally okay to have IR looks like !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64) !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag") !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) OK, thanks. There is still the question of why the DWARF generated for this case that I have been concerned about: int __typetag1 * __typetag2 __typetag3 * g; differs between GCC (with this series) and clang. After studying it, GCC is doing with the attributes exactly as is described in the Attribute Syntax portion of the GCC manual where the GNU syntax is described. I do not think there is any problem here. So the difference in DWARF suggests to me that clang is not handling the GNU attribute syntax in this particular case correctly, since it seems to be associating __typetag2 and __typetag3 to g's type rather than the type to which it points. I am not sure whether for the use purposes of the tags this difference is very important, but it is worth noting. As Joseph suggested, it may be better to encourage users of these tags to use the C2x attribute syntax if they are concerned with precisely which construct the tag applies. This would also be a way around any issues in handling the attributes due to the GNU syntax. I tried a few test cases using C2x syntax BTF type tags with a clang-15 build, but ran into some issues (in particular, some of the tag attributes being ignored altogether). I couldn't find confirmation whether C2x attribute syntax is fully supported in clang yet, so maybe this isn't expected to work. Do you know whether the C2x syntax is fully supported in clang yet? Actually, I don't know either. But since the btf decl_tag and type_tag are also used to compile linux kernel and the minimum compiler version to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1 supports c2x or not, I guess probably not.
Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
On 5/24/22 4:07 AM, Jose E. Marchesi wrote: On 5/11/22 11:44 AM, David Faust wrote: On 5/10/22 22:05, Yonghong Song wrote: On 5/10/22 8:43 PM, Yonghong Song wrote: On 5/6/22 2:18 PM, David Faust wrote: On 5/5/22 16:00, Yonghong Song wrote: On 5/4/22 10:03 AM, David Faust wrote: On 5/3/22 15:32, Joseph Myers wrote: On Mon, 2 May 2022, David Faust via Gcc-patches wrote: Consider the following example: #define __typetag1 __attribute__((btf_type_tag("tag1"))) #define __typetag2 __attribute__((btf_type_tag("tag2"))) #define __typetag3 __attribute__((btf_type_tag("tag3"))) int __typetag1 * __typetag2 __typetag3 * g; The expected behavior is that 'g' is "a pointer with tags 'tag2' and 'tag3', to a pointer with tag 'tag1' to an int". i.e.: That's not a correct expectation for either GNU __attribute__ or C2x [[]] attribute syntax. In either syntax, __typetag2 __typetag3 should apply to the type to which g points, not to g or its type, just as if you had a type qualifier there. You'd need to put the attributes (or qualifier) after the *, not before, to make them apply to the pointer type. See "Attribute Syntax" in the GCC manual for how the syntax is defined for GNU attributes and deduce in turn, for each subsequence of the tokens matching the syntax for some kind of declarator, what the type for "T D1" would be as defined there and in the C standard, as deduced from the type for "T D" for a sub-declarator D. >> But GCC's attribute parsing produces a variable 'g' which is "a pointer with tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e. In GNU syntax, __typetag1 applies to the declaration, whereas in C2x syntax it applies to int. Again, if you wanted it to apply to the pointer type it would need to go after the * not before. If you are concerned with the fine details of what construct an attribute appertains to, I recommend using C2x syntax not GNU syntax. Joseph, thank you! This is very helpful. My understanding of the syntax was not correct. (Actually, I made a bad mistake in paraphrasing this example from the discussion of it in the series cover letter. But, the reason why it is incorrect is the same.) Yonghong, is the specific ordering an expectation in BPF programs or other users of the tags? This is probably a language writing issue. We are saying tags only apply to pointer. We probably should say it only apply to pointee. $ cat t.c int const *ptr; the llvm ir debuginfo: !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64) !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7) !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) We could replace 'const' with a tag like below: int __attribute__((btf_type_tag("tag"))) *ptr; !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64, annotations: !7) !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) !7 = !{!8} !8 = !{!"btf_type_tag", !"tag"} In the above IR, we generate annotations to pointer_type because we didn't invent a new DI type for encode btf_type_tag. But it is totally okay to have IR looks like !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64) !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag") !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) OK, thanks. There is still the question of why the DWARF generated for this case that I have been concerned about: int __typetag1 * __typetag2 __typetag3 * g; differs between GCC (with this series) and clang. After studying it, GCC is doing with the attributes exactly as is described in the Attribute Syntax portion of the GCC manual where the GNU syntax is described. I do not think there is any problem here. So the difference in DWARF suggests to me that clang is not handling the GNU attribute syntax in this particular case correctly, since it seems to be associating __typetag2 and __typetag3 to g's type rather than the type to which it points. I am not sure whether for the use purposes of the tags this difference is very important, but it is worth noting. As Joseph suggested, it may be better to encourage users of these tags to use the C2x attribute syntax if they are concerned with precisely which construct the tag applies. This would also be a way around any issues in handling the attributes due to the GNU syntax. I tried a few test cases using C2x syntax BTF type tags with a clang-15 build, but ran into some issues (in particular, some of the tag attributes being ignored altogether). I couldn't find confirmation whether C2x attribute syntax is fully supported in clang yet, so maybe this isn't expected to work. Do you know whether the C2x syntax is fully supported in clang yet? Actually, I don't know either. But since the btf decl_tag and type_tag are also used to compile linux kernel and the minimum compiler version to compile kernel is gcc5.1 and clang11. I am not sure whether gcc5.1 su
Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
On 5/24/22 8:53 AM, David Faust wrote: On 5/24/22 04:07, Jose E. Marchesi wrote: On 5/11/22 11:44 AM, David Faust wrote: On 5/10/22 22:05, Yonghong Song wrote: On 5/10/22 8:43 PM, Yonghong Song wrote: On 5/6/22 2:18 PM, David Faust wrote: On 5/5/22 16:00, Yonghong Song wrote: On 5/4/22 10:03 AM, David Faust wrote: On 5/3/22 15:32, Joseph Myers wrote: On Mon, 2 May 2022, David Faust via Gcc-patches wrote: Consider the following example: #define __typetag1 __attribute__((btf_type_tag("tag1"))) #define __typetag2 __attribute__((btf_type_tag("tag2"))) #define __typetag3 __attribute__((btf_type_tag("tag3"))) int __typetag1 * __typetag2 __typetag3 * g; The expected behavior is that 'g' is "a pointer with tags 'tag2' and 'tag3', to a pointer with tag 'tag1' to an int". i.e.: That's not a correct expectation for either GNU __attribute__ or C2x [[]] attribute syntax. In either syntax, __typetag2 __typetag3 should apply to the type to which g points, not to g or its type, just as if you had a type qualifier there. You'd need to put the attributes (or qualifier) after the *, not before, to make them apply to the pointer type. See "Attribute Syntax" in the GCC manual for how the syntax is defined for GNU attributes and deduce in turn, for each subsequence of the tokens matching the syntax for some kind of declarator, what the type for "T D1" would be as defined there and in the C standard, as deduced from the type for "T D" for a sub-declarator D. >> But GCC's attribute parsing produces a variable 'g' which is "a pointer with tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e. In GNU syntax, __typetag1 applies to the declaration, whereas in C2x syntax it applies to int. Again, if you wanted it to apply to the pointer type it would need to go after the * not before. If you are concerned with the fine details of what construct an attribute appertains to, I recommend using C2x syntax not GNU syntax. Joseph, thank you! This is very helpful. My understanding of the syntax was not correct. (Actually, I made a bad mistake in paraphrasing this example from the discussion of it in the series cover letter. But, the reason why it is incorrect is the same.) Yonghong, is the specific ordering an expectation in BPF programs or other users of the tags? This is probably a language writing issue. We are saying tags only apply to pointer. We probably should say it only apply to pointee. $ cat t.c int const *ptr; the llvm ir debuginfo: !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64) !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7) !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) We could replace 'const' with a tag like below: int __attribute__((btf_type_tag("tag"))) *ptr; !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64, annotations: !7) !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) !7 = !{!8} !8 = !{!"btf_type_tag", !"tag"} In the above IR, we generate annotations to pointer_type because we didn't invent a new DI type for encode btf_type_tag. But it is totally okay to have IR looks like !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64) !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag") !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) OK, thanks. There is still the question of why the DWARF generated for this case that I have been concerned about: int __typetag1 * __typetag2 __typetag3 * g; differs between GCC (with this series) and clang. After studying it, GCC is doing with the attributes exactly as is described in the Attribute Syntax portion of the GCC manual where the GNU syntax is described. I do not think there is any problem here. So the difference in DWARF suggests to me that clang is not handling the GNU attribute syntax in this particular case correctly, since it seems to be associating __typetag2 and __typetag3 to g's type rather than the type to which it points. I am not sure whether for the use purposes of the tags this difference is very important, but it is worth noting. As Joseph suggested, it may be better to encourage users of these tags to use the C2x attribute syntax if they are concerned with precisely which construct the tag applies. This would also be a way around any issues in handling the attributes due to the GNU syntax. I tried a few test cases using C2x syntax BTF type tags with a clang-15 build, but ran into some issues (in particular, some of the tag attributes being ignored altogether). I couldn't find confirmation whether C2x attribute syntax is fully supported in clang yet, so maybe this isn't expected to work. Do you know whether the C2x syntax is fully supported in clang yet? Actually, I don't know either. But since the btf decl_tag and type_tag are also used to compile linux kernel and the minimum compiler version to compile kernel is gcc5.1 and
Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
On 5/24/22 10:04 AM, David Faust wrote: On 5/24/22 09:03, Yonghong Song wrote: On 5/24/22 8:53 AM, David Faust wrote: On 5/24/22 04:07, Jose E. Marchesi wrote: On 5/11/22 11:44 AM, David Faust wrote: On 5/10/22 22:05, Yonghong Song wrote: On 5/10/22 8:43 PM, Yonghong Song wrote: On 5/6/22 2:18 PM, David Faust wrote: On 5/5/22 16:00, Yonghong Song wrote: On 5/4/22 10:03 AM, David Faust wrote: On 5/3/22 15:32, Joseph Myers wrote: On Mon, 2 May 2022, David Faust via Gcc-patches wrote: Consider the following example: #define __typetag1 __attribute__((btf_type_tag("tag1"))) #define __typetag2 __attribute__((btf_type_tag("tag2"))) #define __typetag3 __attribute__((btf_type_tag("tag3"))) int __typetag1 * __typetag2 __typetag3 * g; The expected behavior is that 'g' is "a pointer with tags 'tag2' and 'tag3', to a pointer with tag 'tag1' to an int". i.e.: That's not a correct expectation for either GNU __attribute__ or C2x [[]] attribute syntax. In either syntax, __typetag2 __typetag3 should apply to the type to which g points, not to g or its type, just as if you had a type qualifier there. You'd need to put the attributes (or qualifier) after the *, not before, to make them apply to the pointer type. See "Attribute Syntax" in the GCC manual for how the syntax is defined for GNU attributes and deduce in turn, for each subsequence of the tokens matching the syntax for some kind of declarator, what the type for "T D1" would be as defined there and in the C standard, as deduced from the type for "T D" for a sub-declarator D. >> But GCC's attribute parsing produces a variable 'g' which is "a pointer with tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e. In GNU syntax, __typetag1 applies to the declaration, whereas in C2x syntax it applies to int. Again, if you wanted it to apply to the pointer type it would need to go after the * not before. If you are concerned with the fine details of what construct an attribute appertains to, I recommend using C2x syntax not GNU syntax. Joseph, thank you! This is very helpful. My understanding of the syntax was not correct. (Actually, I made a bad mistake in paraphrasing this example from the discussion of it in the series cover letter. But, the reason why it is incorrect is the same.) Yonghong, is the specific ordering an expectation in BPF programs or other users of the tags? This is probably a language writing issue. We are saying tags only apply to pointer. We probably should say it only apply to pointee. $ cat t.c int const *ptr; the llvm ir debuginfo: !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64) !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7) !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) We could replace 'const' with a tag like below: int __attribute__((btf_type_tag("tag"))) *ptr; !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64, annotations: !7) !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) !7 = !{!8} !8 = !{!"btf_type_tag", !"tag"} In the above IR, we generate annotations to pointer_type because we didn't invent a new DI type for encode btf_type_tag. But it is totally okay to have IR looks like !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64) !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag") !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) OK, thanks. There is still the question of why the DWARF generated for this case that I have been concerned about: int __typetag1 * __typetag2 __typetag3 * g; differs between GCC (with this series) and clang. After studying it, GCC is doing with the attributes exactly as is described in the Attribute Syntax portion of the GCC manual where the GNU syntax is described. I do not think there is any problem here. So the difference in DWARF suggests to me that clang is not handling the GNU attribute syntax in this particular case correctly, since it seems to be associating __typetag2 and __typetag3 to g's type rather than the type to which it points. I am not sure whether for the use purposes of the tags this difference is very important, but it is worth noting. As Joseph suggested, it may be better to encourage users of these tags to use the C2x attribute syntax if they are concerned with precisely which construct the tag applies. This would also be a way around any issues in handling the attributes due to the GNU syntax. I tried a few test cases using C2x syntax BTF type tags with a clang-15 build, but ran into some issues (in particular, some of the tag attributes being ignored altogether). I couldn't find confirmation whether C2x attribute syntax is fully supported in clang yet, so maybe this isn't expected to work. Do you know whether the C2x syntax is fully supported in clang yet? Actually, I don't know either. But since the btf decl_tag and type_tag are also use
Re: [ping2][PATCH 0/8][RFC] Support BTF decl_tag and type_tag annotations
On 5/27/22 12:56 PM, David Faust wrote: On 5/26/22 00:29, Yonghong Song wrote: On 5/24/22 10:04 AM, David Faust wrote: On 5/24/22 09:03, Yonghong Song wrote: On 5/24/22 8:53 AM, David Faust wrote: On 5/24/22 04:07, Jose E. Marchesi wrote: On 5/11/22 11:44 AM, David Faust wrote: On 5/10/22 22:05, Yonghong Song wrote: On 5/10/22 8:43 PM, Yonghong Song wrote: On 5/6/22 2:18 PM, David Faust wrote: On 5/5/22 16:00, Yonghong Song wrote: On 5/4/22 10:03 AM, David Faust wrote: On 5/3/22 15:32, Joseph Myers wrote: On Mon, 2 May 2022, David Faust via Gcc-patches wrote: Consider the following example: #define __typetag1 __attribute__((btf_type_tag("tag1"))) #define __typetag2 __attribute__((btf_type_tag("tag2"))) #define __typetag3 __attribute__((btf_type_tag("tag3"))) int __typetag1 * __typetag2 __typetag3 * g; The expected behavior is that 'g' is "a pointer with tags 'tag2' and 'tag3', to a pointer with tag 'tag1' to an int". i.e.: That's not a correct expectation for either GNU __attribute__ or C2x [[]] attribute syntax. In either syntax, __typetag2 __typetag3 should apply to the type to which g points, not to g or its type, just as if you had a type qualifier there. You'd need to put the attributes (or qualifier) after the *, not before, to make them apply to the pointer type. See "Attribute Syntax" in the GCC manual for how the syntax is defined for GNU attributes and deduce in turn, for each subsequence of the tokens matching the syntax for some kind of declarator, what the type for "T D1" would be as defined there and in the C standard, as deduced from the type for "T D" for a sub-declarator D. >> But GCC's attribute parsing produces a variable 'g' which is "a pointer with tag 'tag1' to a pointer with tags 'tag2' and 'tag3' to an int", i.e. In GNU syntax, __typetag1 applies to the declaration, whereas in C2x syntax it applies to int. Again, if you wanted it to apply to the pointer type it would need to go after the * not before. If you are concerned with the fine details of what construct an attribute appertains to, I recommend using C2x syntax not GNU syntax. Joseph, thank you! This is very helpful. My understanding of the syntax was not correct. (Actually, I made a bad mistake in paraphrasing this example from the discussion of it in the series cover letter. But, the reason why it is incorrect is the same.) Yonghong, is the specific ordering an expectation in BPF programs or other users of the tags? This is probably a language writing issue. We are saying tags only apply to pointer. We probably should say it only apply to pointee. $ cat t.c int const *ptr; the llvm ir debuginfo: !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64) !6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !7) !7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) We could replace 'const' with a tag like below: int __attribute__((btf_type_tag("tag"))) *ptr; !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64, annotations: !7) !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) !7 = !{!8} !8 = !{!"btf_type_tag", !"tag"} In the above IR, we generate annotations to pointer_type because we didn't invent a new DI type for encode btf_type_tag. But it is totally okay to have IR looks like !5 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !11, size: 64) !11 = !DIBtfTypeTagType(..., baseType: !6, name: !"Tag") !6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) OK, thanks. There is still the question of why the DWARF generated for this case that I have been concerned about: int __typetag1 * __typetag2 __typetag3 * g; differs between GCC (with this series) and clang. After studying it, GCC is doing with the attributes exactly as is described in the Attribute Syntax portion of the GCC manual where the GNU syntax is described. I do not think there is any problem here. So the difference in DWARF suggests to me that clang is not handling the GNU attribute syntax in this particular case correctly, since it seems to be associating __typetag2 and __typetag3 to g's type rather than the type to which it points. I am not sure whether for the use purposes of the tags this difference is very important, but it is worth noting. As Joseph suggested, it may be better to encourage users of these tags to use the C2x attribute syntax if they are concerned with precisely which construct the tag applies. This would also be a way around any issues in handling the attributes due to the GNU syntax. I tried a few test cases using C2x syntax BTF type tags with a clang-15 build, but ran into some issues (in particular, some of the tag attributes being ignored altogether). I couldn't find confirmation whether C2x attribute syntax is fully supported in clang yet, so maybe this isn't expected to work. Do you know whether the C2x syntax is fully supported in clang
Re: [PATCH 0/9] Add debug_annotate attributes
On 6/7/22 2:43 PM, David Faust wrote: Hello, This patch series adds support for: - Two new C-language-level attributes that allow to associate (to "annotate" or to "tag") particular declarations and types with arbitrary strings. As explained below, this is intended to be used to, for example, characterize certain pointer types. - The conveyance of that information in the DWARF output in the form of a new DIE: DW_TAG_GNU_annotation. - The conveyance of that information in the BTF output in the form of two new kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. All of these facilities are being added to the eBPF ecosystem, and support for them exists in some form in LLVM. Purpose === 1) Addition of C-family language constructs (attributes) to specify free-text tags on certain language elements, such as struct fields. The purpose of these annotations is to provide additional information about types, variables, and function parameters of interest to the kernel. A driving use case is to tag pointer types within the linux kernel and eBPF programs with additional semantic information, such as '__user' or '__rcu'. For example, consider the linux kernel function do_execve with the following declaration: static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Here, __user could be defined with these annotations to record semantic information about the pointer parameters (e.g., they are user-provided) in DWARF and BTF information. Other kernel facilites such as the eBPF verifier can read the tags and make use of the information. 2) Conveying the tags in the generated DWARF debug info. The main motivation for emitting the tags in DWARF is that the Linux kernel generates its BTF information via pahole, using DWARF as a source: ++ BTF BTF +--+ | pahole |---> vmlinux.btf --->| verifier | ++ +--+ ^^ || DWARF |BTF | || vmlinux +-+ module1.ko | BPF program | module2.ko +-+ ... This is because: a) Unlike GCC, LLVM will only generate BTF for BPF programs. b) GCC can generate BTF for whatever target with -gbtf, but there is no support for linking/deduplicating BTF in the linker. In the scenario above, the verifier needs access to the pointer tags of both the kernel types/declarations (conveyed in the DWARF and translated to BTF by pahole) and those of the BPF program (available directly in BTF). Another motivation for having the tag information in DWARF, unrelated to BPF and BTF, is that the drgn project (another DWARF consumer) also wants to benefit from these tags in order to differentiate between different kinds of pointers in the kernel. 3) Conveying the tags in the generated BTF debug info. This is easy: the main purpose of having this info in BTF is for the compiled eBPF programs. The kernel verifier can then access the tags of pointers used by the eBPF programs. For more information about these tags and the motivation behind them, please refer to the following linux kernel discussions: https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/ Implementation Overview === To enable these annotations, two new C language attributes are added: __attribute__((debug_annotate_decl("foo"))) and __attribute__((debug_annotate_type("bar"))). Both attributes accept a single arbitrary string constant argument, which will be recorded in the generated DWARF and/or BTF debug information. They have no effect on code generation. Note that we are not using the same attribute names as LLVM (btf_decl_tag and btf_type_tag, respectively). While these attributes are functionally very similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf" in the attribute name seems misleading. DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF, declarations and types will be checked for the corresponding attributes. If present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for the annotated type or declaration, one for each tag. These DIEs link the arbitrary tag value to the item they annotate. For example, the following variable declaration: #define __typetag1 _
Re: [PATCH 0/9] Add debug_annotate attributes
On 6/15/22 1:57 PM, David Faust wrote: On 6/14/22 22:53, Yonghong Song wrote: On 6/7/22 2:43 PM, David Faust wrote: Hello, This patch series adds support for: - Two new C-language-level attributes that allow to associate (to "annotate" or to "tag") particular declarations and types with arbitrary strings. As explained below, this is intended to be used to, for example, characterize certain pointer types. - The conveyance of that information in the DWARF output in the form of a new DIE: DW_TAG_GNU_annotation. - The conveyance of that information in the BTF output in the form of two new kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. All of these facilities are being added to the eBPF ecosystem, and support for them exists in some form in LLVM. Purpose === 1) Addition of C-family language constructs (attributes) to specify free-text tags on certain language elements, such as struct fields. The purpose of these annotations is to provide additional information about types, variables, and function parameters of interest to the kernel. A driving use case is to tag pointer types within the linux kernel and eBPF programs with additional semantic information, such as '__user' or '__rcu'. For example, consider the linux kernel function do_execve with the following declaration: static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Here, __user could be defined with these annotations to record semantic information about the pointer parameters (e.g., they are user-provided) in DWARF and BTF information. Other kernel facilites such as the eBPF verifier can read the tags and make use of the information. 2) Conveying the tags in the generated DWARF debug info. The main motivation for emitting the tags in DWARF is that the Linux kernel generates its BTF information via pahole, using DWARF as a source: ++ BTF BTF +--+ | pahole |---> vmlinux.btf --->| verifier | ++ +--+ ^^ || DWARF |BTF | || vmlinux +-+ module1.ko | BPF program | module2.ko +-+ ... This is because: a) Unlike GCC, LLVM will only generate BTF for BPF programs. b) GCC can generate BTF for whatever target with -gbtf, but there is no support for linking/deduplicating BTF in the linker. In the scenario above, the verifier needs access to the pointer tags of both the kernel types/declarations (conveyed in the DWARF and translated to BTF by pahole) and those of the BPF program (available directly in BTF). Another motivation for having the tag information in DWARF, unrelated to BPF and BTF, is that the drgn project (another DWARF consumer) also wants to benefit from these tags in order to differentiate between different kinds of pointers in the kernel. 3) Conveying the tags in the generated BTF debug info. This is easy: the main purpose of having this info in BTF is for the compiled eBPF programs. The kernel verifier can then access the tags of pointers used by the eBPF programs. For more information about these tags and the motivation behind them, please refer to the following linux kernel discussions: https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/ Implementation Overview === To enable these annotations, two new C language attributes are added: __attribute__((debug_annotate_decl("foo"))) and __attribute__((debug_annotate_type("bar"))). Both attributes accept a single arbitrary string constant argument, which will be recorded in the generated DWARF and/or BTF debug information. They have no effect on code generation. Note that we are not using the same attribute names as LLVM (btf_decl_tag and btf_type_tag, respectively). While these attributes are functionally very similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf" in the attribute name seems misleading. DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF, declarations and types will be checked for the corresponding attributes. If present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for the annotated type or declaration, one for each tag. The
Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes
On 6/17/22 10:18 AM, Jose E. Marchesi wrote: Hi Yonghong. On 6/15/22 1:57 PM, David Faust wrote: On 6/14/22 22:53, Yonghong Song wrote: On 6/7/22 2:43 PM, David Faust wrote: Hello, This patch series adds support for: - Two new C-language-level attributes that allow to associate (to "annotate" or to "tag") particular declarations and types with arbitrary strings. As explained below, this is intended to be used to, for example, characterize certain pointer types. - The conveyance of that information in the DWARF output in the form of a new DIE: DW_TAG_GNU_annotation. - The conveyance of that information in the BTF output in the form of two new kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. All of these facilities are being added to the eBPF ecosystem, and support for them exists in some form in LLVM. Purpose === 1) Addition of C-family language constructs (attributes) to specify free-text tags on certain language elements, such as struct fields. The purpose of these annotations is to provide additional information about types, variables, and function parameters of interest to the kernel. A driving use case is to tag pointer types within the linux kernel and eBPF programs with additional semantic information, such as '__user' or '__rcu'. For example, consider the linux kernel function do_execve with the following declaration: static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Here, __user could be defined with these annotations to record semantic information about the pointer parameters (e.g., they are user-provided) in DWARF and BTF information. Other kernel facilites such as the eBPF verifier can read the tags and make use of the information. 2) Conveying the tags in the generated DWARF debug info. The main motivation for emitting the tags in DWARF is that the Linux kernel generates its BTF information via pahole, using DWARF as a source: ++ BTF BTF +--+ | pahole |---> vmlinux.btf --->| verifier | ++ +--+ ^^ || DWARF |BTF | || vmlinux +-+ module1.ko | BPF program | module2.ko +-+ ... This is because: a) Unlike GCC, LLVM will only generate BTF for BPF programs. b) GCC can generate BTF for whatever target with -gbtf, but there is no support for linking/deduplicating BTF in the linker. In the scenario above, the verifier needs access to the pointer tags of both the kernel types/declarations (conveyed in the DWARF and translated to BTF by pahole) and those of the BPF program (available directly in BTF). Another motivation for having the tag information in DWARF, unrelated to BPF and BTF, is that the drgn project (another DWARF consumer) also wants to benefit from these tags in order to differentiate between different kinds of pointers in the kernel. 3) Conveying the tags in the generated BTF debug info. This is easy: the main purpose of having this info in BTF is for the compiled eBPF programs. The kernel verifier can then access the tags of pointers used by the eBPF programs. For more information about these tags and the motivation behind them, please refer to the following linux kernel discussions: https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/ Implementation Overview === To enable these annotations, two new C language attributes are added: __attribute__((debug_annotate_decl("foo"))) and __attribute__((debug_annotate_type("bar"))). Both attributes accept a single arbitrary string constant argument, which will be recorded in the generated DWARF and/or BTF debug information. They have no effect on code generation. Note that we are not using the same attribute names as LLVM (btf_decl_tag and btf_type_tag, respectively). While these attributes are functionally very similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf" in the attribute name seems misleading. DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF, declarations and types will be checked for the corresponding attributes. If present, a DW_TAG_GNU_an
Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type, decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes
On 6/21/22 9:12 AM, Jose E. Marchesi wrote: On 6/17/22 10:18 AM, Jose E. Marchesi wrote: Hi Yonghong. On 6/15/22 1:57 PM, David Faust wrote: On 6/14/22 22:53, Yonghong Song wrote: On 6/7/22 2:43 PM, David Faust wrote: Hello, This patch series adds support for: - Two new C-language-level attributes that allow to associate (to "annotate" or to "tag") particular declarations and types with arbitrary strings. As explained below, this is intended to be used to, for example, characterize certain pointer types. - The conveyance of that information in the DWARF output in the form of a new DIE: DW_TAG_GNU_annotation. - The conveyance of that information in the BTF output in the form of two new kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. All of these facilities are being added to the eBPF ecosystem, and support for them exists in some form in LLVM. Purpose === 1) Addition of C-family language constructs (attributes) to specify free-text tags on certain language elements, such as struct fields. The purpose of these annotations is to provide additional information about types, variables, and function parameters of interest to the kernel. A driving use case is to tag pointer types within the linux kernel and eBPF programs with additional semantic information, such as '__user' or '__rcu'. For example, consider the linux kernel function do_execve with the following declaration: static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Here, __user could be defined with these annotations to record semantic information about the pointer parameters (e.g., they are user-provided) in DWARF and BTF information. Other kernel facilites such as the eBPF verifier can read the tags and make use of the information. 2) Conveying the tags in the generated DWARF debug info. The main motivation for emitting the tags in DWARF is that the Linux kernel generates its BTF information via pahole, using DWARF as a source: ++ BTF BTF +--+ | pahole |---> vmlinux.btf --->| verifier | ++ +--+ ^^ || DWARF |BTF | || vmlinux +-+ module1.ko | BPF program | module2.ko +-+ ... This is because: a) Unlike GCC, LLVM will only generate BTF for BPF programs. b) GCC can generate BTF for whatever target with -gbtf, but there is no support for linking/deduplicating BTF in the linker. In the scenario above, the verifier needs access to the pointer tags of both the kernel types/declarations (conveyed in the DWARF and translated to BTF by pahole) and those of the BPF program (available directly in BTF). Another motivation for having the tag information in DWARF, unrelated to BPF and BTF, is that the drgn project (another DWARF consumer) also wants to benefit from these tags in order to differentiate between different kinds of pointers in the kernel. 3) Conveying the tags in the generated BTF debug info. This is easy: the main purpose of having this info in BTF is for the compiled eBPF programs. The kernel verifier can then access the tags of pointers used by the eBPF programs. For more information about these tags and the motivation behind them, please refer to the following linux kernel discussions: https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/ Implementation Overview === To enable these annotations, two new C language attributes are added: __attribute__((debug_annotate_decl("foo"))) and __attribute__((debug_annotate_type("bar"))). Both attributes accept a single arbitrary string constant argument, which will be recorded in the generated DWARF and/or BTF debug information. They have no effect on code generation. Note that we are not using the same attribute names as LLVM (btf_decl_tag and btf_type_tag, respectively). While these attributes are functionally very similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf" in the attribute name seems misleading. DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF, dec