Hello,
This patch series adds support for:
- Two new C-language-level attributes that allow to associate (to
"annotate" or
to "tag") particular declarations and types with arbitrary
strings. As
explained below, this is intended to be used to, for example,
characterize
certain pointer types.
- The conveyance of that information in the DWARF output in the form
of a new
DIE: DW_TAG_GNU_annotation.
- The conveyance of that information in the BTF output in the form
of two new
kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
All of these facilities are being added to the eBPF ecosystem, and
support for
them exists in some form in LLVM.
Purpose
=======
1) Addition of C-family language constructs (attributes) to specify
free-text
tags on certain language elements, such as struct fields.
The purpose of these annotations is to provide additional
information about
types, variables, and function parameters of interest to the
kernel. A
driving use case is to tag pointer types within the linux
kernel and eBPF
programs with additional semantic information, such as
'__user' or '__rcu'.
For example, consider the linux kernel function do_execve with
the
following declaration:
static int do_execve(struct filename *filename,
const char __user *const __user *__argv,
const char __user *const __user *__envp);
Here, __user could be defined with these annotations to record
semantic
information about the pointer parameters (e.g., they are
user-provided) in
DWARF and BTF information. Other kernel facilites such as the
eBPF verifier
can read the tags and make use of the information.
2) Conveying the tags in the generated DWARF debug info.
The main motivation for emitting the tags in DWARF is that the
Linux kernel
generates its BTF information via pahole, using DWARF as a
source:
+--------+ BTF BTF +----------+
| pahole |-------> vmlinux.btf ------->| verifier |
+--------+ +----------+
^ ^
| |
DWARF | BTF |
| |
vmlinux +-------------+
module1.ko | BPF program |
module2.ko +-------------+
...
This is because:
a) Unlike GCC, LLVM will only generate BTF for BPF programs.
b) GCC can generate BTF for whatever target with -gbtf, but
there is no
support for linking/deduplicating BTF in the linker.
In the scenario above, the verifier needs access to the
pointer tags of
both the kernel types/declarations (conveyed in the DWARF and
translated
to BTF by pahole) and those of the BPF program (available
directly in BTF).
Another motivation for having the tag information in DWARF,
unrelated to
BPF and BTF, is that the drgn project (another DWARF consumer)
also wants
to benefit from these tags in order to differentiate between
different
kinds of pointers in the kernel.
3) Conveying the tags in the generated BTF debug info.
This is easy: the main purpose of having this info in BTF is
for the
compiled eBPF programs. The kernel verifier can then access
the tags
of pointers used by the eBPF programs.
For more information about these tags and the motivation behind
them, please
refer to the following linux kernel discussions:
https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
https://lore.kernel.org/bpf/20211112012604.1504583-1-...@fb.com/
Implementation Overview
=======================
To enable these annotations, two new C language attributes are added:
__attribute__((debug_annotate_decl("foo"))) and
__attribute__((debug_annotate_type("bar"))). Both attributes accept
a single
arbitrary string constant argument, which will be recorded in the
generated
DWARF and/or BTF debug information. They have no effect on code
generation.
Note that we are not using the same attribute names as LLVM
(btf_decl_tag and
btf_type_tag, respectively). While these attributes are functionally
very
similar, they have grown beyond purely BTF-specific uses, so
inclusion of "btf"
in the attribute name seems misleading.
DWARF support is enabled via a new DW_TAG_GNU_annotation. When
generating DWARF,
declarations and types will be checked for the corresponding
attributes. If
present, a DW_TAG_GNU_annotation DIE will be created as a child of
the DIE for
the annotated type or declaration, one for each tag. These DIEs link
the
arbitrary tag value to the item they annotate.
For example, the following variable declaration:
#define __typetag1 __attribute__((debug_annotate_type
("typetag1")))
#define __decltag1 __attribute__((debug_annotate_decl
("decltag1")))
#define __decltag2 __attribute__((debug_annotate_decl
("decltag2")))
int * __typetag1 x __decltag1 __decltag2;