stephan.yichao.zhao added inline comments.
================ Comment at: clang/docs/DataFlowSanitizer.rst:172-178 assert(ij_label == 3); // Verifies all of the above + // Or, equivalently: + assert(dfsan_has_label(ij_label, i_label)); + assert(dfsan_has_label(ij_label, j_label)); + assert(!dfsan_has_label(ij_label, k_label)); + ---------------- If we swap assert(ij_label == 3) with the 3 dfsan_has_label, the two equivalent blocks are close to each other. ================ Comment at: clang/docs/DataFlowSanitizerDesign.rst:60 -As stated above, the tool must track a large number of taint -labels. This poses an implementation challenge, as most multiple-label -tainting systems assign one label per bit to shadow storage, and -union taint labels using a bitwise or operation. This will not scale -to clients which use hundreds or thousands of taint labels, as the -label union operation becomes O(n) in the number of supported labels, -and data associated with it will quickly dominate the live variable -set, causing register spills and hampering performance. - -Instead, a low overhead approach is proposed which is best-case O(log\ -:sub:`2` n) during execution. The underlying assumption is that -the required space of label unions is sparse, which is a reasonable -assumption to make given that we are optimizing for the case where -applications mostly copy data from one place to another, without often -invoking the need for an actual union operation. The representation -of a taint label is a 16-bit integer, and new labels are allocated -sequentially from a pool. The label identifier 0 is special, and means -that the data item is unlabelled. - -When a label union operation is requested at a join point (any -arithmetic or logical operation with two or more operands, such as -addition), the code checks whether a union is required, whether the -same union has been requested before, and whether one union label -subsumes the other. If so, it returns the previously allocated union -label. If not, it allocates a new union label from the same pool used -for new labels. - -Specifically, the instrumentation pass will insert code like this -to decide the union label ``lu`` for a pair of labels ``l1`` -and ``l2``: - -.. code-block:: c - - if (l1 == l2) - lu = l1; - else - lu = __dfsan_union(l1, l2); - -The equality comparison is outlined, to provide an early exit in -the common cases where the program is processing unlabelled data, or -where the two data items have the same label. ``__dfsan_union`` is -a runtime library function which performs all other union computation. +We use an 8-bit unsigned integers for the representation of a +label. The label identifier 0 is special, and means that the data item ---------------- integer ================ Comment at: clang/docs/DataFlowSanitizerDesign.rst:65 +join point (any arithmetic or logical operation with two or more +operands, such as addition), we can simply OR the two labels in O(1). ---------------- the labels, and each OR is in O(1). ================ Comment at: clang/docs/DataFlowSanitizerDesign.rst:68 +Users are responsible for managing the 8 integer labels (i.e., keeping +track of what labels they have used so far, pick one that is yet +unused, etc). ---------------- picking ================ Comment at: clang/docs/DataFlowSanitizerDesign.rst:74 The following is the current memory layout for Linux/x86\_64: ---------------- memory layout ================ Comment at: clang/docs/DataFlowSanitizerDesign.rst:99 associated directly with registers. Loads will result in a union of -all shadow labels corresponding to bytes loaded (which most of the -time will be short circuited by the initial comparison) and stores will -result in a copy of the label to the shadow of all bytes stored to. +all shadow labels corresponding to bytes loaded and stores will result +in a copy of the label to the shadow of all bytes stored to. ---------------- , and ================ Comment at: clang/docs/DataFlowSanitizerDesign.rst:100 +all shadow labels corresponding to bytes loaded and stores will result +in a copy of the label to the shadow of all bytes stored to. ---------------- the label of a stored value ================ Comment at: compiler-rt/lib/dfsan/dfsan.cpp:209 -// Like __dfsan_union, but for use from the client or custom functions. Hence -// the equality comparison is done here before calling __dfsan_union. +// Resolves the union of two unequal labels. SANITIZER_INTERFACE_ATTRIBUTE dfsan_label ---------------- After removing legacy mode. if our code does not check l1 != l2 in IR, the comments can be updated. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D103745/new/ https://reviews.llvm.org/D103745 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits