[clang] c7b7638 - [dfsan][NFC] Add compile flags and environment variables to doc
Author: Jianzhou Zhao Date: 2021-07-27T00:20:22Z New Revision: c7b7638dfee54053553d9b22eeb8912ca42a06ec URL: https://github.com/llvm/llvm-project/commit/c7b7638dfee54053553d9b22eeb8912ca42a06ec DIFF: https://github.com/llvm/llvm-project/commit/c7b7638dfee54053553d9b22eeb8912ca42a06ec.diff LOG: [dfsan][NFC] Add compile flags and environment variables to doc Reviewed By: gbalats Differential Revision: https://reviews.llvm.org/D106833 Added: Modified: clang/docs/DataFlowSanitizer.rst Removed: diff --git a/clang/docs/DataFlowSanitizer.rst b/clang/docs/DataFlowSanitizer.rst index 143b6e3d3242..dbe62e3b6aa0 100644 --- a/clang/docs/DataFlowSanitizer.rst +++ b/clang/docs/DataFlowSanitizer.rst @@ -137,6 +137,88 @@ For example: fun:memcpy=uninstrumented fun:memcpy=custom +Compilation Flags +- + +* ``-dfsan-abilist`` -- The additional ABI list files that control how shadow + parameters are passed. File names are separated by comma. +* ``-dfsan-combine-pointer-labels-on-load`` -- Controls whether to include or + ignore the labels of pointers in load instructions. Its default value is true. + For example: + +.. code-block:: c++ + v = *p; + +If the flag is true, the label of ``v`` is the union of the label of ``p`` and +the label of ``*p``. If the flag is false, the label of ``v`` is the label of +just ``*p``. +* ``-dfsan-combine-pointer-labels-on-store`` -- Controls whether to include or + ignore the labels of pointers in store instructions. Its default value is + false. For example: + +.. code-block:: c++ + *p = v; + +If the flag is true, the label of ``*p`` is the union of the label of ``p`` and +the label of ``v``. If the flag is false, the label of ``*p`` is the label of +just ``v``. +* ``-dfsan-combine-offset-labels-on-gep`` -- Controls whether to propagate + labels of offsets in GEP instructions. Its default value is true. For example: + +.. code-block:: c++ + p += i; + +If the flag is true, the label of ``p`` is the union of the label of ``p`` and +the label of ``i``. If the flag is false, the label of ``p`` is unchanged. +* ``-dfsan-track-select-control-flow`` -- Controls whether to track the control + flow of select instructions. Its default value is true. For example: + +.. code-block:: c++ + v = b? v1: v2; + +If the flag is true, the label of ``v`` is the union of the labels of ``b``, +``v1`` and ``v2``. If the flag is false, the label of ``v`` is the union of the +labels of just ``v1`` and ``v2``. +* ``-dfsan-event-callbacks`` -- An experimental feature that inserts callbacks for +certain data events. Currently callbacks are only inserted for loads, stores, +memory transfers (i.e. memcpy and memmove), and comparisons. Its default value +is false. If this flag is set to true, a user must provide definitions for the +following callback functions: + +.. code-block:: c++ + void __dfsan_load_callback(dfsan_label Label, void* Addr); + void __dfsan_store_callback(dfsan_label Label, void* Addr); + void __dfsan_mem_transfer_callback(dfsan_label *Start, size_t Len); + void __dfsan_cmp_callback(dfsan_label CombinedLabel); +* ``-dfsan-track-origins`` -- Controls how to track origins. When its value is + 0, the runtime does not track origins. When its value is 1, the runtime tracks + origins at memory store operations. When its value is 2, the runtime tracks + origins at memory load and store operations. Its default value is 0. +* ``-dfsan-instrument-with-call-threshold`` -- If a function being instrumented + requires more than this number of origin stores, use callbacks instead of + inline checks (-1 means never use callbacks). Its default value is 3500. + +Environment Variables +- + +* ``warn_unimplemented`` -- Whether to warn on unimplemented functions. Its + default value is false. +* ``strict_data_dependencies`` -- Whether to propagate labels only when there is + explicit obvious data dependency (e.g., when comparing strings, ignore the fact + that the output of the comparison might be implicit data-dependent on the + content of the strings). This applies only to functions with ``custom`` category + in ABI list. Its default value is true. +* ``origin_history_size`` -- The limit of origin chain length. Non-positive values + mean unlimited. Its default value is 16. +* ``origin_history_per_stack_limit`` -- The limit of origin node's references count. + Non-positive values mean unlimited. Its default value is 2. +* ``store_context_size`` -- The depth limit of origin tracking stack traces. Its + default value is 20. +* ``zero_in_malloc`` -- Whether to zero shadow space of new allocated memory. Its + default value is true. +* ``zero_in_free`` --- Whether to zero shadow space of deallocated memory. Its + default value is true. + Example === ___ cfe-commits mai
[clang] e69a8c4 - [dfsan] Fix doc build errors
Author: Jianzhou Zhao Date: 2021-07-27T00:29:55Z New Revision: e69a8c42135606e60446d5e78144357a9e429c77 URL: https://github.com/llvm/llvm-project/commit/e69a8c42135606e60446d5e78144357a9e429c77 DIFF: https://github.com/llvm/llvm-project/commit/e69a8c42135606e60446d5e78144357a9e429c77.diff LOG: [dfsan] Fix doc build errors Added: Modified: clang/docs/DataFlowSanitizer.rst Removed: diff --git a/clang/docs/DataFlowSanitizer.rst b/clang/docs/DataFlowSanitizer.rst index dbe62e3b6aa0..c21f9a922603 100644 --- a/clang/docs/DataFlowSanitizer.rst +++ b/clang/docs/DataFlowSanitizer.rst @@ -147,6 +147,7 @@ Compilation Flags For example: .. code-block:: c++ + v = *p; If the flag is true, the label of ``v`` is the union of the label of ``p`` and @@ -157,6 +158,7 @@ just ``*p``. false. For example: .. code-block:: c++ + *p = v; If the flag is true, the label of ``*p`` is the union of the label of ``p`` and @@ -166,6 +168,7 @@ just ``v``. labels of offsets in GEP instructions. Its default value is true. For example: .. code-block:: c++ + p += i; If the flag is true, the label of ``p`` is the union of the label of ``p`` and @@ -174,6 +177,7 @@ the label of ``i``. If the flag is false, the label of ``p`` is unchanged. flow of select instructions. Its default value is true. For example: .. code-block:: c++ + v = b? v1: v2; If the flag is true, the label of ``v`` is the union of the labels of ``b``, @@ -186,6 +190,7 @@ is false. If this flag is set to true, a user must provide definitions for the following callback functions: .. code-block:: c++ + void __dfsan_load_callback(dfsan_label Label, void* Addr); void __dfsan_store_callback(dfsan_label Label, void* Addr); void __dfsan_mem_transfer_callback(dfsan_label *Start, size_t Len); ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 494f1e6 - [dfsan][NFC] Fix doc format
Author: Jianzhou Zhao Date: 2021-07-27T02:07:53Z New Revision: 494f1e6706481ec49942c07ebf48697872919612 URL: https://github.com/llvm/llvm-project/commit/494f1e6706481ec49942c07ebf48697872919612 DIFF: https://github.com/llvm/llvm-project/commit/494f1e6706481ec49942c07ebf48697872919612.diff LOG: [dfsan][NFC] Fix doc format Added: Modified: clang/docs/DataFlowSanitizer.rst Removed: diff --git a/clang/docs/DataFlowSanitizer.rst b/clang/docs/DataFlowSanitizer.rst index c21f9a922603..cb4837bdc788 100644 --- a/clang/docs/DataFlowSanitizer.rst +++ b/clang/docs/DataFlowSanitizer.rst @@ -153,6 +153,7 @@ Compilation Flags If the flag is true, the label of ``v`` is the union of the label of ``p`` and the label of ``*p``. If the flag is false, the label of ``v`` is the label of just ``*p``. + * ``-dfsan-combine-pointer-labels-on-store`` -- Controls whether to include or ignore the labels of pointers in store instructions. Its default value is false. For example: @@ -164,6 +165,7 @@ just ``*p``. If the flag is true, the label of ``*p`` is the union of the label of ``p`` and the label of ``v``. If the flag is false, the label of ``*p`` is the label of just ``v``. + * ``-dfsan-combine-offset-labels-on-gep`` -- Controls whether to propagate labels of offsets in GEP instructions. Its default value is true. For example: @@ -173,6 +175,7 @@ just ``v``. If the flag is true, the label of ``p`` is the union of the label of ``p`` and the label of ``i``. If the flag is false, the label of ``p`` is unchanged. + * ``-dfsan-track-select-control-flow`` -- Controls whether to track the control flow of select instructions. Its default value is true. For example: @@ -183,6 +186,7 @@ the label of ``i``. If the flag is false, the label of ``p`` is unchanged. If the flag is true, the label of ``v`` is the union of the labels of ``b``, ``v1`` and ``v2``. If the flag is false, the label of ``v`` is the union of the labels of just ``v1`` and ``v2``. + * ``-dfsan-event-callbacks`` -- An experimental feature that inserts callbacks for certain data events. Currently callbacks are only inserted for loads, stores, memory transfers (i.e. memcpy and memmove), and comparisons. Its default value @@ -195,10 +199,12 @@ following callback functions: void __dfsan_store_callback(dfsan_label Label, void* Addr); void __dfsan_mem_transfer_callback(dfsan_label *Start, size_t Len); void __dfsan_cmp_callback(dfsan_label CombinedLabel); + * ``-dfsan-track-origins`` -- Controls how to track origins. When its value is 0, the runtime does not track origins. When its value is 1, the runtime tracks origins at memory store operations. When its value is 2, the runtime tracks origins at memory load and store operations. Its default value is 0. + * ``-dfsan-instrument-with-call-threshold`` -- If a function being instrumented requires more than this number of origin stores, use callbacks instead of inline checks (-1 means never use callbacks). Its default value is 3500. ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 531b19a - [dfsan][NFC] Fix doc format
Author: Jianzhou Zhao Date: 2021-07-27T04:22:20Z New Revision: 531b19a49e66de5c4b35fc89eebc078c13eb9a85 URL: https://github.com/llvm/llvm-project/commit/531b19a49e66de5c4b35fc89eebc078c13eb9a85 DIFF: https://github.com/llvm/llvm-project/commit/531b19a49e66de5c4b35fc89eebc078c13eb9a85.diff LOG: [dfsan][NFC] Fix doc format Added: Modified: clang/docs/DataFlowSanitizer.rst Removed: diff --git a/clang/docs/DataFlowSanitizer.rst b/clang/docs/DataFlowSanitizer.rst index cb4837bdc788..1253cb98e634 100644 --- a/clang/docs/DataFlowSanitizer.rst +++ b/clang/docs/DataFlowSanitizer.rst @@ -188,10 +188,10 @@ If the flag is true, the label of ``v`` is the union of the labels of ``b``, labels of just ``v1`` and ``v2``. * ``-dfsan-event-callbacks`` -- An experimental feature that inserts callbacks for -certain data events. Currently callbacks are only inserted for loads, stores, -memory transfers (i.e. memcpy and memmove), and comparisons. Its default value -is false. If this flag is set to true, a user must provide definitions for the -following callback functions: + certain data events. Currently callbacks are only inserted for loads, stores, + memory transfers (i.e. memcpy and memmove), and comparisons. Its default value + is false. If this flag is set to true, a user must provide definitions for the + following callback functions: .. code-block:: c++ @@ -206,7 +206,7 @@ following callback functions: origins at memory load and store operations. Its default value is 0. * ``-dfsan-instrument-with-call-threshold`` -- If a function being instrumented - requires more than this number of origin stores, use callbacks instead of + requires more than this number of origin stores, use callbacks instead of inline checks (-1 means never use callbacks). Its default value is 3500. Environment Variables ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 00411eb - [dfsan][NFC] Update API interfaces
Author: Jianzhou Zhao Date: 2021-07-27T18:53:36Z New Revision: 00411ebeeb718da63d1ec0e0ffc8e5012e474fe9 URL: https://github.com/llvm/llvm-project/commit/00411ebeeb718da63d1ec0e0ffc8e5012e474fe9 DIFF: https://github.com/llvm/llvm-project/commit/00411ebeeb718da63d1ec0e0ffc8e5012e474fe9.diff LOG: [dfsan][NFC] Update API interfaces Reviewed By: gbalats Differential Revision: https://reviews.llvm.org/D106895 Added: Modified: clang/docs/DataFlowSanitizerDesign.rst Removed: diff --git a/clang/docs/DataFlowSanitizerDesign.rst b/clang/docs/DataFlowSanitizerDesign.rst index 7615a2acc58b..ea40fe332010 100644 --- a/clang/docs/DataFlowSanitizerDesign.rst +++ b/clang/docs/DataFlowSanitizerDesign.rst @@ -48,12 +48,79 @@ file ``sanitizer/dfsan_interface.h``. /// value. dfsan_label dfsan_get_label(long data); + /// Retrieves the label associated with the data at the given address. + dfsan_label dfsan_read_label(const void *addr, size_t size); + /// Returns whether the given label label contains the label elem. int dfsan_has_label(dfsan_label label, dfsan_label elem); /// Computes the union of \c l1 and \c l2, resulting in a union label. dfsan_label dfsan_union(dfsan_label l1, dfsan_label l2); + /// Flushes the DFSan shadow, i.e. forgets about all labels currently associated + /// with the application memory. Use this call to start over the taint tracking + /// within the same process. + /// + /// Note: If another thread is working with tainted data during the flush, that + /// taint could still be written to shadow after the flush. + void dfsan_flush(void); + +The following functions are provided to check origin tracking status and results. + +.. code-block:: c + + /// Retrieves the immediate origin associated with the given data. The returned + /// origin may point to another origin. + /// + /// The type of 'data' is arbitrary. The function accepts a value of any type, + /// which can be truncated or extended (implicitly or explicitly) as necessary. + /// The truncation/extension operations will preserve the label of the original + /// value. + dfsan_origin dfsan_get_origin(long data); + + /// Retrieves the very first origin associated with the data at the given + /// address. + dfsan_origin dfsan_get_init_origin(const void *addr); + + /// Prints the origin trace of the label at the address `addr` to stderr. It also + /// prints description at the beginning of the trace. If origin tracking is not + /// on, or the address is not labeled, it prints nothing. + void dfsan_print_origin_trace(const void *addr, const char *description); + + /// Prints the origin trace of the label at the address `addr` to a pre-allocated + /// output buffer. If origin tracking is not on, or the address is` + /// not labeled, it prints nothing. + /// + /// `addr` is the tainted memory address whose origin we are printing. + /// `description` is a description printed at the beginning of the trace. + /// `out_buf` is the output buffer to write the results to. `out_buf_size` is + /// the size of `out_buf`. The function returns the number of symbols that + /// should have been written to `out_buf` (not including trailing null byte '\0'). + /// Thus, the string is truncated iff return value is not less than `out_buf_size`. + size_t dfsan_sprint_origin_trace(const void *addr, const char *description, + char *out_buf, size_t out_buf_size); + + /// Returns the value of `-dfsan-track-origins`. + int dfsan_get_track_origins(void); + +The following functions are provided to register hooks called by custom wrappers. + +.. code-block:: c + + /// Sets a callback to be invoked on calls to `write`. The callback is invoked + /// before the write is done. The write is not guaranteed to succeed when the + /// callback executes. Pass in NULL to remove any callback. + typedef void (*dfsan_write_callback_t)(int fd, const void *buf, size_t count); + void dfsan_set_write_callback(dfsan_write_callback_t labeled_write_callback); + + /// Callbacks to be invoked on calls to `memcmp` or `strncmp`. + void dfsan_weak_hook_memcmp(void *caller_pc, const void *s1, const void *s2, + size_t n, dfsan_label s1_label, + dfsan_label s2_label, dfsan_label n_label); + void dfsan_weak_hook_strncmp(void *caller_pc, const char *s1, const char *s2, + size_t n, dfsan_label s1_label, + dfsan_label s2_label, dfsan_label n_label); + Taint label representation -- ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] c49df15 - [dfsan][NFC] Describe how origin trace tracking works
Author: Jianzhou Zhao Date: 2021-07-27T21:10:39Z New Revision: c49df15c278857adecd12db6bb1cdc96885f7079 URL: https://github.com/llvm/llvm-project/commit/c49df15c278857adecd12db6bb1cdc96885f7079 DIFF: https://github.com/llvm/llvm-project/commit/c49df15c278857adecd12db6bb1cdc96885f7079.diff LOG: [dfsan][NFC] Describe how origin trace tracking works Reviewed By: gbalats Differential Revision: https://reviews.llvm.org/D106903 Added: Modified: clang/docs/DataFlowSanitizerDesign.rst Removed: diff --git a/clang/docs/DataFlowSanitizerDesign.rst b/clang/docs/DataFlowSanitizerDesign.rst index ea40fe332010..bed4d2f38cba 100644 --- a/clang/docs/DataFlowSanitizerDesign.rst +++ b/clang/docs/DataFlowSanitizerDesign.rst @@ -135,6 +135,35 @@ Users are responsible for managing the 8 integer labels (i.e., keeping track of what labels they have used so far, picking one that is yet unused, etc). +Origin tracking trace representation + + +An origin tracking trace is a list of chains. Each chain has a stack trace +where the DFSan runtime records a label propapation, and a pointer to its +previous chain. The very first chain does not point to any chain. + +Every four 4-bytes aligned application bytes share a 4-byte origin trace ID. A +4-byte origin trace ID contains a 4-bit depth and a 28-bit hash ID of a chain. + +A chain ID is calculated as a hash from a chain structure. A chain structure +contains a stack ID and the previous chain ID. The chain head has a zero +previous chain ID. A stack ID is a hash from a stack trace. The 4-bit depth +limits the maximal length of a path. The environment variable ``origin_history_size`` +can set the depth limit. Non-positive values mean unlimited. Its default value +is 16. When reaching the limit, origin tracking ignores following propagation +chains. + +The first chain of a trace starts by `dfsan_set_label` with non-zero labels. A +new chain is appended at the end of a trace at stores or memory transfers when +``-dfsan-track-origins`` is 1. Memory transfers include LLVM memory transfer +instructions, glibc memcpy and memmove. When ``-dfsan-track-origins`` is 2, a +new chain is also appended at loads. + +Other instructions do not create new chains, but simply propagate origin trace +IDs. If an instruction has more than one operands with non-zero labels, the origin +treace ID of the last operand with non-zero label is propagated to the result of +the instruction. + Memory layout and label management -- ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] 71dc0f1 - [dfsan][NFC] Add Origin Tracking into doc
Author: Jianzhou Zhao Date: 2021-07-07T18:13:26Z New Revision: 71dc0f1c02cd00a431fc327b0ea86524fad28afe URL: https://github.com/llvm/llvm-project/commit/71dc0f1c02cd00a431fc327b0ea86524fad28afe DIFF: https://github.com/llvm/llvm-project/commit/71dc0f1c02cd00a431fc327b0ea86524fad28afe.diff LOG: [dfsan][NFC] Add Origin Tracking into doc Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D105378 Added: Modified: clang/docs/DataFlowSanitizer.rst Removed: diff --git a/clang/docs/DataFlowSanitizer.rst b/clang/docs/DataFlowSanitizer.rst index 8bbc2534ad4db..143b6e3d3242e 100644 --- a/clang/docs/DataFlowSanitizer.rst +++ b/clang/docs/DataFlowSanitizer.rst @@ -191,6 +191,44 @@ the correct labels are propagated. return 0; } +Origin Tracking +=== + +DataFlowSanitizer can track origins of labeled values. This feature is enabled by +``-mllvm -dfsan-track-origins=1``. For example, + +.. code-block:: console + +% cat test.cc +#include +#include + +int main(int argc, char** argv) { + int i = 0; + dfsan_set_label(i_label, &i, sizeof(i)); + int j = i + 1; + dfsan_print_origin_trace(&j, "A flow from i to j"); + return 0; +} + +% clang++ -fsanitize=dataflow -mllvm -dfsan-track-origins=1 -fno-omit-frame-pointer -g -O2 test.cc +% ./a.out +Taint value 0x1 (at 0x7ffd42bf415c) origin tracking (A flow from i to j) +Origin value: 0x1391, Taint value was stored to memory at + #0 0x55676db85a62 in main test.cc:7:7 + #1 0x7f0083611bbc in __libc_start_main libc-start.c:285 + +Origin value: 0x9e1, Taint value was created at + #0 0x55676db85a08 in main test.cc:6:3 + #1 0x7f0083611bbc in __libc_start_main libc-start.c:285 + +By ``-mllvm -dfsan-track-origins=1`` DataFlowSanitizer collects only +intermediate stores a labeled value went through. Origin tracking slows down +program execution by a factor of 2x on top of the usual DataFlowSanitizer +slowdown and increases memory overhead by 1x. By ``-mllvm -dfsan-track-origins=2`` +DataFlowSanitizer also collects intermediate loads a labeled value went through. +This mode slows down program execution by a factor of 4x. + Current status == ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits