[clang] c7b7638 - [dfsan][NFC] Add compile flags and environment variables to doc

2021-07-26 Thread Jianzhou Zhao via cfe-commits

Author: Jianzhou Zhao
Date: 2021-07-27T00:20:22Z
New Revision: c7b7638dfee54053553d9b22eeb8912ca42a06ec

URL: 
https://github.com/llvm/llvm-project/commit/c7b7638dfee54053553d9b22eeb8912ca42a06ec
DIFF: 
https://github.com/llvm/llvm-project/commit/c7b7638dfee54053553d9b22eeb8912ca42a06ec.diff

LOG: [dfsan][NFC] Add compile flags and environment variables to doc

Reviewed By: gbalats

Differential Revision: https://reviews.llvm.org/D106833

Added: 


Modified: 
clang/docs/DataFlowSanitizer.rst

Removed: 




diff  --git a/clang/docs/DataFlowSanitizer.rst 
b/clang/docs/DataFlowSanitizer.rst
index 143b6e3d3242..dbe62e3b6aa0 100644
--- a/clang/docs/DataFlowSanitizer.rst
+++ b/clang/docs/DataFlowSanitizer.rst
@@ -137,6 +137,88 @@ For example:
   fun:memcpy=uninstrumented
   fun:memcpy=custom
 
+Compilation Flags
+-
+
+* ``-dfsan-abilist`` -- The additional ABI list files that control how shadow
+  parameters are passed. File names are separated by comma.
+* ``-dfsan-combine-pointer-labels-on-load`` -- Controls whether to include or
+  ignore the labels of pointers in load instructions. Its default value is 
true.
+  For example:
+
+.. code-block:: c++
+  v = *p;
+
+If the flag is true, the label of ``v`` is the union of the label of ``p`` and
+the label of ``*p``. If the flag is false, the label of ``v`` is the label of
+just ``*p``.
+* ``-dfsan-combine-pointer-labels-on-store`` -- Controls whether to include or
+  ignore the labels of pointers in store instructions. Its default value is
+  false. For example:
+
+.. code-block:: c++
+  *p = v;
+
+If the flag is true, the label of ``*p`` is the union of the label of ``p`` and
+the label of ``v``. If the flag is false, the label of ``*p`` is the label of
+just ``v``.
+* ``-dfsan-combine-offset-labels-on-gep`` -- Controls whether to propagate
+  labels of offsets in GEP instructions. Its default value is true. For 
example:
+
+.. code-block:: c++
+  p += i;
+
+If the flag is true, the label of ``p`` is the union of the label of ``p`` and
+the label of ``i``. If the flag is false, the label of ``p`` is unchanged.
+* ``-dfsan-track-select-control-flow`` -- Controls whether to track the control
+  flow of select instructions. Its default value is true. For example:
+
+.. code-block:: c++
+  v = b? v1: v2;
+
+If the flag is true, the label of ``v`` is the union of the labels of ``b``,
+``v1`` and ``v2``.  If the flag is false, the label of ``v`` is the union of 
the
+labels of just ``v1`` and ``v2``.
+* ``-dfsan-event-callbacks`` -- An experimental feature that inserts callbacks 
for
+certain data events. Currently callbacks are only inserted for loads, stores,
+memory transfers (i.e. memcpy and memmove), and comparisons. Its default value
+is false. If this flag is set to true, a user must provide definitions for the
+following callback functions:
+
+.. code-block:: c++
+  void __dfsan_load_callback(dfsan_label Label, void* Addr);
+  void __dfsan_store_callback(dfsan_label Label, void* Addr);
+  void __dfsan_mem_transfer_callback(dfsan_label *Start, size_t Len);
+  void __dfsan_cmp_callback(dfsan_label CombinedLabel);
+* ``-dfsan-track-origins`` -- Controls how to track origins. When its value is
+  0, the runtime does not track origins. When its value is 1, the runtime 
tracks
+  origins at memory store operations. When its value is 2, the runtime tracks
+  origins at memory load and store operations. Its default value is 0.
+* ``-dfsan-instrument-with-call-threshold`` -- If a function being instrumented
+   requires more than this number of origin stores, use callbacks instead of
+  inline checks (-1 means never use callbacks). Its default value is 3500.
+
+Environment Variables
+-
+
+* ``warn_unimplemented`` -- Whether to warn on unimplemented functions. Its
+  default value is false.
+* ``strict_data_dependencies`` -- Whether to propagate labels only when there 
is
+  explicit obvious data dependency (e.g., when comparing strings, ignore the 
fact
+  that the output of the comparison might be implicit data-dependent on the
+  content of the strings). This applies only to functions with ``custom`` 
category
+  in ABI list. Its default value is true.
+* ``origin_history_size`` -- The limit of origin chain length. Non-positive 
values
+  mean unlimited. Its default value is 16.
+* ``origin_history_per_stack_limit`` -- The limit of origin node's references 
count.
+  Non-positive values mean unlimited. Its default value is 2.
+* ``store_context_size`` -- The depth limit of origin tracking stack traces. 
Its
+  default value is 20.
+* ``zero_in_malloc`` -- Whether to zero shadow space of new allocated memory. 
Its
+  default value is true.
+* ``zero_in_free`` --- Whether to zero shadow space of deallocated memory. Its
+  default value is true.
+
 Example
 ===
 



___
cfe-commits mai

[clang] e69a8c4 - [dfsan] Fix doc build errors

2021-07-26 Thread Jianzhou Zhao via cfe-commits

Author: Jianzhou Zhao
Date: 2021-07-27T00:29:55Z
New Revision: e69a8c42135606e60446d5e78144357a9e429c77

URL: 
https://github.com/llvm/llvm-project/commit/e69a8c42135606e60446d5e78144357a9e429c77
DIFF: 
https://github.com/llvm/llvm-project/commit/e69a8c42135606e60446d5e78144357a9e429c77.diff

LOG: [dfsan] Fix doc build errors

Added: 


Modified: 
clang/docs/DataFlowSanitizer.rst

Removed: 




diff  --git a/clang/docs/DataFlowSanitizer.rst 
b/clang/docs/DataFlowSanitizer.rst
index dbe62e3b6aa0..c21f9a922603 100644
--- a/clang/docs/DataFlowSanitizer.rst
+++ b/clang/docs/DataFlowSanitizer.rst
@@ -147,6 +147,7 @@ Compilation Flags
   For example:
 
 .. code-block:: c++
+
   v = *p;
 
 If the flag is true, the label of ``v`` is the union of the label of ``p`` and
@@ -157,6 +158,7 @@ just ``*p``.
   false. For example:
 
 .. code-block:: c++
+
   *p = v;
 
 If the flag is true, the label of ``*p`` is the union of the label of ``p`` and
@@ -166,6 +168,7 @@ just ``v``.
   labels of offsets in GEP instructions. Its default value is true. For 
example:
 
 .. code-block:: c++
+
   p += i;
 
 If the flag is true, the label of ``p`` is the union of the label of ``p`` and
@@ -174,6 +177,7 @@ the label of ``i``. If the flag is false, the label of 
``p`` is unchanged.
   flow of select instructions. Its default value is true. For example:
 
 .. code-block:: c++
+
   v = b? v1: v2;
 
 If the flag is true, the label of ``v`` is the union of the labels of ``b``,
@@ -186,6 +190,7 @@ is false. If this flag is set to true, a user must provide 
definitions for the
 following callback functions:
 
 .. code-block:: c++
+
   void __dfsan_load_callback(dfsan_label Label, void* Addr);
   void __dfsan_store_callback(dfsan_label Label, void* Addr);
   void __dfsan_mem_transfer_callback(dfsan_label *Start, size_t Len);



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 494f1e6 - [dfsan][NFC] Fix doc format

2021-07-26 Thread Jianzhou Zhao via cfe-commits

Author: Jianzhou Zhao
Date: 2021-07-27T02:07:53Z
New Revision: 494f1e6706481ec49942c07ebf48697872919612

URL: 
https://github.com/llvm/llvm-project/commit/494f1e6706481ec49942c07ebf48697872919612
DIFF: 
https://github.com/llvm/llvm-project/commit/494f1e6706481ec49942c07ebf48697872919612.diff

LOG: [dfsan][NFC] Fix doc format

Added: 


Modified: 
clang/docs/DataFlowSanitizer.rst

Removed: 




diff  --git a/clang/docs/DataFlowSanitizer.rst 
b/clang/docs/DataFlowSanitizer.rst
index c21f9a922603..cb4837bdc788 100644
--- a/clang/docs/DataFlowSanitizer.rst
+++ b/clang/docs/DataFlowSanitizer.rst
@@ -153,6 +153,7 @@ Compilation Flags
 If the flag is true, the label of ``v`` is the union of the label of ``p`` and
 the label of ``*p``. If the flag is false, the label of ``v`` is the label of
 just ``*p``.
+
 * ``-dfsan-combine-pointer-labels-on-store`` -- Controls whether to include or
   ignore the labels of pointers in store instructions. Its default value is
   false. For example:
@@ -164,6 +165,7 @@ just ``*p``.
 If the flag is true, the label of ``*p`` is the union of the label of ``p`` and
 the label of ``v``. If the flag is false, the label of ``*p`` is the label of
 just ``v``.
+
 * ``-dfsan-combine-offset-labels-on-gep`` -- Controls whether to propagate
   labels of offsets in GEP instructions. Its default value is true. For 
example:
 
@@ -173,6 +175,7 @@ just ``v``.
 
 If the flag is true, the label of ``p`` is the union of the label of ``p`` and
 the label of ``i``. If the flag is false, the label of ``p`` is unchanged.
+
 * ``-dfsan-track-select-control-flow`` -- Controls whether to track the control
   flow of select instructions. Its default value is true. For example:
 
@@ -183,6 +186,7 @@ the label of ``i``. If the flag is false, the label of 
``p`` is unchanged.
 If the flag is true, the label of ``v`` is the union of the labels of ``b``,
 ``v1`` and ``v2``.  If the flag is false, the label of ``v`` is the union of 
the
 labels of just ``v1`` and ``v2``.
+
 * ``-dfsan-event-callbacks`` -- An experimental feature that inserts callbacks 
for
 certain data events. Currently callbacks are only inserted for loads, stores,
 memory transfers (i.e. memcpy and memmove), and comparisons. Its default value
@@ -195,10 +199,12 @@ following callback functions:
   void __dfsan_store_callback(dfsan_label Label, void* Addr);
   void __dfsan_mem_transfer_callback(dfsan_label *Start, size_t Len);
   void __dfsan_cmp_callback(dfsan_label CombinedLabel);
+
 * ``-dfsan-track-origins`` -- Controls how to track origins. When its value is
   0, the runtime does not track origins. When its value is 1, the runtime 
tracks
   origins at memory store operations. When its value is 2, the runtime tracks
   origins at memory load and store operations. Its default value is 0.
+
 * ``-dfsan-instrument-with-call-threshold`` -- If a function being instrumented
requires more than this number of origin stores, use callbacks instead of
   inline checks (-1 means never use callbacks). Its default value is 3500.



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 531b19a - [dfsan][NFC] Fix doc format

2021-07-26 Thread Jianzhou Zhao via cfe-commits

Author: Jianzhou Zhao
Date: 2021-07-27T04:22:20Z
New Revision: 531b19a49e66de5c4b35fc89eebc078c13eb9a85

URL: 
https://github.com/llvm/llvm-project/commit/531b19a49e66de5c4b35fc89eebc078c13eb9a85
DIFF: 
https://github.com/llvm/llvm-project/commit/531b19a49e66de5c4b35fc89eebc078c13eb9a85.diff

LOG: [dfsan][NFC] Fix doc format

Added: 


Modified: 
clang/docs/DataFlowSanitizer.rst

Removed: 




diff  --git a/clang/docs/DataFlowSanitizer.rst 
b/clang/docs/DataFlowSanitizer.rst
index cb4837bdc788..1253cb98e634 100644
--- a/clang/docs/DataFlowSanitizer.rst
+++ b/clang/docs/DataFlowSanitizer.rst
@@ -188,10 +188,10 @@ If the flag is true, the label of ``v`` is the union of 
the labels of ``b``,
 labels of just ``v1`` and ``v2``.
 
 * ``-dfsan-event-callbacks`` -- An experimental feature that inserts callbacks 
for
-certain data events. Currently callbacks are only inserted for loads, stores,
-memory transfers (i.e. memcpy and memmove), and comparisons. Its default value
-is false. If this flag is set to true, a user must provide definitions for the
-following callback functions:
+  certain data events. Currently callbacks are only inserted for loads, stores,
+  memory transfers (i.e. memcpy and memmove), and comparisons. Its default 
value
+  is false. If this flag is set to true, a user must provide definitions for 
the
+  following callback functions:
 
 .. code-block:: c++
 
@@ -206,7 +206,7 @@ following callback functions:
   origins at memory load and store operations. Its default value is 0.
 
 * ``-dfsan-instrument-with-call-threshold`` -- If a function being instrumented
-   requires more than this number of origin stores, use callbacks instead of
+  requires more than this number of origin stores, use callbacks instead of
   inline checks (-1 means never use callbacks). Its default value is 3500.
 
 Environment Variables



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 00411eb - [dfsan][NFC] Update API interfaces

2021-07-27 Thread Jianzhou Zhao via cfe-commits

Author: Jianzhou Zhao
Date: 2021-07-27T18:53:36Z
New Revision: 00411ebeeb718da63d1ec0e0ffc8e5012e474fe9

URL: 
https://github.com/llvm/llvm-project/commit/00411ebeeb718da63d1ec0e0ffc8e5012e474fe9
DIFF: 
https://github.com/llvm/llvm-project/commit/00411ebeeb718da63d1ec0e0ffc8e5012e474fe9.diff

LOG: [dfsan][NFC] Update API interfaces

Reviewed By: gbalats

Differential Revision: https://reviews.llvm.org/D106895

Added: 


Modified: 
clang/docs/DataFlowSanitizerDesign.rst

Removed: 




diff  --git a/clang/docs/DataFlowSanitizerDesign.rst 
b/clang/docs/DataFlowSanitizerDesign.rst
index 7615a2acc58b..ea40fe332010 100644
--- a/clang/docs/DataFlowSanitizerDesign.rst
+++ b/clang/docs/DataFlowSanitizerDesign.rst
@@ -48,12 +48,79 @@ file ``sanitizer/dfsan_interface.h``.
   /// value.
   dfsan_label dfsan_get_label(long data);
 
+  /// Retrieves the label associated with the data at the given address.
+  dfsan_label dfsan_read_label(const void *addr, size_t size);
+
   /// Returns whether the given label label contains the label elem.
   int dfsan_has_label(dfsan_label label, dfsan_label elem);
 
   /// Computes the union of \c l1 and \c l2, resulting in a union label.
   dfsan_label dfsan_union(dfsan_label l1, dfsan_label l2);
 
+  /// Flushes the DFSan shadow, i.e. forgets about all labels currently 
associated
+  /// with the application memory.  Use this call to start over the taint 
tracking
+  /// within the same process.
+  ///
+  /// Note: If another thread is working with tainted data during the flush, 
that
+  /// taint could still be written to shadow after the flush.
+  void dfsan_flush(void);
+
+The following functions are provided to check origin tracking status and 
results.
+
+.. code-block:: c
+
+  /// Retrieves the immediate origin associated with the given data. The 
returned
+  /// origin may point to another origin.
+  ///
+  /// The type of 'data' is arbitrary. The function accepts a value of any 
type,
+  /// which can be truncated or extended (implicitly or explicitly) as 
necessary.
+  /// The truncation/extension operations will preserve the label of the 
original
+  /// value.
+  dfsan_origin dfsan_get_origin(long data);
+
+  /// Retrieves the very first origin associated with the data at the given
+  /// address.
+  dfsan_origin dfsan_get_init_origin(const void *addr);
+
+  /// Prints the origin trace of the label at the address `addr` to stderr. It 
also
+  /// prints description at the beginning of the trace. If origin tracking is 
not
+  /// on, or the address is not labeled, it prints nothing.
+  void dfsan_print_origin_trace(const void *addr, const char *description);
+
+  /// Prints the origin trace of the label at the address `addr` to a 
pre-allocated
+  /// output buffer. If origin tracking is not on, or the address is`
+  /// not labeled, it prints nothing.
+  ///
+  /// `addr` is the tainted memory address whose origin we are printing.
+  /// `description` is a description printed at the beginning of the trace.
+  /// `out_buf` is the output buffer to write the results to. `out_buf_size` is
+  /// the size of `out_buf`. The function returns the number of symbols that
+  /// should have been written to `out_buf` (not including trailing null byte 
'\0').
+  /// Thus, the string is truncated iff return value is not less than 
`out_buf_size`.
+  size_t dfsan_sprint_origin_trace(const void *addr, const char *description,
+   char *out_buf, size_t out_buf_size);
+
+  /// Returns the value of `-dfsan-track-origins`.
+  int dfsan_get_track_origins(void);
+
+The following functions are provided to register hooks called by custom 
wrappers.
+
+.. code-block:: c
+
+  /// Sets a callback to be invoked on calls to `write`.  The callback is 
invoked
+  /// before the write is done. The write is not guaranteed to succeed when the
+  /// callback executes. Pass in NULL to remove any callback.
+  typedef void (*dfsan_write_callback_t)(int fd, const void *buf, size_t 
count);
+  void dfsan_set_write_callback(dfsan_write_callback_t labeled_write_callback);
+
+  /// Callbacks to be invoked on calls to `memcmp` or `strncmp`.
+  void dfsan_weak_hook_memcmp(void *caller_pc, const void *s1, const void *s2,
+  size_t n, dfsan_label s1_label,
+  dfsan_label s2_label, dfsan_label n_label);
+  void dfsan_weak_hook_strncmp(void *caller_pc, const char *s1, const char *s2,
+  size_t n, dfsan_label s1_label,
+  dfsan_label s2_label, dfsan_label n_label);
+
 Taint label representation
 --
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] c49df15 - [dfsan][NFC] Describe how origin trace tracking works

2021-07-27 Thread Jianzhou Zhao via cfe-commits

Author: Jianzhou Zhao
Date: 2021-07-27T21:10:39Z
New Revision: c49df15c278857adecd12db6bb1cdc96885f7079

URL: 
https://github.com/llvm/llvm-project/commit/c49df15c278857adecd12db6bb1cdc96885f7079
DIFF: 
https://github.com/llvm/llvm-project/commit/c49df15c278857adecd12db6bb1cdc96885f7079.diff

LOG: [dfsan][NFC] Describe how origin trace tracking works

Reviewed By: gbalats

Differential Revision: https://reviews.llvm.org/D106903

Added: 


Modified: 
clang/docs/DataFlowSanitizerDesign.rst

Removed: 




diff  --git a/clang/docs/DataFlowSanitizerDesign.rst 
b/clang/docs/DataFlowSanitizerDesign.rst
index ea40fe332010..bed4d2f38cba 100644
--- a/clang/docs/DataFlowSanitizerDesign.rst
+++ b/clang/docs/DataFlowSanitizerDesign.rst
@@ -135,6 +135,35 @@ Users are responsible for managing the 8 integer labels 
(i.e., keeping
 track of what labels they have used so far, picking one that is yet
 unused, etc).
 
+Origin tracking trace representation
+
+
+An origin tracking trace is a list of chains. Each chain has a stack trace
+where the DFSan runtime records a label propapation, and a pointer to its
+previous chain. The very first chain does not point to any chain.
+
+Every four 4-bytes aligned application bytes share a 4-byte origin trace ID. A
+4-byte origin trace ID contains a 4-bit depth and a 28-bit hash ID of a chain.
+
+A chain ID is calculated as a hash from a chain structure. A chain structure
+contains a stack ID and the previous chain ID. The chain head has a zero
+previous chain ID. A stack ID is a hash from a stack trace. The 4-bit depth
+limits the maximal length of a path. The environment variable 
``origin_history_size``
+can set the depth limit. Non-positive values mean unlimited. Its default value
+is 16. When reaching the limit, origin tracking ignores following propagation
+chains.
+
+The first chain of a trace starts by `dfsan_set_label` with non-zero labels. A
+new chain is appended at the end of a trace at stores or memory transfers when
+``-dfsan-track-origins`` is 1. Memory transfers include LLVM memory transfer
+instructions, glibc memcpy and memmove. When ``-dfsan-track-origins`` is 2, a
+new chain is also appended at loads.
+
+Other instructions do not create new chains, but simply propagate origin trace
+IDs. If an instruction has more than one operands with non-zero labels, the 
origin
+treace ID of the last operand with non-zero label is propagated to the result 
of
+the instruction.
+
 Memory layout and label management
 --
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 71dc0f1 - [dfsan][NFC] Add Origin Tracking into doc

2021-07-07 Thread Jianzhou Zhao via cfe-commits

Author: Jianzhou Zhao
Date: 2021-07-07T18:13:26Z
New Revision: 71dc0f1c02cd00a431fc327b0ea86524fad28afe

URL: 
https://github.com/llvm/llvm-project/commit/71dc0f1c02cd00a431fc327b0ea86524fad28afe
DIFF: 
https://github.com/llvm/llvm-project/commit/71dc0f1c02cd00a431fc327b0ea86524fad28afe.diff

LOG: [dfsan][NFC] Add Origin Tracking into doc

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D105378

Added: 


Modified: 
clang/docs/DataFlowSanitizer.rst

Removed: 




diff  --git a/clang/docs/DataFlowSanitizer.rst 
b/clang/docs/DataFlowSanitizer.rst
index 8bbc2534ad4db..143b6e3d3242e 100644
--- a/clang/docs/DataFlowSanitizer.rst
+++ b/clang/docs/DataFlowSanitizer.rst
@@ -191,6 +191,44 @@ the correct labels are propagated.
 return 0;
   }
 
+Origin Tracking
+===
+
+DataFlowSanitizer can track origins of labeled values. This feature is enabled 
by
+``-mllvm -dfsan-track-origins=1``. For example,
+
+.. code-block:: console
+
+% cat test.cc
+#include 
+#include 
+
+int main(int argc, char** argv) {
+  int i = 0;
+  dfsan_set_label(i_label, &i, sizeof(i));
+  int j = i + 1;
+  dfsan_print_origin_trace(&j, "A flow from i to j");
+  return 0;
+}
+
+% clang++ -fsanitize=dataflow -mllvm -dfsan-track-origins=1 
-fno-omit-frame-pointer -g -O2 test.cc
+% ./a.out
+Taint value 0x1 (at 0x7ffd42bf415c) origin tracking (A flow from i to j)
+Origin value: 0x1391, Taint value was stored to memory at
+  #0 0x55676db85a62 in main test.cc:7:7
+  #1 0x7f0083611bbc in __libc_start_main libc-start.c:285
+
+Origin value: 0x9e1, Taint value was created at
+  #0 0x55676db85a08 in main test.cc:6:3
+  #1 0x7f0083611bbc in __libc_start_main libc-start.c:285
+
+By ``-mllvm -dfsan-track-origins=1`` DataFlowSanitizer collects only
+intermediate stores a labeled value went through. Origin tracking slows down
+program execution by a factor of 2x on top of the usual DataFlowSanitizer
+slowdown and increases memory overhead by 1x. By ``-mllvm 
-dfsan-track-origins=2``
+DataFlowSanitizer also collects intermediate loads a labeled value went 
through.
+This mode slows down program execution by a factor of 4x.
+
 Current status
 ==
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits