[PATCH] D106833: [dfsan][NFC] Add compile flags and environment variables to doc

2021-07-26 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao created this revision.
stephan.yichao.zhao added a reviewer: gbalats.
stephan.yichao.zhao requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D106833

Files:
  clang/docs/DataFlowSanitizer.rst


Index: clang/docs/DataFlowSanitizer.rst
===
--- clang/docs/DataFlowSanitizer.rst
+++ clang/docs/DataFlowSanitizer.rst
@@ -137,6 +137,88 @@
   fun:memcpy=uninstrumented
   fun:memcpy=custom
 
+Compilation Flags
+-
+
+* ``-dfsan-abilist`` -- The additional ABI list files that control how shadow
+  parameters are passed. File names are separated by comma.
+* ``-dfsan-combine-pointer-labels-on-load`` -- Controls whether to include or
+  ignore the labels of pointers in load instructions. Its default value is 
true.
+  For example:
+
+.. code-block:: c++
+  v = *p;
+
+If the flag is true, the label of ``v`` is the union of the label of ``p`` and
+the label of ``*p``. If the flag is false, the label of ``v`` is the label of
+``*p``.
+* ``-dfsan-combine-pointer-labels-on-store`` -- Controls whether to include or
+  ignore the labels of pointers in store instructions. Its default value is
+  false. For example:
+
+.. code-block:: c++
+  *p = v;
+
+If the flag is true, the label of ``*p`` is the union of the label of ``p`` and
+the label of ``v``. If the flag is false, the label of ``*p`` is the label of
+``v``.
+* ``-dfsan-combine-offset-labels-on-gep`` -- Controls whether to propagate
+  labels of offsets in GEP instructions. Its default value is true. For 
example:
+
+.. code-block:: c++
+  p += i;
+
+If the flag is true, the label of ``p`` is the union of the label of ``p`` and
+the label of ``i``. If the flag is false, the label of ``p`` is unchanged.
+* ``-dfsan-track-select-control-flow`` -- Controls whether to track the control
+  flow of select instructions. Its default value is true. For example:
+
+.. code-block:: c++
+  v = b? v1: v2;
+
+If the flag is true, the label of ``v`` is the union of the labels of ``b``,
+``v1`` and ``v2``.  If the flag is false, the label of ``v`` is the union of 
the
+labels of ``v1`` and ``v2``.
+* ``-dfsan-event-callbacks`` -- An experimental feature that inserts callbacks 
for
+certain data events. Currently callbacks are only inserted for loads, stores,
+memory transfers (i.e. memcpy and memmove), and comparisons. Its default value
+is false. If this flag is set to true, a user must provide definitions for the
+following callback functions:
+
+.. code-block:: c++
+  void __dfsan_load_callback(dfsan_label Label, void* Addr);
+  void __dfsan_store_callback(dfsan_label Label, void* Addr);
+  void __dfsan_mem_transfer_callback(dfsan_label *Start, size_t Len);
+  void __dfsan_cmp_callback(dfsan_label CombinedLabel);
+* ``-dfsan-track-origins`` -- Controls how to track origins. When its value is
+  0, the runtime does not track origins. When its value is 1, the runtime 
tracks
+  origins at memory store operations. When its value is 2, the runtime tracks
+  origins at memory load and store operations. Its default value is 0.
+* ``-dfsan-instrument-with-call-threshold`` -- If a function being instrumented
+   requires more than this number of origin stores, use callbacks instead of
+  inline checks (-1 means never use callbacks). Its default value is 3500.
+
+Environment Variables
+-
+
+* ``warn_unimplemented`` -- Whether to warn on unimplemented functions. Its
+  default value is false.
+* ``strict_data_dependencies`` -- Whether to propagate labels only when there 
is
+  explicit obvious data dependency (e.g., when comparing strings, ignore the 
fact
+  that the output of the comparison might be implicit data-dependent on the
+  content of the strings). This applies only to functions with ``custom`` 
category
+  in ABI list. Its default value is true.
+* ``origin_history_size`` -- The limit of origin chain length. Non-positive 
values
+  mean unlimited. Its default value is 16.
+* ``origin_history_per_stack_limit`` -- The limit of origin node's references 
count.
+  Non-positive values mean unlimited. Its default value is 2.
+* ``store_context_size`` -- The depth limit of origin tracking stack traces. 
Its
+  default value is 20.
+* ``zero_in_malloc`` -- Whether to zero shadow space of new allocated memory. 
Its
+  default value is true.
+* ``zero_in_free`` --- Whether to zero shadow space of deallocated memory. Its
+  default value is true.
+
 Example
 ===
 


Index: clang/docs/DataFlowSanitizer.rst
===
--- clang/docs/DataFlowSanitizer.rst
+++ clang/docs/DataFlowSanitizer.rst
@@ -137,6 +137,88 @@
   fun:memcpy=uninstrumented
   fun:memcpy=custom
 
+Compilation Flags
+-
+
+* ``-dfsan-abilist`` -- The additional ABI list files that control how shadow
+  parameters ar

[PATCH] D106833: [dfsan][NFC] Add compile flags and environment variables to doc

2021-07-26 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao updated this revision to Diff 361859.
stephan.yichao.zhao marked 2 inline comments as done.
stephan.yichao.zhao retitled this revision from " [dfsan][NFC] Add compile 
flags and environment variables to doc" to "[dfsan][NFC] Add compile flags and 
environment variables to doc".
stephan.yichao.zhao added a comment.

applied comments


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106833/new/

https://reviews.llvm.org/D106833

Files:
  clang/docs/DataFlowSanitizer.rst


Index: clang/docs/DataFlowSanitizer.rst
===
--- clang/docs/DataFlowSanitizer.rst
+++ clang/docs/DataFlowSanitizer.rst
@@ -137,6 +137,88 @@
   fun:memcpy=uninstrumented
   fun:memcpy=custom
 
+Compilation Flags
+-
+
+* ``-dfsan-abilist`` -- The additional ABI list files that control how shadow
+  parameters are passed. File names are separated by comma.
+* ``-dfsan-combine-pointer-labels-on-load`` -- Controls whether to include or
+  ignore the labels of pointers in load instructions. Its default value is 
true.
+  For example:
+
+.. code-block:: c++
+  v = *p;
+
+If the flag is true, the label of ``v`` is the union of the label of ``p`` and
+the label of ``*p``. If the flag is false, the label of ``v`` is the label of
+just ``*p``.
+* ``-dfsan-combine-pointer-labels-on-store`` -- Controls whether to include or
+  ignore the labels of pointers in store instructions. Its default value is
+  false. For example:
+
+.. code-block:: c++
+  *p = v;
+
+If the flag is true, the label of ``*p`` is the union of the label of ``p`` and
+the label of ``v``. If the flag is false, the label of ``*p`` is the label of
+just ``v``.
+* ``-dfsan-combine-offset-labels-on-gep`` -- Controls whether to propagate
+  labels of offsets in GEP instructions. Its default value is true. For 
example:
+
+.. code-block:: c++
+  p += i;
+
+If the flag is true, the label of ``p`` is the union of the label of ``p`` and
+the label of ``i``. If the flag is false, the label of ``p`` is unchanged.
+* ``-dfsan-track-select-control-flow`` -- Controls whether to track the control
+  flow of select instructions. Its default value is true. For example:
+
+.. code-block:: c++
+  v = b? v1: v2;
+
+If the flag is true, the label of ``v`` is the union of the labels of ``b``,
+``v1`` and ``v2``.  If the flag is false, the label of ``v`` is the union of 
the
+labels of just ``v1`` and ``v2``.
+* ``-dfsan-event-callbacks`` -- An experimental feature that inserts callbacks 
for
+certain data events. Currently callbacks are only inserted for loads, stores,
+memory transfers (i.e. memcpy and memmove), and comparisons. Its default value
+is false. If this flag is set to true, a user must provide definitions for the
+following callback functions:
+
+.. code-block:: c++
+  void __dfsan_load_callback(dfsan_label Label, void* Addr);
+  void __dfsan_store_callback(dfsan_label Label, void* Addr);
+  void __dfsan_mem_transfer_callback(dfsan_label *Start, size_t Len);
+  void __dfsan_cmp_callback(dfsan_label CombinedLabel);
+* ``-dfsan-track-origins`` -- Controls how to track origins. When its value is
+  0, the runtime does not track origins. When its value is 1, the runtime 
tracks
+  origins at memory store operations. When its value is 2, the runtime tracks
+  origins at memory load and store operations. Its default value is 0.
+* ``-dfsan-instrument-with-call-threshold`` -- If a function being instrumented
+   requires more than this number of origin stores, use callbacks instead of
+  inline checks (-1 means never use callbacks). Its default value is 3500.
+
+Environment Variables
+-
+
+* ``warn_unimplemented`` -- Whether to warn on unimplemented functions. Its
+  default value is false.
+* ``strict_data_dependencies`` -- Whether to propagate labels only when there 
is
+  explicit obvious data dependency (e.g., when comparing strings, ignore the 
fact
+  that the output of the comparison might be implicit data-dependent on the
+  content of the strings). This applies only to functions with ``custom`` 
category
+  in ABI list. Its default value is true.
+* ``origin_history_size`` -- The limit of origin chain length. Non-positive 
values
+  mean unlimited. Its default value is 16.
+* ``origin_history_per_stack_limit`` -- The limit of origin node's references 
count.
+  Non-positive values mean unlimited. Its default value is 2.
+* ``store_context_size`` -- The depth limit of origin tracking stack traces. 
Its
+  default value is 20.
+* ``zero_in_malloc`` -- Whether to zero shadow space of new allocated memory. 
Its
+  default value is true.
+* ``zero_in_free`` --- Whether to zero shadow space of deallocated memory. Its
+  default value is true.
+
 Example
 ===
 


Index: clang/docs/DataFlowSanitizer.rst
===
--- clang/docs/DataFlowSanitizer.rst
+++ clang/docs/DataFlowS

[PATCH] D106895: [dfsan][NFC] Update API interfaces

2021-07-27 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao created this revision.
stephan.yichao.zhao added a reviewer: gbalats.
stephan.yichao.zhao requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D106895

Files:
  clang/docs/DataFlowSanitizerDesign.rst


Index: clang/docs/DataFlowSanitizerDesign.rst
===
--- clang/docs/DataFlowSanitizerDesign.rst
+++ clang/docs/DataFlowSanitizerDesign.rst
@@ -48,12 +48,79 @@
   /// value.
   dfsan_label dfsan_get_label(long data);
 
+  /// Retrieves the label associated with the data at the given address.
+  dfsan_label dfsan_read_label(const void *addr, size_t size);
+
   /// Returns whether the given label label contains the label elem.
   int dfsan_has_label(dfsan_label label, dfsan_label elem);
 
   /// Computes the union of \c l1 and \c l2, resulting in a union label.
   dfsan_label dfsan_union(dfsan_label l1, dfsan_label l2);
 
+  /// Flushes the DFSan shadow, i.e. forgets about all labels currently 
associated
+  /// with the application memory.  Use this call to start over the taint 
tracking
+  /// within the same process.
+  ///
+  /// Note: If another thread is working with tainted data during the flush, 
that
+  /// taint could still be written to shadow after the flush.
+  void dfsan_flush(void);
+
+The following functions are provided to check origin tracking status and 
results.
+
+.. code-block:: c
+
+  /// Retrieves the immediate origin associated with the given data. The 
returned
+  /// origin may point to another origin.
+  ///
+  /// The type of 'data' is arbitrary. The function accepts a value of any 
type,
+  /// which can be truncated or extended (implicitly or explicitly) as 
necessary.
+  /// The truncation/extension operations will preserve the label of the 
original
+  /// value.
+  dfsan_origin dfsan_get_origin(long data);
+
+  /// Retrieves the very first origin associated with the data at the given
+  /// address.
+  dfsan_origin dfsan_get_init_origin(const void *addr);
+
+  /// Prints the origin trace of the label at the address `addr` to stderr. It 
also
+  /// prints description at the beginning of the trace. If origin tracking is 
not
+  /// on, or the address is not labeled, it prints nothing.
+  void dfsan_print_origin_trace(const void *addr, const char *description);
+
+  /// Prints the origin trace of the label at the address `addr` to a 
pre-allocated
+  /// output buffer. If origin tracking is not on, or the address is`
+  /// not labeled, it prints nothing.
+  ///
+  /// `addr` is the tainted memory address whose origin we are printing.
+  /// `description` is a description printed at the beginning of the trace.
+  /// `out_buf` is the output buffer to write the results to. `out_buf_size` is
+  /// the size of `out_buf`. The function returns the number of symbols that
+  /// should have been written to `out_buf` (not including trailing null byte 
'\0').
+  /// Thus, the string is truncated iff return value is not less than 
`out_buf_size`.
+  size_t dfsan_sprint_origin_trace(const void *addr, const char *description,
+   char *out_buf, size_t out_buf_size);
+
+  /// Returns the value of `-dfsan-track-origins`.
+  int dfsan_get_track_origins(void);
+
+The following functions are provided to register hooks called by custom 
wrappers.
+
+.. code-block:: c
+
+  /// Sets a callback to be invoked on calls to `write`.  The callback is 
invoked
+  /// before the write is done. The write is not guaranteed to succeed when the
+  /// callback executes. Pass in NULL to remove any callback.
+  typedef void (*dfsan_write_callback_t)(int fd, const void *buf, size_t 
count);
+  void dfsan_set_write_callback(dfsan_write_callback_t labeled_write_callback);
+
+  /// Callbacks to be invoked on calls to `memcmp` or `strncmp`.
+  void dfsan_weak_hook_memcmp(void *caller_pc, const void *s1, const void *s2,
+  size_t n, dfsan_label s1_label,
+  dfsan_label s2_label, dfsan_label n_label);
+  void dfsan_weak_hook_strncmp(void *caller_pc, const char *s1, const char *s2,
+  size_t n, dfsan_label s1_label,
+  dfsan_label s2_label, dfsan_label n_label);
+
 Taint label representation
 --
 


Index: clang/docs/DataFlowSanitizerDesign.rst
===
--- clang/docs/DataFlowSanitizerDesign.rst
+++ clang/docs/DataFlowSanitizerDesign.rst
@@ -48,12 +48,79 @@
   /// value.
   dfsan_label dfsan_get_label(long data);
 
+  /// Retrieves the label associated with the data at the given address.
+  dfsan_label dfsan_read_label(const void *addr, size_t size);
+
   /// Returns whether the given label label contains the label elem.
   int dfsan_has_label(dfsan_label label, dfsan_label elem);
 
   /// Computes the

[PATCH] D106903: [dfsan][NFC] Describe how origin trace tracking works

2021-07-27 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao created this revision.
stephan.yichao.zhao added a reviewer: gbalats.
stephan.yichao.zhao requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D106903

Files:
  clang/docs/DataFlowSanitizerDesign.rst


Index: clang/docs/DataFlowSanitizerDesign.rst
===
--- clang/docs/DataFlowSanitizerDesign.rst
+++ clang/docs/DataFlowSanitizerDesign.rst
@@ -135,6 +135,30 @@
 track of what labels they have used so far, picking one that is yet
 unused, etc).
 
+Origin tracking trace representation
+
+
+Every four 4-bytes aligned application bytes share a 4-byte origin value. A
+4-byte origin contains a 4-bit depth and a 28-bit hash ID of a chain.
+
+A chain ID is calculated as a hash from a chain structure. A chain structure
+contains a stack ID and the previous chain ID. The chain head has a zero
+previous chain ID. A stack ID is a hash from a stack trace. The 4-bit depth
+limits the maximal length of a path. The environment variable 
``origin_history_size``
+can set the depth limit. Non-positive values mean unlimited. Its default value
+is 16. When reaching the limit, origin tracking ignores following propagation
+chains.
+
+A chain starts by `dfsan_set_label` with non-zero labels. A new chain is added
+at stores or memory-transfer when ``-dfsan-track-origins`` is 1. Memory 
transfers
+include LLVM memory transfer instructions and wrapped glibc memcpy and memmove.
+When ``-dfsan-track-origins`` is 2, a new chain is also added at loads.
+
+Other instructions do not create new chains, but simply propagate origin 
values.
+If an instruction has more than one operands with non-zero labels, the origin
+value of the last operand with non-zero label is propagated to the result of
+this instruction.
+
 Memory layout and label management
 --
 


Index: clang/docs/DataFlowSanitizerDesign.rst
===
--- clang/docs/DataFlowSanitizerDesign.rst
+++ clang/docs/DataFlowSanitizerDesign.rst
@@ -135,6 +135,30 @@
 track of what labels they have used so far, picking one that is yet
 unused, etc).
 
+Origin tracking trace representation
+
+
+Every four 4-bytes aligned application bytes share a 4-byte origin value. A
+4-byte origin contains a 4-bit depth and a 28-bit hash ID of a chain.
+
+A chain ID is calculated as a hash from a chain structure. A chain structure
+contains a stack ID and the previous chain ID. The chain head has a zero
+previous chain ID. A stack ID is a hash from a stack trace. The 4-bit depth
+limits the maximal length of a path. The environment variable ``origin_history_size``
+can set the depth limit. Non-positive values mean unlimited. Its default value
+is 16. When reaching the limit, origin tracking ignores following propagation
+chains.
+
+A chain starts by `dfsan_set_label` with non-zero labels. A new chain is added
+at stores or memory-transfer when ``-dfsan-track-origins`` is 1. Memory transfers
+include LLVM memory transfer instructions and wrapped glibc memcpy and memmove.
+When ``-dfsan-track-origins`` is 2, a new chain is also added at loads.
+
+Other instructions do not create new chains, but simply propagate origin values.
+If an instruction has more than one operands with non-zero labels, the origin
+value of the last operand with non-zero label is propagated to the result of
+this instruction.
+
 Memory layout and label management
 --
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106903: [dfsan][NFC] Describe how origin trace tracking works

2021-07-27 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao updated this revision to Diff 362158.
stephan.yichao.zhao added a comment.

explained what a trace and a chain are.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106903/new/

https://reviews.llvm.org/D106903

Files:
  clang/docs/DataFlowSanitizerDesign.rst


Index: clang/docs/DataFlowSanitizerDesign.rst
===
--- clang/docs/DataFlowSanitizerDesign.rst
+++ clang/docs/DataFlowSanitizerDesign.rst
@@ -135,6 +135,35 @@
 track of what labels they have used so far, picking one that is yet
 unused, etc).
 
+Origin tracking trace representation
+
+
+An origin tracking trace is a list of chains. Each chain has a stack trace
+where the DFSan runtime records a label propapation, and a pointer to its
+previous chain. The very first chain does not point to any chain.
+
+Every four 4-bytes aligned application bytes share a 4-byte origin trace ID. A
+4-byte origin trace ID contains a 4-bit depth and a 28-bit hash ID of a chain.
+
+A chain ID is calculated as a hash from a chain structure. A chain structure
+contains a stack ID and the previous chain ID. The chain head has a zero
+previous chain ID. A stack ID is a hash from a stack trace. The 4-bit depth
+limits the maximal length of a path. The environment variable 
``origin_history_size``
+can set the depth limit. Non-positive values mean unlimited. Its default value
+is 16. When reaching the limit, origin tracking ignores following propagation
+chains.
+
+The first chain of a trace starts by `dfsan_set_label` with non-zero labels. A
+new chain is appended at the end of a trace at stores or memory-transfer when
+``-dfsan-track-origins`` is 1. Memory transfers include LLVM memory transfer
+instructions and wrapped glibc memcpy and memmove. When 
``-dfsan-track-origins``
+is 2, a new chain is also appended at loads.
+
+Other instructions do not create new chains, but simply propagate origin trace
+IDs. If an instruction has more than one operands with non-zero labels, the 
origin
+treace ID of the last operand with non-zero label is propagated to the result 
of
+this instruction.
+
 Memory layout and label management
 --
 


Index: clang/docs/DataFlowSanitizerDesign.rst
===
--- clang/docs/DataFlowSanitizerDesign.rst
+++ clang/docs/DataFlowSanitizerDesign.rst
@@ -135,6 +135,35 @@
 track of what labels they have used so far, picking one that is yet
 unused, etc).
 
+Origin tracking trace representation
+
+
+An origin tracking trace is a list of chains. Each chain has a stack trace
+where the DFSan runtime records a label propapation, and a pointer to its
+previous chain. The very first chain does not point to any chain.
+
+Every four 4-bytes aligned application bytes share a 4-byte origin trace ID. A
+4-byte origin trace ID contains a 4-bit depth and a 28-bit hash ID of a chain.
+
+A chain ID is calculated as a hash from a chain structure. A chain structure
+contains a stack ID and the previous chain ID. The chain head has a zero
+previous chain ID. A stack ID is a hash from a stack trace. The 4-bit depth
+limits the maximal length of a path. The environment variable ``origin_history_size``
+can set the depth limit. Non-positive values mean unlimited. Its default value
+is 16. When reaching the limit, origin tracking ignores following propagation
+chains.
+
+The first chain of a trace starts by `dfsan_set_label` with non-zero labels. A
+new chain is appended at the end of a trace at stores or memory-transfer when
+``-dfsan-track-origins`` is 1. Memory transfers include LLVM memory transfer
+instructions and wrapped glibc memcpy and memmove. When ``-dfsan-track-origins``
+is 2, a new chain is also appended at loads.
+
+Other instructions do not create new chains, but simply propagate origin trace
+IDs. If an instruction has more than one operands with non-zero labels, the origin
+treace ID of the last operand with non-zero label is propagated to the result of
+this instruction.
+
 Memory layout and label management
 --
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106903: [dfsan][NFC] Describe how origin trace tracking works

2021-07-27 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao updated this revision to Diff 362159.
stephan.yichao.zhao added a comment.

tweak


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106903/new/

https://reviews.llvm.org/D106903

Files:
  clang/docs/DataFlowSanitizerDesign.rst


Index: clang/docs/DataFlowSanitizerDesign.rst
===
--- clang/docs/DataFlowSanitizerDesign.rst
+++ clang/docs/DataFlowSanitizerDesign.rst
@@ -135,6 +135,35 @@
 track of what labels they have used so far, picking one that is yet
 unused, etc).
 
+Origin tracking trace representation
+
+
+An origin tracking trace is a list of chains. Each chain has a stack trace
+where the DFSan runtime records a label propapation, and a pointer to its
+previous chain. The very first chain does not point to any chain.
+
+Every four 4-bytes aligned application bytes share a 4-byte origin trace ID. A
+4-byte origin trace ID contains a 4-bit depth and a 28-bit hash ID of a chain.
+
+A chain ID is calculated as a hash from a chain structure. A chain structure
+contains a stack ID and the previous chain ID. The chain head has a zero
+previous chain ID. A stack ID is a hash from a stack trace. The 4-bit depth
+limits the maximal length of a path. The environment variable 
``origin_history_size``
+can set the depth limit. Non-positive values mean unlimited. Its default value
+is 16. When reaching the limit, origin tracking ignores following propagation
+chains.
+
+The first chain of a trace starts by `dfsan_set_label` with non-zero labels. A
+new chain is appended at the end of a trace at stores or memory transfers when
+``-dfsan-track-origins`` is 1. Memory transfers include LLVM memory transfer
+instructions and wrapped glibc memcpy and memmove. When 
``-dfsan-track-origins``
+is 2, a new chain is also appended at loads.
+
+Other instructions do not create new chains, but simply propagate origin trace
+IDs. If an instruction has more than one operands with non-zero labels, the 
origin
+treace ID of the last operand with non-zero label is propagated to the result 
of
+this instruction.
+
 Memory layout and label management
 --
 


Index: clang/docs/DataFlowSanitizerDesign.rst
===
--- clang/docs/DataFlowSanitizerDesign.rst
+++ clang/docs/DataFlowSanitizerDesign.rst
@@ -135,6 +135,35 @@
 track of what labels they have used so far, picking one that is yet
 unused, etc).
 
+Origin tracking trace representation
+
+
+An origin tracking trace is a list of chains. Each chain has a stack trace
+where the DFSan runtime records a label propapation, and a pointer to its
+previous chain. The very first chain does not point to any chain.
+
+Every four 4-bytes aligned application bytes share a 4-byte origin trace ID. A
+4-byte origin trace ID contains a 4-bit depth and a 28-bit hash ID of a chain.
+
+A chain ID is calculated as a hash from a chain structure. A chain structure
+contains a stack ID and the previous chain ID. The chain head has a zero
+previous chain ID. A stack ID is a hash from a stack trace. The 4-bit depth
+limits the maximal length of a path. The environment variable ``origin_history_size``
+can set the depth limit. Non-positive values mean unlimited. Its default value
+is 16. When reaching the limit, origin tracking ignores following propagation
+chains.
+
+The first chain of a trace starts by `dfsan_set_label` with non-zero labels. A
+new chain is appended at the end of a trace at stores or memory transfers when
+``-dfsan-track-origins`` is 1. Memory transfers include LLVM memory transfer
+instructions and wrapped glibc memcpy and memmove. When ``-dfsan-track-origins``
+is 2, a new chain is also appended at loads.
+
+Other instructions do not create new chains, but simply propagate origin trace
+IDs. If an instruction has more than one operands with non-zero labels, the origin
+treace ID of the last operand with non-zero label is propagated to the result of
+this instruction.
+
 Memory layout and label management
 --
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D106903: [dfsan][NFC] Describe how origin trace tracking works

2021-07-27 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao updated this revision to Diff 362161.
stephan.yichao.zhao added a comment.

typos


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106903/new/

https://reviews.llvm.org/D106903

Files:
  clang/docs/DataFlowSanitizerDesign.rst


Index: clang/docs/DataFlowSanitizerDesign.rst
===
--- clang/docs/DataFlowSanitizerDesign.rst
+++ clang/docs/DataFlowSanitizerDesign.rst
@@ -135,6 +135,35 @@
 track of what labels they have used so far, picking one that is yet
 unused, etc).
 
+Origin tracking trace representation
+
+
+An origin tracking trace is a list of chains. Each chain has a stack trace
+where the DFSan runtime records a label propapation, and a pointer to its
+previous chain. The very first chain does not point to any chain.
+
+Every four 4-bytes aligned application bytes share a 4-byte origin trace ID. A
+4-byte origin trace ID contains a 4-bit depth and a 28-bit hash ID of a chain.
+
+A chain ID is calculated as a hash from a chain structure. A chain structure
+contains a stack ID and the previous chain ID. The chain head has a zero
+previous chain ID. A stack ID is a hash from a stack trace. The 4-bit depth
+limits the maximal length of a path. The environment variable 
``origin_history_size``
+can set the depth limit. Non-positive values mean unlimited. Its default value
+is 16. When reaching the limit, origin tracking ignores following propagation
+chains.
+
+The first chain of a trace starts by `dfsan_set_label` with non-zero labels. A
+new chain is appended at the end of a trace at stores or memory transfers when
+``-dfsan-track-origins`` is 1. Memory transfers include LLVM memory transfer
+instructions, glibc memcpy and memmove. When ``-dfsan-track-origins`` is 2, a
+new chain is also appended at loads.
+
+Other instructions do not create new chains, but simply propagate origin trace
+IDs. If an instruction has more than one operands with non-zero labels, the 
origin
+treace ID of the last operand with non-zero label is propagated to the result 
of
+the instruction.
+
 Memory layout and label management
 --
 


Index: clang/docs/DataFlowSanitizerDesign.rst
===
--- clang/docs/DataFlowSanitizerDesign.rst
+++ clang/docs/DataFlowSanitizerDesign.rst
@@ -135,6 +135,35 @@
 track of what labels they have used so far, picking one that is yet
 unused, etc).
 
+Origin tracking trace representation
+
+
+An origin tracking trace is a list of chains. Each chain has a stack trace
+where the DFSan runtime records a label propapation, and a pointer to its
+previous chain. The very first chain does not point to any chain.
+
+Every four 4-bytes aligned application bytes share a 4-byte origin trace ID. A
+4-byte origin trace ID contains a 4-bit depth and a 28-bit hash ID of a chain.
+
+A chain ID is calculated as a hash from a chain structure. A chain structure
+contains a stack ID and the previous chain ID. The chain head has a zero
+previous chain ID. A stack ID is a hash from a stack trace. The 4-bit depth
+limits the maximal length of a path. The environment variable ``origin_history_size``
+can set the depth limit. Non-positive values mean unlimited. Its default value
+is 16. When reaching the limit, origin tracking ignores following propagation
+chains.
+
+The first chain of a trace starts by `dfsan_set_label` with non-zero labels. A
+new chain is appended at the end of a trace at stores or memory transfers when
+``-dfsan-track-origins`` is 1. Memory transfers include LLVM memory transfer
+instructions, glibc memcpy and memmove. When ``-dfsan-track-origins`` is 2, a
+new chain is also appended at loads.
+
+Other instructions do not create new chains, but simply propagate origin trace
+IDs. If an instruction has more than one operands with non-zero labels, the origin
+treace ID of the last operand with non-zero label is propagated to the result of
+the instruction.
+
 Memory layout and label management
 --
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D103745: [dfsan] Add full fast8 support

2021-06-05 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao added a comment.

The failed test cases from x64 debian > libFuzzer.libFuzzer::* seem related. 
They still use -dfsan-fast-16-labels.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103745/new/

https://reviews.llvm.org/D103745

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D103745: [dfsan] Add full fast8 support

2021-06-05 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao added inline comments.



Comment at: clang/docs/DataFlowSanitizer.rst:172-178
 assert(ij_label == 3);  // Verifies all of the above
 
+// Or, equivalently:
+assert(dfsan_has_label(ij_label, i_label));
+assert(dfsan_has_label(ij_label, j_label));
+assert(!dfsan_has_label(ij_label, k_label));
+

If we swap assert(ij_label == 3) with the 3 dfsan_has_label, the two equivalent 
blocks are close to each other.



Comment at: clang/docs/DataFlowSanitizerDesign.rst:60
 
-As stated above, the tool must track a large number of taint
-labels. This poses an implementation challenge, as most multiple-label
-tainting systems assign one label per bit to shadow storage, and
-union taint labels using a bitwise or operation. This will not scale
-to clients which use hundreds or thousands of taint labels, as the
-label union operation becomes O(n) in the number of supported labels,
-and data associated with it will quickly dominate the live variable
-set, causing register spills and hampering performance.
-
-Instead, a low overhead approach is proposed which is best-case O(log\
-:sub:`2` n) during execution. The underlying assumption is that
-the required space of label unions is sparse, which is a reasonable
-assumption to make given that we are optimizing for the case where
-applications mostly copy data from one place to another, without often
-invoking the need for an actual union operation. The representation
-of a taint label is a 16-bit integer, and new labels are allocated
-sequentially from a pool. The label identifier 0 is special, and means
-that the data item is unlabelled.
-
-When a label union operation is requested at a join point (any
-arithmetic or logical operation with two or more operands, such as
-addition), the code checks whether a union is required, whether the
-same union has been requested before, and whether one union label
-subsumes the other. If so, it returns the previously allocated union
-label. If not, it allocates a new union label from the same pool used
-for new labels.
-
-Specifically, the instrumentation pass will insert code like this
-to decide the union label ``lu`` for a pair of labels ``l1``
-and ``l2``:
-
-.. code-block:: c
-
-  if (l1 == l2)
-lu = l1;
-  else
-lu = __dfsan_union(l1, l2);
-
-The equality comparison is outlined, to provide an early exit in
-the common cases where the program is processing unlabelled data, or
-where the two data items have the same label.  ``__dfsan_union`` is
-a runtime library function which performs all other union computation.
+We use an 8-bit unsigned integers for the representation of a
+label. The label identifier 0 is special, and means that the data item

integer



Comment at: clang/docs/DataFlowSanitizerDesign.rst:65
+join point (any arithmetic or logical operation with two or more
+operands, such as addition), we can simply OR the two labels in O(1).
 

the labels, and each OR is in O(1).



Comment at: clang/docs/DataFlowSanitizerDesign.rst:68
+Users are responsible for managing the 8 integer labels (i.e., keeping
+track of what labels they have used so far, pick one that is yet
+unused, etc).

picking



Comment at: clang/docs/DataFlowSanitizerDesign.rst:74
 
 The following is the current memory layout for Linux/x86\_64:
 

memory layout



Comment at: clang/docs/DataFlowSanitizerDesign.rst:99
 associated directly with registers.  Loads will result in a union of
-all shadow labels corresponding to bytes loaded (which most of the
-time will be short circuited by the initial comparison) and stores will
-result in a copy of the label to the shadow of all bytes stored to.
+all shadow labels corresponding to bytes loaded and stores will result
+in a copy of the label to the shadow of all bytes stored to.

, and



Comment at: clang/docs/DataFlowSanitizerDesign.rst:100
+all shadow labels corresponding to bytes loaded and stores will result
+in a copy of the label to the shadow of all bytes stored to.
 

the label of a stored value



Comment at: compiler-rt/lib/dfsan/dfsan.cpp:209
 
-// Like __dfsan_union, but for use from the client or custom functions.  Hence
-// the equality comparison is done here before calling __dfsan_union.
+// Resolves the union of two unequal labels.
 SANITIZER_INTERFACE_ATTRIBUTE dfsan_label

After removing legacy mode. if our code does not check l1 != l2 in IR, the 
comments can be updated.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103745/new/

https://reviews.llvm.org/D103745

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D103745: [dfsan] Add full fast8 support

2021-06-07 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao added a comment.

How did we fix that alignment error from compiler-rt/test/dfsan/origin_ldst.c?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103745/new/

https://reviews.llvm.org/D103745

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D104494: [dfsan] Replace dfs$ prefix with .dfsan suffix

2021-06-17 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao added inline comments.



Comment at: llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:1134
+  Asm.replace(Pos, 1, Suffix + "@");
+}
 GV->getParent()->setModuleInlineAsm(Asm);

Based on http://web.mit.edu/rhel-doc/3/rhel-as-en-3/symver.html, there must be 
a @ in the .symver line after the first match.
Please change  Pos != std::string::npos to be like
```
Pos = Asm.find("@", Pos);
assert(Pos != std::string::npos);
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104494/new/

https://reviews.llvm.org/D104494

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D104896: [DFSan] Change shadow and origin memory layouts to match MSan.

2021-06-25 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao added inline comments.



Comment at: compiler-rt/lib/dfsan/dfsan.cpp:169
+// TODO(browneee): Removed this after testing and not hit.
+CHECK(MEM_IS_SHADOW(s));
+// if (!MEM_IS_SHADOW(s)) {

Based on the recent issue about using a replaced memmove.
It seems safe to always keep this CHECK and update the comments to reflect this.




Comment at: compiler-rt/lib/dfsan/dfsan.cpp:170
+CHECK(MEM_IS_SHADOW(s));
+// if (!MEM_IS_SHADOW(s)) {
+//   // The current DFSan memory layout is not always correct. For example,

Please remove the commented code.



Comment at: compiler-rt/lib/dfsan/dfsan.cpp:177
+
+if (*s) {
+  uptr aligned_addr = OriginAlignDown(SHADOW_TO_ORIGIN(s));

This branch seems redundant with the one below.



Comment at: compiler-rt/lib/dfsan/dfsan.cpp:332
StackTrace *stack) {
-  if (!has_valid_shadow_addr(dst) ||
-  !has_valid_shadow_addr((void *)((uptr)dst + size)) ||
-  !has_valid_shadow_addr(src) ||
-  !has_valid_shadow_addr((void *)((uptr)src + size))) {
+  // TODO(browneee): Removed this after testing and not hit.
+  if (!MEM_IS_SHADOW(shadow_for(dst)) ||

Please update the comments to be consistent with other MEM_IS_SHADOW checks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104896/new/

https://reviews.llvm.org/D104896

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D104896: [DFSan] Change shadow and origin memory layouts to match MSan.

2021-06-25 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao added a comment.

Thank you for making this work!




Comment at: compiler-rt/lib/dfsan/dfsan.cpp:871
+
+static void CheckMemoryLayoutSanity() {
+  uptr prev_end = 0;

Please add a comment about that these CheckMemoryLayoutSanity, ... and 
InitShadow are possible to be shared with MSan (like the TODO in 
dfsan_platform.h), by moving them and those platform mapping definitions/macros 
compiler-rt/lib/msan/msan.h to sanitizer_common because it is highly likely 
that MSan and DFSan always have the same layouts...
Since the change does not branch from MSan's code, w/o comments, it is not easy 
for others to know they are similar.



Comment at: compiler-rt/lib/dfsan/dfsan.cpp:902
+
+static bool CheckMemoryRangeAvailability(uptr beg, uptr size) {
+  if (size > 0) {

I suggest moving CheckMemoryRangeAvailability and ProtectMemoryRange to 
sanitizer_common (if this does not break any sanitizer convention).
They are checking some general things and also some corner cases like "protect 
address 0...".
If shared, it is more likely that others can help DFSan to improve them.



Comment at: llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:257
+// x86_64 Linux
+static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
+0,  // AndMask (not used)

Does this suggest LinuxX8664MemoryMapParams? Not sure if there is a workaround 
to suppress this warning.



Comment at: llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:460
 
+  /// Memory map parameters used in application-to-shadow calculation.
+  const MemoryMapParams *MapParams;

in mapping application to shadow and origin


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104896/new/

https://reviews.llvm.org/D104896

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105378: [dfsan][NFC] Add Origin Tracking into doc

2021-07-02 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao created this revision.
stephan.yichao.zhao added a reviewer: morehouse.
stephan.yichao.zhao requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D105378

Files:
  clang/docs/DataFlowSanitizer.rst


Index: clang/docs/DataFlowSanitizer.rst
===
--- clang/docs/DataFlowSanitizer.rst
+++ clang/docs/DataFlowSanitizer.rst
@@ -191,6 +191,44 @@
 return 0;
   }
 
+Origin Tracking
+===
+
+DataFlowSanitizer can track origins of labeled values. This feature is enabled 
by
+``-mllvm -dfsan-track-origins=1``. For example,
+
+.. code-block:: console
+
+% cat test.cc
+#include 
+#include 
+
+int main(int argc, char** argv) {
+  int i = 0;
+  dfsan_set_label(i_label, &i, sizeof(i));
+  int j = i + 1;
+  dfsan_print_origin_trace(&j, "A flow from i to j");
+  return 0;
+}
+
+% clang -fsanitize=dataflow -mllvm -dfsan-track-origins=1 
-fno-omit-frame-pointer -g -O2 test.cc
+% ./a.out
+Taint value 0x1 (at 0x7ffd42bf415c) origin tracking (A flow from i to j)
+Origin value: 0x1391, Taint value was stored to memory at
+  #0 0x55676db85a62 in main test.cc:7:7
+  #1 0x7f0083611bbc in __libc_start_main libc-start.c:285
+
+Origin value: 0x9e1, Taint value was created at
+  #0 0x55676db85a08 in main test.cc:6:3
+  #1 0x7f0083611bbc in __libc_start_main libc-start.c:285
+
+By ``-mllvm -dfsan-track-origins=1`` DataFlowSanitizer collects only
+intermediate stores a labeled value went through. Origin tracking slows down
+program execution by a factor of 2x on top of the usual DataFlowSanitizer
+slowdown and increases memory overhead by 1x. By ``-mllvm 
-dfsan-track-origins=2``
+DataFlowSanitizer also collects intermediate loads a labeled value went 
through.
+This mode slows down program execution by a factor of 4x.
+
 Current status
 ==
 


Index: clang/docs/DataFlowSanitizer.rst
===
--- clang/docs/DataFlowSanitizer.rst
+++ clang/docs/DataFlowSanitizer.rst
@@ -191,6 +191,44 @@
 return 0;
   }
 
+Origin Tracking
+===
+
+DataFlowSanitizer can track origins of labeled values. This feature is enabled by
+``-mllvm -dfsan-track-origins=1``. For example,
+
+.. code-block:: console
+
+% cat test.cc
+#include 
+#include 
+
+int main(int argc, char** argv) {
+  int i = 0;
+  dfsan_set_label(i_label, &i, sizeof(i));
+  int j = i + 1;
+  dfsan_print_origin_trace(&j, "A flow from i to j");
+  return 0;
+}
+
+% clang -fsanitize=dataflow -mllvm -dfsan-track-origins=1 -fno-omit-frame-pointer -g -O2 test.cc
+% ./a.out
+Taint value 0x1 (at 0x7ffd42bf415c) origin tracking (A flow from i to j)
+Origin value: 0x1391, Taint value was stored to memory at
+  #0 0x55676db85a62 in main test.cc:7:7
+  #1 0x7f0083611bbc in __libc_start_main libc-start.c:285
+
+Origin value: 0x9e1, Taint value was created at
+  #0 0x55676db85a08 in main test.cc:6:3
+  #1 0x7f0083611bbc in __libc_start_main libc-start.c:285
+
+By ``-mllvm -dfsan-track-origins=1`` DataFlowSanitizer collects only
+intermediate stores a labeled value went through. Origin tracking slows down
+program execution by a factor of 2x on top of the usual DataFlowSanitizer
+slowdown and increases memory overhead by 1x. By ``-mllvm -dfsan-track-origins=2``
+DataFlowSanitizer also collects intermediate loads a labeled value went through.
+This mode slows down program execution by a factor of 4x.
+
 Current status
 ==
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105378: [dfsan][NFC] Add Origin Tracking into doc

2021-07-07 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao updated this revision to Diff 357004.
stephan.yichao.zhao marked an inline comment as done.
stephan.yichao.zhao retitled this revision from " [dfsan][NFC] Add Origin 
Tracking into doc" to "[dfsan][NFC] Add Origin Tracking into doc".
stephan.yichao.zhao added a comment.

clang -> clange++


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105378/new/

https://reviews.llvm.org/D105378

Files:
  clang/docs/DataFlowSanitizer.rst


Index: clang/docs/DataFlowSanitizer.rst
===
--- clang/docs/DataFlowSanitizer.rst
+++ clang/docs/DataFlowSanitizer.rst
@@ -191,6 +191,44 @@
 return 0;
   }
 
+Origin Tracking
+===
+
+DataFlowSanitizer can track origins of labeled values. This feature is enabled 
by
+``-mllvm -dfsan-track-origins=1``. For example,
+
+.. code-block:: console
+
+% cat test.cc
+#include 
+#include 
+
+int main(int argc, char** argv) {
+  int i = 0;
+  dfsan_set_label(i_label, &i, sizeof(i));
+  int j = i + 1;
+  dfsan_print_origin_trace(&j, "A flow from i to j");
+  return 0;
+}
+
+% clang++ -fsanitize=dataflow -mllvm -dfsan-track-origins=1 
-fno-omit-frame-pointer -g -O2 test.cc
+% ./a.out
+Taint value 0x1 (at 0x7ffd42bf415c) origin tracking (A flow from i to j)
+Origin value: 0x1391, Taint value was stored to memory at
+  #0 0x55676db85a62 in main test.cc:7:7
+  #1 0x7f0083611bbc in __libc_start_main libc-start.c:285
+
+Origin value: 0x9e1, Taint value was created at
+  #0 0x55676db85a08 in main test.cc:6:3
+  #1 0x7f0083611bbc in __libc_start_main libc-start.c:285
+
+By ``-mllvm -dfsan-track-origins=1`` DataFlowSanitizer collects only
+intermediate stores a labeled value went through. Origin tracking slows down
+program execution by a factor of 2x on top of the usual DataFlowSanitizer
+slowdown and increases memory overhead by 1x. By ``-mllvm 
-dfsan-track-origins=2``
+DataFlowSanitizer also collects intermediate loads a labeled value went 
through.
+This mode slows down program execution by a factor of 4x.
+
 Current status
 ==
 


Index: clang/docs/DataFlowSanitizer.rst
===
--- clang/docs/DataFlowSanitizer.rst
+++ clang/docs/DataFlowSanitizer.rst
@@ -191,6 +191,44 @@
 return 0;
   }
 
+Origin Tracking
+===
+
+DataFlowSanitizer can track origins of labeled values. This feature is enabled by
+``-mllvm -dfsan-track-origins=1``. For example,
+
+.. code-block:: console
+
+% cat test.cc
+#include 
+#include 
+
+int main(int argc, char** argv) {
+  int i = 0;
+  dfsan_set_label(i_label, &i, sizeof(i));
+  int j = i + 1;
+  dfsan_print_origin_trace(&j, "A flow from i to j");
+  return 0;
+}
+
+% clang++ -fsanitize=dataflow -mllvm -dfsan-track-origins=1 -fno-omit-frame-pointer -g -O2 test.cc
+% ./a.out
+Taint value 0x1 (at 0x7ffd42bf415c) origin tracking (A flow from i to j)
+Origin value: 0x1391, Taint value was stored to memory at
+  #0 0x55676db85a62 in main test.cc:7:7
+  #1 0x7f0083611bbc in __libc_start_main libc-start.c:285
+
+Origin value: 0x9e1, Taint value was created at
+  #0 0x55676db85a08 in main test.cc:6:3
+  #1 0x7f0083611bbc in __libc_start_main libc-start.c:285
+
+By ``-mllvm -dfsan-track-origins=1`` DataFlowSanitizer collects only
+intermediate stores a labeled value went through. Origin tracking slows down
+program execution by a factor of 2x on top of the usual DataFlowSanitizer
+slowdown and increases memory overhead by 1x. By ``-mllvm -dfsan-track-origins=2``
+DataFlowSanitizer also collects intermediate loads a labeled value went through.
+This mode slows down program execution by a factor of 4x.
+
 Current status
 ==
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105378: [dfsan][NFC] Add Origin Tracking into doc

2021-07-07 Thread stephan.yichao.zhao via Phabricator via cfe-commits
stephan.yichao.zhao added a comment.

In D105378#2861914 , @morehouse wrote:

> We may also want to consider creating a frontend flag like MSan's origin 
> tracking (`-fsanitize-memory-track-origins`).

I will follow up this in a separate change.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105378/new/

https://reviews.llvm.org/D105378

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits