On Wed, Nov 12, 2014 at 12:18:31PM +0300, Yury Gribov wrote:
> >--- gcc/sanopt.c.jj  2014-11-11 09:13:36.698280115 +0100
> >+++ gcc/sanopt.c     2014-11-11 18:07:17.913539517 +0100
> >@@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.
> >  #include "langhooks.h"
> >  #include "ubsan.h"
> >  #include "params.h"
> >+#include "tree-ssa-operands.h"
> >
> >
> >  /* This is used to carry information about basic blocks.  It is
> >@@ -56,7 +57,29 @@ along with GCC; see the file COPYING3.
> >
> >  struct sanopt_info
> >  {
> >-  /* True if this BB has been visited.  */
> >+  /* True if this BB might call (directly or indirectly) free/munmap
> >+     or similar operation.  */
> 
> Frankly I think this is more about memory poisoned status then free.

For ASAN about anything that might make ASAN_CHECK non-redundant, that
certainly is free as the most common thing (bet munmap doesn't poison
the memory being unmapped) and also the asan_*poison* calls.

> >@@ -69,11 +92,307 @@ struct sanopt_ctx
> >       a vector of UBSAN_NULL call statements that check this pointer.  */
> >    hash_map<tree, auto_vec<gimple> > null_check_map;
> >
> >+  /* This map maps a pointer (the second argument of ASAN_CHECK) to
> >+     a vector of ASAN_CHECK call statements that check the access.  */
> >+  hash_map<tree, auto_vec<gimple> > asan_check_map;
> 
> How about using traits class like the one below for both maps?

Well, for null_check_map, it is only SSA_NAMEs that matter (perhaps the
code doesn't return early if it is not, maybe should), address of
decls is always non-NULL and alignment should be known too.
Or, Marek, can you see if we can get there e.g. decls for alignments,
extern char a[];
long long int
foo (void)
{
  *(long long int *) &a[0] = 5;
}
?

> struct tree_map_traits : default_hashmap_traits
> {
>   static inline hashval_t hash (const_tree ref)
>     {
>       return iterative_hash_expr (ref, 0);
>     }
> 
>   static inline bool equal_keys (const_tree ref1, const_tree ref2)
>     {
>       return operand_equal_p (ref1, ref2, 0);
>     }
> };
> 
> Also the hash_map probably deserves a typedef.

For asan you're right, we can have addresses of decls there etc.
If you have spare cycles, feel free to take over the patch and adjust it.

> >+/* Return true if there might be any call to free/munmap operation
> >+   on any path in between DOM (which should be imm(BB)) and BB.  */
> >+
> >+static bool
> >+imm_dom_path_with_freeing_call (basic_block bb, basic_block dom)
> 
> Perhaps some parameter to bound search in pathalogical cases?

I believe right now (Honza, correct me if wrong) the algorithm doesn't
have pathological cases.

> >+  /* We already have recorded a ASAN_CHECK check for this pointer.  Perhaps
> >+     we can drop this one.  But only if this check doesn't specify larger
> >+     size.  */
> >+  while (!v.is_empty ())
> >+    {
> >+      gimple g = v.last ();
> >+      /* Remove statements for BBs that have been already processed.  */
> >+      sanopt_info *si = (sanopt_info *) gimple_bb (g)->aux;
> >+      if (si->visited_p)
> >+    v.pop ();
> >+      else
> >+    break;
> >+    }
> 
> This part seems to be duplicated across maybe_optimize_ubsan_null_ifn and
> maybe_optimize_asan_check_ifn.  Perhaps make some maybe_get_dominating_check
> API?

I think they aren't similar to make it worth it.
Though, if we optimize tsan, I'd say we should use exactly the same routines
(and given that tsan is compile time incompatible with asan, can use even
the same flags).  Just, different definition of what is the "freeing call"
thing - bet we need a better name for that.  For asan it is
asm volatile, asm (... : "memory") or a call that might free or asan*poison
or munmap or so.  For tsan, I believe it is asm volatile, asm (... :
"memory") or a __sync_*/__atomic_* call or pthread_* synchronization
primitive, or some function that has any of those in that.  So, for
start !nonfreeing_call_p () || __sync/__atomic* builtin I think.  But for
tsan we'd need to figure out the rules what we can optimize.

> >+      tree glen = gimple_call_arg (g, 2);
> >+      /* If glen is not integer, we'd have added it to the vector only if
> >+     ASAN_CHECK_NON_ZERO_LEN flag is set, so treat it as length 1.  */
> 
> Frankly I don't think we use ASAN_CHECK_NON_ZERO_LEN anymore (it's only set
> for trivial cases now).  Perhaps we should just nuke it from asan.c and
> sanopt.c alltogether?

I thought that for the builtins libasan doesn't instrument (which includes
very often used functions like __memcpy_chk etc.) we still use it.
> 
> >+      if (TREE_CODE (glen) != INTEGER_CST)
> 
> That's a matter of taste but why not a higher-level tree_fits_shwi and
> tree_to_shwi?

As we compare it only using tree_int_cst_lt, there is no point to require
that it fits into shwi.  Even if it doesn't, we can handle it.

> >+  bool asan_check_optimize
> >+    = (flag_sanitize & SANITIZE_ADDRESS)
> >+      && ((flag_sanitize & flag_sanitize_recover
> >+       & SANITIZE_KERNEL_ADDRESS) == 0);
> 
> Why do we disable check optimizations for KASan?

Only for -fno-sanitize-recover=kernel-address too.  The thing is,
if you do recover from failed asan checks, supposedly you want to
see all errors reported, not just the first one.

> >       case IFN_ASAN_CHECK:
> >-        ctx->asan_num_accesses++;
> >+        if (asan_check_optimize)
> >+          remove = maybe_optimize_asan_check_ifn (ctx, stmt);
> 
> It may be useful to also store base address in check-table:
> 
> static tree
> maybe_get_single_definition (tree t)
> {
>   if (TREE_CODE (t) == SSA_NAME)
>     {
>       gimple g = SSA_NAME_DEF_STMT (t);
>       if (gimple_assign_single_p (g))
>         return gimple_assign_rhs1 (g);
>     }
>   return NULL_TREE;
> }

Why?  forwprop etc. should have propagated it into the ASAN_CHECK if
it is is_gimple_val.  Or do you have specific examples which you have in
mind?

        Jakub

Reply via email to