On Wed, 16 Jun 2021, Qing Zhao wrote:

> Hi, Richard,
> 
> On Jun 16, 2021, at 1:19 AM, Richard Biener 
> <rguent...@suse.de<mailto:rguent...@suse.de>> wrote:
> 
> +/* Expand the IFN_DEFERRED_INIT function according to its second
> argument.  */
> +static void
> +expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> +{
> +  tree var = gimple_call_lhs (stmt);
> +  tree init = NULL_TREE;
> +  enum auto_init_type init_type
> +    = (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
> +
> +  switch (init_type)
> +    {
> +    default:
> +      gcc_unreachable ();
> +    case AUTO_INIT_PATTERN:
> +      init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
> +      expand_assignment (var, init, false);
> +      break;
> +    case AUTO_INIT_ZERO:
> +      init = build_zero_cst (TREE_TYPE (var));
> +      expand_assignment (var, init, false);
> +      break;
> +    }
> 
> I think actually building build_pattern_cst_for_auto_init can generate
> massive garbage and for big auto vars code size is also a concern and
> ideally on x86 you'd produce rep movq.  So I don't think going
> via expand_assignment is good.  Instead you possibly want to lower
> .DEFERRED_INIT to MEMs following expand_builtin_memset and
> eventually enhance that to allow storing pieces larger than a byte.
> 
> Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a
> repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer
> types. Therefore, lowering the .DEFERRED_INIT for “PATTERN”
> initialization through “memset” is not always possible.
> 
> Let me know if I miss anything in the above. Do you have other suggestions?
> 
> The main point is that you need to avoid building the explicit initializer
> only to have it consumed by assignment expansion.  If you want to keep
> all the singing and dancing (as opposed to maybe initializing with a
> 0x1 byte pattern) then I think for efficiency you still want to
> block-initialize the variable and then only fixup the special fields.
> 
> Yes, this is a good idea.
> 
> We can memset the whole structure with repeated pattern “0xAA” first,
> Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform.
> That might be more efficient.
> 
> However, after more consideration, I feel that this might be a more
> general optimization for “store_constructor” itself:
> 
> I.e,  if the “constructor” includes repeated byte value “0xAA” or any other 
> value over a certain threshold,
> i.e, 70% of the total size, then we might need to use a call to memset first, 
> and then emit some additional single
> field stores  to fix up the fields that have different initialization values?
> 
> Just like the current handling of “zeroes” in the current 
> “store_constructor”, if “zeroes” occupy most of the constructor, then
> “Clear the whole structure” first, then emit additional single field stories 
> to fix up other fields that do not hold zeros.
> 
> So, I think that it might be better to keep the current
> “expand_assignment” for “Pattern initialization” as it is in this patch.
> 
> And then, later we can add a separate patch to add this more general
> optimization in “store_constructor” to improve the run time performance
> and code size in general?
> 
> What’s your opinion on this?
> 
> My point is that _building_ the constructor is what we want to avoid
> since that involves a lot of overhead memory-wise, it also requires
> yet another complex structure field walk with much room for errors.
> 
> Block-initializing the object is so much easier and more efficient.
> Implementing block initialization with a block size different from
> a single byte should be also reasonably possible.  I mean there's
> wmemset (not in GCC), so such block initialization would have other
> uses as well.
> 
> If the pattern of the value that is used to initialize is repeatable, then
> Block-initializing is ideal. However, Since the patterns of the values that
> are used to initialize might not be completely repeatable due to BOOLEAN (0),
> POINTER_TYPE at 32-bit platform (0x000000AA) and FLOATING TYPE (NaN),
> After block initializing of the whole object, we still need to add additional 
> fix up
> stores of these different patterns to the corresponding fields.
> 
> But that's a bug with the pattern used then.  You can never be sure that
> an object is used only as its declared type but you are initializing it
> as if it were.  Also all uninit uses invoke undefined behavior so I don't
> see why you need to pay special attention here.  After all this makes
> pattern init so much more fragile than zero-init which makes me question
> it even more ...
> 
> Yes, you are right.  The major reason for the complexity of the code to 
> handle pattern initialization
> is because multiple different patterns are assigned to different types.
> 
> This is for the compatibility with CLANG. -:). 
> (https://reviews.llvm.org/D54604)

I don't care about functional 1:1 "compatibility" with CLANG.

> For reference, I copied the part for pattern initialization from CLANG’s 
> patch below:
> 
> 
> 1. Pattern initialization
> 
>   This is the recommended initialization approach. Pattern initialization's

But elsewhere you said pattern initialization is only for debugging,
not production ...

>   goal is to initialize automatic variables with values which will likely
>   transform logic bugs into crashes down the line, are easily recognizable in
>   a crash dump, without being values which programmers can rely on for useful
>   program semantics. At the same time, pattern initialization tries to
>   generate code which will optimize well. You'll find the following details in
>   `patternFor`:
> 
>   - Integers are initialized with repeated 0xAA bytes (infinite scream).
>   - Vectors of integers are also initialized with infinite scream.
>   - Pointers are initialized with infinite scream on 64-bit platforms because
>     it's an unmappable pointer value on architectures I'm aware of. Pointers
>     are initialize to 0x000000AA (small scream) on 32-bit platforms because
>     32-bit platforms don't consistently offer unmappable pages. When they do
>     it's usually the zero page. As people try this out, I expect that we'll
>     want to allow different platforms to customize this, let's do so later.
>   - Vectors of pointers are initialized the same way pointers are.
>   - Floating point values and vectors are initialized with a negative quiet 
> NaN
>     with repeated 0xFF payload (e.g. 0xffffffff and 0xffffffffffffffff). NaNs 
> are nice
>     (here, anways) because they propagate on arithmetic, making it more likely
>     that entire computations become NaN when a single uninitialized value
>     sneaks in.
>   - Arrays are initialized to their homogeneous elements' initialization
>     value, repeated. Stack-based Variable-Length Arrays (VLAs) are
>     runtime-initialized to the allocated size (no effort is made for negative
>     size, but zero-sized VLAs are untouched even if technically undefined).
>   - Structs are initialized to their heterogeneous element's initialization
>     values. Zero-size structs are initialized as 0xAA since they're allocated
>     a single byte.
>   - Unions are initialized using the initialization for the largest member of
>     the union.
> 
>   Expect the values used for pattern initialization to change over time, as we
>   refine heuristics (both for performance and security). The goal is truly to
>   avoid injecting semantics into undefined behavior, and we should be
>   comfortable changing these values when there's a worthwhile point in doing
>   so.
> 
>   Why so much infinite scream? Repeated byte patterns tend to be easy to
>   synthesize on most architectures, and otherwise memset is usually very
>   efficient. For values which aren't entirely repeated byte patterns, LLVM
>   will often generate code which does memset + a few stores.
> 
> 
> 
> 
> For some of the objects whose most fields are BOOLEAN, POINTER_TYPE,
> rr FLOATING_TYPE, pattern  initializing likee this might be less efficient. 
> Do you
> agree on this?
> 
> Use a pattern that fits them all.  I mean memory allocation hardening
> fills allocated storage with a repeated (byte) pattern and people are
> happy with that.  It also makes it easy to spot uninitialized storage
> from a debugger.  So please, do not over-design this, it really doesn't
> make any sense and the common case you are inevitably chasing here
> would already be fine with a random repeated pattern.
> 
> So, My question is:
> 
> If we want to pattern initialize with the single repeated pattern for all 
> types, with one is better to use:  “0xAAAAAAAA”
> or “0xFFFFFFFF” , or other pattern that our current glibc used? What’s that 
> pattern?

It's set by the user.

> Will  “0xAAAAAAAA” in a floating type auto variable crash the program?
> Will “0xFFFFFFFF” in a pointer type auto variable crash the program? (Might 
> crash?)
> 
> 
> (thus also my suggestion to split out
> padding handling - now we can also split out pattern init handling,
> maybe somebody else feels like reviewing and approving this, who knows).
> 
> I am okay with further splitting pattern initialization part to a separate 
> patch. Then we will
> have 4 independent patches in total:
> 
> 1. -fauto-var-init=zero and all the handling in other passes to the new added 
> call to .DEFERRED_INIT.
> 2. Add -fauto-var-init=pattern
> 3. Add -fauto-var-init-padding
> 4. Add -ftrivial-auto-var-init for CLANG compatibility.
> 
> Are the above the correct understanding?

I think we can drop -fauto-var-init=pattern and just go with block
initializing which will cover padding as well which means we can
stay with the odd -ftrivial-auto-var-init name used by CLANG and
add no additional options.

> As said, block-initializing with a repeated pattern is OK and I can see
> that being useful.  Trying to produce "nicer" values for floats, bools
> and pointers on 32bit platforms is IMHO not going to fix anything and
> introduce as many problems as it will "fix".
> 
> Yes, I agree, if we can find a good repeated pattern for all types’s 
> pattern initialization, that will be much easier and simpler to 
> implement, I am happy to do that.  (Honestly, the part of implementation 
> that took me most of the time is pattern-initialization.. and I am still 
> not very comfortable with this part Of the code myself.  -:)

There's no "safe" pattern besides all-zero for all "undefined" uses
(note that uses do not necessarily use declared types).  Which is why
recommending pattern init is somewhat misguided.  There's maybe 
some useful pattern that more readily produces crashes, those that
produce a FP sNaN for all of the float types.

> And if you block-initialize stuff you then automagically cover padding.
> I call this a win-win, no?
> 
> Yes, this will also initialize paddings with patterns (Not zeroes as CLANG 
> did).
> Shall we compatible with CLANG on this?

No, why?

> Now, what you _could_ try is do sth like
> 
> tree ar = build_array_type (uint_ptr_type_node, size_type_node, false);
> tree range = build2 (RANGE_EXPR, size_type_node,
>                      size_zero_node, build_int_cst (size_type_node,
> size-in-words));
> tree pattern = build_int_cst (uint_ptr_type_node, 0xdeadbeef);
> ctor = build_constructor_single (ar, range, pattern);
> ctor = build1 (VIEW_CONVERT_EXPR, type, ctor);
> 
> 
> So, the above will be used to replace the following original code in my patch:
> 
> +    case AUTO_INIT_PATTERN:
> +      init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
> +      expand_assignment (var, init, false);
> +      break;
> 
> ?
> 
> in my example code (untested) you then still need
> 
>   expand_assignment (var, ctor, false);
> 
> it would be the easiest way to try pattern init with a pattern that's
> bigger than a byte (otherwise of course the memset path is optimal).
> 
> If the pattern that is used to initialize all types is byte-repeatable, for 
> example, 0xA or 0xF, then
> We can use memset to initialize all types, however, the potential problem is, 
> if later we decide
> To change to another pattern that might not be byte-repeatable, then the 
> memset implementation
> is not proper at that time.
> 
> Is it possible that we might change the pattern later?

The pattern should be documented as an implementation detail unless
we want to expose it to the user via, say, -fpattern-init=0xdeadbeef.

> 
> 
> thus build a range-init CTOR of an array of pointer-sized elements
> but do the actual assignment to the target object by viewing that
> as the target objects type.
> 
> Okay.
> 
> So, for a target object that is smaller than a word, for example, BOOLEAN,
> CHAR or short, is the above still working?
> 
> Well, you obviously have to adjust that - you can't pattern-init
> a BOOL with word-size stores of a word-size pattern.
> 
> As said, for example glibc allocator hardening with MALLOC_PERTURB_
> uses simple byte-init.
> 
> What’s the pattern glibc used?

The value of the MALLOC_PERTURB_ environment truncated to a byte.

Richard.

> Thanks a lot.
> 
> Qing
> 
> That should block-initialize with
> a uint_ptr_type_node sized pattern (but likely less efficient than
> special block init would do).
> 
> Not sure what problems you will run into this, you'll have to try.
> 
> I will try this.
> 
> Thanks a lot for your suggestions.
> 
> Qing
> 
> Richard.
> 
> 
> 
> --
> Richard Biener <rguent...@suse.de<mailto:rguent...@suse.de>>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to