On 11/29/18 4:17 PM, Jakub Jelinek wrote:
> On Thu, Nov 29, 2018 at 04:03:42PM +0100, Martin Liška wrote:
>>> Two problems, it uses unconditionally unaligned stores, without
>>> checking if the target supports them at all (in this case it does).
>>> And, it doesn't check if it wouldn't be more efficient to use
>>> 32-bit stores. 
>>
>> Ok, so now we reduced the alignment of stack variables to 
>> ASAN_MIN_RED_ZONE_SIZE (16).
>> What options do wehave when we emit the red zones? The only guarantee we 
>> have is
>> that in shadow memory we are aligned to at least 2. Should byte-based 
>> emission used
>> for STRICT_ALIGNMENT targets?
> 
> While individual variables can have smaller alignment, why we can't as before 
> ensure
> the whole block of all the vars is 32-byte aligned and (see below) also
> padded to 32 bytes?  Doesn't __asan_stack_malloc* align those as before and
> if not doing use after return sanitization, has your patch changed anything
> in how the whole block is aligned on the stack?  Of course, on non-strict
> alignment targets we might not care that much.

We can ensure the alignment and padding will be done to 32-bytes. However the 
current
code jumps over the offsets and can eventually do a store to 1B aligned memory.
See small demo:

$ cat asan-smaller.c
void bar (void *a, void *b, void *c, void *d, void *e, void *f, void *g)
{
  int *p = a;
  *p = 123;
}


int
baz (void)
{
  char a;
  char e[17];
  char e1[30];
  char e2[40];
  bar (&a, 0, 0, 0, &e[0], &e1[0], &e2[0]);
  return 0;
}

int main()
{
  baz();
}

$ gcc asan-smaller.c  -fsanitize=address  && ./a.out
flushing into: -256, values: f1f1f1f1
flushing into: -224, values: 01f20000
flushing into: -192, values: 01f2f2f2
flushing into: -160, values: f2f20000
flushing into: -120, values: 06f2f2f2
flushing into: -88, values: f2000000
flushing into: -40, values: f3f3f3f3
flushing into: -8, values: f3000000
=================================================================
==7793==ERROR: AddressSanitizer: stack-buffer-overflow on address 
0x7fffffffdac0 at pc 0x0000004011a1 bp 0x7fffffffda30 sp 0x7fffffffda28
WRITE of size 4 at 0x7fffffffdac0 thread T0
    #0 0x4011a0 in bar (/home/marxin/Programming/testcases/a.out+0x4011a0)
    #1 0x401293 in baz (/home/marxin/Programming/testcases/a.out+0x401293)
    #2 0x40133b in main (/home/marxin/Programming/testcases/a.out+0x40133b)
    #3 0x7ffff7019fea in __libc_start_main ../csu/libc-start.c:308
    #4 0x401099 in _start (/home/marxin/Programming/testcases/a.out+0x401099)

Address 0x7fffffffdac0 is located in stack of thread T0 at offset 32 in frame
    #0 0x4011bd in baz (/home/marxin/Programming/testcases/a.out+0x4011bd)

  This frame has 4 object(s):
    [32, 33) 'a' (line 11) <== Memory access at offset 32 partially overflows 
this variable
    [48, 65) 'e' (line 12)
    [112, 142) 'e1' (line 13)
    [176, 216) 'e2' (line 14)
HINT: this may be a false positive if your program uses some custom stack 
unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow 
(/home/marxin/Programming/testcases/a.out+0x4011a0) in bar
Shadow bytes around the buggy address:
  0x10007fff7b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10007fff7b50: 00 00 00 00 f1 f1 f1 f1[01]f2 00 00 01 f2 f2 f2
  0x10007fff7b60: f2 f2 00 00 00 06 f2 f2 f2 f2 00 00 00 00 00 f3
  0x10007fff7b70: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7ba0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

as seen 06f2f2f2 is completely unaligned write.

So the current code needs to be learned that once we move to another variable 
we must
start the offset at 32B boundary. Let me take a look..

Martin

> 
>>>  It isn't that we want the gaps to have whatever
>>> value the shadow memory contained before, we want it to be 0 and just rely
>>> on it being zero from earlier.
>>> Another question is if it wouldn't be worth to ensure the variable area is
>>> always a multiple of 32 bytes (thus the corresponding shadow always multiple
>>> of 4 bytes).  Then, for this testcase, you'd store:
>>> $-235802127, 2147450880(%rbp)       // 0xf1f1f1f1
>>> $-234687999, 2147450884(%rbp)   // 0xf202f201
>>> $-218041852, 2147450888(%rbp)   // 0xf300f204
>>> $-202116109, 2147450892(%rbp)   // 0xf3f3f3f3
>>> For the single char/short/int/long var in a function case you'd still emit
>>> just f1 f1 f1 f1 0[1240] f3 f3 f3
>>> and the number of final f3 bytes would be from 3 to 6 (or 2 to 5), 1 is
>>> probably too few.
> 
> Here, the padding to 32 bytes.
> 
>       Jakub
> 

Reply via email to