Linaro GCC vs Valgrind
Hello, We have been using Linaro GCC 5.x[1] and valgrind. When the optimizer is turned on valgrind complains about writes beyond the current stack pointer. With the optimizer off, the problem report goes away. I have my own conclusion about what is going on but I won't bias you with it. Here are the facts: All files and logs attached as 10K tar.gz if it survives this maillist. test.c: #include int main(int argc,char** argv) { int i; for (i = 1; i < argc; i++) { printf("argument is %s\n", argv[i]); } return 0; } $ arm-linux-gnueabihf-gcc -march=armv7ve -marm -mfpu=neon \ -mfloat-abi=hard -mcpu=cortex-a15 -O2 -g \ -o test-fail test.c $ valgrind --leak-resolution=high --track-origins=yes \ --trace-children=yes --leak-check=full --error-limit=no \ ./test-fail arg1 arg2 arg3 ==20011== Memcheck, a memory error detector ==20011== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==20011== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==20011== Command: ./test-fail arg1 arg2 arg3 ==20011== ==20011== Invalid write of size 4 ==20011==at 0x10300: main (test.c:4) ==20011== Address 0xbdbfcb58 is on thread 1's stack ==20011== 24 bytes below stack pointer ==20011== 000102f8 : 102f8: e351cmp r0, #1 102fc: da14ble 10354 10300: e16d41f8strdr4, [sp, #-24]! ; 0xffe8 Complaint is here 10304: e1a05001mov r5, r1 10308: e3a04001mov r4, #1 1030c: e1cd60f8strdr6, [sp, #8] 10310: e300748cmovwr7, #1164 ; 0x48c 10314: e1a06000mov r6, r0 10318: e3407001movtr7, #1 1031c: e58d8010str r8, [sp, #16] 10320: e58de014str lr, [sp, #20] 10324: e2844001add r4, r4, #1 10328: e5b51004ldr r1, [r5, #4]! 1032c: e1a7mov r0, r7 10330: ebe4bl 102c8 10334: e1560004cmp r6, r4 10338: 1af9bne 10324 1033c: e1cd40d0ldrdr4, [sp] 10340: e3a0mov r0, #0 10344: e1cd60d8ldrdr6, [sp, #8] 10348: e59d8010ldr r8, [sp, #16] 1034c: e28dd014add sp, sp, #20 10350: e49df004pop {pc}; (ldr pc, [sp], #4) 10354: e3a0mov r0, #0 10358: e12fff1ebx lr Without the optimizer, the code looks different and valgrind does not issue any errors. 000103d8 : 103d8: e52db008str fp, [sp, #-8]! ^^^ Valgrind does not complain about this 103dc: e58de004str lr, [sp, #4] 103e0: e28db004add fp, sp, #4 103e4: e24dd010sub sp, sp, #16 103e8: e50b0010str r0, [fp, #-16] 103ec: e50b1014str r1, [fp, #-20] ; 0xffec 103f0: e3a03001mov r3, #1 103f4: e50b3008str r3, [fp, #-8] 103f8: ea0bb 1042c 103fc: e51b3008ldr r3, [fp, #-8] 10400: e1a03103lsl r3, r3, #2 10404: e51b2014ldr r2, [fp, #-20] ; 0xffec 10408: e0823003add r3, r2, r3 1040c: e5933000ldr r3, [r3] 10410: e1a01003mov r1, r3 10414: e30004a4movwr0, #1188 ; 0x4a4 10418: e341movtr0, #1 1041c: eba9bl 102c8 10420: e51b3008ldr r3, [fp, #-8] 10424: e2833001add r3, r3, #1 10428: e50b3008str r3, [fp, #-8] 1042c: e51b2008ldr r2, [fp, #-8] 10430: e51b3010ldr r3, [fp, #-16] 10434: e1520003cmp r2, r3 10438: baefblt 103fc 1043c: e3a03000mov r3, #0 10440: e1a3mov r0, r3 10444: e24bd004sub sp, fp, #4 10448: e59db000ldr fp, [sp] 1044c: e28dd004add sp, sp, #4 10450: e49df004pop {pc}; (ldr pc, [sp], #4) [1] 5.3-2016.02 for Yocto-project and cross-compile 5.2 on the ARM target "since Linaro hasn’t yet fixed building 5.3 from recipes yet." Both versions give the same results for this test program. William A. Mills Chief Technologist, Open Solutions, SDO Texas Instruments, Inc. 20450 Century Blvd Germantown MD 20878 240-643-0836 valtest.tar.gz Description: application/gzip ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Linaro GCC vs Valgrind
On Thu, Jun 9, 2016 at 2:22 PM, William Mills wrote: > When the optimizer is turned on valgrind complains about writes beyond > the current stack pointer. With the optimizer off, the problem report > goes away. > 000102f8 : >102f8: e351cmp r0, #1 >102fc: da14ble 10354 >10300: e16d41f8strdr4, [sp, #-24]! ; 0xffe8 > Complaint is here This optimization is called shrink-wrapping. It involves moving the function prologue/epilogue inside an outer-most if statement, so that we we can avoid allocating a stack frame when we don't need it. It can be disabled with -fno-shrink-wrap. Perhaps valgrind has special support to detect stack writes inside a prologue, and this support is failing when a function is shrink wrapped because it can't identify where the prologue is. Jim ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Linaro GCC vs Valgrind
This looks like a valgrind bug to me. I can reproduce the problem with this simple program, which shows the issue at any optimisation level. int main () { asm volatile ("" : : : "r4", "r5"); return 0; } [on my raspberry pi, with the system gcc] $ gcc test.c -mtune=cortex-a15 -marm $ valgrind ./a.out ==15850== Memcheck, a memory error detector ==15850== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==15850== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==15850== Command: ./a.out ==15850== ==15850== Invalid write of size 4 ==15850==at 0x103E8: main (in /home/cgb23/a.out) ==15850== Address 0xbdcf34a4 is just below the stack ptr. To suppress, use: --workaround-gcc296-bugs=yes ... 000103e8 : 103e8: e16d40fcstrdr4, [sp, #-12]! 103ec: e58db008str fp, [sp, #8] 103f0: e28db008add fp, sp, #8 103f4: e3a03000mov r3, #0 103f8: e1a3mov r0, r3 103fc: e24bd008sub sp, fp, #8 10400: e1cd40d0ldrdr4, [sp] 10404: e59db008ldr fp, [sp, #8] 10408: e28dd00cadd sp, sp, #12 1040c: e12fff1ebx lr Without looking at the valgrind sources, I'd guess that valgrind isn't handling the strd instruction correctly. "size 4" obviously isn't correct for the strd, and it also may not be accounting for the writeback of the stack pointer correctly. Looking at google, I found this bug report to the valgrind mailing list: https://sourceforge.net/p/valgrind/mailman/message/34632852/. It seems to relate to the same issue, but did not attract any attention. A brief look at the attached patch suggests that the problem is related to the way valgrind handles writes to the stack with negative offsets and writeback. The suggested --workaround-gcc296-bugs=yes option does seem to suppress the error. Alternatively, since the compiler will only use STRD/LDRD in the prologue and epilogue when compiling for cores with an out-of-order microarchitecture, you can workaround the problem by compiling with -mcpu=cortex-a7, in which case it will use PUSH and POP instead On 9 June 2016 at 22:22, William Mills wrote: > Hello, > > We have been using Linaro GCC 5.x[1] and valgrind. > > When the optimizer is turned on valgrind complains about writes beyond > the current stack pointer. With the optimizer off, the problem report > goes away. > > I have my own conclusion about what is going on but I won't bias you > with it. Here are the facts: > > All files and logs attached as 10K tar.gz if it survives this maillist. > > test.c: > #include > > int main(int argc,char** argv) > { > int i; > > for (i = 1; i < argc; i++) { > printf("argument is %s\n", argv[i]); >} > >return 0; > } > > $ arm-linux-gnueabihf-gcc -march=armv7ve -marm -mfpu=neon \ > -mfloat-abi=hard -mcpu=cortex-a15 -O2 -g \ > -o test-fail test.c > > > $ valgrind --leak-resolution=high --track-origins=yes \ > --trace-children=yes --leak-check=full --error-limit=no \ > ./test-fail arg1 arg2 arg3 > > ==20011== Memcheck, a memory error detector > ==20011== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. > ==20011== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info > ==20011== Command: ./test-fail arg1 arg2 arg3 > ==20011== > ==20011== Invalid write of size 4 > ==20011==at 0x10300: main (test.c:4) > ==20011== Address 0xbdbfcb58 is on thread 1's stack > ==20011== 24 bytes below stack pointer > ==20011== > > 000102f8 : >102f8: e351cmp r0, #1 >102fc: da14ble 10354 >10300: e16d41f8strdr4, [sp, #-24]! ; 0xffe8 > Complaint is here > >10304: e1a05001mov r5, r1 >10308: e3a04001mov r4, #1 >1030c: e1cd60f8strdr6, [sp, #8] >10310: e300748cmovwr7, #1164 ; 0x48c >10314: e1a06000mov r6, r0 >10318: e3407001movtr7, #1 >1031c: e58d8010str r8, [sp, #16] >10320: e58de014str lr, [sp, #20] >10324: e2844001add r4, r4, #1 >10328: e5b51004ldr r1, [r5, #4]! >1032c: e1a7mov r0, r7 >10330: ebe4bl 102c8 >10334: e1560004cmp r6, r4 >10338: 1af9bne 10324 >1033c: e1cd40d0ldrdr4, [sp] >10340: e3a0mov r0, #0 >10344: e1cd60d8ldrdr6, [sp, #8] >10348: e59d8010ldr r8, [sp, #16] >1034c: e28dd014add sp, sp, #20 >10350: e49df004pop {pc}; (ldr pc, [sp], #4) >10354: e3a0mov r0, #0 >10358: e12fff1ebx