Re: LTO vs GCC 8
On 11/05/18 17:49, Freddie Chopin wrote: > On Fri, 2018-05-11 at 13:06 +0200, David Brown wrote: >> For the Cortex-M devices (and probably many other RISC targets), >> -fdata-sections comes at a big cost - it effectively blocks >> -fsection-anchors and makes access to file-static data a lot bigger. >> People often use -fdata-sections and -ffunction-sections along with >> -Wl,--gc-sections with the aim of removing unused code and data (and >> thus saving space, useful on small devices) - I would expect LTO >> would >> manage that anyway. The other purpose of these is to improve >> locality >> of reference - again LTO should do that for you. But even without >> LTO, >> I find the cost of -fdata-sections high compared to -fsection- >> anchors. > > Unfortunatelly having LTO doesn't make -ffunction-sections + -fdata- > sections + --gc-sections useless. > > My test project compiled: > - without LTO and without these attributes - 150824 B ROM + 4240 B RAM > - with LTO and without these attributes - 133812 B ROM + 4208 B RAM > - without LTO and with these attributes - 124456 B ROM + 3484 B RAM > - with LTO and with these attributes - 120280 B ROM + 3680 B RAM > > As you see these attributes give much more than LTO in terms of size. > Interesting. Making these sections and then using gc-sections should only remove code that is not used - LTO should do that anyway. Have you tried with -ffunction-sections and not -fdata-sections? It is the -fdata-sections that ruins -fsection-anchors - the -ffunction-sections doesn't have the same kind of cost. > > As for the -fsection-anchors I guess this has no use for non-PIC code > for arm-none-eabi. Whether I use it or not, the sizes are identical. > No, -fsection-anchors has plenty of use for fixed-position eabi code. Take this little example code: static int x; static int y; static int z; void foo(void) { int t = x; x = y; y = z; z = t; } Compiled with gcc (4.8, as that's what I had convenient) with -O2 -mcpu=cortex-m4 -mthumb and -fsection-anchors (enabled automatically with -O2, I believe), this gives: 21foo: 22@ args = 0, pretend = 0, frame = 0 23@ frame_needed = 0, uses_anonymous_args = 0 24@ link register save eliminated. 25 034B ldr r3, .L2 26 0002 93E80500 ldmia r3, {r0, r2} 27 0006 9968 ldr r1, [r3, #8] 28 0008 1A60 str r2, [r3] 29 000a 9860 str r0, [r3, #8] 30 000c 5960 str r1, [r3, #4] 31 000e 7047 bx lr 32.L3: 33.align 2 34.L2: 35 0010 .word .LANCHOR0 37.bss 38.align 2 39.set.LANCHOR0,. + 0 42x: 43 .space 4 46y: 47 0004 .space 4 50z: 51 0008 .space 4 With -fdata-sections, I get: 21foo: 22@ args = 0, pretend = 0, frame = 0 23@ frame_needed = 0, uses_anonymous_args = 0 24@ link register save eliminated. 25 30B4 push{r4, r5} 26 0002 0549 ldr r1, .L2 27 0004 054B ldr r3, .L2+4 28 0006 064A ldr r2, .L2+8 29 0008 0D68 ldr r5, [r1] 30 000a 1468 ldr r4, [r2] 31 000c 1868 ldr r0, [r3] 32 000e 1560 str r5, [r2] 33 0010 1C60 str r4, [r3] 34 0012 0860 str r0, [r1] 35 0014 30BC pop {r4, r5} 36 0016 7047 bx lr 37.L3: 38.align 2 39.L2: 40 0018 .word .LANCHOR0 41 001c .word .LANCHOR1 42 0020 .word .LANCHOR2 44.section.bss.x,"aw",%nobits 45.align 2 46.set.LANCHOR0,. + 0 49x: 50 .space 4 51.section.bss.y,"aw",%nobits 52.align 2 53.set.LANCHOR1,. + 0 56y: 57 .space 4 58.section.bss.z,"aw",%nobits 59.align 2 60.set.LANCHOR2,. + 0 63z: 64 .
Re: Possible bug in cse.c affecting pre/post-modify mem access
On 05/12/2018 01:35 PM, Bernd Schmidt wrote: > On 05/12/2018 07:01 PM, Jeff Law wrote: > >> No. We're not supposed to have any auto-inc insns prior to the auto-inc >> pass. A stack push/pop early in the compiler would have to be >> represented by a PARALLEL. >> >> It's been this way forever. It's documented in the internals manual >> somewhere. > > Sorry, but you're misremembering this. Stack pushes/pops were always > represented with autoinc, these being the only exception to the rule you > remember. You can easily verify this by looking at a .expand dump from a > 32-bit i386 compiler - I just did so with 2.95 and 6.4. It's all pre_dec > for argument passing. That does sound vaguely familiar. Did we put autoinc notes on the stack pushes? That makes me wonder if there is a latent bug though. Consider pushing args to a pure function. Could we then try to CSE the memory reference and get it wrong because we haven't accounted for the autoinc? Jeff
Please support the _Atomic keyword in C++
In addition to the bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60932 I wanted to add some comment: It would be very useful if the _Atomic keyword would be supported in C++. This way the header could be included inconditionally in C++ code. Even if it is not compatible with the C++ header, it would be useful. Supporting the _Atomic keyword in C++ would benefit at least two cases: - When mixing C and C++ code for interoperability (using, in C++, some variables declared as _Atomic in a C header). - When developing operating systems or kernels in C++, in a freestanding environment (cross compiler), is not available, but is. So to correctly use things like __atomic_fetch_add in C++ in freestanding mode, this is the only way. Otherwise one cannot use atomics at all in these conditions.
Re: Please support the _Atomic keyword in C++
On 14 May 2018 at 22:32, Rodrigo V. G. wrote: > In addition to the bug: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60932 > I wanted to add some comment: > > It would be very useful if the _Atomic keyword would be supported in C++. > This way the header could be included inconditionally in C++ > code. > Even if it is not compatible with the C++ header, it would be useful. > > Supporting the _Atomic keyword in C++ would benefit at least two cases: > > - When mixing C and C++ code for interoperability (using, in C++, some > variables declared as _Atomic in a C header). > > - When developing operating systems or kernels in C++, in a > freestanding environment (cross compiler), is not available, Why not? It's part of a freestanding C++ implementation. > but is. How? It's not part of any C++ implementation at all, freestanding or not. > So to correctly use things like __atomic_fetch_add in C++ in > freestanding mode, this is the only way. Otherwise one cannot use > atomics at all in these conditions. Why can't you use __atomic_fetch_add directly?
Auto-generated .rodata contents and __attribute__((section))
Hi, I'm a firmware/embedded engineer and frequently run into cases where certain parts of the code need to be placed in a special memory area (for example, because the area that contains the other code is not yet initialized or currently inaccessible). My go-to method to solve this is to mark all functions and globals used by this code with __attribute__((section)), and using a linker script to map those special sections to the desired area. This mostly works pretty well. However, I just found an issue with this when the functions include local variables like this: const int some_array[] = { 1, 2, 3, 4, 5, 6 }; In this case (and with -Os optimization), GCC seems to automatically reserve some space in the .rodata section to place the array, and the generated code accesses it there. Of course this breaks my use case if the generic .rodata section is inaccessible while that function executes. I have not found any way to work around this without either rewriting the code to completely avoid those constructs, or manipulating sections manually at the linker level (in particular, you can't just mark the array itself with __attribute__((section)), since that attribute is not legal for locals). Is this intentional, and if so, does it make sense that it is? I can understand that it may technically be compliant with the description of __attribute__((section)) in the GCC manual -- but I think the use case I'm trying to solve is one of the most common uses of that attribute, and it seems to become completely impossible due to this. Wouldn't it make more sense and be more useful if __attribute__((section)) meant "place *everything* generated as part of this function source code into that section"? Or at least offer some sort of other extension to be able to control section placement for those special constants? (Note that GCC usually seems to place constants for individual variables in the text section, simply behind the epilogue of the function... so it's also quite unclear to me why arrays get treated differently at all.) Apart from this issue, this behavior also seems to "break" -ffunction-sections/-fdata-sections. Even with both of those set, these sorts of constants seem to get placed into the same big, common .rodata section (as opposed to either .text.functionname or .rodata.functionname as you'd expect). That means that they won't get collected when linking the binary with --gc-sections and will bloat the code size for projects that link a lot of code opportunistically and rely on --gc-sections to drop everything that's not needed for the current configuration. Is there some clever trick that I missed to work around this, or is this really not possible with the current GCC? And if so, would you agree that this is a valid problem that GCC should provide a solution for (in some form or another)? Thanks, Julius
Re: Possible bug in cse.c affecting pre/post-modify mem access
On 05/14/2018 10:55 PM, Jeff Law wrote: > That does sound vaguely familiar. Did we put autoinc notes on the stack > pushes? Not as far as I recall. I only see REG_ARGS_SIZE notes in the dumps. > That makes me wonder if there is a latent bug though. Consider pushing > args to a pure function. Could we then try to CSE the memory reference > and get it wrong because we haven't accounted for the autoinc? Can't know for sure but one would hope something would test for side_effects_p. Bernd