Re: LTO vs GCC 8

2018-05-14 Thread David Brown
On 11/05/18 17:49, Freddie Chopin wrote:
> On Fri, 2018-05-11 at 13:06 +0200, David Brown wrote:
>> For the Cortex-M devices (and probably many other RISC targets),
>> -fdata-sections comes at a big cost - it effectively blocks
>> -fsection-anchors and makes access to file-static data a lot bigger.
>> People often use -fdata-sections and -ffunction-sections along with
>> -Wl,--gc-sections with the aim of removing unused code and data (and
>> thus saving space, useful on small devices) - I would expect LTO
>> would
>> manage that anyway.  The other purpose of these is to improve
>> locality
>> of reference - again LTO should do that for you.  But even without
>> LTO,
>> I find the cost of -fdata-sections high compared to -fsection-
>> anchors.
> 
> Unfortunatelly having LTO doesn't make -ffunction-sections + -fdata-
> sections + --gc-sections useless.
> 
> My test project compiled:
> - without LTO and without these attributes - 150824 B ROM + 4240 B RAM
> - with LTO and without these attributes - 133812 B ROM + 4208 B RAM
> - without LTO and with these attributes - 124456 B ROM + 3484 B RAM
> - with LTO and with these attributes - 120280 B ROM + 3680 B RAM
> 
> As you see these attributes give much more than LTO in terms of size.
> 

Interesting.  Making these sections and then using gc-sections should
only remove code that is not used - LTO should do that anyway.

Have you tried with -ffunction-sections and not -fdata-sections?  It is
the -fdata-sections that ruins -fsection-anchors - the
-ffunction-sections doesn't have the same kind of cost.

>
> As for the -fsection-anchors I guess this has no use for non-PIC code
> for arm-none-eabi. Whether I use it or not, the sizes are identical.
> 

No, -fsection-anchors has plenty of use for fixed-position eabi code.

Take this little example code:

static int x;
static int y;
static int z;

void foo(void) {
int t = x;
x = y;
y = z;
z = t;
}

Compiled with gcc (4.8, as that's what I had convenient) with -O2
-mcpu=cortex-m4 -mthumb and -fsection-anchors (enabled automatically
with -O2, I believe), this gives:

  21foo:
  22@ args = 0, pretend = 0, frame = 0
  23@ frame_needed = 0, uses_anonymous_args = 0
  24@ link register save eliminated.
  25  034B  ldr r3, .L2
  26 0002 93E80500  ldmia   r3, {r0, r2}
  27 0006 9968  ldr r1, [r3, #8]
  28 0008 1A60  str r2, [r3]
  29 000a 9860  str r0, [r3, #8]
  30 000c 5960  str r1, [r3, #4]
  31 000e 7047  bx  lr
  32.L3:
  33.align  2
  34.L2:
  35 0010   .word   .LANCHOR0
  37.bss
  38.align  2
  39.set.LANCHOR0,. + 0
  42x:
  43    .space  4
  46y:
  47 0004   .space  4
  50z:
  51 0008   .space  4


With -fdata-sections, I get:

  21foo:
  22@ args = 0, pretend = 0, frame = 0
  23@ frame_needed = 0, uses_anonymous_args = 0
  24@ link register save eliminated.
  25  30B4  push{r4, r5}
  26 0002 0549  ldr r1, .L2
  27 0004 054B  ldr r3, .L2+4
  28 0006 064A  ldr r2, .L2+8
  29 0008 0D68  ldr r5, [r1]
  30 000a 1468  ldr r4, [r2]
  31 000c 1868  ldr r0, [r3]
  32 000e 1560  str r5, [r2]
  33 0010 1C60  str r4, [r3]
  34 0012 0860  str r0, [r1]
  35 0014 30BC  pop {r4, r5}
  36 0016 7047  bx  lr
  37.L3:
  38.align  2
  39.L2:
  40 0018   .word   .LANCHOR0
  41 001c   .word   .LANCHOR1
  42 0020   .word   .LANCHOR2
  44.section.bss.x,"aw",%nobits
  45.align  2
  46.set.LANCHOR0,. + 0
  49x:
  50    .space  4
  51.section.bss.y,"aw",%nobits
  52.align  2
  53.set.LANCHOR1,. + 0
  56y:
  57    .space  4
  58.section.bss.z,"aw",%nobits
  59.align  2
  60.set.LANCHOR2,. + 0
  63z:
  64    .

Re: Possible bug in cse.c affecting pre/post-modify mem access

2018-05-14 Thread Jeff Law
On 05/12/2018 01:35 PM, Bernd Schmidt wrote:
> On 05/12/2018 07:01 PM, Jeff Law wrote:
> 
>> No.  We're not supposed to have any auto-inc insns prior to the auto-inc
>> pass.  A stack push/pop early in the compiler would have to be
>> represented by a PARALLEL.
>>
>> It's been this way forever.  It's documented in the internals manual
>> somewhere.
> 
> Sorry, but you're misremembering this. Stack pushes/pops were always
> represented with autoinc, these being the only exception to the rule you
> remember. You can easily verify this by looking at a .expand dump from a
> 32-bit i386 compiler - I just did so with 2.95 and 6.4. It's all pre_dec
> for argument passing.
That does sound vaguely familiar.  Did we put autoinc notes on the stack
pushes?


That makes me wonder if there is a latent bug though.  Consider pushing
args to a pure function.  Could we then try to CSE the memory reference
and get it wrong because we haven't accounted for the autoinc?

Jeff


Please support the _Atomic keyword in C++

2018-05-14 Thread Rodrigo V. G.
In addition to the bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60932
I wanted to add some comment:

It would be very useful if the _Atomic keyword would be supported in C++.
This way the  header could be included inconditionally in C++ code.
Even if it is not compatible with the C++  header, it would be useful.

Supporting the _Atomic keyword in C++ would benefit at least two cases:

- When mixing C and C++ code for interoperability (using, in C++, some
variables declared as _Atomic in a C header).

- When developing operating systems or kernels in C++, in a
freestanding environment (cross compiler),  is not available,
but  is.
So to correctly use things like __atomic_fetch_add in C++ in
freestanding mode, this is the only way. Otherwise one cannot use
atomics at all in these conditions.


Re: Please support the _Atomic keyword in C++

2018-05-14 Thread Jonathan Wakely
On 14 May 2018 at 22:32, Rodrigo V. G.  wrote:
> In addition to the bug:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60932
> I wanted to add some comment:
>
> It would be very useful if the _Atomic keyword would be supported in C++.
> This way the  header could be included inconditionally in C++ 
> code.
> Even if it is not compatible with the C++  header, it would be useful.
>
> Supporting the _Atomic keyword in C++ would benefit at least two cases:
>
> - When mixing C and C++ code for interoperability (using, in C++, some
> variables declared as _Atomic in a C header).
>
> - When developing operating systems or kernels in C++, in a
> freestanding environment (cross compiler),  is not available,

Why not? It's part of a freestanding C++ implementation.

> but  is.

How? It's not part of any C++ implementation at all, freestanding or not.

> So to correctly use things like __atomic_fetch_add in C++ in
> freestanding mode, this is the only way. Otherwise one cannot use
> atomics at all in these conditions.

Why can't you use __atomic_fetch_add directly?


Auto-generated .rodata contents and __attribute__((section))

2018-05-14 Thread Julius Werner
Hi,

I'm a firmware/embedded engineer and frequently run into cases where
certain parts of the code need to be placed in a special memory area (for
example, because the area that contains the other code is not yet
initialized or currently inaccessible). My go-to method to solve this is to
mark all functions and globals used by this code with
__attribute__((section)), and using a linker script to map those special
sections to the desired area. This mostly works pretty well.

However, I just found an issue with this when the functions include local
variables like this:

  const int some_array[] = { 1, 2, 3, 4, 5, 6 };

In this case (and with -Os optimization), GCC seems to automatically
reserve some space in the .rodata section to place the array, and the
generated code accesses it there. Of course this breaks my use case if the
generic .rodata section is inaccessible while that function executes. I
have not found any way to work around this without either rewriting the
code to completely avoid those constructs, or manipulating sections
manually at the linker level (in particular, you can't just mark the array
itself with __attribute__((section)), since that attribute is not legal for
locals).

Is this intentional, and if so, does it make sense that it is? I can
understand that it may technically be compliant with the description of
__attribute__((section)) in the GCC manual -- but I think the use case I'm
trying to solve is one of the most common uses of that attribute, and it
seems to become completely impossible due to this. Wouldn't it make more
sense and be more useful if __attribute__((section)) meant "place
*everything* generated as part of this function source code into that
section"? Or at least offer some sort of other extension to be able to
control section placement for those special constants? (Note that GCC
usually seems to place constants for individual variables in the text
section, simply behind the epilogue of the function... so it's also quite
unclear to me why arrays get treated differently at all.)

Apart from this issue, this behavior also seems to "break"
-ffunction-sections/-fdata-sections. Even with both of those set, these
sorts of constants seem to get placed into the same big, common .rodata
section (as opposed to either .text.functionname or .rodata.functionname as
you'd expect). That means that they won't get collected when linking the
binary with --gc-sections and will bloat the code size for projects that
link a lot of code opportunistically and rely on --gc-sections to drop
everything that's not needed for the current configuration.

Is there some clever trick that I missed to work around this, or is this
really not possible with the current GCC? And if so, would you agree that
this is a valid problem that GCC should provide a solution for (in some
form or another)?

Thanks,
Julius


Re: Possible bug in cse.c affecting pre/post-modify mem access

2018-05-14 Thread Bernd Schmidt
On 05/14/2018 10:55 PM, Jeff Law wrote:
> That does sound vaguely familiar.  Did we put autoinc notes on the stack
> pushes?

Not as far as I recall. I only see REG_ARGS_SIZE notes in the dumps.

> That makes me wonder if there is a latent bug though.  Consider pushing
> args to a pure function.  Could we then try to CSE the memory reference
> and get it wrong because we haven't accounted for the autoinc?

Can't know for sure but one would hope something would test for
side_effects_p.


Bernd