Re: Issues when emitting sjlj dispatch table
On 09/05/2017 01:18 PM, Richard Biener wrote: > On Tue, Sep 5, 2017 at 12:20 PM, Claudiu Zissulescu > wrote: >> Hi guys, >> >> I found an ICE when emitting sjlj dispatch table for ARC. Namely, in >> sjlj_emit_dispatch_table() function, we create a dispatch table where the >> case elements are having the high value is set to NULL (except.c:1326). >> Later, these case statements are used by expand_sjlj_dispatch_table() >> (stmt.c:1006) where we create a case list requiring also the high element >> (stmt.c:1066). This leads to an error when we try to compute the high bounds >> in emit_case_dispatch_table() (stmt.c:786), due to the fact that high value >> is null. >> >> In gcc7.x, we were initializing the high value of case elements in >> sjlj_emit_dispatch_table() with the CASE_LOW. Shouldn't we do the same >> thing, or do I miss something. > > A NULL CASE_HIGH means the case covers a single value, CASE_LOW. > > Probably broken by Martins reorg. Yes, it's mine and probably dup of PR82154. I've got tested patch that will land soon in @gcc-patches. Martin > > Richard. > >> A test is attached, the error is visible for ARC backend, the option for the >> compiler should be -O2. >> >> Thanks, >> Claudiu
Invalid free in standard library in trivial example with C++17 on gcc 7.2
Hi, Apologies if I am coming about this in the wrong way, I am new to the mailing list. During our preliminary work to upgrade to gcc 7.2 (from 6.3) at my workplace, we have come across a bug that is blocking our move to C++17. I have raised a bug report here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82172. There is an invalid free in a string within basic_stringbuf when inserting a character into an empty stringbuf when using the pre CXX11 ABI, LTO, O1 and C++17. Could anyone offer some advice on diagnosing this further, or working around this issue that doesn't involve moving to the CXX11 ABI? Thanks in advance, -- Shane
Byte swapping support
Hi, To support applications that assume big-endian memory layout on little- endian systems, I'm considering adding support for reversing the storage order to GCC. In contrast to the existing scalar storage order support for structs, the goal is to reverse the storage order for all memory operations to achieve maximum compatibility with the behavior on big-endian systems, as far as observable by the application. The plan is to insert byte swapping instructions as part of the RTL expansion of GIMPLE assignments that access memory. This would leverage code that was added for -fsso-struct, keeping the code simple and maintainable. It should not be necessary to insert byte swapping instructions for spilled registers. Is this something that GCC upstream would be interested to accept? To facilitate byte swapping at endian boundaries (kernel or libraries), I'm also considering developing a new GCC builtin that can byte-swap whole structs in memory. There are limitations to this, e.g., unions could not be supported in general. However, I still expect this to be very useful. Any comments or suggestions? Best regards, Jürg
Re: Help out/New to the Project
On 09/09/2017 08:00 AM, Ramana Radhakrishnan wrote: There are a few getting started guides in the wiki (http://gcc.gnu.org/wiki - Look for tutorials) which can help you get started in terms of reading up on the internals, there are a few Easy hacks listed in the wiki page here https://gcc.gnu.org/wiki/EasyHacks I've lust added a link to EasyHacks from GettingStarted (it wasn't obvious to me how to find it). Is there a different page for 'more involved changes'? We could likn to that from EasyHacks. nathan -- Nathan Sidwell
RFC: Improving GCC8 default option settings
Hi all, At the GNU Cauldron I was inspired by several interesting talks about improving GCC in various ways. While GCC has many great optimizations, a common theme is that its default settings are rather conservative. As a result users are required to enable several additional optimizations by hand to get good code. Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was mentioned repeatedly) which GCC could/should do as well. Here are a few concrete proposals to improve GCC's option settings which will enable better code generation for most targets: * Make -fno-math-errno the default - this mostly affects the code generated for sqrt, which should be treated just like floating point division and not set errno by default (unless you explicitly select C89 mode). * Make -fno-trapping-math the default - another obvious one. From the docs: "Compile code assuming that floating-point operations cannot generate user-visible traps." There isn't a lot of code that actually uses user-visible traps (if any - many CPUs don't even support user traps as it's an optional IEEE feature). So assuming trapping math by default is way too conservative since there is no obvious benefit to users. * Make -fno-common the default - this was originally needed for pre-ANSI C, but is optional in C (not sure whether it is still in C99/C11). This can significantly improve code generation on targets that use anchors for globals (note the linker could report a more helpful message when ancient code that requires -fcommon fails to link). * Make -fomit-frame-pointer the default - various targets already do this at higher optimization levels, but this could easily be done for all targets. Frame pointers haven't been needed for debugging for decades, however if there are still good reasons to keep it enabled with -O0 or -O1 (I can't think of any unless it is for last-resort backtrace when there is no unwind info at a crash), we could just disable the frame pointer from -O2 onwards. These are just a few ideas to start. What do people think? I'd welcome discussion and other proposals for similar improvements. Wilco
Re: Byte swapping support
> On Sep 12, 2017, at 5:32 AM, Jürg Billeter > wrote: > > Hi, > > To support applications that assume big-endian memory layout on little- > endian systems, I'm considering adding support for reversing the > storage order to GCC. In contrast to the existing scalar storage order > support for structs, the goal is to reverse the storage order for all > memory operations to achieve maximum compatibility with the behavior on > big-endian systems, as far as observable by the application. I've done this in the past by C++ type magic. As a general setting it doesn't make sense that I can see. As an attribute applied to a particular data item, it does. But I'm not sure why you'd put this in the compiler when programmers can do it easily enough by defining a "big endian int32" class, etc. paul
Re: Byte swapping support
On 12/09/17 16:15, paul.kon...@dell.com wrote: > >> On Sep 12, 2017, at 5:32 AM, Jürg Billeter >> wrote: >> >> Hi, >> >> To support applications that assume big-endian memory layout on little- >> endian systems, I'm considering adding support for reversing the >> storage order to GCC. In contrast to the existing scalar storage order >> support for structs, the goal is to reverse the storage order for all >> memory operations to achieve maximum compatibility with the behavior on >> big-endian systems, as far as observable by the application. > > I've done this in the past by C++ type magic. As a general setting > it doesn't make sense that I can see. As an attribute applied to a > particular data item, it does. But I'm not sure why you'd put this in > the compiler when programmers can do it easily enough by defining a "big > endian int32" class, etc. > Some people use the compiler for C rather than C++ ... If someone wants to improve on the endianness support in gcc, I can think of a few ideas that /I/ think might be useful. I have no idea how difficult they might be to put in practice, and can't say if they would be of interest to others. First, I would like to see endianness given as a named address space, rather than as a type attribute. A key point here is that named address spaces are effectively qualifiers, like "const" and "volatile" - and you can then use them in pointers: big_endian uint32_t be_buffer[20]; little_endian uint32_t le_buffer[20]; void copy_buffers(const big_endian uint32_t * src, little_endian uint32_t * dest) { for (int i = 0; i < 20; i++) { dest[i] = src[i]; // Swaps endianness on copy } } That would also let you use them for scaler types, not just structs, and you could use typedefs: typedef big_endian uint32_t be_uint32_t; Secondly, I would add more endian types. As well as big_endian and little_endian, I would add native_endian and reverse_endian. These could let you write a little clearer definitions sometimes. And ideally I would like mixed endian with big-endian 16-bit ordering and little-endian ordering for bigger types (i.e., 0x87654321 would be stored 0x43, 0x21, 0x87, 0x65). That order matches some protocols, such as Modbus. Third, I'd like to be able to attach the attribute to particular variables and to scalers, not just struct and union types. Forth, I would like type-punning through unions with different ordering to be allowed. I'd like to be able to define: union U { __attribute__((scalar_storage_order("big-endian"))) uint16_t protocol[32]; __attribute__((scalar_storage_order("little-endian"))) struct { uint32_t id; uint16_t command; uint16_t param1; ... } } and then I could access data in little ordering in the structures, then in 16-bit big-endian lumps via the "protocol" array. David
Re: RFC: Improving GCC8 default option settings
Another one that might be interesting is -funsafe-loop-optimizations. In most cases people write loops assuming simple finite loops (no overflow). Crippling optimization for the small amount of people (system programmers ?) that use such strange loops seems counterproductive. It would be best if such loops can be marked with an attribute in some way and that the general case just assumes that all loops are finite... 0x4F273D5D.asc Description: application/pgp-keys
Re: RFC: Improving GCC8 default option settings
On Tue, 12 Sep 2017, Wilco Dijkstra wrote: > * Make -fno-trapping-math the default - another obvious one. From the docs: > "Compile code assuming that floating-point operations cannot generate >user-visible traps." > There isn't a lot of code that actually uses user-visible traps (if any - > many CPUs don't even support user traps as it's an optional IEEE feature). > So assuming trapping math by default is way too conservative since there is > no obvious benefit to users. "traps" here means "raising IEEE exception flags" not just "invoking trap handlers". That is, -ftrapping-math disables a range of local transformations that would change the set of flags raised by an operation. (Transformations that change the nonzero number of times a flag is raised to a different nonzero number are always OK; that is, the possibility of a trap handler counting how many times it is invoked is never considered. Transformations that might move flag raising across function calls or asms that might inspect or modify the flags should not be OK, at least with a stricter version of -ftrapping-math that might be another option, but we don't have that stricter version at present; -ftrapping-math generally does not disable code movement, or removal of code that is dead apart from its effect on exception flags.) That is, lack of trap support on processors that only support exception flags is not relevant to -ftrapping-math, beyond any question of whether -ftrapping-math should disable transformations that only affect whether an exact underflow exception occurs (the case where default exception handling does not raise the flag), if we have any such transformations (constant folding on exact underflow?). It's true that a stricter version of -ftrapping-math that inhibits code movement and removal would probably inhibit *more* optimizations than -frounding-math (which is off by default), as -frounding-math only makes floating-point operations read thread-local state but -ftrapping-math makes them write it as well. -- Joseph S. Myers jos...@codesourcery.com
Re: RFC: Improving GCC8 default option settings
.On Tue, Sep 12, 2017 at 8:29 AM, Theodore Papadopoulo wrote: > Another one that might be interesting is -funsafe-loop-optimizations. > In most cases people write loops assuming simple finite loops (no > overflow). Crippling optimization for the small amount of people (system > programmers ?) that use such strange loops seems counterproductive. It > would be best if such loops can be marked with an attribute in some way > and that the general case just assumes that all loops are finite... -funsafe-loop-optimizations is a nop in GCC 7 and above. Since https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00956.html . Thanks, Andrew
Re: RFC: Improving GCC8 default option settings
On 09/12/2017 05:32 PM, Andrew Pinski wrote: > .On Tue, Sep 12, 2017 at 8:29 AM, Theodore Papadopoulo > wrote: >> Another one that might be interesting is -funsafe-loop-optimizations. >> In most cases people write loops assuming simple finite loops (no >> overflow). Crippling optimization for the small amount of people (system >> programmers ?) that use such strange loops seems counterproductive. It >> would be best if such loops can be marked with an attribute in some way >> and that the general case just assumes that all loops are finite... > > -funsafe-loop-optimizations is a nop in GCC 7 and above. > Since https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00956.html . > > Thanks, > Andrew > Thank's for the notice. For some reason, I missed that piece of information... Too bad that making such an assumption generates bogus code in some common cases. Theo. 0x4F273D5D.asc Description: application/pgp-keys
Re: RFC: Improving GCC8 default option settings
On Tue, 12 Sep 2017, Wilco Dijkstra wrote: > * Make -fno-math-errno the default - this mostly affects the code generated > for > sqrt, which should be treated just like floating point division and not set > errno by default (unless you explicitly select C89 mode). > > * Make -fno-trapping-math the default - another obvious one. From the docs: Note these would both have implications for library math_errhandling settings (since the compiler options can affect built-in functions). In the absence of -ffast-math glibc defines it to (MATH_ERRNO | MATH_ERREXCEPT). __NO_MATH_ERRNO__ exists since GCC 5 to allow it to be defined to just MATH_ERREXCEPT in the -fno-math-errno case, but the header needs updating to respect that. And we don't have a macro for -fno-trapping-math to say whether MATH_ERREXCEPT should be part of the value. My assumption is that with changed defaults glibc would need to compile with -ftrapping-math just as it uses -frounding-math; code may expect transformations that add exceptions not to occur. It probably does not require -fmath-errno (glibc functions do not generally rely on other functions setting errno). -- Joseph S. Myers jos...@codesourcery.com
Re: Byte swapping support
On Tue, Sep 12, 2017 at 2:32 AM, Jürg Billeter wrote: > Hi, > > To support applications that assume big-endian memory layout on little- > endian systems, I'm considering adding support for reversing the > storage order to GCC. In contrast to the existing scalar storage order > support for structs, the goal is to reverse the storage order for all > memory operations to achieve maximum compatibility with the behavior on > big-endian systems, as far as observable by the application. > > The plan is to insert byte swapping instructions as part of the RTL > expansion of GIMPLE assignments that access memory. This would leverage > code that was added for -fsso-struct, keeping the code simple and > maintainable. It should not be necessary to insert byte swapping > instructions for spilled registers. > > Is this something that GCC upstream would be interested to accept? > > To facilitate byte swapping at endian boundaries (kernel or libraries), > I'm also considering developing a new GCC builtin that can byte-swap > whole structs in memory. There are limitations to this, e.g., unions > could not be supported in general. However, I still expect this to be > very useful. > > Any comments or suggestions? > > Best regards, > Jürg Can you use __attribute__ ((scalar_storage_order)) in GCC 7? -- H.J.
Re: RFC: Improving GCC8 default option settings
On Tue, 12 Sep 2017, Wilco Dijkstra wrote: > * Make -fno-math-errno the default - this mostly affects the code generated > for > sqrt, which should be treated just like floating point division and not set > errno by default (unless you explicitly select C89 mode). (note that this can be selectively enabled by targets where libm never sets errno in the first place, docs call out Darwin as one such target, but musl-libc targets have this property too) > * Make -fno-trapping-math the default - another obvious one. From the docs: > "Compile code assuming that floating-point operations cannot generate >user-visible traps." > There isn't a lot of code that actually uses user-visible traps (if any - > many CPUs don't even support user traps as it's an optional IEEE feature). > So assuming trapping math by default is way too conservative since there is > no obvious benefit to users. OTOH -O options are understood to _never_ sacrifice standards compliance, with the exception of -Ofast. I believe that's an important property to keep. Maybe it's possible to treat -fno-trapping-math similar to -ffp-contract=fast, i.e. implicitly enable it in the default C-with-GNU-extensions mode, keeping strict-compliance mode (-std=c11 as opposed to gnu11) untouched? In any case it shouldn't be hard to issue a warning if fenv.h functions are used when -fno-trapping-math/-fno-rounding-math is enabled. If the above doesn't fly, I believe adopting and promoting a single option for non-value-changing math optimizations (-fno-math-errno -fno-trapping-math, plus -fno-rounding-math -fno-signaling-nans when they're no longer default) would be nice. > * Make -fno-common the default - this was originally needed for pre-ANSI C, > but > is optional in C (not sure whether it is still in C99/C11). This can > significantly improve code generation on targets that use anchors for > globals > (note the linker could report a more helpful message when ancient code that > requires -fcommon fails to link). I think in ISO C situations where -fcommon allows link to succeed fall under undefined behavior, which in GNU toolchain is defined to match the historical behavior. I assume the main issue with this is the amount of legacy code that would cause a link failure if -fno-common is made default - thus, is there anybody in position to trigger a full-distro rebuild with gcc patched to enable -fno-common, and compare before/after build failure stats? Thanks. Alexander
Re: Byte swapping support
On Tue, Sep 12, 2017 at 05:26:29PM +0200, David Brown wrote: > On 12/09/17 16:15, paul.kon...@dell.com wrote: > > > >> On Sep 12, 2017, at 5:32 AM, Jürg Billeter > >> wrote: > >> > >> Hi, > >> > >> To support applications that assume big-endian memory layout on little- > >> endian systems, I'm considering adding support for reversing the > >> storage order to GCC. In contrast to the existing scalar storage order > >> support for structs, the goal is to reverse the storage order for all > >> memory operations to achieve maximum compatibility with the behavior on > >> big-endian systems, as far as observable by the application. > > > > I've done this in the past by C++ type magic. As a general setting > > it doesn't make sense that I can see. As an attribute applied to a > > particular data item, it does. But I'm not sure why you'd put this in > > the compiler when programmers can do it easily enough by defining a "big > > endian int32" class, etc. > > > > Some people use the compiler for C rather than C++ ... > > If someone wants to improve on the endianness support in gcc, I can > think of a few ideas that /I/ think might be useful. I have no idea how > difficult they might be to put in practice, and can't say if they would > be of interest to others. > > First, I would like to see endianness given as a named address space, > rather than as a type attribute. A key point here is that named address > spaces are effectively qualifiers, like "const" and "volatile" - and you > can then use them in pointers: When I gave the talk at the 2009 GCC summit on the named address support, I thought that it could be used to add endianess support. In fact at one time, I had a trial PowerPC compiler that added endianess support. Unfortunately, that was in 2009, and I lost the directory of the work. I tried again a few years ago, but I didn't get far enough into it to get a working compiler before being pulled back into work. Back when I worked at Cygnus Solutions, we used to get requests to add endian support every so often, but nobody wanted to pay the cost that we were then quoting to add the support. Now that named address support is in, it could be done better. I suspect however, you want to do this at the higher tree level, adding in the endianess bits in a separate area than the named address support. Or perhaps, growing the named address support, and adding several standard named addresses. The paper where I talked about the named address support was from the 2009 GCC summit. You can download the proceedings of the 2009 summit from here (my paper is pages 67-74): https://en.wikipedia.org/wiki/GCC_Summit > big_endian uint32_t be_buffer[20]; > little_endian uint32_t le_buffer[20]; > > void copy_buffers(const big_endian uint32_t * src, > little_endian uint32_t * dest) > { > for (int i = 0; i < 20; i++) { > dest[i] = src[i]; // Swaps endianness on copy > } > } > > That would also let you use them for scaler types, not just structs, and > you could use typedefs: > > typedef big_endian uint32_t be_uint32_t; > > > Secondly, I would add more endian types. As well as big_endian and > little_endian, I would add native_endian and reverse_endian. These > could let you write a little clearer definitions sometimes. And ideally > I would like mixed endian with big-endian 16-bit ordering and > little-endian ordering for bigger types (i.e., 0x87654321 would be > stored 0x43, 0x21, 0x87, 0x65). That order matches some protocols, such > as Modbus. It depends, you can add so many different combinations, that in the end you don't add the support you want because of th 53 other variants. Note, if you use it in named addresses, you are currently limited to 15 new keywords for adding named address support. This can be grown, but you should know about the limit ahead of time. Of course if you add it in a parallel, machine independent version, then you don't have to worry about the existing limits. As I write this, another usage for named addresses occurs to me -- and that is restricting addressing for memory mapped I/O regions that don't allow certain types of accesses. > Third, I'd like to be able to attach the attribute to particular > variables and to scalers, not just struct and union types. > > Forth, I would like type-punning through unions with different ordering > to be allowed. I'd like to be able to define: > > union U { > __attribute__((scalar_storage_order("big-endian"))) uint16_t > protocol[32]; > __attribute__((scalar_storage_order("little-endian"))) struct { > uint32_t id; > uint16_t command; > uint16_t param1; > ... > } > } This definately requires support at the higher levels of the compiler. > and then I could access data in little ordering in the structures, then > in 16-bit big-endian lumps via the "protocol" array. One of the things you have to do is be prepared to do a
Successful bootstrap and install of gcc (GCC) 7.2.0 on hppa2.0-unknown-linux-gnu
Hi, Here's a report of a successful build and install of GCC: $ gcc-7.2.0/config.guess hppa2.0-unknown-linux-gnu $ newcompiler/bin/gcc -v Using built-in specs. COLLECT_GCC=newcompiler/bin/gcc COLLECT_LTO_WRAPPER=/home/aaro/gcctest/newcompiler/libexec/gcc/hppa-unknown-linux-gnu/7.2.0/lto-wrapper Target: hppa-unknown-linux-gnu Configured with: ../gcc-7.2.0/configure --disable-nls --prefix=/home/aaro/gcctest/newcompiler --enable-languages=c,c++ --host=hppa-unknown-linux-gnu --build=hppa-unknown-linux-gnu --target=hppa-unknown-linux-gnu --with-system-zlib --with-sysroot=/ Thread model: posix gcc version 7.2.0 (GCC) -- Build environment -- host: hp-c3700 distro: los.git rootfs=dc818 native=dc818 kernel: Linux 4.13.0-los_dc818 binutils: GNU binutils 2.29 make: GNU Make 4.2.1 libc: GNU C Library (GNU libc) stable release version 2.26 zlib: 1.2.11 mpfr: 3.1.3 gmp: 60102 -- Time consumed -- configure: real0m 25.55s user0m 12.21s sys 0m 11.55s bootstrap: real22h 30m 18s user21h 32m 25s sys 54m 12.95s install:real1m 19.96s user0m 22.57s sys 0m 54.13s -- Hardware details --- MemTotal: 3108288 kB processor : 0 cpu family : PA-RISC 2.0 cpu : PA8700 (PCX-W2) cpu MHz : 750.00 capabilities: os32 os64 iopdir_fdc nva_supported (0x07) model : 9000/785/C3700 model name : Allegro W2 hversion: 0x5dc0 sversion: 0x0481 I-cache : 768 KB D-cache : 1536 KB (WB, direct mapped) ITLB entries: 240 DTLB entries: 240 - shared with ITLB bogomips: 1495.85 software id : 2004755634 A.
Re: Power 8 in-core crypto not working as expected
On Thu, Sep 07, 2017 at 10:35:18AM -0400, Jeffrey Walton wrote: > We are using the key and subkey schedule from FIPS 197, Appendix A. We > are using it because the key schedule is fully specified. > > We lack the known answers for a single round using a subkey like one > specified in FIPS 197. IBM does not appear to provide them. 197 appendices B and C are full of such examples though. First see if you can get a single round (one vcipher call) working; you have to get byte ordering correct, etc. > I don't have access to Power ISA 3.0B. It seems to be hidden behind a > paywall. https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0. It's not a paywall, you just have to register. Yes, not ideal. Segher
Re: Byte swapping support
> To support applications that assume big-endian memory layout on little- > endian systems, I'm considering adding support for reversing the > storage order to GCC. That was also the goal of the scalar_storage_order attribute. > In contrast to the existing scalar storage order support for structs, the > goal is to reverse the storage order for all memory operations to achieve > maximum compatibility with the behavior on big-endian systems, as far as > observable by the application. I presume that you'll well aware of this, but you cannot just reverse the storage order for any memory operation; for example, an array of 4 chars in C is stored the same way in big-endian and little-endian order, so you ought not to do byte swapping when you access it as a whole. So the above sentence must be read as "to reverse the storage order for all scalar memory operations". When the scalar_storage_order attribute was designed, discussions lead to the conclusion that doing the swapping for any scalar memory operation, as opposed to any access to a scalar within a structure, would not be a significant step forward to warrant the significantly more complex implementation (or the big performance penalty if you do things very roughly). > The plan is to insert byte swapping instructions as part of the RTL > expansion of GIMPLE assignments that access memory. This would leverage > code that was added for -fsso-struct, keeping the code simple and > maintainable. How do you discriminate scalars stored in native order and scalars stored in reverse order though? That's the main difficulty of the implementation. -- Eric Botcazou
Re: RFC: Improving GCC8 default option settings
On Tue, 12 Sep 2017, Alexander Monakov wrote: > > * Make -fno-trapping-math the default - another obvious one. From the docs: > > "Compile code assuming that floating-point operations cannot generate > >user-visible traps." > > There isn't a lot of code that actually uses user-visible traps (if any - > > many CPUs don't even support user traps as it's an optional IEEE > > feature). > > So assuming trapping math by default is way too conservative since there > > is > > no obvious benefit to users. > > OTOH -O options are understood to _never_ sacrifice standards compliance, with > the exception of -Ofast. I believe that's an important property to keep. ISO C allows the FENV_ACCESS pragma to be OFF by default (we don't support the standard pragmas, but FENV_ACCESS ON is equivalent to a stricter version of -frounding-math -ftrapping-math). Thus, this is not a standards compliance issue (unlike various parts of -ffast-math that break IEEE 754 semantics even with FENV_ACCESS OFF). And since -fno-rounding-math is the default, the default is already a form of FENV_ACCESS OFF. I don't think any -O implication of -fno-trapping-math was proposed; the proposal was about the default (independent of -O options). -- Joseph S. Myers jos...@codesourcery.com
Re: RFC: Improving GCC8 default option settings
> On 13 Sep 2017, at 1:57 AM, Wilco Dijkstra wrote: > > Hi all, > > At the GNU Cauldron I was inspired by several interesting talks about > improving > GCC in various ways. While GCC has many great optimizations, a common theme is > that its default settings are rather conservative. As a result users are > required to enable several additional optimizations by hand to get good code. > Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was > mentioned repeatedly) which GCC could/should do as well. There are some nuances to -O2. Please consider -O2 users who wish use it like Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC). Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase code size can be skipped from -Os without drastically effecting performance. This is not the case with GCC where -Os is a size at all costs optimisation mode. GCC users option for size not at the expense of speed is to use -O2. Clang GCC -Oz ~= -Os -Os ~= -O2 So if adding optimisations to -O2 that increase code size, please considering adding an -O2s that maintains the compact code size of -O2. -O2 generates pretty compact code as many performance optimisations tend to reduce code size, or otherwise add optimisations that increase code size to -O3. Adding loop unrolling on makes sense in the Clang/LLVM context where they have a compact code model with good performance i.e. -Os. In GCC this is -O2. So if you want to enable more optimisations at -O2, please copy -O2 optimisations to -O2s or rename -Os to -Oz and copy -O2 optimisation defaults to a new -Os. The present reality is that any project that wishes to optimize for size at all costs will need to run a configure test for -Oz, and then fall back to -Os, given the current disparity between Clang/LLVM and GCC flags here. > Here are a few concrete proposals to improve GCC's option settings which will > enable better code generation for most targets: > > * Make -fno-math-errno the default - this mostly affects the code generated > for > sqrt, which should be treated just like floating point division and not set > errno by default (unless you explicitly select C89 mode). > > * Make -fno-trapping-math the default - another obvious one. From the docs: > "Compile code assuming that floating-point operations cannot generate > user-visible traps." > There isn't a lot of code that actually uses user-visible traps (if any - > many CPUs don't even support user traps as it's an optional IEEE feature). > So assuming trapping math by default is way too conservative since there is > no obvious benefit to users. > > * Make -fno-common the default - this was originally needed for pre-ANSI C, > but > is optional in C (not sure whether it is still in C99/C11). This can > significantly improve code generation on targets that use anchors for globals > (note the linker could report a more helpful message when ancient code that > requires -fcommon fails to link). > > * Make -fomit-frame-pointer the default - various targets already do this at > higher optimization levels, but this could easily be done for all targets. > Frame pointers haven't been needed for debugging for decades, however if > there > are still good reasons to keep it enabled with -O0 or -O1 (I can't think of > any > unless it is for last-resort backtrace when there is no unwind info at a > crash), > we could just disable the frame pointer from -O2 onwards. > > These are just a few ideas to start. What do people think? I'd welcome > discussion > and other proposals for similar improvements. > > Wilco
gcc-5-20170912 is now available
Snapshot gcc-5-20170912 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/5-20170912/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-5-branch revision 252045 You'll find: gcc-5-20170912.tar.xzComplete GCC SHA256=4e0afc2d86fa9bb3b25b64fd9a47ae045a51382fe8baffabcf03df6fddf6ab28 SHA1=e33aa0044e10d45a84a6b88e3b0922df64245858 Diffs from 5-20170905 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Invalid free in standard library in trivial example with C++17 on gcc 7.2
I confirmed this issue on x86_64 CentOS, and independently here: https://wandbox.org/permlink/ncWqA9Zu3YEofqri Also fails on gcc trunk. Possibly related to bug 81338 "stringstream remains empty after being moved into multiple times"? Although I see that one is fixed by Mr Wakely. Dave On Tue, Sep 12, 2017 at 5:59 PM, Shane Matley wrote: > Hi, > > Apologies if I am coming about this in the wrong way, I am new to the > mailing list. During our preliminary work to upgrade to gcc 7.2 (from > 6.3) at my workplace, we have come across a bug that is blocking our > move to C++17. I have raised a bug report here: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82172. > > There is an invalid free in a string within basic_stringbuf when > inserting a character into an empty stringbuf when using the pre CXX11 > ABI, LTO, O1 and C++17. > > Could anyone offer some advice on diagnosing this further, or working > around this issue that doesn't involve moving to the CXX11 ABI? > > Thanks in advance, > > -- Shane
Re: Help out/New to the Project
On Tue, Sep 12, 2017 at 08:10:13AM -0400, Nathan Sidwell wrote: > On 09/09/2017 08:00 AM, Ramana Radhakrishnan wrote: > > >There are a few getting started guides in the wiki > >(http://gcc.gnu.org/wiki - Look for tutorials) which can help you get > >started in terms of reading up on the internals, there are a few Easy > >hacks listed in the wiki page here https://gcc.gnu.org/wiki/EasyHacks > > I've lust added a link to EasyHacks from GettingStarted (it wasn't > obvious to me how to find it). > > Is there a different page for 'more involved changes'? We could likn to > that from EasyHacks. There is always https://gcc.gnu.org/wiki/CC0Transition ;-) Also there is https://gcc.gnu.org/wiki/ImprovementProjects , which links to many more pages. Segher
Re: RFC: Improving GCC8 default option settings
On Wed, Sep 13, 2017 at 09:27:22AM +1200, Michael Clark wrote: > > Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was > > mentioned repeatedly) which GCC could/should do as well. > > There are some nuances to -O2. Please consider -O2 users who wish use it like > Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC). > > Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase > code size can be skipped from -Os without drastically effecting performance. > > This is not the case with GCC where -Os is a size at all costs optimisation > mode. GCC users option for size not at the expense of speed is to use -O2. "Size not at the expense of speed" exists in neither compiler. Just the tradeoffs are different between GCC and LLVM. It would be a silly optimisation target -- it's exactly the same as just "speed"! Unless "speed" means "let's make it faster, and bigger just because" ;-) GCC's -Os is not "size at all costs" either; there are many options (mostly --params) that can decrease code size significantly. To tune code size down for your particular program you have to play with options a bit. This shouldn't be news to anyone. '-Os' Optimize for size. '-Os' enables all '-O2' optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. > So if adding optimisations to -O2 that increase code size, please considering > adding an -O2s that maintains the compact code size of -O2. -O2 generates > pretty compact code as many performance optimisations tend to reduce code > size, or otherwise add optimisations that increase code size to -O3. Adding > loop unrolling on makes sense in the Clang/LLVM context where they have a > compact code model with good performance i.e. -Os. In GCC this is -O2. > > So if you want to enable more optimisations at -O2, please copy -O2 > optimisations to -O2s or rename -Os to -Oz and copy -O2 optimisation defaults > to a new -Os. '-O2' Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to '-O', this option increases both compilation time and the performance of the generated code. > The present reality is that any project that wishes to optimize for size at > all costs will need to run a configure test for -Oz, and then fall back to > -Os, given the current disparity between Clang/LLVM and GCC flags here. The present reality is that any project that wishes to support both GCC and LLVM needs to do configure tests, because LLVM chose to do many things differently (sometimes unavoidably). If GCC would change some options to be more like LLVM, all users only ever using GCC would be affected, while all other incompatibilities would remain. Not a good tradeoff at all. Segher
Re: RFC: Improving GCC8 default option settings
> On 13 Sep 2017, at 12:47 PM, Segher Boessenkool > wrote: > > On Wed, Sep 13, 2017 at 09:27:22AM +1200, Michael Clark wrote: >>> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was >>> mentioned repeatedly) which GCC could/should do as well. >> >> There are some nuances to -O2. Please consider -O2 users who wish use it >> like Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC). >> >> Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase >> code size can be skipped from -Os without drastically effecting performance. >> >> This is not the case with GCC where -Os is a size at all costs optimisation >> mode. GCC users option for size not at the expense of speed is to use -O2. > > "Size not at the expense of speed" exists in neither compiler. Just the > tradeoffs are different between GCC and LLVM. It would be a silly > optimisation target -- it's exactly the same as just "speed"! Unless > “speed" means "let's make it faster, and bigger just because" ;-) I would like to be able to quantify stats on a well known benchmark suite, say SPECint 2006 or SPECint 2017 but in my own small benchmark suite I saw a disproportionate difference in size between -O2 and -Os, but a significant drop in performance with -O2 vs -Os. - https://rv8.io/bench#optimisation - https://rv8.io/bench#executable-file-sizes -O2 is 98% perf of -O3 on x86-64 -Os is 81% perf of -O3 on x86-64 -O2 saves 5% space on -O3 on x86-64 -Os saves 8% space on -Os on x86-64 17% drop in performance for 3% saving in space is not a good trade for a “general” size optimisation. It’s more like executable compression. -O2 seems to be a suite spot for size versus speed. I could only recommend GCC’s -Os if the user is trying to squeeze something down to fit the last few bytes of a ROM and -Oz seems like a more appropriate name. -O2 the current suite spot in GCC and is likely closest in semantics to LLVM/Clang -Os and I’d like -O2 binaries to stay lean. I don’t think O2 should slow down nor should the binariesget larger. Turning up knobs that effect code size should be reserved for -O3 until GCC makes a distinction between -O2/-O2s and -Os/-Oz. On RISC-V I believe we could shrink binaries at -O2 further with no sacrifice in performance, perhaps with a performance improvement by reducing icache bandwidth… BTW -O2 gets great compression and performance improvements compared to -O0 ;-D it’s the points after -O2 where the trade offs don’t correlate. I like -O2 My 2c. > GCC's -Os is not "size at all costs" either; there are many options (mostly > --params) that can decrease code size significantly. To tune code size > down for your particular program you have to play with options a bit. This > shouldn't be news to anyone. > > '-Os' > Optimize for size. '-Os' enables all '-O2' optimizations that do > not typically increase code size. It also performs further > optimizations designed to reduce code size. > >> So if adding optimisations to -O2 that increase code size, please >> considering adding an -O2s that maintains the compact code size of -O2. -O2 >> generates pretty compact code as many performance optimisations tend to >> reduce code size, or otherwise add optimisations that increase code size to >> -O3. Adding loop unrolling on makes sense in the Clang/LLVM context where >> they have a compact code model with good performance i.e. -Os. In GCC this >> is -O2. >> >> So if you want to enable more optimisations at -O2, please copy -O2 >> optimisations to -O2s or rename -Os to -Oz and copy -O2 optimisation >> defaults to a new -Os. > > '-O2' > Optimize even more. GCC performs nearly all supported > optimizations that do not involve a space-speed tradeoff. As > compared to '-O', this option increases both compilation time and > the performance of the generated code. > >> The present reality is that any project that wishes to optimize for size at >> all costs will need to run a configure test for -Oz, and then fall back to >> -Os, given the current disparity between Clang/LLVM and GCC flags here. > > The present reality is that any project that wishes to support both GCC and > LLVM needs to do configure tests, because LLVM chose to do many things > differently (sometimes unavoidably). If GCC would change some options > to be more like LLVM, all users only ever using GCC would be affected, > while all other incompatibilities would remain. Not a good tradeoff at > all. > > > Segher
Re: RFC: Improving GCC8 default option settings
> On 13 Sep 2017, at 1:15 PM, Michael Clark wrote: > > - https://rv8.io/bench#optimisation > - https://rv8.io/bench#executable-file-sizes > > -O2 is 98% perf of -O3 on x86-64 > -Os is 81% perf of -O3 on x86-64 > > -O2 saves 5% space on -O3 on x86-64 > -Os saves 8% space on -Os on x86-64 > > 17% drop in performance for 3% saving in space is not a good trade for a > “general” size optimisation. It’s more like executable compression. Sorry fixed typo: -O2 is 98% perf of -O3 on x86-64 -Os is 81% perf of -O3 on x86-64 -O2 saves 5% space on -O3 on x86-64 -Os saves 8% space on -O3 on x86-64 The extra ~3% space saving for ~17% drop in performance doesn’t seem like a good general option for size based on the cost in performance. Again. I really like GCC’s -O2 and hope that its binaries don’t grow in size nor slow down.
Re: RFC: Improving GCC8 default option settings
> * Make -fomit-frame-pointer the default - various targets already do this at > higher optimization levels, but this could easily be done for all targets. > Frame pointers haven't been needed for debugging for decades, however if > there > are still good reasons to keep it enabled with -O0 or -O1 (I can't think of > any > unless it is for last-resort backtrace when there is no unwind info at a > crash), > we could just disable the frame pointer from -O2 onwards. Given there's an -Og now, maybe frame pointers could be enabled fo -O0 and -Og, off by default otherwise. I like to use -O1 to kick-in the analysis engine and start catching warnings. It seems like -O1 should be closer -O2/-O3, with respect to frame pointers since it could help find issues and tickle problems with hand crafted ASM. Jeff