[Bug c++/87436] [7 Regression] G++ produces >300MB .rodata section to initialize struct with big array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87436 --- Comment #13 from Daniel Gibson --- Great, thanks a lot for fixing this!
[Bug c++/87436] New: G++ produces >300MB .rodata section to initialize struct with big array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87436 Bug ID: 87436 Summary: G++ produces >300MB .rodata section to initialize struct with big array Product: gcc Version: 5.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: metalcaedes at gmail dot com Target Milestone: --- Created attachment 44749 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44749&action=edit test code I'm on Linux x86_64, more specifically XUbuntu 16.04, and use GCC version "Ubuntu 5.4.0-6ubuntu1~16.04.10", so I selected Version 5.4.0 I think this bug still exists in the latest GCC release, but I was only able to test that on gcc.godbolt.org (there it produces either a timeout or out of memory error). It also seems like the bug was introduced between 4.9.x and 5.1 Anyway: I have a simple struct with a few fields that are all have default values, and another struct containing a huge (16mio elements) array of the aforementioned type (see attachment for test code). I built it with "g++ -std=c++11 -c -o testsize.o testsize.cpp" - it takes over a minute to build, cc1plus uses over 4GB of main memory in the process and it produces a 385MB .o file. Inspecting that .o shows that the .rodata section is really big, and its contents are used for the constructor of the struct containing the array - it just memcpy()s that massive blob of data into the new object.. Now in general this certainly is a clever optimization, but I don't think it's a good idea if the object is this big. Furthermore, it even happens with -0s builds and I guess even for saner object sizes this kind of optimization is not what you'd want in size-optimized builds? This might be related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82294 - however I opened a new bug because that report was about C++17 with constexpr, while my case is C++11. Godbolt link: https://gcc.godbolt.org/z/Cseqhi
[Bug c++/87436] G++ produces >300MB .rodata section to initialize struct with big array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87436 Daniel Gibson changed: What|Removed |Added CC||metalcaedes at gmail dot com --- Comment #1 from Daniel Gibson --- Created attachment 44750 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44750&action=edit the .ii file
[Bug c++/100839] New: -O2 does dangerous optimizations using FMA (please don't break my cross product)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100839 Bug ID: 100839 Summary: -O2 does dangerous optimizations using FMA (please don't break my cross product) Product: gcc Version: 11.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: metalcaedes at gmail dot com Target Milestone: --- Created attachment 50893 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50893&action=edit simple test case (same as on compiler explorer) I'm having problems with a cross-product function returning (slightly) wrong results when built with -O2 (or above) and with FMA enabled (like -march=znver1) - more specifically, -fexpensive-optimizations causes the problem. -ffast-math or related flags were *not* used. Values that should be exactly 0 aren't, because the intermediate of the multiplication part of vfmsub132ss has "infinite precision" while the subtracted value (also result of a multiplication) has single-precision. I (or rather Doom3) has a 3D Vector class called idVec3 which has three (single-precision) float members x, y, z. The cross product is calculated like this: idVec3 cross( const idVec3& v1, const idVec3 &v2 ) { float x = v1.y * v2.z - v1.z * v2.y; float y = v1.z * v2.x - v1.x * v2.z; float z = v1.x * v2.y - v1.y * v2.x; return idVec3(x, y, z); } For brevity (and because it was what caused the bug I investigated[1]) I only looked at the calculation of z, but the same problem should happen with x and y: float crossZ(const idVec3& v1, const idVec3& v2) { float z = v1.x * v2.y - v1.y * v2.x; return z; } So if v1.x * v2.y == v1.y * v2.x (like when v1 == v2) z should be exactly 0.0 However, it's not when FMA and -fexpensive-optimizations are used, because then that function is generated as: vmulss xmm1,xmm1,xmm2 vfmsub132ss xmm0,xmm1,xmm3 ret So the `v1.y * v2.x` part is calculated with a normal multiplication and stored as single-precision floats. Then `v1.x * v2.y - other_result` is calculated with vfmsub132ss, which means that the result of `v1.x * v2.y` is never stored, but only exist as an "infinite precision intermediate result"[2] from which the other (single-precision) result of the `vmulss` is subtracted. This means that, if both multiplication results should *theoretically* be identical, it returns the rounding error between the result as "infinite precision" float (double at least?) and single-precision float. This rounding error fits well into a float because floats have great precisions near zero - and it can be relatively big: With v1.x = -277.129883 and v1.y = -69.282471 (result of multiplications: about 23665.775), crossZ(v1, v1) returns 0.0002170140. With -O1 (or -O2 -fno-expensive-optimizations) the generated ASM is more straightforward and, as expected, does two multiplications and then an addition: vmovss xmm0,DWORD PTR [rdi] vmulss xmm0,xmm0,DWORD PTR [rsi+0x4] vmovss xmm1,DWORD PTR [rdi+0x4] vmulss xmm1,xmm1,DWORD PTR [rsi] vsubss xmm0,xmm0,xmm1 ret IMHO an optimization that basically causes a*b - b*a to not return exactly 0.0 should only be enabled with dangerous flags like -ffast-math, not with plain -O2. Incidentally, it seems like this is what clang does: It only uses vfmsub* in crossZ() if -ffast-math is set. Here's a compiler explorer link with a reduced (to .z) testcase, printing the results of that function when compiled with -O2 vs -O1: https://gcc.godbolt.org/z/8K3vKh7b3 The problem happens with all GCC versions I tested, including the 11.1 and "trunk" versions in compiler explorer. I didn't test this, but I wouldn't be surprised if plain C was also affected (and not just C++). [1] https://github.com/RobertBeckebans/RBDOOM-3-BFG/issues/436 [2] https://www.felixcloutier.com/x86/vfmsub132ss:vfmsub213ss:vfmsub231ss
[Bug middle-end/100839] -O2 does dangerous optimizations using FMA (please don't break my cross product)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100839 Daniel Gibson changed: What|Removed |Added Resolution|INVALID |FIXED --- Comment #2 from Daniel Gibson --- Do I understand this correctly that optimized builds by default violate the standard? At least that's what the difference between -ffp-contract=fast and the unimplemented -ffp-contract=on suggests
[Bug middle-end/100839] -O2 does dangerous optimizations using FMA (please don't break my cross product)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100839 Daniel Gibson changed: What|Removed |Added Resolution|INVALID |FIXED --- Comment #4 from Daniel Gibson --- Nope. 1. -std=c++11 doesn't make a difference, just tested it on compiler explorer. 2. What's the point of -Ofast ("It also enables optimizations that are not valid for all standard-compliant programs.") if the normal optimization levels already violate the standard?
[Bug middle-end/100839] -O2 does dangerous optimizations using FMA (please don't break my cross product)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100839 --- Comment #6 from Daniel Gibson --- Are you sure about this? I couldn't find much about floats (and nothing about legal transformations) in the C++ standard, and if it's in IEEE-754 it should be the same for C and C++, right?