[Bug target/82261] x86: missing peephole for SHLD / SHRD
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82261 --- Comment #4 from Peter Cordes --- GCC will emit SHLD / SHRD as part of shifting an integer that's two registers wide. Hironori Bono proposed the following functions as a workaround for this missed optimization (https://stackoverflow.com/a/71805063/224132) #include #ifdef __SIZEOF_INT128__ uint64_t shldq_x64(uint64_t low, uint64_t high, uint64_t count) { return (uint64_t)(unsigned __int128)high << 64) | (unsigned __int128)low) << (count & 63)) >> 64); } uint64_t shrdq_x64(uint64_t low, uint64_t high, uint64_t count) { return (uint64_t)unsigned __int128)high << 64) | (unsigned __int128)low) >> (count & 63)); } #endif uint32_t shld_x86(uint32_t low, uint32_t high, uint32_t count) { return (uint32_t)(uint64_t)high << 32) | (uint64_t)low) << (count & 31)) >> 32); } uint32_t shrd_x86(uint32_t low, uint32_t high, uint32_t count) { return (uint32_t)uint64_t)high << 32) | (uint64_t)low) >> (count & 31)); } --- The uint64_t functions (using __int128) compile cleanly in 64-bit mode (https://godbolt.org/z/1j94Gcb4o) using 64-bit operand-size shld/shrd but the uint32_t functions compile to a total mess in 32-bit mode (GCC11.2 -O3 -m32 -mregparm=3) before eventually using shld, including a totally insane or dh, 0 GCC trunk with -O3 -mregparm=3 compiles them cleanly, but without regparm it's also slightly different mess. Ironically, the uint32_t functions compile to quite a few instructions in 64-bit mode, actually doing the operations as written with shifts and ORs, and having to manually mask the shift count to &31 because it uses a 64-bit operand-size shift which masks with &63. 32-bit operand-size SHLD would be a win here, at least for -mtune=intel or a specific Intel uarch. I haven't looked at whether they still compile ok after inlining into surrounding code, or whether operations would tend to combine with other things in preference to becoming an SHLD.
[Bug libquadmath/105101] incorrect rounding for sqrtq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105101 --- Comment #5 from Thomas Koenig --- There is another, much worse, problem, reported and analyzed by "Michael S" on comp.arch. The code has #ifdef HAVE_SQRTL { long double xl = (long double) x; if (xl <= LDBL_MAX && xl >= LDBL_MIN) { /* Use long double result as starting point. */ y = (__float128) sqrtl (xl); /* One Newton iteration. */ y -= 0.5q * (y - x / y); return y; } } #endif which assumes that long double has a higher precision than normal double. On x86_64, this depends o the settings of the FPU flags, so a number like 0x1.06bc82f7b9d71dfcbddf2358a0eap-1024 is corrected with 32 ULP of error because there is only a single round of Newton iterations if the FPU flags are set to normal precision. I believe we can at least fix that before the gcc 12 release, by simply removing the code I quoted.
[Bug libquadmath/105101] incorrect rounding for sqrtq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105101 --- Comment #6 from Jakub Jelinek --- (In reply to Thomas Koenig from comment #5) > There is another, much worse, problem, reported and analyzed by "Michael S" > on comp.arch. The code has > > #ifdef HAVE_SQRTL > { > long double xl = (long double) x; > if (xl <= LDBL_MAX && xl >= LDBL_MIN) > { > /* Use long double result as starting point. */ > y = (__float128) sqrtl (xl); > > /* One Newton iteration. */ > y -= 0.5q * (y - x / y); > return y; > } > } > #endif > > which assumes that long double has a higher precision than > normal double. On x86_64, this depends o the settings of the > FPU flags, so a number like 0x1.06bc82f7b9d71dfcbddf2358a0eap-1024 > is corrected with 32 ULP of error because there is only a single > round of Newton iterations if the FPU flags are set to normal precision. That is only a problem on OSes that do that, I think mainly BSDs, no? On Linux it should be fine (well, still not 0.5ulp precise, but not as bad as when sqrtl is just double precision precise).
[Bug libquadmath/105101] incorrect rounding for sqrtq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105101 --- Comment #7 from Thomas Koenig --- (In reply to Jakub Jelinek from comment #6) > (In reply to Thomas Koenig from comment #5) > > There is another, much worse, problem, reported and analyzed by "Michael S" > > on comp.arch. The code has > > > > #ifdef HAVE_SQRTL > > { > > long double xl = (long double) x; > > if (xl <= LDBL_MAX && xl >= LDBL_MIN) > > { > > /* Use long double result as starting point. */ > > y = (__float128) sqrtl (xl); > > > > /* One Newton iteration. */ > > y -= 0.5q * (y - x / y); > > return y; > > } > > } > > #endif > > > > which assumes that long double has a higher precision than > > normal double. On x86_64, this depends o the settings of the > > FPU flags, so a number like 0x1.06bc82f7b9d71dfcbddf2358a0eap-1024 > > is corrected with 32 ULP of error because there is only a single > > round of Newton iterations if the FPU flags are set to normal precision. > > That is only a problem on OSes that do that, I think mainly BSDs, no? Correct. > On Linux it should be fine (well, still not 0.5ulp precise, but not as bad > as when sqrtl is just double precision precise). In this case, it was discovered on some version of WSL.
[Bug libquadmath/105101] incorrect rounding for sqrtq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105101 --- Comment #8 from Steve Kargl --- On Sat, Apr 09, 2022 at 10:23:39AM +, jakub at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105101 > > --- Comment #6 from Jakub Jelinek --- > (In reply to Thomas Koenig from comment #5) > > There is another, much worse, problem, reported and analyzed by "Michael S" > > on comp.arch. The code has > > > > #ifdef HAVE_SQRTL > > { > > long double xl = (long double) x; > > if (xl <= LDBL_MAX && xl >= LDBL_MIN) > > { > > /* Use long double result as starting point. */ > > y = (__float128) sqrtl (xl); > > > > /* One Newton iteration. */ > > y -= 0.5q * (y - x / y); > > return y; > > } > > } > > #endif > > > > which assumes that long double has a higher precision than > > normal double. On x86_64, this depends o the settings of the > > FPU flags, so a number like 0x1.06bc82f7b9d71dfcbddf2358a0eap-1024 > > is corrected with 32 ULP of error because there is only a single > > round of Newton iterations if the FPU flags are set to normal precision. > > That is only a problem on OSes that do that, I think mainly BSDs, no? > On Linux it should be fine (well, still not 0.5ulp precise, but not as bad as > when sqrtl is just double precision precise). > i686-*-freebsd sets the FPU to have 53 bits of precision for long double. It has the usual exponent range of an Intel 80-bit extended double.
[Bug ipa/105160] [12 regression] ipa modref marks functions with asm volatile as const or pure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160 Jan Hubicka changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #10 from Jan Hubicka --- Fixed by r:aabb9a261ef060cf24fd626713f1d7d9df81aa57
[Bug fortran/105205] New: Incorrect assignment of derived type with allocatable, deferred-length character component
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105205 Bug ID: 105205 Summary: Incorrect assignment of derived type with allocatable, deferred-length character component Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: townsend at astro dot wisc.edu Target Milestone: --- I've run into problems with assignment of derived types containing an allocatable array of deferred-length strings. Example program: --- program alloc_char_type implicit none type mytype character(:), allocatable :: c(:) end type mytype type(mytype) :: a type(mytype) :: b integer :: i a%c = ['foo','bar','biz','buz'] b = a do i = 1, size(b%c) print *,b%c(i) end do end --- Running with gfortran 10.2.0 or 11.2.0, I get the output: >> foo << If I hard-code the length of the c component (to, say, 3), I get the expected output: >> foo bar biz buz << It seems as if only the first element of c is being copied correctly. cheers, Rich
[Bug tree-optimization/103376] [12 Regression] wrong code at -Os and above on x86_64-linux-gnu since r12-5453-ga944b5dec3adb28e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103376 --- Comment #12 from CVS Commits --- The master branch has been updated by Jan Hubicka : https://gcc.gnu.org/g:4943b75e9f06f0b64ed541430bb7fbccf55fc552 commit r12-8070-g4943b75e9f06f0b64ed541430bb7fbccf55fc552 Author: Jan Hubicka Date: Sat Apr 9 21:22:58 2022 +0200 Update semantic_interposition flag at analysis time This patch solves problem with FE first finalizing function and then adding -fno-semantic-interposition flag (by parsing optimization attribute). gcc/ChangeLog: 2022-04-09 Jan Hubicka PR ipa/103376 * cgraphunit.cc (cgraph_node::analyze): update semantic_interposition flag. gcc/testsuite/ChangeLog: 2022-04-09 Jan Hubicka PR ipa/103376 * gcc.c-torture/compile/pr103376.c: New test.
[Bug middle-end/105206] New: mis-optimization with -ffast-math and __builtin_powf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105206 Bug ID: 105206 Summary: mis-optimization with -ffast-math and __builtin_powf Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: kargl at gcc dot gnu.org Target Milestone: ---
[Bug middle-end/105206] mis-optimization with -ffast-math and __builtin_powf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105206 kargl at gcc dot gnu.org changed: What|Removed |Added Severity|normal |minor --- Comment #1 from kargl at gcc dot gnu.org --- Not sure if anyone cares. I don't use -ffast-math, but this might considered a mis-optimization with that option. #include float foof(float x) { return (powf(10.f,x)); } double food(double x) { return (pow(10.,x)); } -fdump-tree-original shows ;; Function foof (null) ;; enabled by -tree-original { return powf (1.0e+1, x); } ;; Function food (null) ;; enabled by -tree-original { return pow (1.0e+1, x); } Compiling to assembly shows foof: .LFB3: .cfi_startproc movaps %xmm0, %xmm1 movss .LC0(%rip), %xmm0 jmp powf .cfi_endproc food: .LFB4: .cfi_startproc mulsd .LC1(%rip), %xmm0 jmp exp .cfi_endproc So, the middle-end is converting pow(10.x) to exp(x*log(10.0)) where log(10.0) is reduced, but the same transformation of powf(10.f,x) still yields a call to powf.
[Bug ipa/103818] [12 Regression] ICE: in insert, at ipa-modref-tree.c:591 since r12-3202-gf5ff3a8ed4ca9173
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103818 --- Comment #4 from Jan Hubicka --- We have access list: Base 0: alias set 2 Ref 0: alias set 1 access: Parm 0 param offset:0 offset:-4611686018427387936 size:32 max_size:32 access: Parm 0 param offset:0 offset:352 size:32 max_size:32 access: Parm 0 param offset:0 offset:64 size:32 max_size:32 access: Parm 0 param offset:0 offset:0 size:32 max_size:32 access: Parm 0 param offset:0 offset:32800 size:32 max_size:32 access: Parm 0 param offset:0 offset:160 size:32 max_size:32 access: Parm 0 param offset:0 offset:4629700416936869888 size:32 max_size:32 access: Parm 0 param offset:0 offset:-96 size:32 max_size:32 access: Parm 0 param offset:0 offset:1376 size:32 max_size:32 access: Parm 0 param offset:0 offset:224 size:32 max_size:32 access: Parm 0 param offset:0 offset:-288 size:32 max_size:32 access: Parm 0 param offset:0 offset:448 size:32 max_size:32 access: Parm 0 param offset:0 offset:288 size:32 max_size:32 access: Parm 0 param offset:0 offset:1568 size:32 max_size:32 access: Parm 0 param offset:0 offset:640 size:32 max_size:32 access: Parm 0 param offset:0 offset:2624 size:32 max_size:32 and we want to merge Parm 0 param offset:0 offset:-4611686018427387936 size:32 max_size:32 and Parm 0 param offset:0 offset:4629700416936869888 size:32 max_size:32 into one entry since we think they have small difference. So an overflow issue: new_max_size = max_size2 + offset2 - offset1; if (known_le (new_max_size, max_size1)) new_max_size = max_size1; So we need 128bit math here. I need to look into proper way to get this right (and corresponding overflow that makes the lgoic to choose these two entries as closest to each other.
[Bug ipa/103378] [12 Regression] ICE: verify_cgraph_node failed (error: semantic interposition mismatch) since r12-5412-g458d2c689963d846
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103378 Jan Hubicka changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #7 from Jan Hubicka --- Fixed by r:4943b75e9f06f0b64ed541430bb7fbccf55fc552 Sorry for wrong PR marker :( I should have cut&pasted.
[Bug ipa/103819] [10/11/12 Regression] ICE in redirect_callee, at cgraph.c:1389 with __attribute__((flatten)) and -O2 since r11-7940-ge7fd3b783238d034
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103819 Jan Hubicka changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #4 from Jan Hubicka --- mine.
[Bug tree-optimization/103680] Jump threading and switch corrupts profile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103680 --- Comment #5 from Jan Hubicka --- The cfgcleanup logic is consistent assuming that your profile was consistent on the input (i.e. read from profile feedback). If you 1) read profile 2) do optimization and prove that given if conditional is true then you should also have 100% probability on the "true" edge so doing nothing in cfgcleanup is correct. Now of course what can happen is that you guess profile or 1) read profile 2) duplicate code 3) prove if conditonal always true in one of the copy. In this case fixing up profile locally is not possible (since it is also wrong in the other copy), so we opt doing nothing which keeps errors sort of contained and we need to live that profile is somethimes inconsistent. So cfgcleanup behaviour is by design. However if you do threading there is way to update the profile and logic for that iis n update_bb_profile_for_threading. If guessed profile was consistent with the thread, it will update profile well and it will drop message to a dump file otherwise. Now the problem is that each time profiling code is updated the interface to this function is lost. I tried to get it fixed but got lost in the new code. /* An edge originally destinating BB of COUNT has been proved to leave the block by TAKEN_EDGE. Update profile of BB such that edge E can be redirected to destination of TAKEN_EDGE. This function may leave the profile inconsistent in the case TAKEN_EDGE frequency or count is believed to be lower than COUNT respectively. */ void update_bb_profile_for_threading (basic_block bb, profile_count count, edge taken_edge) So the interface is quite simple. I have to re-read the new updating code since I no longer recall where I got lost, but perhaps if you are familiar with it, you can write in the update?
[Bug c/105207] New: C preprocessor: splicing physical source lines to form logical source lines may not work
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105207 Bug ID: 105207 Summary: C preprocessor: splicing physical source lines to form logical source lines may not work Product: gcc Version: 11.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: pavel.morozkin at gmail dot com Target Milestone: --- Sample code: xxx \ #error Invocation: $ gcc t6.c -E Actual output: xxx #error Expected output: xxx #error
[Bug c/105207] C preprocessor: splicing physical source lines to form logical source lines may not work
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105207 --- Comment #1 from Pavel M --- The same behavior with: xxx \ error Expected: xxx error Actual: xxx error
[Bug analyzer/103892] -Wanalyzer-double-free false positive when compiling libpipeline
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103892 --- Comment #3 from CVS Commits --- The master branch has been updated by David Malcolm : https://gcc.gnu.org/g:3d41408c5d28105e7a3ea2eb2529431a70b96369 commit r12-8071-g3d41408c5d28105e7a3ea2eb2529431a70b96369 Author: David Malcolm Date: Sat Apr 9 18:12:57 2022 -0400 analyzer: fix folding of regions involving unknown ptrs [PR103892] PR analyzer/103892 reports a false positive from -Wanalyzer-double-free. The root cause is the analyzer failing to properly handle "unknown" symbolic regions, and thus confusing two different expressions. Specifically, the analyzer eventually hits the complexity limit for symbolic values, and starts using an "unknown" svalue for a pointer. The analyzer uses symbolic_region(unknown_svalue([of ptr type])) i.e. (*UNKNOWN_PTR) in a few places to mean "we have an lvalue, but we're not going to attempt to track what it is anymore". "Unknown" should probably be renamed to "unknowable"; in theory, any operation on such an unknown svalue should be also an unknown svalue. The issue is that in various places where we create child regions, we were failing to check for the parent region being (*UNKNOWN_PTR), and so were erroneously creating regions based on (*UNKNOWN_PTR), such as *(UNKNOWN_PTR + OFFSET). The state-machine handling was erroneously allowing e.g. INITIAL_VALUE (*(UNKNOWN_PTR + OFFSET)) to have state, and thus we could record that such a value had had "free" called on it, and thus eventually false report a double-free when a different expression incorrectly "simplified" to the same expression. This patch fixes things by checking when creating the various kinds of child region for (*UNKNOWN_PTR) as the parent region, and simply returning another (*UNKNOWN_PTR) for such child regions (using the appropriate type). Doing so fixes the false positive, and also fixes a state explosion on this testcase, as the states at the program points more rapidly reach a fixed point where everything is unknown. I checked for other cases that no longer needed -Wno-analyzer-too-complex; the only other one seems to be gcc.dg/analyzer/pr96841.c, but that seems to already have become redundant at some point before this patch. gcc/analyzer/ChangeLog: PR analyzer/103892 * region-model-manager.cc (region_model_manager::get_unknown_symbolic_region): New, extracted from... (region_model_manager::get_field_region): ...here. (region_model_manager::get_element_region): Use it here. (region_model_manager::get_offset_region): Likewise. (region_model_manager::get_sized_region): Likewise. (region_model_manager::get_cast_region): Likewise. (region_model_manager::get_bit_range): Likewise. * region-model.h (region_model_manager::get_unknown_symbolic_region): New decl. * region.cc (symbolic_region::symbolic_region): Handle sval_ptr having NULL type. (symbolic_region::dump_to_pp): Handle having NULL type. gcc/testsuite/ChangeLog: PR analyzer/103892 * gcc.dg/analyzer/pr103892.c: New test. * gcc.dg/analyzer/pr96841.c: Drop redundant -Wno-analyzer-too-complex. Signed-off-by: David Malcolm
[Bug c/105207] Translation phase 2: splicing physical source lines to form logical source lines may not work
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105207 Pavel M changed: What|Removed |Added Summary|C preprocessor: splicing|Translation phase 2: |physical source lines to|splicing physical source |form logical source lines |lines to form logical |may not work|source lines may not work --- Comment #2 from Pavel M --- Actually the issue is not related to the preprocessor. It is relayed to the translation phase 2. Please
[Bug analyzer/103892] -Wanalyzer-double-free false positive when compiling libpipeline
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103892 David Malcolm changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from David Malcolm --- Should be fixed by the above patch on trunk for GCC 12. Backporting the fix to GCC 11 is probably not feasible. Marking as resolved. Thanks again for filing this bug.
[Bug preprocessor/105207] Translation phase 2: splicing physical source lines to form logical source lines may not work
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105207 --- Comment #3 from Andrew Pinski --- Note this only matters if you preprocessing the file yourself; that is -save-temps works correctly and errors out that there is a stray '#' in program.
[Bug preprocessor/59782] libcpp does not avoid bug #48326 when compiled by older GCC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59782 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |WONTFIX --- Comment #8 from Andrew Pinski --- This is a won't fix as GCC 12+ requires GCC 4.8.0+ to build which has the fix.
[Bug c++/105199] can't compile glslang on windows
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105199 Andrew Pinski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #2 from Andrew Pinski --- Fixed on the trunk for GCC 12 as PCH are now relocatable. *** This bug has been marked as a duplicate of bug 91440 ***
[Bug pch/91440] Precompiled headers don't work with ASLR on mingw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91440 Andrew Pinski changed: What|Removed |Added CC||malashkin.andrey at gmail dot com --- Comment #9 from Andrew Pinski --- *** Bug 105199 has been marked as a duplicate of this bug. ***
[Bug libstdc++/93687] Add mcf thread model to GCC on windows for supporting C++11 std::thread?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93687 Evgeniy changed: What|Removed |Added CC||xtemp09 at gmail dot com --- Comment #3 from Evgeniy --- Another lite approach: https://github.com/meganz/mingw-std-threads