[Bug c++/100224] incorrect result when doing double vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100224 --- Comment #2 from Zhao Chun --- (In reply to Richard Biener from comment #1) > You are accessing 'double' via a pointer to uint64_t * here: > > k = *((uint64_t*)data); > > that violates type based aliasing rules. You can use -fno-strict-aliasing > to work around your bug or use > > typedef uint64_t aliasing_uint64_t __attribute__((may_alias)); > k = *((aliasing_uint64_t*)data); Thanks for your answer, it works for me. However I don't quite understand why it works after such a change. Could you please explain it more clearly?
[Bug libfortran/98301] random_init() is broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98301 Andre Vehreschild changed: What|Removed |Added Status|NEW |ASSIGNED CC||vehre at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |vehre at gcc dot gnu.org --- Comment #9 from Andre Vehreschild --- Going to implement the coarray part.
[Bug inline-asm/100178] Should the “short” be promoted to “int” when use inline asm?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100178 --- Comment #3 from GengQi --- Thanks for your replies, I have taken enough information from them. I hope this is made clear in the documentation soon.
[Bug rtl-optimization/100225] New: [8/9/10/11/12 Regression] ICE in add_cross_iteration_register_deps, at ddg.c:291
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100225 Bug ID: 100225 Summary: [8/9/10/11/12 Regression] ICE in add_cross_iteration_register_deps, at ddg.c:291 Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: asolokha at gmx dot com Target Milestone: --- Target: aarch64-linux-gnu gcc-11.0.1-alpha20210418 snapshot (g:b412ce8e961052e6becea3bc783a53e1d5feaa0f) ICEs when compiling the following testcase w/ -O1 -fmodulo-sched: void vorbis_synthesis_lapout (void); void ov_info (int **lappcm, int ov_info_i) { while (ov_info_i < 1) lappcm[ov_info_i++] = __builtin_alloca (1); vorbis_synthesis_lapout (); } % aarch64-linux-gnu-gcc-11.0.1 -O1 -fmodulo-sched -c oacjgazv.c during RTL pass: sms oacjgazv.c: In function 'ov_info': oacjgazv.c:11:1: internal compiler error: in add_cross_iteration_register_deps, at ddg.c:291 11 | } | ^ 0x8913fa add_cross_iteration_register_deps /var/tmp/portage/cross-aarch64-linux-gnu/gcc-11.0.1_alpha20210418/work/gcc-11-20210418/gcc/ddg.c:291 0x8913fa build_inter_loop_deps /var/tmp/portage/cross-aarch64-linux-gnu/gcc-11.0.1_alpha20210418/work/gcc-11-20210418/gcc/ddg.c:360 0x8913fa create_ddg(basic_block_def*, int) /var/tmp/portage/cross-aarch64-linux-gnu/gcc-11.0.1_alpha20210418/work/gcc-11-20210418/gcc/ddg.c:605 0x1a90489 sms_schedule /var/tmp/portage/cross-aarch64-linux-gnu/gcc-11.0.1_alpha20210418/work/gcc-11-20210418/gcc/modulo-sched.c:1513 0x1a9066f execute /var/tmp/portage/cross-aarch64-linux-gnu/gcc-11.0.1_alpha20210418/work/gcc-11-20210418/gcc/modulo-sched.c:3345
[Bug libstdc++/100226] New: [11/12 Regression] c++/11/bits/stl_tree.h:770:8: error: static assertion failed: comparison object must be invocable as const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100226 Bug ID: 100226 Summary: [11/12 Regression] c++/11/bits/stl_tree.h:770:8: error: static assertion failed: comparison object must be invocable as const Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: marxin at gcc dot gnu.org Target Milestone: --- It's taken from ncurses package, where the package can be built with GCC 10. It's likely caused by changes in libstdc++ in header files. I can build g++-10 -E && g++-11 ncurses.ii, but g++-11 ... fails. It's also very difficult to decode the error message. $ g++ ncurses.ii -c In file included from /usr/include/c++/11/set:60, from /usr/include/zypp/Arch.h:17, from /usr/include/zypp/sat/Solvable.h:22, from /usr/include/zypp/sat/SolvIterMixin.h:21, from /usr/include/zypp/sat/LocaleSupport.h:18, from /home/abuild/rpmbuild/BUILD/libyui-4.2.1/libyui-ncurses-pkg/src/NCPkgFilterPattern.cc:44: /usr/include/c++/11/bits/stl_tree.h: In instantiation of ‘static const _Key& std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_S_key(std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_Const_Link_type) [with _Key = std::pair, std::__cxx11::basic_string >; _Val = std::pair, std::__cxx11::basic_string >; _KeyOfValue = std::_Identity, std::__cxx11::basic_string > >; _Compare = paircmp; _Alloc = std::allocator, std::__cxx11::basic_string > >; std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_Const_Link_type = const std::_Rb_tree_node, std::__cxx11::basic_string > >*]’: /usr/include/c++/11/bits/stl_tree.h:2069:47: required from ‘std::pair std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_get_insert_unique_pos(const key_type&) [with _Key = std::pair, std::__cxx11::basic_string >; _Val = std::pair, std::__cxx11::basic_string >; _KeyOfValue = std::_Identity, std::__cxx11::basic_string > >; _Compare = paircmp; _Alloc = std::allocator, std::__cxx11::basic_string > >; std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::key_type = std::pair, std::__cxx11::basic_string >]’ /usr/include/c++/11/bits/stl_tree.h:2122:4: required from ‘std::pair, bool> std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_insert_unique(_Arg&&) [with _Arg = std::pair, std::__cxx11::basic_string >; _Key = std::pair, std::__cxx11::basic_string >; _Val = std::pair, std::__cxx11::basic_string >; _KeyOfValue = std::_Identity, std::__cxx11::basic_string > >; _Compare = paircmp; _Alloc = std::allocator, std::__cxx11::basic_string > >]’ /usr/include/c++/11/bits/stl_set.h:521:25: required from ‘std::pair, _Compare, typename __gnu_cxx::__alloc_traits<_Alloc>::rebind<_Key>::other>::const_iterator, bool> std::set<_Key, _Compare, _Alloc>::insert(std::set<_Key, _Compare, _Alloc>::value_type&&) [with _Key = std::pair, std::__cxx11::basic_string >; _Compare = paircmp; _Alloc = std::allocator, std::__cxx11::basic_string > >; typename std::_Rb_tree<_Key, _Key, std::_Identity<_Tp>, _Compare, typename __gnu_cxx::__alloc_traits<_Alloc>::rebind<_Key>::other>::const_iterator = std::_Rb_tree, std::__cxx11::basic_string >, std::pair, std::__cxx11::basic_string >, std::_Identity, std::__cxx11::basic_string > >, paircmp, std::allocator, std::__cxx11::basic_string > > >::const_iterator; typename __gnu_cxx::__alloc_traits<_Alloc>::rebind<_Key>::other = std::allocator, std::__cxx11::basic_string > >; typename __gnu_cxx::__alloc_traits<_Alloc>::rebind<_Key> = __gnu_cxx::__alloc_traits, std::__cxx11::basic_string > >, std::pair, std::__cxx11::basic_string > >::rebind, std::__cxx11::basic_string > >; typename _Alloc::value_type = std::pair, std::__cxx11::basic_string >; std::set<_Key, _Compare, _Alloc>::value_type = std::pair, std::__cxx11::basic_string >]’ /home/abuild/rpmbuild/BUILD/libyui-4.2.1/libyui-ncurses-pkg/src/NCPkgFilterPattern.cc:343:28: required from here /usr/include/c++/11/bits/stl_tree.h:770:8: error: static assertion failed: comparison object must be invocable as const 770 | |^ /usr/include/c++/11/bits/stl_tree.h:770:8: note: ‘std::is_invocable_v, std::allocator >, std::__cxx11::basic_string, std::allocator > >&, const std::pair, std::allocator >, std::__cxx11::basic_string, std::allocator > >&>’ evaluates to false
[Bug libstdc++/100226] [11/12 Regression] c++/11/bits/stl_tree.h:770:8: error: static assertion failed: comparison object must be invocable as const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100226 --- Comment #1 from Martin Liška --- Created attachment 50656 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50656&action=edit test-case
[Bug libstdc++/100226] [11/12 Regression] c++/11/bits/stl_tree.h:770:8: error: static assertion failed: comparison object must be invocable as const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100226 Martin Liška changed: What|Removed |Added Ever confirmed|0 |1 CC||redi at gcc dot gnu.org Last reconfirmed||2021-04-23 Status|UNCONFIRMED |NEW
[Bug c++/100224] incorrect result when doing double vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100224 --- Comment #3 from Richard Biener --- (In reply to Zhao Chun from comment #2) > (In reply to Richard Biener from comment #1) > > You are accessing 'double' via a pointer to uint64_t * here: > > > > k = *((uint64_t*)data); > > > > that violates type based aliasing rules. You can use -fno-strict-aliasing > > to work around your bug or use > > > > typedef uint64_t aliasing_uint64_t __attribute__((may_alias)); > > k = *((aliasing_uint64_t*)data); > > Thanks for your answer, it works for me. > > However I don't quite understand why it works after such a change. Could you > please explain it more clearly? Using a may_alias attributed type tells GCC to treat it like a 'character type' in terms of what the C/C++ standards allow. Note that this is not portable. A solution that works with all compilers I know is doing memcpy (&k, data, sizeof (uint64_t));
[Bug libstdc++/100226] [11/12 Regression] c++/11/bits/stl_tree.h:770:8: error: static assertion failed: comparison object must be invocable as const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100226 Richard Biener changed: What|Removed |Added Known to work||10.3.0 Target Milestone|--- |11.0 Keywords||needs-reduction, ||rejects-valid --- Comment #2 from Richard Biener --- I guess you want to uninclude it and reduce it w/o expanding the std library headers.
[Bug rtl-optimization/100225] [8/9/10/11/12 Regression] ICE in add_cross_iteration_register_deps, at ddg.c:291
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100225 Richard Biener changed: What|Removed |Added Target Milestone|--- |8.5
[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971 --- Comment #8 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:700e542971251b11623cce877075567815f72965 commit r12-79-g700e542971251b11623cce877075567815f72965 Author: Richard Biener Date: Fri Apr 9 09:35:51 2021 +0200 tree-optimization/99971 - improve BB vect dependence analysis We can use TBAA even when we have a DR, do so. For the testcase that means fully vectorizing it instead of only vectorizing the first store group resulting in suboptimal code. 2021-04-09 Richard Biener PR tree-optimization/99971 * tree-vect-data-refs.c (vect_slp_analyze_node_dependences): Always use TBAA for loads. * g++.dg/vect/slp-pr99971.cc: New testcase.
[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED Known to work||12.0 --- Comment #9 from Richard Biener --- Fixed for GCC 12.
[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #12 from Jakub Jelinek --- They do. Though, in the combined patch I'm still a little bit worried about the first 4 modified peephole2s, the last 4 look good to me. The last 4 are where the original insn did a normal DFmode store and your patch restores those DFmode stores. But the first 4 had an atomic store followed by a DFmode read, shouldn't those preserve an atomic store instead of the DFmode store? A non-atomic DFmode read is one thing, but it could be followed later by atomic loads, both into DFmode and ones into DImode that would check the whole bit pattern.
[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #13 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #12) > They do. Though, in the combined patch I'm still a little bit worried about > the first 4 modified peephole2s, the last 4 look good to me. > The last 4 are where the original insn did a normal DFmode store and your > patch restores those DFmode stores. > But the first 4 had an atomic store followed by a DFmode read, shouldn't > those > preserve an atomic store instead of the DFmode store? A non-atomic DFmode > read is one thing, but it could be followed later by atomic loads, both into > DFmode and ones into DImode that would check the whole bit pattern. DFmode loads and stores *are* atomic, this is what the optimization is based on.
[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #14 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #13) > DFmode loads and stores *are* atomic, this is what the optimization is based > on. Loads and stores to/from x87 and SSE registers, to be clear.
[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #15 from Jakub Jelinek --- Yes, but do they preserve all the bits and never modify any bit patterns, including qNaNs and sNaNs? I thought the point of using the fistp was that it preserves everything.
[Bug fortran/100227] New: write with implicit loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100227 Bug ID: 100227 Summary: write with implicit loop Product: gcc Version: 8.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: priv123 at hotmail dot fr Target Milestone: --- Created attachment 50657 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50657&action=edit example code The code in attachment runs properly with gfortran 6.3.0 with -O1 and with gfortran 8.3.0 with -O0, but fails with gfortran 8.3.0 with -O1: + gfortran --version GNU Fortran (Debian 8.3.0-6) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + gfortran -Wall -Wextra -O1 pb_write.F90 + ./a.out KO if -O1/gcc8 4.57776182E-41 -773.585938 3.06225753E-41 OK: add index1 1. 5 5. 9 9. OK: rearranged 1. 5. 9. OK: two vars 1. 5. 9. + gfortran -Wall -Wextra -O0 pb_write.F90 + ./a.out KO if -O1/gcc8 1. 5. 9. OK: add index1 1. 5 5. 9 9. OK: rearranged 1. 5. 9. OK: two vars 1. 5. 9. As see in bug 86837, it is also ok with -O1 -fno-frontend-optimize. I'm emitting a new bug since it should have been fixed in 8.2.1 and it seems in failure with 8.3.0. I will check and post results with last available docker images to see if it is fixed after 8.3.0... Thanks for reading!
[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #16 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #15) > Yes, but do they preserve all the bits and never modify any bit patterns, > including qNaNs and sNaNs? I thought the point of using the fistp was that > it preserves everything. Hm, they don't...
[Bug other/100174] Binary floating-point conversion under source-gcc/gcc/real.[c\h] test on x86-64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100174 --- Comment #2 from LinoPeng <608410104 at alum dot ccu.edu.tw> --- Hi Andrew Pinski, I am new here. I am sorry if I had offended you. Float a = 0.. It 23 fraction bits is "01010101010011001001100". I trace gcc-9.2.0 source code from real.c. I founded sig[SIGSZ-1] would be clear another 41 bits to zero (sig[SIGSZ-1] have 64bits -> 64 - 23 = 41). Also double too. Before clear sig[SIGSZ-1] = 01010101 01001100 10011000 0101 0110 0110 10010100 0100011 After clear sig[SIGSZ-1] = 01010101 01001100 10011000 000 As below statement that is real.c source code. r is const REAL_VALUE_TYPE. "sig = (r->sig[SIGSZ-1] >> (HOST_BITS_PER_LONG - 24)) & 0x7f;" I mean's why not just truncated 23 bits in sig[SIGSZ-1]. Do or do not using the function "clear_significand_below" it do not effect sig record fraction bit.
[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #17 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #16) > (In reply to Jakub Jelinek from comment #15) > > Yes, but do they preserve all the bits and never modify any bit patterns, > > including qNaNs and sNaNs? I thought the point of using the fistp was that > > it preserves everything. > > Hm, they don't... This probably means we have to remove x87 peepholes, where an atomic store is followed by a DFmode read. x87 can't load and store DFmode untouched without fild/fistp pair.
[Bug target/100152] [10/11/12 Regression] used caller-saved register not preserved across a call.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100152 Richard Biener changed: What|Removed |Added Priority|P3 |P2 Summary|[10.3, 11, 12 Regression] |[10/11/12 Regression] used |used caller-saved register |caller-saved register not |not preserved across a |preserved across a call. |call. | Keywords||ra --- Comment #30 from Richard Biener --- (In reply to Iain Sandoe from comment #29) > what is also somewhat peculiar is that replacing the first function in the > reduced test case with "extern void ___UTF_8_put(char *a, int b);" changes > the code-gen for the second function. That might hint at IPA RA which you can try disabling via -fno-ipa-ra which in turn hints at a target issue. I'm seeing whether a cross reproduces the issue on your reduced testcase. Btw, the GIMPLE optimization change just exposes the issue - it can have no influence on the used registers.
[Bug target/100217] [11/12 Regression] ICE when building valgrind testsuite with -march=z14 since r11-7552
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100217 --- Comment #2 from Jakub Jelinek --- Created attachment 50658 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50658&action=edit gcc11-pr100217.patch Untested fix. IMHO when we have a hard reg in the inline asm, we just need to honor it, trying to force it into a pseudo and then subreg would just mean the user chosen reg is not guaranteed anymore.
[Bug libstdc++/100223] Missing early return in std::partial_sort
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100223 --- Comment #1 from Jonathan Wakely --- Arguably, the caller can do this check if they think it can occur in their code. That way all calls to the algorithm don't pay for the check. But it's probably cheap enough to check anyway.
[Bug libstdc++/100226] [11/12 Regression] c++/11/bits/stl_tree.h:770:8: error: static assertion failed: comparison object must be invocable as const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100226 --- Comment #3 from Jonathan Wakely --- The static assert was added intentionally, the comparison function used with the container must have a const-qualified operator(). I would check that in the nurses code first.
[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #18 from Jakub Jelinek --- Indeed.
[Bug target/100228] New: repeated std::atomic::load() misoptimized by x87 peephole
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100228 Bug ID: 100228 Summary: repeated std::atomic::load() misoptimized by x87 peephole Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: aoliva at gcc dot gnu.org Target Milestone: --- Target: i686-pc-linux-gnu compile this with -O2 -mfpmath=387 -mno-sse #include int main() { std::atomic a0; std::atomic a1(1.0); a0 = a1.load(); if (a0.load() != a1.load()) __builtin_abort (); } it aborts because the first a1.load() is optimized by sync.md:398: (define_peephole2 [(set (match_operand:DF 0 "memory_operand") (match_operand:DF 1 "any_fp_register_operand")) (set (mem:BLK (scratch:SI)) (unspec:BLK [(mem:BLK (scratch:SI))] UNSPEC_MEMORY_BLOCKAGE)) (set (match_operand:DF 2 "fp_register_operand") (unspec:DF [(match_operand:DI 3 "memory_operand")] UNSPEC_FILD_ATOMIC)) (set (match_operand:DI 4 "memory_operand") (unspec:DI [(match_dup 2)] UNSPEC_FIST_ATOMIC))] "!TARGET_64BIT && peep2_reg_dead_p (4, operands[2]) && rtx_equal_p (XEXP (operands[0], 0), XEXP (operands[3], 0))" [(const_int 0)] { emit_insn (gen_memory_blockage ()); emit_move_insn (gen_lowpart (DFmode, operands[4]), operands[1]); DONE; }) the memory location operands[0] stored into by the first instruction is reused and loaded again in the second a1.load(), but after this peephole, there's no store before the load. I don't think we have infrastructure in peephole to test whether there are any other uses of a store, so I think we have to keep it. There are other variations of this peephole around it, that appear to have the same problem.
[Bug libstdc++/100226] [11/12 Regression] c++/11/bits/stl_tree.h:770:8: error: static assertion failed: comparison object must be invocable as const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100226 --- Comment #4 from Jonathan Wakely --- The template argument '_Compare = paircmp' shows the type user as the comparison object. So paircmp::operator() needs to be const.
[Bug libstdc++/100226] [11/12 Regression] c++/11/bits/stl_tree.h:770:8: error: static assertion failed: comparison object must be invocable as const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100226 Martin Liška changed: What|Removed |Added Resolution|--- |INVALID Status|NEW |RESOLVED --- Comment #5 from Martin Liška --- (In reply to Jonathan Wakely from comment #4) > The template argument '_Compare = paircmp' shows the type user as the > comparison object. > > So paircmp::operator() needs to be const. I can confirm that it works, thank you for help!
[Bug target/100152] [10/11/12 Regression] used caller-saved register not preserved across a call.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100152 --- Comment #31 from Iain Sandoe --- (In reply to Richard Biener from comment #30) > (In reply to Iain Sandoe from comment #29) > > what is also somewhat peculiar is that replacing the first function in the > > reduced test case with "extern void ___UTF_8_put(char *a, int b);" changes > > the code-gen for the second function. > > That might hint at IPA RA which you can try disabling via -fno-ipa-ra which > in turn hints at a target issue. yeah, it does switch back to using rbx, at least on the reduced test case. > Btw, the GIMPLE optimization change just exposes the issue - it can have no > influence on the used registers. indeed, it seemed more likely to be "exposed by".
[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #2 from Tom de Vries --- Minimal example: ... $ cat libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c int main (void) { int vectors_max = -1; #pragma acc parallel \ num_gangs (1) \ num_workers (1) \ vector_length (32) \ copy (vectors_max) { #pragma acc loop gang reduction (max: vectors_max) for (int i = 0; i < 2; i++) #pragma acc loop worker reduction (max: vectors_max) for (int j = 0; j < 2; j++) #pragma acc loop vector reduction (max: vectors_max) for (int k = 0; k < 32; k++) vectors_max = k; } if (vectors_max != 31) __builtin_abort (); return 0; } ... Passes with GOMP_NVPTX_JIT=-O0, starts failing at GOMP_NVPTX_JIT=-O1.
[Bug libstdc++/100226] [11/12 Regression] c++/11/bits/stl_tree.h:770:8: error: static assertion failed: comparison object must be invocable as const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100226 --- Comment #6 from Jonathan Wakely --- Bug 89370 would really help simplify this diagnostic. The last three lines would be: .../src/NCPkgFilterPattern.cc:343:28: required from here /usr/include/c++/11/bits/stl_tree.h:770:8: error: static assertion failed: comparison object must be invocable as const 770 | |^ /usr/include/c++/11/bits/stl_tree.h:770:8: note: ‘std::is_invocable_v’ evaluates to false Which is pretty clear, I think.
[Bug libstdc++/100179] [12 regression] xtreme-header-2_a.H fails on arm-eabi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100179 --- Comment #6 from Christophe Lyon --- Yes, I confirm it's now fixed, thanks!
[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 --- Comment #19 from Jakub Jelinek --- Perhaps best would be to try to construct a testcase for each of the peephole2s and try some bit pattern that isn't preserved through the FPU except for fistp/fildp and see what enabling/disabling each of the peephole2s does to it.
[Bug fortran/100227] write with implicit loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100227 Mathieu changed: What|Removed |Added Known to work||6.3.0 Known to fail||10.3.0, 9.3.0 --- Comment #1 from Mathieu --- same problem reproduced with 9.3.0 and 10.3.0 (from docker)
[Bug target/100228] repeated std::atomic::load() misoptimized by x87 peephole
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100228 --- Comment #1 from Jonathan Wakely --- Is this the same cause as bug 100182?
[Bug libstdc++/100179] [12 regression] xtreme-header-2_a.H fails on arm-eabi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100179 --- Comment #7 from Jonathan Wakely --- Great, thanks for report, so that this could be fixed for gcc-11.
[Bug target/100228] repeated std::atomic::load() misoptimized by x87 peephole
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100228 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #2 from Jakub Jelinek --- Yes. *** This bug has been marked as a duplicate of bug 100182 ***
[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 Jakub Jelinek changed: What|Removed |Added CC||aoliva at gcc dot gnu.org --- Comment #20 from Jakub Jelinek --- *** Bug 100228 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971 --- Comment #10 from andysem at mail dot ru --- Thanks. Will this be backported to 10 and 11 branches?
[Bug libstdc++/100223] Missing early return in std::partial_sort
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100223 --- Comment #2 from 康桓瑋 --- (In reply to Jonathan Wakely from comment #1) > Arguably, the caller can do this check if they think it can occur in their > code. That way all calls to the algorithm don't pay for the check. > > But it's probably cheap enough to check anyway. Exactly, since the is full of such checks, I think there is nothing wrong with adding one for partial_sort.
[Bug target/100152] [10/11/12 Regression] used caller-saved register not preserved across a call.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100152 --- Comment #32 from Iain Sandoe --- (In reply to Iain Sandoe from comment #31) > (In reply to Richard Biener from comment #30) > > (In reply to Iain Sandoe from comment #29) > > > what is also somewhat peculiar is that replacing the first function in the > > > reduced test case with "extern void ___UTF_8_put(char *a, int b);" changes > > > the code-gen for the second function. > > > > That might hint at IPA RA which you can try disabling via -fno-ipa-ra which > > in turn hints at a target issue. > > yeah, it does switch back to using rbx, at least on the reduced test case. (also on the original). I wonder if the problem is that IPA can't "see" the lazy symbol resolver, so it just sees a call to ___UTF_8_put and doesn't know that this will be resolved indirectly. .. but something similar must apply to PLT and targets with linker veneers ?
[Bug target/100152] [10/11/12 Regression] used caller-saved register not preserved across a call.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100152 Richard Biener changed: What|Removed |Added CC||vmakarov at gcc dot gnu.org --- Comment #33 from Richard Biener --- (In reply to Iain Sandoe from comment #32) > (In reply to Iain Sandoe from comment #31) > > (In reply to Richard Biener from comment #30) > > > (In reply to Iain Sandoe from comment #29) > > > > what is also somewhat peculiar is that replacing the first function in > > > > the > > > > reduced test case with "extern void ___UTF_8_put(char *a, int b);" > > > > changes > > > > the code-gen for the second function. > > > > > > That might hint at IPA RA which you can try disabling via -fno-ipa-ra > > > which > > > in turn hints at a target issue. > > > > yeah, it does switch back to using rbx, at least on the reduced test case. > > (also on the original). > > I wonder if the problem is that IPA can't "see" the lazy symbol resolver, so > it just sees a call to ___UTF_8_put and doesn't know that this will be > resolved indirectly. > > .. but something similar must apply to PLT and targets with linker veneers ? I don't know how IPA RA works in detail but obviously the target has to expose this detail. It looks like IPA RA causes us to add some notes to call insns which are supposed to describe those details and there's collect_fn_hard_reg_usage which looks at the target function (but likely does not include the ABI details of the call itself, in this case the resolver).
[Bug target/100152] [10/11/12 Regression] used caller-saved register not preserved across a call.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100152 --- Comment #34 from Richard Biener --- (In reply to Richard Biener from comment #33) > (In reply to Iain Sandoe from comment #32) > > (In reply to Iain Sandoe from comment #31) > > > (In reply to Richard Biener from comment #30) > > > > (In reply to Iain Sandoe from comment #29) > > > > > what is also somewhat peculiar is that replacing the first function > > > > > in the > > > > > reduced test case with "extern void ___UTF_8_put(char *a, int b);" > > > > > changes > > > > > the code-gen for the second function. > > > > > > > > That might hint at IPA RA which you can try disabling via -fno-ipa-ra > > > > which > > > > in turn hints at a target issue. > > > > > > yeah, it does switch back to using rbx, at least on the reduced test case. > > > > (also on the original). > > > > I wonder if the problem is that IPA can't "see" the lazy symbol resolver, so > > it just sees a call to ___UTF_8_put and doesn't know that this will be > > resolved indirectly. > > > > .. but something similar must apply to PLT and targets with linker veneers ? > > I don't know how IPA RA works in detail but obviously the target has to > expose this detail. It looks like IPA RA causes us to add some notes to > call insns which are supposed to describe those details and there's > collect_fn_hard_reg_usage which looks at the target function (but likely > does not include the ABI details of the call itself, in this case the > resolver). @deftypevr {Target Hook} bool TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS Set to true if each call that binds to a local definition explicitly clobbers or sets all non-fixed registers modified by performing the call. That is, by the call pattern itself, or by code that might be inserted by the linker (e.g.@: stubs, veneers, branch islands), but not including those modifiable by the callee. The affected registers may be mentioned explicitly in the call pattern, or included as clobbers in CALL_INSN_FUNCTION_USAGE. The default version of this hook is set to false. The purpose of this hook is to enable the fipa-ra optimization. @end deftypevr might be relevant - though when compiling for a shared library the call to ___UTF_8_put does not bind locally (but then IPA RA shouldn't apply either I guess). So, does ___UTF_8_put bind locally?
[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971 --- Comment #11 from rguenther at suse dot de --- On Fri, 23 Apr 2021, andysem at mail dot ru wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971 > > --- Comment #10 from andysem at mail dot ru --- > Thanks. Will this be backported to 10 and 11 branches? I don't plan to since it isn't a regression as far as I know, it doesn't apply to GCC 10 so definitely not there. I'll consider for GCC 11.
[Bug target/100152] [10/11/12 Regression] used caller-saved register not preserved across a call.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100152 --- Comment #35 from Richard Biener --- Which means another possible candidate for the "bug" is darwin_binds_local_p
[Bug target/100152] [10/11/12 Regression] used caller-saved register not preserved across a call.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100152 --- Comment #36 from Iain Sandoe --- (In reply to Richard Biener from comment #35) > Which means another possible candidate for the "bug" is darwin_binds_local_p yeah... see below. > > > .. but something similar must apply to PLT and targets with linker > > > veneers ? > > > > I don't know how IPA RA works in detail but obviously the target has to > > expose this detail. It looks like IPA RA causes us to add some notes to > > call insns which are supposed to describe those details and there's > > collect_fn_hard_reg_usage which looks at the target function (but likely > > does not include the ABI details of the call itself, in this case the > > resolver). > @deftypevr {Target Hook} bool TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS > Set to true if each call that binds to a local definition explicitly > clobbers or sets all non-fixed registers modified by performing the call. > That is, by the call pattern itself, or by code that might be inserted by the > linker (e.g.@: stubs, veneers, branch islands), but not including those > modifiable by the callee. The affected registers may be mentioned explicitly > in the call pattern, or included as clobbers in CALL_INSN_FUNCTION_USAGE. > The default version of this hook is set to false. The purpose of this hook > is to enable the fipa-ra optimization. > @end deftypevr thanks for the pointer, I'll take a look at that when i have some cycles. I guess it was never added at the time the IPA stuff was done... and somehow we "got away with it" mostly. > might be relevant - though when compiling for a shared library the call > to ___UTF_8_put does not bind locally (but then IPA RA shouldn't apply > either I guess). So, does ___UTF_8_put bind locally? extern void ___UTF_8_put (char* *ptr, unsigned int c) If it does, then that's also a bug :), will have to check (sometime later). (we are always building with fPIC for x86_64, snd don't specifically identify that the result will be a shlib [all Darwin exes are DSOs too] - although Linux does identify shlibs as something special).
[Bug tree-optimization/99726] [10 Regression] ICE in create_intersect_range_checks_index, at tree-data-ref.c:1855 since r10-4762-gf9d6338bd15ce1fae36bf25d3a0545e9678ddc58
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99726 --- Comment #8 from CVS Commits --- The releases/gcc-10 branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:7e2db68a77fb211898a024c5a7ad7c4449c7e355 commit r10-9749-g7e2db68a77fb211898a024c5a7ad7c4449c7e355 Author: Richard Sandiford Date: Fri Apr 23 10:09:38 2021 +0100 data-ref: Tighten index-based alias checks [PR99726] create_intersect_range_checks_index tries to create a runtime alias check based on index comparisons. It looks through the access functions for the two DRs to find a SCEV for the loop that is being versioned and converts a DR_STEP-based check into an index-based check. However, there isn't any reliable sign information in the types, so the code expects the value of the IV step (when interpreted as signed) to be negative iff the DR_STEP (when interpreted as signed) is negative. r10-4762 added another assert related to this assumption and the assert fired for the testcase in the PR. The sign of the IV step didn't match the sign of the DR_STEP. I think this is actually showing what was previously a wrong-code bug. The signs didn't match because the DRs contained *two* access function SCEVs for the loop being versioned. It doesn't look like the code is set up to deal with this, since it checks each access function independently and treats it as the sole source of DR_STEP. The patch therefore moves the main condition out of the loop. This also has the advantage of not building a tree for one access function only to throw it away if we find an inner function that makes the comparison invalid. gcc/ PR tree-optimization/99726 * tree-data-ref.c (create_intersect_range_checks_index): Bail out if there is more than one access function SCEV for the loop being versioned. gcc/testsuite/ PR tree-optimization/99726 * gcc.target/i386/pr99726.c: New test. (cherry picked from commit b5c7accfb56a7347008f629be4c7344dd849b1b1)
[Bug tree-optimization/98268] [10 Regression] ICE: verify_gimple failed with LTO and SVE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98268 --- Comment #11 from CVS Commits --- The releases/gcc-10 branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:18a190c3ee32548de3888b7a64f701999893727b commit r10-9750-g18a190c3ee32548de3888b7a64f701999893727b Author: Richard Sandiford Date: Fri Apr 23 10:09:39 2021 +0100 gimple-fold: Recompute ADDR_EXPR flags after folding a TMR [PR98268] The gimple verifier picked up that an ADDR_EXPR of a MEM_REF was not marked TREE_CONSTANT even though the address was in fact invariant. This came from folding a &TARGET_MEM_REF with constant operands to a &MEM_REF; &TARGET_MEM_REF is never treated as TREE_CONSTANT but &MEM_REF can be. gcc/ PR tree-optimization/98268 * gimple-fold.c (maybe_canonicalize_mem_ref_addr): Call recompute_tree_invariant_for_addr_expr after successfully folding a TARGET_MEM_REF that occurs inside an ADDR_EXPR. gcc/testsuite/ PR tree-optimization/98268 * gcc.target/aarch64/sve/pr98268-1.c: New test. * gcc.target/aarch64/sve/pr98268-2.c: Likewise. (cherry picked from commit c778968339afd140380a46edbade054667c7dce2)
[Bug tree-optimization/98726] [10/11 Regression] SVE: tree check: expected integer_cst, have poly_int_cst in to_wide, at tree.h:5984
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98726 --- Comment #13 from CVS Commits --- The releases/gcc-10 branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:8849e4a94550ffc9a564c105f0cefed5f42b3a7d commit r10-9752-g8849e4a94550ffc9a564c105f0cefed5f42b3a7d Author: Richard Biener Date: Fri Apr 23 10:09:40 2021 +0100 middle-end/98726 - fix VECTOR_CST element access This fixes VECTOR_CST element access with POLY_INT elements and allows to produce dump files of the PR98726 testcase without ICEing. 2021-04-23 Richard Biener PR middle-end/98726 * tree.h (vector_cst_int_elt): Remove. * tree.c (vector_cst_int_elt): Use poly_wide_int for computations, make static. (cherry picked from commit 4b59dbb5d6759e43bfa23161a8d3feb9ae969e1a)
[Bug target/98136] [8/9/10 Regression] [aarch64] Internal compiler error with large classes and virtual methods since r8-5967-gf5470a77425a54efebfe1732488c40f05ef176d0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98136 --- Comment #6 from CVS Commits --- The releases/gcc-10 branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:63da018de828b4792e95d1431118fd10efef87d1 commit r10-9751-g63da018de828b4792e95d1431118fd10efef87d1 Author: Richard Sandiford Date: Fri Apr 23 10:09:40 2021 +0100 aarch64: Tweak post-RA handling of CONST_INT moves [PR98136] This PR is a regression caused by r8-5967, where we replaced a call to aarch64_internal_mov_immediate in aarch64_add_offset with a call to aarch64_force_temporary, which in turn uses the normal emit_move_insn{,_1} routines. The problem is that aarch64_add_offset can be called while outputting a thunk, where we require all instructions to be valid without splitting. However, the move expanders were not splitting CONST_INT moves themselves. I think the right fix is to make the move expanders work even in this scenario, rather than require callers to handle it as a special case. gcc/ PR target/98136 * config/aarch64/aarch64.md (mov): Pass multi-instruction CONST_INTs to aarch64_expand_mov_immediate when called after RA. gcc/testsuite/ PR target/98136 * g++.dg/pr98136.C: New test. (cherry picked from commit 48c79f054bf435051c95ee093c45a0f8c9de5b4e)
[Bug tree-optimization/98726] [10/11 Regression] SVE: tree check: expected integer_cst, have poly_int_cst in to_wide, at tree.h:5984
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98726 --- Comment #14 from CVS Commits --- The releases/gcc-10 branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:dc9233a4f65a67ca280903d60d57c5fd5d95303e commit r10-9753-gdc9233a4f65a67ca280903d60d57c5fd5d95303e Author: Richard Sandiford Date: Fri Apr 23 10:09:41 2021 +0100 Handle CONST_POLY_INTs in CONST_VECTORs [PR97141, PR98726] This PR is caused by POLY_INT_CSTs being (necessarily) valid in tree-level VECTOR_CSTs but CONST_POLY_INTs not being valid in RTL CONST_VECTORs. I can't tell/remember how deliberate that was, but I'm guessing not very. In particular, valid_for_const_vector_p was added to guard against symbolic constants rather than CONST_POLY_INTs. I did briefly consider whether we should maintain the current status anyway. However, that would then require a way of constructing variable-length vectors from individiual elements if, say, we have: { [2, 2], [3, 2], [4, 2], ⦠} So I'm chalking this up to an oversight. I think the intention (and certainly the natural thing) is to have the same rules for both trees and RTL. The SVE CONST_VECTOR code should already be set up to handle CONST_POLY_INTs. However, we need to add support for Advanced SIMD CONST_VECTORs that happen to contain SVE-based values. The patch does that by expanding such CONST_VECTORs in the same way as variable vectors. gcc/ PR rtl-optimization/97141 PR rtl-optimization/98726 * emit-rtl.c (valid_for_const_vector_p): Return true for CONST_POLY_INT_P. * rtx-vector-builder.h (rtx_vector_builder::step): Return a poly_wide_int instead of a wide_int. (rtx_vector_builder::apply_set): Take a poly_wide_int instead of a wide_int. * rtx-vector-builder.c (rtx_vector_builder::apply_set): Likewise. * config/aarch64/aarch64.c (aarch64_legitimate_constant_p): Return false for CONST_VECTORs that cannot be forced to memory. * config/aarch64/aarch64-simd.md (mov): If a CONST_VECTOR is too complex to force to memory, build it up from individual elements instead. gcc/testsuite/ PR rtl-optimization/97141 PR rtl-optimization/98726 * gcc.c-torture/compile/pr97141.c: New test. * gcc.c-torture/compile/pr98726.c: Likewise. * gcc.target/aarch64/sve/pr97141.c: Likewise. * gcc.target/aarch64/sve/pr98726.c: Likewise. (cherry picked from commit 1b5f74e8be4dd7abe5624ff60adceff19ca71bda)
[Bug target/97141] [10 Regression] aarch64, SVE: ICE in decompose, at rtl.h (during expand) since r10-4676-g9c437a108a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97141 --- Comment #8 from CVS Commits --- The releases/gcc-10 branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:dc9233a4f65a67ca280903d60d57c5fd5d95303e commit r10-9753-gdc9233a4f65a67ca280903d60d57c5fd5d95303e Author: Richard Sandiford Date: Fri Apr 23 10:09:41 2021 +0100 Handle CONST_POLY_INTs in CONST_VECTORs [PR97141, PR98726] This PR is caused by POLY_INT_CSTs being (necessarily) valid in tree-level VECTOR_CSTs but CONST_POLY_INTs not being valid in RTL CONST_VECTORs. I can't tell/remember how deliberate that was, but I'm guessing not very. In particular, valid_for_const_vector_p was added to guard against symbolic constants rather than CONST_POLY_INTs. I did briefly consider whether we should maintain the current status anyway. However, that would then require a way of constructing variable-length vectors from individiual elements if, say, we have: { [2, 2], [3, 2], [4, 2], ⦠} So I'm chalking this up to an oversight. I think the intention (and certainly the natural thing) is to have the same rules for both trees and RTL. The SVE CONST_VECTOR code should already be set up to handle CONST_POLY_INTs. However, we need to add support for Advanced SIMD CONST_VECTORs that happen to contain SVE-based values. The patch does that by expanding such CONST_VECTORs in the same way as variable vectors. gcc/ PR rtl-optimization/97141 PR rtl-optimization/98726 * emit-rtl.c (valid_for_const_vector_p): Return true for CONST_POLY_INT_P. * rtx-vector-builder.h (rtx_vector_builder::step): Return a poly_wide_int instead of a wide_int. (rtx_vector_builder::apply_set): Take a poly_wide_int instead of a wide_int. * rtx-vector-builder.c (rtx_vector_builder::apply_set): Likewise. * config/aarch64/aarch64.c (aarch64_legitimate_constant_p): Return false for CONST_VECTORs that cannot be forced to memory. * config/aarch64/aarch64-simd.md (mov): If a CONST_VECTOR is too complex to force to memory, build it up from individual elements instead. gcc/testsuite/ PR rtl-optimization/97141 PR rtl-optimization/98726 * gcc.c-torture/compile/pr97141.c: New test. * gcc.c-torture/compile/pr98726.c: Likewise. * gcc.target/aarch64/sve/pr97141.c: Likewise. * gcc.target/aarch64/sve/pr98726.c: Likewise. (cherry picked from commit 1b5f74e8be4dd7abe5624ff60adceff19ca71bda)
[Bug target/99249] [8/9/10 Backport] SVE: ICE in aarch64_expand_sve_const_vector (during RTL pass: early_remat)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99249 --- Comment #5 from CVS Commits --- The releases/gcc-10 branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:690aa217cf2882e58a0572171a3dd8e346f616cf commit r10-9754-g690aa217cf2882e58a0572171a3dd8e346f616cf Author: Richard Sandiford Date: Fri Apr 23 10:09:42 2021 +0100 aarch64: Handle more SVE vector constants [PR99246] PR99246 is about a case in which we failed to handle a CONST_VECTOR with NELTS_PER_PATTERN==2, i.e. a vector with a âforegroundâ sequence of N vectors followed by a repeating âbackgroundâ sequence of N vectors. At the moment, it's difficult to produce these vectors directly, but I'm hoping that for GCC 12 we'll do more folding, which will in turn make this easier to test and easier to optimise. Until then, the patch simply relies on the testcase in the PR. gcc/ PR target/99249 * config/aarch64/aarch64.c (aarch64_expand_sve_const_vector_sel): New function. (aarch64_expand_sve_const_vector): Use it for nelts_per_pattern==2. gcc/testsuite/ PR target/99249 * gcc.target/aarch64/sve/acle/general/pr99246.c: New test. (cherry picked from commit a065e0bb092a010664777394530ab1a52bb5293b)
[Bug tree-optimization/99726] [10 Regression] ICE in create_intersect_range_checks_index, at tree-data-ref.c:1855 since r10-4762-gf9d6338bd15ce1fae36bf25d3a0545e9678ddc58
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99726 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #9 from rsandifo at gcc dot gnu.org --- Fixed.
[Bug tree-optimization/98268] [10 Regression] ICE: verify_gimple failed with LTO and SVE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98268 rsandifo at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #12 from rsandifo at gcc dot gnu.org --- Fixed.
[Bug fortran/100227] [8/9/10/11/12 Regression] write with implicit loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100227 Richard Biener changed: What|Removed |Added Summary|write with implicit loop|[8/9/10/11/12 Regression] ||write with implicit loop Keywords||wrong-code Target Milestone|--- |8.5 Priority|P3 |P4
[Bug tree-optimization/100222] Redundant mark_irreducible_loops () in predicate.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100222 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed||2021-04-23 Status|UNCONFIRMED |ASSIGNED Version|tree-ssa|12.0 --- Comment #1 from Richard Biener --- Mine.
[Bug target/97141] [10 Regression] aarch64, SVE: ICE in decompose, at rtl.h (during expand) since r10-4676-g9c437a108a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97141 rsandifo at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #9 from rsandifo at gcc dot gnu.org --- Fixed.
[Bug target/100182] [8/9/10/11/12 Regression] Miscompilation of atomic_float/1.cc and atomic_float/wait_notify.cc on i686
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100182 Uroš Bizjak changed: What|Removed |Added Attachment #50649|0 |1 is obsolete|| --- Comment #21 from Uroš Bizjak --- Created attachment 50659 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50659&action=edit Proposed patch Here is the complete proposed patch. We can retain problematic atomic store followed by a DFmode load peepholes as long as we have a load to the SSE register. Load to the SSE register uses movlps/movq moves that preserve all bits, so we are sure the store to a memory location is unchanged from the original. However, "load to the SSE register" requirement makes the peephole ineffective for -mfpmath=387, so XFAILs are added to affected testcases.
[Bug tree-optimization/98069] [8/9/10 Regression] Miscompilation with -O3 since r8-2380-g2d7744d4ef93bfff
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98069 rsandifo at gcc dot gnu.org changed: What|Removed |Added Target Milestone|8.5 |10.4 --- Comment #7 from rsandifo at gcc dot gnu.org --- As discussed on irc, the fix was quite invasive, so it seems a bit dangerous to backport further than GCC 10. Will backport to GCC 10 in the GCC 11.2 timeframe, once we've had more chance to see if there's any fallout.
[Bug tree-optimization/97960] [8/9/10 Regression] Wrong code at -O3 since r8-6511-g3ae129323d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97960 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |DUPLICATE --- Comment #12 from rsandifo at gcc dot gnu.org --- Tracking backports in PR98069 *** This bug has been marked as a duplicate of bug 98069 ***
[Bug tree-optimization/98069] [8/9/10 Regression] Miscompilation with -O3 since r8-2380-g2d7744d4ef93bfff
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98069 rsandifo at gcc dot gnu.org changed: What|Removed |Added CC||acoplan at gcc dot gnu.org --- Comment #8 from rsandifo at gcc dot gnu.org --- *** Bug 97960 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/95396] [8/9/10 Regression] GCC produces incorrect code with -O3 for loops since r8-6511-g3ae129323d150621
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95396 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |DUPLICATE --- Comment #14 from rsandifo at gcc dot gnu.org --- Tracking backports in PR98069 *** This bug has been marked as a duplicate of bug 98069 ***
[Bug tree-optimization/98069] [8/9/10 Regression] Miscompilation with -O3 since r8-2380-g2d7744d4ef93bfff
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98069 --- Comment #9 from rsandifo at gcc dot gnu.org --- *** Bug 95396 has been marked as a duplicate of this bug. ***
[Bug target/100229] New: arm: UB in arm_block_set_aligned_non_vect (shift exponent 32 is too large for 32-bit type)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100229 Bug ID: 100229 Summary: arm: UB in arm_block_set_aligned_non_vect (shift exponent 32 is too large for 32-bit type) Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: acoplan at gcc dot gnu.org Target Milestone: --- $ cat test.c int a[1]; __attribute((always_inline)) void g(void *c, int d, int e) { __builtin_memset(c, d, e); } void f() { g(a, 0, 0); } $ gcc/xgcc -B gcc -c test.c -ftree-ter test.c:2:35: warning: ‘always_inline’ function might not be inlinable [-Wattributes] 2 | __attribute((always_inline)) void g(void *c, int d, int e) { | ^ /data_sdb/toolchain/src/gcc/gcc/config/arm/arm.c:32358:22: runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int' #0 0x2427a05 in arm_block_set_aligned_non_vect(rtx_def*, unsigned long, unsigned long, unsigned long) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x2427a05) #1 0x2428ce6 in arm_gen_setmem(rtx_def**) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x2428ce6) #2 0x2c32189 in gen_setmemsi(rtx_def*, rtx_def*, rtx_def*, rtx_def*) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x2c32189) #3 0x16ddcf1 in rtx_insn* insn_gen_fn::operator()(rtx_def*, rtx_def*, rtx_def*, rtx_def*) const (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x16ddcf1) #4 0x16dcd35 in maybe_gen_insn(insn_code, unsigned int, expand_operand*) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x16dcd35) #5 0x16dd513 in maybe_expand_insn(insn_code, unsigned int, expand_operand*) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x16dd513) #6 0x10561d0 in set_storage_via_setmem(rtx_def*, rtx_def*, rtx_def*, unsigned int, unsigned int, long, unsigned long, unsigned long, unsigned long) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x10561d0) #7 0xc7df36 in expand_builtin_memset_args(tree_node*, tree_node*, tree_node*, rtx_def*, machine_mode, tree_node*) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0xc7df36) #8 0xc7db6f in expand_builtin_memset(tree_node*, rtx_def*, machine_mode) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0xc7db6f) #9 0xc8ccc8 in expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0xc8ccc8) #10 0x108ff58 in expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x108ff58) #11 0x1079c93 in expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x1079c93) #12 0xd0c350 in expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0xd0c350) #13 0xd1b56d in expand_call_stmt(gcall*) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0xd1b56d) #14 0xd211e3 in expand_gimple_stmt_1(gimple*) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0xd211e3) #15 0xd21b56 in expand_gimple_stmt(gimple*) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0xd21b56) #16 0xd32062 in expand_gimple_basic_block(basic_block_def*, bool) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0xd32062) #17 0xd35db8 in (anonymous namespace)::pass_expand::execute(function*) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0xd35db8) #18 0x17d2355 in execute_one_pass(opt_pass*) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x17d2355) #19 0x17d2b6e in execute_pass_list_1(opt_pass*) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x17d2b6e) #20 0x17d2c65 in execute_pass_list(function*, opt_pass*) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x17d2c65) #21 0xdf27ac in cgraph_node::expand() (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0xdf27ac) #22 0xdf3a23 in cgraph_order_sort::process() (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0xdf3a23) #23 0xdf4135 in output_in_order() (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0xdf4135) #24 0xdf4e01 in symbol_table::compile() (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0xdf4e01) #25 0xdf55ea in symbol_table::finalize_compilation_unit() (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0xdf55ea) #26 0x1ac3baa in compile_file() (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x1ac3baa) #27 0x1ac8a15 in do_compile() (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x1ac8a15) #28 0x1ac8f10 in toplev::main(int, char**) (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x1ac8f10) #29 0x36a5ee7 in main (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x36a5ee7) #30 0x75ca1bf6 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6) #31 0x980249 in _start (/data_sdb/toolchain/cc1s/ubsan-arm/gcc/cc1+0x980249)
[Bug target/100214] UB in arm.c:optimal_immediate_sequence_1 (left shift of 255 by 30 places cannot be represented in type 'int')
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100214 Richard Earnshaw changed: What|Removed |Added Last reconfirmed||2021-04-23 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Richard Earnshaw --- Confirmed by visual inspection of source. There look to be a number of signed/unsigned confusions in this function.
[Bug c++/98297] [8/9/10/11 Regression] ICE in cp_parser_elaborated_type_specifier, at cp/parser.c:19653
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98297 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #5 from Jakub Jelinek --- Note, the testcase FAILs on the 8 branch, the emitted error is different. $ gcc-9/obj28/gcc/cc1plus -quiet -std=c++11 /tmp/pr98297.C -o /tmp/pr98297.s /tmp/pr98297.C:5:1: warning: ‘b’ attribute directive ignored [-Wattributes] 5 | a ; // { dg-error "does not declare anything" } | ^~~ /tmp/pr98297.C:5:1: error: declaration does not declare anything [-fpermissive] $ gcc-8/obj32/gcc/cc1plus -quiet -std=c++11 /tmp/pr98297.C -o /tmp/pr98297.s /tmp/pr98297.C:5:1: warning: ‘b’ attribute directive ignored [-Wattributes] a ; // { dg-error "does not declare anything" } ^~~ /tmp/pr98297.C:5:1: error: name of class shadows template template parameter ‘a’ $ gcc-8/obj30/gcc/cc1plus -quiet -std=c++11 /tmp/pr98297.C -o /tmp/pr98297.s /tmp/pr98297.C:5:1: internal compiler error: Segmentation fault a ; // { dg-error "does not declare anything" } ^~~ gcc-8/obj30 is 5 months old snapshot which expectedly ICEs, but the middle error is different from what the test expects.
[Bug target/100216] arm: UB in arm_canonicalize_comparison (shift exponent 127 is too large for 64-bit type)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100216 Richard Earnshaw changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2021-04-23 Status|UNCONFIRMED |NEW --- Comment #2 from Richard Earnshaw --- Confirmed by visual inspection. Clearly this code was written at a time when the largest integral mode on Arm was DImode. It won't work for wider modes and it won't do anything for non-integral modes. Needs an overhaul.
[Bug rtl-optimization/100230] New: ASan: alloc-dealloc-mismatch in early-remat.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100230 Bug ID: 100230 Summary: ASan: alloc-dealloc-mismatch in early-remat.c Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: acoplan at gcc dot gnu.org Target Milestone: --- Bootstrapping on aarch64 --with-build-config=bootstrap-asan and running the testsuite shows the following issue: $ cat test.c int a, b; void c() { while (b) a += b++; } $ gcc/xgcc -B gcc -c test.c -march=armv8.2-a+sve -O2 -ftree-vectorize = ==22323==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x92f0d900 #0 0x75ed5c in operator delete(void*, unsigned long) /home/alecop01/toolchain/src/gcc/libsanitizer/asan/asan_new_delete.cpp:172 #1 0x33b033c in sort_candidates /home/alecop01/toolchain/src/gcc/gcc/early-remat.c:1062 #2 0x33b033c in run /home/alecop01/toolchain/src/gcc/gcc/early-remat.c:2567 #3 0x33b033c in execute /home/alecop01/toolchain/src/gcc/gcc/early-remat.c:2629 #4 0x151ebd4 in execute_one_pass(opt_pass*) /home/alecop01/toolchain/src/gcc/gcc/passes.c:2567 #5 0x15201a0 in execute_pass_list_1 /home/alecop01/toolchain/src/gcc/gcc/passes.c:2656 #6 0x15201c4 in execute_pass_list_1 /home/alecop01/toolchain/src/gcc/gcc/passes.c:2657 #7 0x1520270 in execute_pass_list(function*, opt_pass*) /home/alecop01/toolchain/src/gcc/gcc/passes.c:2667 #8 0xbb7c34 in cgraph_node::expand() /home/alecop01/toolchain/src/gcc/gcc/cgraphunit.c:1830 #9 0xbb7c34 in cgraph_node::expand() /home/alecop01/toolchain/src/gcc/gcc/cgraphunit.c:1783 #10 0xbba6d4 in expand_all_functions /home/alecop01/toolchain/src/gcc/gcc/cgraphunit.c:1994 #11 0xbba6d4 in symbol_table::compile() /home/alecop01/toolchain/src/gcc/gcc/cgraphunit.c:2358 #12 0xbc18a8 in symbol_table::compile() /home/alecop01/toolchain/src/gcc/gcc/cgraphunit.c:2271 #13 0xbc18a8 in symbol_table::finalize_compilation_unit() /home/alecop01/toolchain/src/gcc/gcc/cgraphunit.c:2539 #14 0x1793f44 in compile_file /home/alecop01/toolchain/src/gcc/gcc/toplev.c:482 #15 0x6d4ffc in do_compile /home/alecop01/toolchain/src/gcc/gcc/toplev.c:2201 #16 0x6d4ffc in toplev::main(int, char**) /home/alecop01/toolchain/src/gcc/gcc/toplev.c:2340 #17 0x6df804 in main /home/alecop01/toolchain/src/gcc/gcc/main.c:39 #18 0x973276dc in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x206dc) #19 0x6e271c (/data/alecop01/builds/gcc11-bstrap-asan/gcc/cc1+0x6e271c) 0x92f0d900 is located 0 bytes inside of 28-byte region [0x92f0d900,0x92f0d91c) allocated by thread T0 here: #0 0x75e16c in operator new[](unsigned long) /home/alecop01/toolchain/src/gcc/libsanitizer/asan/asan_new_delete.cpp:102 #1 0x33b027c in sort_candidates /home/alecop01/toolchain/src/gcc/gcc/early-remat.c:1056 #2 0x33b027c in run /home/alecop01/toolchain/src/gcc/gcc/early-remat.c:2567 #3 0x33b027c in execute /home/alecop01/toolchain/src/gcc/gcc/early-remat.c:2629 #4 0x151ebd4 in execute_one_pass(opt_pass*) /home/alecop01/toolchain/src/gcc/gcc/passes.c:2567 #5 0x15201a0 in execute_pass_list_1 /home/alecop01/toolchain/src/gcc/gcc/passes.c:2656 #6 0x15201c4 in execute_pass_list_1 /home/alecop01/toolchain/src/gcc/gcc/passes.c:2657 #7 0x1520270 in execute_pass_list(function*, opt_pass*) /home/alecop01/toolchain/src/gcc/gcc/passes.c:2667 #8 0xbb7c34 in cgraph_node::expand() /home/alecop01/toolchain/src/gcc/gcc/cgraphunit.c:1830 #9 0xbb7c34 in cgraph_node::expand() /home/alecop01/toolchain/src/gcc/gcc/cgraphunit.c:1783 #10 0xbba6d4 in expand_all_functions /home/alecop01/toolchain/src/gcc/gcc/cgraphunit.c:1994 #11 0xbba6d4 in symbol_table::compile() /home/alecop01/toolchain/src/gcc/gcc/cgraphunit.c:2358 #12 0xbc18a8 in symbol_table::compile() /home/alecop01/toolchain/src/gcc/gcc/cgraphunit.c:2271 #13 0xbc18a8 in symbol_table::finalize_compilation_unit() /home/alecop01/toolchain/src/gcc/gcc/cgraphunit.c:2539 #14 0x1793f44 in compile_file /home/alecop01/toolchain/src/gcc/gcc/toplev.c:482 #15 0x6d4ffc in do_compile /home/alecop01/toolchain/src/gcc/gcc/toplev.c:2201 #16 0x6d4ffc in toplev::main(int, char**) /home/alecop01/toolchain/src/gcc/gcc/toplev.c:2340 #17 0x6df804 in main /home/alecop01/toolchain/src/gcc/gcc/main.c:39 #18 0x973276dc in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x206dc) #19 0x6e271c (/data/alecop01/builds/gcc11-bstrap-asan/gcc/cc1+0x6e271c) The fix looks obvious.
[Bug rtl-optimization/100230] ASan: alloc-dealloc-mismatch in early-remat.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100230 Alex Coplan changed: What|Removed |Added Last reconfirmed||2021-04-23 Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |acoplan at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED
[Bug rtl-optimization/100230] ASan: alloc-dealloc-mismatch in early-remat.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100230 --- Comment #1 from Alex Coplan --- Testing a fix.
[Bug rtl-optimization/100225] [8/9/10/11/12 Regression] ICE in add_cross_iteration_register_deps, at ddg.c:291
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100225 Alexander Monakov changed: What|Removed |Added Blocks|85099 | CC||amonakov at gcc dot gnu.org, ||zhroma at gcc dot gnu.org --- Comment #1 from Alexander Monakov --- Hi Martin, this is a modulo-scheduling bug; I think you added "Blocks: sel-sched" by mistake — removing, and Cc'ing Roman. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85099 [Bug 85099] [meta-bug] selective scheduling issues
[Bug target/99488] dwz: /usr/lib/gcc/mips64el-linux-gnuabi64/11/go1: Found two copies of .debug_line_str section
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99488 --- Comment #12 from YunQiang Su --- This problem disappears if we build gcc 11 with binutils 2.36.
[Bug rtl-optimization/100225] [8/9/10/11/12 Regression] ICE in add_cross_iteration_register_deps, at ddg.c:291
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100225 --- Comment #2 from Martin Liška --- Ah, you are right, sorry.
[Bug fortran/100227] [8/9/10/11/12 Regression] write with implicit loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100227 Dominique d'Humieres changed: What|Removed |Added CC||tkoenig at gcc dot gnu.org Known to fail||11.0, 12.0 Status|UNCONFIRMED |NEW Last reconfirmed||2021-04-23 Ever confirmed|0 |1 --- Comment #2 from Dominique d'Humieres --- Workaround: use -fno-frontend-optimize.
[Bug target/100217] [11/12 Regression] ICE when building valgrind testsuite with -march=z14 since r11-7552
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100217 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #3 from Ilya Leoshkevich --- There main problem here is that `register long double f0 asm ("f0")` does not make sense on z14 anymore. long doubles are stored in vector registers now, not in floating-point register pairs. If we skip the hard reg, the code will end up having the following semantics: vr0[0:128] = 1.0L; asm("/* expect the value in vr0[0:64] . vr2[0:64] */"); and fail during the run time. So I think it's better to use the "best effort" approach and force it into a pseudo, even if this would mean that the user-specified register is not honored: --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16814,6 +16814,12 @@ s390_md_asm_adjust (vec &outputs, vec &inputs, gcc_assert (allows_reg); /* Copy input value from a vector register into a FPR pair. */ rtx fprx2 = gen_reg_rtx (FPRX2mode); + if (REG_P (inputs[i]) && HARD_REGISTER_P (inputs[i])) + { + rtx orig_input = inputs[i]; + inputs[i] = gen_reg_rtx (TFmode); + emit_move_insn (inputs[i], orig_input); + } emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i])); inputs[i] = fprx2; input_modes[i] = FPRX2mode; I need to check whether we can keep the output logic as is. Ideally the code should be adapted and use the __LONG_DOUBLE_VX__ macro like this: #ifdef __LONG_DOUBLE_VX__ register long double f0 asm ("v0"); #else register long double f0 asm ("f0"); #endif f0 = 1.0L; #ifdef __LONG_DOUBLE_VX__ asm("" : : "v" (f0)); #else asm("" : : "f" (f0)); #endif Maybe a warning recommending to do this should be printed.
[Bug target/100217] [11/12 Regression] ICE when building valgrind testsuite with -march=z14 since r11-7552
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100217 --- Comment #4 from Jakub Jelinek --- That seems like quite undesirable API change. Can't the backend when it sees long double register vars for the fN registers change the mode from TFmode to that new FPRX2mode, so that old code keeps working?
[Bug tree-optimization/100222] Redundant mark_irreducible_loops () in predicate.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100222 --- Comment #2 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:500305a92ef85e6b87ad428a35221c62f4037b93 commit r12-82-g500305a92ef85e6b87ad428a35221c62f4037b93 Author: Richard Biener Date: Fri Apr 23 11:16:52 2021 +0200 tree-optimization/100222 - remove redundant mark_irreducible_loops calls loop_optimizer_init (LOOPS_NORMAL) already performs this (quite expensive) marking. 2021-04-23 Richard Biener PR tree-optimization/100222 * predict.c (pass_profile::execute): Remove redundant call to mark_irreducible_loops. (report_predictor_hitrates): Likewise.
[Bug tree-optimization/100222] Redundant mark_irreducible_loops () in predicate.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100222 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #3 from Richard Biener --- Fixed.
[Bug target/100217] [11/12 Regression] ICE when building valgrind testsuite with -march=z14 since r11-7552
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100217 Jakub Jelinek changed: What|Removed |Added Priority|P3 |P2
[Bug rtl-optimization/100225] [8/9/10/11/12 Regression] ICE in add_cross_iteration_register_deps, at ddg.c:291
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100225 Alex Coplan changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW CC||acoplan at gcc dot gnu.org Last reconfirmed||2021-04-23 --- Comment #3 from Alex Coplan --- Confirmed.
[Bug target/100152] [10/11/12 Regression] used caller-saved register not preserved across a call.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100152 --- Comment #37 from Richard Biener --- Oh, and FYI a cc1 cross from x86_64 to x86_64-apple-darwin19.6.0 doesn't seem to reproduce the issue with the reduced testcase (I seee no call to ___UTF_8_put remaining with -O3 -fPIC -fno-strict-aliasing -fwrapv).
[Bug target/100217] [11/12 Regression] ICE when building valgrind testsuite with -march=z14 since r11-7552
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100217 --- Comment #5 from Ilya Leoshkevich --- That would be an ideal solution, but I wonder how to implement it? Suppose we find a way to convince expand to pick FPRX2mode for such a long double. What if the following comes up? register long double x asm ("v0"); /* FPRX2mode */ long double y; /* TFmode */ x += y; /* convert? */ Would it be feasible to also teach expand to do the mode conversions? One other alternative might be to detect `register long double asm("fN")` declarations and go back to using floating point register pairs for functions that contain them.
[Bug target/100217] [11/12 Regression] ICE when building valgrind testsuite with -march=z14 since r11-7552
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100217 --- Comment #6 from Jakub Jelinek --- (In reply to Ilya Leoshkevich from comment #5) > That would be an ideal solution, but I wonder how to implement it? Suppose > we find a way to convince expand to pick FPRX2mode for such a long double. > What if the following comes up? > > register long double x asm ("v0"); /* FPRX2mode */ > long double y; /* TFmode */ > x += y; /* convert? */ > > Would it be feasible to also teach expand to do the mode conversions? It is certainly doable, but perhaps with extra target hooks or something similar. Types have their TYPE_MODE and decls have DECL_MODE, though the question is what breaks if TYPE_MODE != DECL_MODE, at least the comment in tree.h says that they can only differ for FIELD_DECLs. Anyway, in GIMPLE register vars are non-SSA, so apart from inline asm one needs separate loads and stores to them, so if we could expand those as having FPRX2 hard reg and loads from it convert to TFmode and stores into it convert from TFmode, ... > One other alternative might be to detect `register long double asm("fN")` > declarations and go back to using floating point register pairs for > functions that contain them. But this might be actually best short-time solution (for GCC 11.x).
[Bug target/100217] [11/12 Regression] ICE when building valgrind testsuite with -march=z14 since r11-7552
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100217 --- Comment #7 from Jakub Jelinek --- That said, I'm afraid I don't really understand what wrong happens with the patch I've attached. Trying something like: long double foo (void) { register long double f0 asm ("f0"); f0 = 1.0L; f0 += 127.L; f0 *= 32.L; return f0; } with -O0 -march=z14 -mlong-double-128 so that it is not all folded immediately shows in the end the computations are done in vector registers. And another thing to try is intermix that with inline asm expecting those in "+f" so that intermediate results are pushed to the floating point register pair.
[Bug target/99748] MVE: Wrong code at -O0 with float to integer conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99748 --- Comment #6 from CVS Commits --- The releases/gcc-10 branch has been updated by Alex Coplan : https://gcc.gnu.org/g:283367662c25057fd7c9c98257cca858f85b75fc commit r10-9755-g283367662c25057fd7c9c98257cca858f85b75fc Author: Alex Coplan Date: Tue Apr 6 09:06:27 2021 +0100 arm: Fix PCS for SFmode -> SImode libcalls [PR99748] This patch fixes PR99748 which shows us trying to pass the argument to __aeabi_f2iz in the VFP register s0 when the library function is expecting to use the GPR r0. It also fixes the __aeabi_f2uiz case which was broken in the same way. For the testcase in the PR, here is the code we generate before the patch (with -mfloat-abi=hard -march=armv8.1-m.main+mve -O0): main: push{r7, lr} sub sp, sp, #8 add r7, sp, #0 mov r3, #1065353216 str r3, [r7, #4]@ float vldr.32 s0, [r7, #4] bl __aeabi_f2iz mov r3, r0 cmp r3, #1 [...] This becomes: main: push{r7, lr} sub sp, sp, #8 add r7, sp, #0 mov r3, #1065353216 str r3, [r7, #4]@ float ldr r0, [r7, #4]@ float bl __aeabi_f2iz mov r3, r0 cmp r3, #1 [...] after the patch. We see a similar change for the same testcase with a cast to unsigned instead of int. gcc/ChangeLog: PR target/99748 * config/arm/arm.c (arm_libcall_uses_aapcs_base): Also use base PCS for [su]fix_optab. (cherry picked from commit 16ea7f57891d3fe885ee55b2917208695e184714)
[Bug target/99748] MVE: Wrong code at -O0 with float to integer conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99748 Alex Coplan changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #7 from Alex Coplan --- Fixed for 10.4, so fixed everywhere.
[Bug c++/98297] [8/9/10/11 Regression] ICE in cp_parser_elaborated_type_specifier, at cp/parser.c:19653
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98297 --- Comment #6 from Jakub Jelinek --- Ah, tracked already in PR98358.
[Bug c/69558] [8 Regression] glib2 warning pragmas stopped working
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69558 Jakub Jelinek changed: What|Removed |Added Priority|P1 |P2 --- Comment #31 from Jakub Jelinek --- 5 years old bug can't be P1.
[Bug target/100217] [11/12 Regression] ICE when building valgrind testsuite with -march=z14 since r11-7552
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100217 --- Comment #8 from Ilya Leoshkevich --- Yeah, inline asm seems to be problematic: /home/iii/gcc/build/gcc/xgcc -B/home/iii/gcc/build/gcc/ /home/iii/gcc/gcc/testsuite/gcc.target/s390/vector/long-double-asm-hardreg.c -fdiagnostics-plain-output -O2 -march=z14 -mzarch -S -o long-double-asm-hardreg.s with the patch from comment 2 produces: foo: .LFB0: .cfi_startproc larl%r5,.L4 vl %v0,.L5-.L4(%r5),3 #APP # 10 "/home/iii/gcc/gcc/testsuite/gcc.target/s390/vector/long-double-asm-hardreg.c" 1 # %v0 # 0 "" 2 #NO_APP br %r14 `vl %v0,.L5-.L4(%r5),3` loads 1.0L into %v0[0:128]. However, it should be loaded into %v0[0:64] . %v2[0:64]. With the patch from comment 3 I get: foo: .LFB0: .cfi_startproc larl%r5,.L4 ld %f0,.L5-.L4(%r5) ld %f2,.L5-.L4+8(%r5) #APP # 10 "/home/iii/gcc/gcc/testsuite/gcc.target/s390/vector/long-double-asm-hardreg.c" 1 # %f0 # 0 "" 2 #NO_APP br %r14 which is correct, but in general case the exact reg that the user requested is not honored.
[Bug libstdc++/99402] [10 Regression] std::copy creates _GLIBCXX_DEBUG false positive for attempt to subscript a dereferenceable (start-of-sequence) iterator
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99402 --- Comment #13 from François Dumont --- Fixed on gcc-10 branch by this commit https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ab83ce42ea0b2fbc09d51b7bd5e69905dcaa2041.
[Bug target/100152] [10/11/12 Regression] used caller-saved register not preserved across a call.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100152 --- Comment #38 from Iain Sandoe --- (In reply to Richard Biener from comment #37) > Oh, and FYI a cc1 cross from x86_64 to x86_64-apple-darwin19.6.0 doesn't seem > to reproduce the issue with the reduced testcase (I seee no call to > ___UTF_8_put remaining with -O3 -fPIC -fno-strict-aliasing -fwrapv). I think my interestingness test isn't strict enough - the creduced code resulting doesn't have an extern for ___UTF_8_put and only seems to not inline that fn because the interface has been mangled. [ so that the fn is legitimately binds_localP as the pasted case ]. if you still have the build around, out of curiosity, does it fail on the original .i file attached here? and with -fno-trapping-math -fno-math-errno -fschedule-insns2 -fomit-frame-pointer ( I only need O2 to get a fail ).
[Bug c++/100210] [[nodiscard]] constructor causes warning on arm-linux-gnueabihf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100210 Jakub Jelinek changed: What|Removed |Added Resolution|FIXED |DUPLICATE CC||jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- Closing as dup. *** This bug has been marked as a duplicate of bug 99362 ***
[Bug c++/99362] [10 Regression] invalid unused result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99362 Jakub Jelinek changed: What|Removed |Added CC||georg.schwab at emocean dot io --- Comment #10 from Jakub Jelinek --- *** Bug 100210 has been marked as a duplicate of this bug. ***
[Bug c++/98767] Function signature lost in concept diagnostic message
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98767 --- Comment #2 from CVS Commits --- The master branch has been updated by Patrick Palka : https://gcc.gnu.org/g:87fc34a461cf362947a430d8a241f653fd83bc7b commit r12-86-g87fc34a461cf362947a430d8a241f653fd83bc7b Author: Patrick Palka Date: Fri Apr 23 08:47:02 2021 -0400 c++: Fix pretty printing pointer to function type [PR98767] When pretty printing a pointer to function type, pp_cxx_parameter_declaration_clause ends up always outputting an empty function parameter list because the loop that outputs the list iterates over 'args' instead of 'types', and 'args' is empty when a FUNCTION_TYPE is passed to this routine (as opposed to a FUNCTION_DECL). This patch fixes this by making the loop iterate over 'types' instead. This patch also moves the retrofitted chain-of-PARM_DECLs printing from here to pp_cxx_requires_expr, the only caller that uses it. Doing so lets us easily output the trailing '...' in the parameter list of a variadic function, which this patch also implements. gcc/cp/ChangeLog: PR c++/98767 * cxx-pretty-print.c (pp_cxx_parameter_declaration_clause): Adjust parameter list loop to iterate over 'types' instead of 'args'. Output the trailing '...' for a variadic function. Remove PARM_DECL support. (pp_cxx_requires_expr): Pretty print the parameter list directly instead of going through pp_cxx_parameter_declaration_clause. gcc/testsuite/ChangeLog: PR c++/98767 * g++.dg/concepts/diagnostic17.C: New test.
[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #3 from Tom de Vries --- Created attachment 50660 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50660&action=edit Cuda reproducer
[Bug c++/100231] New: [C++17] Variable template specialization inside a class gives compilation error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100231 Bug ID: 100231 Summary: [C++17] Variable template specialization inside a class gives compilation error Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: krzyk240 at gmail dot com Target Milestone: --- On the following code: ``` template struct X {}; class Foo { template static constexpr inline bool bar = false; template static constexpr inline bool bar> = true; }; ``` GCC gives error: :8:34: error: explicit template argument list not allowed 8 | static constexpr inline bool bar> = true; | ^ But Clang, ICC and MSVC compile it correctly. Defining variable template bar outside of Foo class produces no compile errors. Compilation command: g++ example.cpp -std=c++17 Live example: https://godbolt.org/z/54hqYxe4P
[Bug target/100152] [10/11/12 Regression] used caller-saved register not preserved across a call.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100152 --- Comment #39 from Richard Biener --- (In reply to Iain Sandoe from comment #38) > (In reply to Richard Biener from comment #37) > > Oh, and FYI a cc1 cross from x86_64 to x86_64-apple-darwin19.6.0 doesn't > > seem > > to reproduce the issue with the reduced testcase (I seee no call to > > ___UTF_8_put remaining with -O3 -fPIC -fno-strict-aliasing -fwrapv). > > I think my interestingness test isn't strict enough - the creduced code > resulting doesn't have an extern for ___UTF_8_put and only seems to not > inline that fn because the interface has been mangled. [ so that the fn is > legitimately binds_localP as the pasted case ]. > > if you still have the build around, out of curiosity, does it fail on the > original .i file attached here? > > and with -fno-trapping-math -fno-math-errno -fschedule-insns2 > -fomit-frame-pointer > > ( I only need O2 to get a fail ). Yes, with -O2 -fno-trapping-math -fno-math-errno -fschedule-insns2 -fomit-frame-pointer it produces the problematical .align 4,0x90 L945: movl0(%rbp,%r10,4), %esi callUTF_8_put movq%r10, %rax addq$1, %r10 cmpq%rax, %r12 jne L945 code. But then ___UTF_8_put isn't interposable so I wonder why the linker even has to resolve anything. Adding -fPIC OTOH should definitely make the symbol interposable but the same code is still generated ... Note the 'extern' declaration shouldn't change anything, only that we see a definition is relevant. breaking on darwin_binds_local_p I see ___UTF_8_put is considered binding local even with -fPIC. So GCC thinks there will be no linker stub involved. Note 'shlib' is passed as false to default_binds_local_p_3 computed as 3140 on earlier system versions, and with a TODO to complete. */ 3141 bool force_overridable = TARGET_KEXTABI && DARWIN_VTABLE_P (decl); 3142 return default_binds_local_p_3 (decl, force_overridable /* shlib */, 3143 false /* weak dominate */, and default_binds_local_p_3 would do /* If PIC, then assume that any global name can be overridden by symbols resolved from other modules. */ if (shlib) return false; ix86_binds_local_p simply passes flag_shlib != 0 as this argument.
[Bug libstdc++/100180] experimental/net/internet/address/v6/members.cc fails on arm-eabi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100180 --- Comment #6 from CVS Commits --- The master branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:0e1e7b77904f1fe2a6dbfe84bb4fc026584ba480 commit r12-89-g0e1e7b77904f1fe2a6dbfe84bb4fc026584ba480 Author: Jonathan Wakely Date: Fri Apr 23 13:38:05 2021 +0100 libstdc++: Allow net::io_context to compile without [PR 100180] This adds dummy placeholders to net::io_context so that it can still be compiled on targets without . libstdc++-v3/ChangeLog: PR libstdc++/100180 * include/experimental/io_context (io_context): Define dummy_pollfd type so that most member functions still compile without and struct pollfd.
[Bug target/100152] [10/11/12 Regression] used caller-saved register not preserved across a call.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100152 --- Comment #40 from Richard Biener --- (In reply to Richard Biener from comment #39) > (In reply to Iain Sandoe from comment #38) > > (In reply to Richard Biener from comment #37) > > > Oh, and FYI a cc1 cross from x86_64 to x86_64-apple-darwin19.6.0 doesn't > > > seem > > > to reproduce the issue with the reduced testcase (I seee no call to > > > ___UTF_8_put remaining with -O3 -fPIC -fno-strict-aliasing -fwrapv). > > > > I think my interestingness test isn't strict enough - the creduced code > > resulting doesn't have an extern for ___UTF_8_put and only seems to not > > inline that fn because the interface has been mangled. [ so that the fn is > > legitimately binds_localP as the pasted case ]. > > > > if you still have the build around, out of curiosity, does it fail on the > > original .i file attached here? > > > > and with -fno-trapping-math -fno-math-errno -fschedule-insns2 > > -fomit-frame-pointer > > > > ( I only need O2 to get a fail ). > > Yes, with -O2 -fno-trapping-math -fno-math-errno -fschedule-insns2 > -fomit-frame-pointer it produces the problematical > > .align 4,0x90 > L945: > movl0(%rbp,%r10,4), %esi > callUTF_8_put > movq%r10, %rax > addq$1, %r10 > cmpq%rax, %r12 > jne L945 > > code. But then ___UTF_8_put isn't interposable so I wonder why the linker > even has to resolve anything. Adding -fPIC OTOH should definitely make the > symbol interposable but the same code is still generated ... > > Note the 'extern' declaration shouldn't change anything, only that we > see a definition is relevant. > > breaking on darwin_binds_local_p I see ___UTF_8_put is considered binding > local even with -fPIC. So GCC thinks there will be no linker stub involved. > > Note 'shlib' is passed as false to default_binds_local_p_3 computed as > > 3140 on earlier system versions, and with a TODO to complete. */ > 3141 bool force_overridable = TARGET_KEXTABI && DARWIN_VTABLE_P (decl); > 3142 return default_binds_local_p_3 (decl, force_overridable /* shlib > */, > 3143 false /* weak dominate */, > > and default_binds_local_p_3 would do > > /* If PIC, then assume that any global name can be overridden by > symbols resolved from other modules. */ > if (shlib) > return false; > > ix86_binds_local_p simply passes flag_shlib != 0 as this argument. So it looks like darwin should pass flag_shlib != 0 || force_overridable instead?