[Bug c++/92031] [9 Regression] Incorrect "taking address of r-value" error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92031 Chip Kerchner changed: What|Removed |Added CC||chip.kerchner at ibm dot com --- Comment #5 from Chip Kerchner --- Please backport this fix into 9.4 or 9.5.
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #9 from Chip Kerchner --- Doesn't this work for powers of two (N) and signed values (for A, N and M)? (59 - (33 * -2)) / -2 + 31 = -62 + 31 = -29 and 59 / -2 = -29
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #10 from Chip Kerchner --- Oops that should be 31 * -2, not 33.
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #11 from Chip Kerchner --- Nevermind, using a similar example that Segher gave, it would failed too.
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #12 from Chip Kerchner --- Here is an example of the original problem #define EIGEN_ALWAYS_INLINE __attribute__((always_inline)) inline typedef __vector float Packet4f; typedef size_t Index; EIGEN_ALWAYS_INLINE Packet4f ploadu(const float* from) { return vec_xl(0, const_cast(from)); } EIGEN_ALWAYS_INLINE void pstoreu(float* to, const Packet4f &from) { vec_xst(from, 0, to); } void convert(Index rows, float*src, float *result) { for(Index i = 0; i + 4 <= rows; i+=4) { Packet4f r32_0 = ploadu(src + i + 0); pstoreu(result + i + 0, r32_0); } } And the output (with notation on the lines in question) cmpldi 0,3,3 blelr 0 addi 3,3,-4 <- i = rows - 4 li 9,0 srdi 3,3,2 <- i >>= 2 addi 8,3,1 <- i = i + 1 andi. 7,8,0x3 mr 10,8 beq 0,.L10 cmpdi 0,7,1 beq 0,.L14 cmpdi 0,7,2 beq 0,.L15 lxv 0,0(4) mr 8,3 li 9,16 stxv 0,0(5) .L15: lxvx 0,4,9 addi 8,8,-1 stxvx 0,5,9 addi 9,9,16 .L14: lxvx 0,4,9 cmpdi 0,8,1 stxvx 0,5,9 addi 9,9,16 beqlr 0 .L10: srdi 10,10,2 mtctr 10 .L3: lxvx 0,4,9 addi 10,9,16 addi 7,9,32 addi 8,9,48 stxvx 0,5,9 lxvx 0,4,10 addi 9,9,64 stxvx 0,5,10 lxvx 0,4,7 stxvx 0,5,7 lxvx 0,4,8 stxvx 0,5,8 bdnz .L3 blr In this case the 3 lines notated can be replaced a simple `srdi 8,3,2`
[Bug rtl-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #15 from Chip Kerchner --- How about this (from Peter's testcase)? Does it still have issues? It produces the same assembly. #define N 32 #define M 2 unsigned long int foo (unsigned long int a) { return (a - (N*M)) / N + M; }
[Bug rtl-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #16 from Chip Kerchner --- Dang copy and paste issue... This is what I meant. unsigned long int foo (unsigned long int a) { return (a + (N*M)) / N - M; }
[Bug rtl-optimization/109116] New: vector_pair register allocation bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116 Bug ID: 109116 Summary: vector_pair register allocation bug Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: chip.kerchner at ibm dot com Target Milestone: --- There seems to be a bug in the register allocator when using a __vector_pair. GCC didn't choose a register for the load that served the later instruction. With this testcase ``` #include #if !__has_builtin(__builtin_vsx_disassemble_pair) #define __builtin_vsx_disassemble_pair __builtin_mma_disassemble_pair #endif int main() { float A[8] = { float(1), float(2), float(3), float(4), float(5), float(6), float(7), float(8) }; __vector_pair P; __vector_quad Q; vector float B, C[2], D[4]; __builtin_mma_xxsetaccz(&Q); P = *reinterpret_cast<__vector_pair *>(A); B = *reinterpret_cast(A); __builtin_vsx_disassemble_pair((void*)(C), &P); __builtin_mma_xvf32gerpp(&Q, reinterpret_cast<__vector unsigned char>(C[0]), reinterpret_cast<__vector unsigned char>(B)); __builtin_mma_xvf32gerpp(&Q, reinterpret_cast<__vector unsigned char>(C[1]), reinterpret_cast<__vector unsigned char>(B)); __builtin_mma_disassemble_acc((void *)D, &Q); return int(D[0][0]); } ``` It produces an output with extra (unneeded) register moves. ``` plxvp 12,.LANCHOR0@pcrel xxsetaccz 0 plxv 33,.LC1@pcrel xxlor 45,13,13 xxlor 32,12,12 xvf32gerpp 0,45,33 xvf32gerpp 0,32,33 xxmfacc 0 ```
[Bug rtl-optimization/109116] vector_pair register allocation bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116 --- Comment #1 from Chip Kerchner --- This has been in GCC since the initial version that supported __vector_pair (10.x)
[Bug target/109116] vector_pair register allocation bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109116 --- Comment #2 from Chip Kerchner --- This could be a bigger issue with register allocation after the disassemble of an opaque object like vector_pair or MMA.
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #21 from Chip Kerchner --- I'm also seeing MMA problems with LTO. With this simple program (main.ii) -- int main() { float *b; __vector_quad c; __builtin_mma_disassemble_acc(b, &c); return 0; } -- And this compile (using gcc 10.3.1) -- g++ -flto=auto -mcpu=power9 main.ii -- I'm seeing this error (which does NOT occur without LTO) -- lto1: error: '__builtin_mma_xxmfacc_internal' requires the '-mmma' option lto1: fatal error: target specific builtin not available compilation terminated. --
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #22 from Chip Kerchner --- (In reply to Chip Kerchner from comment #21) - Forgot one line of code > -- > #pragma GCC target "cpu=power10" > int main() { > float *b; > __vector_quad c; > __builtin_mma_disassemble_acc(b, &c); > return 0; > } > --
[Bug lto/102347] New: "fatal error: target specific builtin not available" with MMA and LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102347 Bug ID: 102347 Summary: "fatal error: target specific builtin not available" with MMA and LTO Product: gcc Version: 10.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: chip.kerchner at ibm dot com CC: marxin at gcc dot gnu.org Target Milestone: --- I'm seeing MMA problems with LTO. With this simple program (main.ii) -- #pragma GCC target "cpu=power10" int main() { float *b; __vector_quad c; __builtin_mma_disassemble_acc(b, &c); return 0; } -- And this compile -- g++ -flto=auto -mcpu=power9 main.ii -- I'm seeing this error (which does NOT occur without LTO) -- lto1: error: '__builtin_mma_xxmfacc_internal' requires the '-mmma' option lto1: fatal error: target specific builtin not available compilation terminated. --
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 --- Comment #24 from Chip Kerchner --- (In reply to Kewen Lin from comment #23) > Hi Chip, I can reproduce this error with trunk. With some investigation, I > think it's not duplicated of this PR, some information restoring seems wrong > when lto. Could you please file a separated PR? Thanks in advance! https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102347
[Bug ipa/102059] Incorrect always_inline diagnostic in LTO mode with #pragma GCC target("cpu=power10")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102059 Chip Kerchner changed: What|Removed |Added CC||chip.kerchner at ibm dot com --- Comment #3 from Chip Kerchner --- It also fails with version 10.3
[Bug target/70243] PowerPC V4SFmode should not use Altivec instructions on VSX systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70243 Chip Kerchner changed: What|Removed |Added CC||chip.kerchner at ibm dot com --- Comment #3 from Chip Kerchner --- This is showing up in some of the binaries generated by Eigen (with GCC13).
[Bug target/70243] PowerPC V4SFmode should not use Altivec instructions on VSX systems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70243 --- Comment #4 from Chip Kerchner --- It shows up as a rounding difference on BE machines.
[Bug c++/109501] New: vec_test_data_class defines missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109501 Bug ID: 109501 Summary: vec_test_data_class defines missing Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: chip.kerchner at ibm dot com Target Milestone: --- These defines seems to be missing #define __VEC_CLASS_FP_NAN (1<<6) #define __VEC_CLASS_FP_INFINITY_P (1<<5) #define __VEC_CLASS_FP_INFINITY_N (1<<4) #define __VEC_CLASS_FP_ZERO_P (1<<3) #define __VEC_CLASS_FP_ZERO_N (1<<2) #define __VEC_CLASS_FP_SUBNORMAL_P (1<<1) #define __VEC_CLASS_FP_SUBNORMAL_N (1<<0) #define __VEC_CLASS_FP_INFINITY (__VEC_CLASS_FP_INFINITY_P | __VEC_CLASS_FP_INFINITY_N) #define __VEC_CLASS_FP_ZERO (__VEC_CLASS_FP_ZERO_P | __VEC_CLASS_FP_ZERO_N) #define __VEC_CLASS_FP_SUBNORMAL (__VEC_CLASS_FP_SUBNORMAL_P | __VEC_CLASS_FP_SUBNORMAL_N) #define __VEC_CLASS_FP_NOT_NORMAL (__VEC_CLASS_FP_NAN | __VEC_CLASS_FP_SUBNORMAL | __VEC_CLASS_FP_ZERO | __VEC_CLASS_FP_INFINITY)
[Bug c++/109501] vec_test_data_class defines missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109501 Chip Kerchner changed: What|Removed |Added CC||chip.kerchner at ibm dot com --- Comment #1 from Chip Kerchner --- ``` __vector float p4f = some data; 1645 | __vector __bool int nan_selector = vec_test_data_class(p4f, __VEC_CLASS_FP_NAN); ```
[Bug c++/109501] vec_test_data_class defines missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109501 --- Comment #2 from Chip Kerchner --- '__VEC_CLASS_FP_NAN' was not declared in this scope
[Bug target/109501] vec_test_data_class defines missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109501 --- Comment #4 from Chip Kerchner --- PowerPC LE - P9. Yes, other PVIPR APIs are available and compile in more source code.
[Bug target/109501] vec_test_data_class defines missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109501 --- Comment #5 from Chip Kerchner --- Here's a testcase ``` #include #include int main() { __vector float p4f = { float(0), float(1), float(2), float(3) }; __vector __bool int nan_selector = vec_test_data_class(p4f, __VEC_CLASS_FP_NAN); return 0; } ``` ``` NAN_defines.cpp: In function ‘int main()’: NAN_defines.cpp:7:63: error: ‘__VEC_CLASS_FP_NAN’ was not declared in this scope 7 | __vector __bool int nan_selector = vec_test_data_class(p4f, __VEC_CLASS_FP_NAN); | ^~ ``` ``` /opt/gcc-nightly/trunk/bin/g++ -O3 -mcpu=power9 NAN_defines.cpp
[Bug target/109501] vec_test_data_class defines missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109501 --- Comment #8 from Chip Kerchner --- Well, then I'm asking GCC to add these to make it easier to use `vec_test_data_class`
[Bug tree-optimization/109491] [11/12 Regression] Segfault in tree-ssa-sccvn.cc:expressions_equal_p()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109491 --- Comment #12 from Chip Kerchner --- > having always_inline across a deep call stack can exponentially increase > compile-time Do you think it would be worth requesting a feature to reduce the compilation times in situations like this? Ideally exponentially is not a good thing.
[Bug tree-optimization/109491] [11/12 Regression] Segfault in tree-ssa-sccvn.cc:expressions_equal_p()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109491 --- Comment #14 from Chip Kerchner --- Just one more question and then I'll switch to the new bug. Would it help any if the functions that are "always_inline" be changed from non-static to static? Eigen's approach (where this code originally came from - yes, it could be definite be better) is to use non-static inlined function.
[Bug ipa/109509] Huge compile time with forced inlining
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109509 --- Comment #1 from Chip Kerchner --- Just for note: The same code that has heavy use always_inline compiles about 3X faster in LLVM and uses about 2X less memory to compile.