[Bug rtl-optimization/94945] New: Missed optimization: Carry chain not recognized in manually unrolled loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94945 Bug ID: 94945 Summary: Missed optimization: Carry chain not recognized in manually unrolled loop Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: madhur4127 at gmail dot com Target Milestone: --- Context: Big integer addition using ADC (_addcarry_u64). See Godbolt link: https://godbolt.org/z/rMxe6W Example: Suppose the case of big integer addition: // pa, pb: pointer to big integer A, B // n: size of big integer A, B // pr: pointer to result void add(const uint64_t * __restrict__ pa, const uint64_t * __restrict__ pb, uint64_t * __restrict__ pr, unsigned n) { unsigned char carry = 0; unsigned i; for(i = 0; i
[Bug tree-optimization/94945] Missed optimization: Carry chain not recognized in manually unrolled loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94945 Madhur Chauhan changed: What|Removed |Added Version|10.0|10.1.1 Component|rtl-optimization|tree-optimization --- Comment #1 from Madhur Chauhan --- Is the scope of this optimization too narrow?
[Bug translation/95563] New: High memory usage and possible infinite loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95563 Bug ID: 95563 Summary: High memory usage and possible infinite loop Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: translation Assignee: unassigned at gcc dot gnu.org Reporter: madhur4127 at gmail dot com Target Milestone: --- Created attachment 48693 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48693&action=edit Original code Attached simplified.cc, which aims to be a complete example. For the original code please see [1] (at bottom). simplified.cc aims to be minimal in nature. But please verify the issue with the original code. Logic: Computing a binomial coefficient over a few iterations using a Modular<> struct which is a wrapper over integer supporting modular arithmetic. Combinatorics struct contains two arrays of "Modular" integers, one for factorials and other for inverse factorials (modular multiplicative inverse of the corresponding factorial). On Linux, compilation took over 3.3GB of memory and 100% CPU utilization for over 30 minutes, and thereafter I had to kill it. This hints for an infinite loop. command: g++ simplified.cc -o simplified GCC version: Default GCC package (gcc 10.1.0-2) of Arch Linux Target: x86_64-pc-linux-gnu If the issue is simple to fix, then I would like to fix it for GCC (some pointers would be appreciated as I have no experience in developing compilers). References: 1. Original Code: https://gist.github.com/madhur4127/4eb29246463ed5a0538e1d9c8a2a7192
[Bug translation/95563] High memory usage and possible infinite loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95563 --- Comment #1 from Madhur Chauhan --- (In reply to Madhur Chauhan from comment #0) > Created attachment 48693 [details] > Original code My bad, this is the simplified version of the code but its name is misleading. The original code is in the URL at the bottom.
[Bug rtl-optimization/93141] New: Missed optimization : Use of adc when checking overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141 Bug ID: 93141 Summary: Missed optimization : Use of adc when checking overflow Product: gcc Version: 9.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: madhur4127 at gmail dot com Target Milestone: --- Created attachment 47584 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47584&action=edit Complete code for computing dot product (same as the godbolt link) Consider emulating 192-bit integer using a 128-bit integer and a 64-bit integer. In the code sample this emulated integer is used to compute dot product of two uint64_t vectors of length N. // function to compute dot product of two vectors using u128 = unsigned __int128; const int N = 2048; uint64_t a[N], b[N]; u128 sum = 0; uint64_t overflow = 0; for(int i=0;ihttps://godbolt.org/z/ktdA4b
[Bug rtl-optimization/93142] New: Missed optimization : Use of adc when checking overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93142 Bug ID: 93142 Summary: Missed optimization : Use of adc when checking overflow Product: gcc Version: 9.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: madhur4127 at gmail dot com Target Milestone: --- Created attachment 47585 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47585&action=edit Complete code for computing dot product (same as the godbolt link) Consider emulating 192-bit integer using a 128-bit integer and a 64-bit integer. In the code sample this emulated integer is used to compute dot product of two uint64_t vectors of length N. // function to compute dot product of two vectors using u128 = unsigned __int128; const int N = 2048; uint64_t a[N], b[N]; u128 sum = 0; uint64_t overflow = 0; for(int i=0;ihttps://godbolt.org/z/ktdA4b
[Bug target/93141] Missed optimization : Use of adc when checking overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141 --- Comment #1 from Madhur Chauhan --- The source of this bug is the stackoverflow Q&A: https://stackoverflow.com/questions/59575408/fastest-way-to-sum-dot-product-of-vector-of-unsigned-64-bit-integers-using-192-2/59579310#59579310
[Bug target/93141] Missed optimization : Use of adc when checking overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141 --- Comment #6 from Madhur Chauhan --- As far as I can tell optimal asm generated should be like: mov-load from on array mul or preferably mulx with a memory source from the other array add + adc into 128-bit answer register adc reg, 0 to accumulate the carry-out. This optimization could help many big integer libraries as this lies in most critical section.
[Bug target/93141] Missed optimization : Use of adc when checking overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141 --- Comment #7 from Madhur Chauhan --- As far as I can tell optimal asm generated should be like: mov-load from on array mul or preferably mulx with a memory source from the other array add + adc into 128-bit answer register adc reg, 0 to accumulate the carry-out. This optimization could help many big integer libraries as this lies in most critical section.
[Bug target/116414] New: Missed optimization: Branch elimination and memory writes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116414 Bug ID: 116414 Summary: Missed optimization: Branch elimination and memory writes Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: madhur4127 at gmail dot com Target Milestone: --- Godbolt: https://godbolt.org/z/oKrGfh34z Code: ``` #include #include std::optional x(uint16_t x) { return x <= 1 ? std::nullopt : std::optional{x}; } ``` GCC trunk generates a branch and uses memory writes whereas it could just use `eax` like clang ``` x(unsigned short): cmp di, 1 jbe .L5 mov WORD PTR [rsp-4], di mov BYTE PTR [rsp-2], 1 mov eax, DWORD PTR [rsp-4] ret .L5: mov BYTE PTR [rsp-2], 0 mov eax, DWORD PTR [rsp-4] ret ``` clang: ``` x(unsigned short): xor eax, eax cmp di, 2 setae al shl eax, 16 or eax, edi ret ```
[Bug c++/109686] New: Errorneous infinite loop detection (-Winfinite-recursion)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109686 Bug ID: 109686 Summary: Errorneous infinite loop detection (-Winfinite-recursion) Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: madhur4127 at gmail dot com Target Milestone: --- This code: ``` #define myassert(x) \ do { \ if (! (x) ) \ abort(); \ } while (false) static void recursive(int n) { myassert(n > 0); recursive(n - 1); printf("%d\n", n); } ``` should not result in infinite recursion. Assert fails for all non-positive integers. For positive numbers it counts down to 0 and then fails. This affect GCC12, GCC13 and trunk: https://compiler-explorer.com/z/57qrnzdEK
[Bug middle-end/107334] Incorrect "infinite recursion detected" warning if base case aborts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107334 Madhur Chauhan changed: What|Removed |Added CC||madhur4127 at gmail dot com --- Comment #2 from Madhur Chauhan --- *** Bug 109686 has been marked as a duplicate of this bug. ***
[Bug middle-end/109686] Errorneous infinite loop detection (-Winfinite-recursion)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109686 Madhur Chauhan changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #2 from Madhur Chauhan --- Yes it's a dupe. Sorry. *** This bug has been marked as a duplicate of bug 107334 ***
[Bug middle-end/107334] Incorrect "infinite recursion detected" warning if base case aborts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107334 --- Comment #3 from Madhur Chauhan --- I feel changing the warning text will still cause troubles for people using -Werror. This forces to use ignore pragmas or disable this warning completely, none of them are ideal. Maybe disable warning in this case?
[Bug c++/110383] New: Missed optimiztion: Default operator== with trivial struct can be replaced with bcmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110383 Bug ID: 110383 Summary: Missed optimiztion: Default operator== with trivial struct can be replaced with bcmp Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: madhur4127 at gmail dot com Target Milestone: --- Godbolt: https://compiler-explorer.com/z/Gr8E5zfo5 Default operator== produces inefficient assembly where it checks for each data member therefore having a lot of branches. Clang produces better assembly with memcmp / bcmp. I know its hard to generalize performance claims but in the worst case having 8 branches (one for each member), is not more efficient than a bcmp.