[Bug rtl-optimization/94945] New: Missed optimization: Carry chain not recognized in manually unrolled loop

2020-05-04 Thread madhur4127 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94945

Bug ID: 94945
   Summary: Missed optimization: Carry chain not recognized in
manually unrolled loop
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: madhur4127 at gmail dot com
  Target Milestone: ---

Context: Big integer addition using ADC (_addcarry_u64).

See Godbolt link: https://godbolt.org/z/rMxe6W

Example:
Suppose the case of big integer addition:

// pa, pb: pointer to big integer A, B
// n: size of big integer A, B
// pr: pointer to result

void add(const uint64_t * __restrict__ pa, const uint64_t * __restrict__ pb,
uint64_t * __restrict__ pr, unsigned n) {
unsigned char carry = 0;
unsigned i;
for(i = 0; i

[Bug tree-optimization/94945] Missed optimization: Carry chain not recognized in manually unrolled loop

2020-05-22 Thread madhur4127 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94945

Madhur Chauhan  changed:

   What|Removed |Added

Version|10.0|10.1.1
  Component|rtl-optimization|tree-optimization

--- Comment #1 from Madhur Chauhan  ---
Is the scope of this optimization too narrow?

[Bug translation/95563] New: High memory usage and possible infinite loop

2020-06-06 Thread madhur4127 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95563

Bug ID: 95563
   Summary: High memory usage and possible infinite loop
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: translation
  Assignee: unassigned at gcc dot gnu.org
  Reporter: madhur4127 at gmail dot com
  Target Milestone: ---

Created attachment 48693
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48693&action=edit
Original code

Attached simplified.cc, which aims to be a complete example. For the original
code please see [1] (at bottom).

simplified.cc aims to be minimal in nature. But please verify the issue with
the original code.

Logic: Computing a binomial coefficient over a few iterations using a Modular<>
struct which is a wrapper over integer supporting modular arithmetic.
Combinatorics struct contains two arrays of "Modular" integers, one for
factorials and other for inverse factorials (modular multiplicative inverse of
the corresponding factorial).

On Linux, compilation took over 3.3GB of memory and 100% CPU utilization for
over 30 minutes, and thereafter I had to kill it. This hints for an infinite
loop.

command: g++ simplified.cc -o simplified
GCC version: Default GCC package (gcc 10.1.0-2) of Arch Linux
Target: x86_64-pc-linux-gnu

If the issue is simple to fix, then I would like to fix it for GCC (some
pointers would be appreciated as I have no experience in developing compilers).

References:
1. Original Code:
https://gist.github.com/madhur4127/4eb29246463ed5a0538e1d9c8a2a7192

[Bug translation/95563] High memory usage and possible infinite loop

2020-06-06 Thread madhur4127 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95563

--- Comment #1 from Madhur Chauhan  ---
(In reply to Madhur Chauhan from comment #0)
> Created attachment 48693 [details]
> Original code

My bad, this is the simplified version of the code but its name is misleading.
The original code is in the URL at the bottom.

[Bug rtl-optimization/93141] New: Missed optimization : Use of adc when checking overflow

2020-01-03 Thread madhur4127 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141

Bug ID: 93141
   Summary: Missed optimization : Use of adc when checking
overflow
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: madhur4127 at gmail dot com
  Target Milestone: ---

Created attachment 47584
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47584&action=edit
Complete code for computing dot product (same as the godbolt link)

Consider emulating 192-bit integer using a 128-bit integer and a 64-bit
integer. In the code sample this emulated integer is used to compute dot
product of two uint64_t vectors of length N. 

// function to compute dot product of two vectors
using u128 = unsigned __int128;
const int N = 2048;
uint64_t a[N], b[N];
u128 sum = 0;
uint64_t overflow = 0;
for(int i=0;ihttps://godbolt.org/z/ktdA4b

[Bug rtl-optimization/93142] New: Missed optimization : Use of adc when checking overflow

2020-01-03 Thread madhur4127 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93142

Bug ID: 93142
   Summary: Missed optimization : Use of adc when checking
overflow
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: madhur4127 at gmail dot com
  Target Milestone: ---

Created attachment 47585
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47585&action=edit
Complete code for computing dot product (same as the godbolt link)

Consider emulating 192-bit integer using a 128-bit integer and a 64-bit
integer. In the code sample this emulated integer is used to compute dot
product of two uint64_t vectors of length N. 

// function to compute dot product of two vectors
using u128 = unsigned __int128;
const int N = 2048;
uint64_t a[N], b[N];
u128 sum = 0;
uint64_t overflow = 0;
for(int i=0;ihttps://godbolt.org/z/ktdA4b

[Bug target/93141] Missed optimization : Use of adc when checking overflow

2020-01-03 Thread madhur4127 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141

--- Comment #1 from Madhur Chauhan  ---
The source of this bug is the stackoverflow Q&A: 

https://stackoverflow.com/questions/59575408/fastest-way-to-sum-dot-product-of-vector-of-unsigned-64-bit-integers-using-192-2/59579310#59579310

[Bug target/93141] Missed optimization : Use of adc when checking overflow

2020-01-04 Thread madhur4127 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141

--- Comment #6 from Madhur Chauhan  ---
As far as I can tell optimal asm generated should be like: 

mov-load from on array
mul or preferably mulx with a memory source from the other array
add + adc into 128-bit answer register
adc reg, 0 to accumulate the carry-out.

This optimization could help many big integer libraries as this lies in most
critical section.

[Bug target/93141] Missed optimization : Use of adc when checking overflow

2020-01-04 Thread madhur4127 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93141

--- Comment #7 from Madhur Chauhan  ---
As far as I can tell optimal asm generated should be like: 

mov-load from on array
mul or preferably mulx with a memory source from the other array
add + adc into 128-bit answer register
adc reg, 0 to accumulate the carry-out.

This optimization could help many big integer libraries as this lies in most
critical section.

[Bug target/116414] New: Missed optimization: Branch elimination and memory writes

2024-08-19 Thread madhur4127 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116414

Bug ID: 116414
   Summary: Missed optimization: Branch elimination and memory
writes
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: madhur4127 at gmail dot com
  Target Milestone: ---

Godbolt: https://godbolt.org/z/oKrGfh34z

Code:

```
#include 
#include 

std::optional x(uint16_t x) 
{ return x <= 1 ? std::nullopt : std::optional{x}; }
```

GCC trunk generates a branch and uses memory writes whereas it could just use
`eax` like clang

```
x(unsigned short):
cmp di, 1
jbe .L5
mov WORD PTR [rsp-4], di
mov BYTE PTR [rsp-2], 1
mov eax, DWORD PTR [rsp-4]
ret
.L5:
mov BYTE PTR [rsp-2], 0
mov eax, DWORD PTR [rsp-4]
ret
```

clang:
```
x(unsigned short):
xor eax, eax
cmp di, 2
setae   al
shl eax, 16
or  eax, edi
ret
```

[Bug c++/109686] New: Errorneous infinite loop detection (-Winfinite-recursion)

2023-05-01 Thread madhur4127 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109686

Bug ID: 109686
   Summary: Errorneous infinite loop detection
(-Winfinite-recursion)
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: madhur4127 at gmail dot com
  Target Milestone: ---

This code: 

```
#define myassert(x) \
do { \
if (! (x) ) \
abort(); \
}  while (false)

static void recursive(int n) {
myassert(n > 0);
recursive(n - 1);
printf("%d\n", n);
}
```

should not result in infinite recursion. Assert fails for all non-positive
integers. For positive numbers it counts down to 0 and then fails.

This affect GCC12, GCC13 and trunk: https://compiler-explorer.com/z/57qrnzdEK

[Bug middle-end/107334] Incorrect "infinite recursion detected" warning if base case aborts

2023-05-01 Thread madhur4127 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107334

Madhur Chauhan  changed:

   What|Removed |Added

 CC||madhur4127 at gmail dot com

--- Comment #2 from Madhur Chauhan  ---
*** Bug 109686 has been marked as a duplicate of this bug. ***

[Bug middle-end/109686] Errorneous infinite loop detection (-Winfinite-recursion)

2023-05-01 Thread madhur4127 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109686

Madhur Chauhan  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Madhur Chauhan  ---
Yes it's a dupe. Sorry.

*** This bug has been marked as a duplicate of bug 107334 ***

[Bug middle-end/107334] Incorrect "infinite recursion detected" warning if base case aborts

2023-05-01 Thread madhur4127 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107334

--- Comment #3 from Madhur Chauhan  ---
I feel changing the warning text will still cause troubles for people using
-Werror. This forces to use ignore pragmas or disable this warning completely,
none of them are ideal.

Maybe disable warning in this case?

[Bug c++/110383] New: Missed optimiztion: Default operator== with trivial struct can be replaced with bcmp

2023-06-23 Thread madhur4127 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110383

Bug ID: 110383
   Summary: Missed optimiztion: Default operator== with trivial
struct can be replaced with bcmp
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: madhur4127 at gmail dot com
  Target Milestone: ---

Godbolt: https://compiler-explorer.com/z/Gr8E5zfo5

Default operator== produces inefficient assembly where it checks for each data
member therefore having a lot of branches. Clang produces better assembly with
memcmp / bcmp.

I know its hard to generalize performance claims but in the worst case having 8
branches (one for each member), is not more efficient than a bcmp.