[Bug libstdc++/66416] string::find_last_of 3.5 times slower than memrchr

2020-03-24 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66416

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #2 from AK  ---
Could it be because string::find is more optimized than string::rfind?
see: https://gcc.gnu.org/legacy-ml/libstdc++/2017-01/msg00034.html

[Bug libstdc++/66414] string::find ten times slower than strstr

2020-03-24 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66414

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #8 from AK  ---
Should we consider this fixed?

[Bug libstdc++/93584] std::string::find_first_not_of is about 9X slower than strspn

2020-03-24 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93584

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #3 from AK  ---
Could it be because string::find_first_not_of is not as optimized as
string::find?

https://github.com/gcc-mirror/gcc/commit/cb627cdf5c0761f9e1be587a1416db9446a4801b

[Bug libstdc++/94747] New: Undefined behavior: integer overflow in libsupc++/dyncast.cc

2020-04-24 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94747

Bug ID: 94747
   Summary: Undefined behavior: integer overflow in
libsupc++/dyncast.cc
   Product: gcc
   Version: 7.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

Integer overflow reported by asan with the following stack trace. If this is
not 
sufficient I can try to provide a repro

gcc/7.x/libstdc++-v3/libsupc++/dyncast.cc:53:11: runtime error: negation of 16
cannot be represented in type 'unsigned long'
> #0 in __dynamic_cast gcc/7.x/libstdc++-v3/libsupc++/dyncast.cc:53
> #1 in bool std::has_facet >(std::locale const&) 
> gcc/7.x/.../bits/locale_classes.tcc:110
> #2 in std::basic_ios 
> >::_M_cache_locale(std::locale const&) gcc/7.x/.../bits/basic_ios.tcc:159
> #3 in std::basic_ios 
> >::init(std::basic_streambuf >*) 
> gcc/7.x/.../bits/basic_ios.tcc:132
> #4 in std::basic_ostream 
> >::basic_ostream(std::basic_streambuf >*) 
> gcc/7.x/.../ostream:85
> #5 in std::ios_base::Init::Init() 
> gcc/7.x/libstdc++-v3/src/c++98/ios_init.cc:91
> #6 in __cxx_global_var_init gcc/7.x/.../iostream:74

[Bug libstdc++/94823] modulo arithmetic bug in random.tcc

2020-04-28 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94823

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #1 from AK  ---
Here's the partial stack trace in case it helps

```
in bits/random.tcc:3274:20: runtime error: unsigned integer overflow: 0 - 1
cannot be represented in type 'unsigned long'

in void std::seed_seq::generate(unsigned int*, unsigned int*) 

in std::enable_if::value, void>::type
__gnu_cxx::simd_fast_mersenne_twister_engine::seed(std::seed_seq&) ext/random.tcc:110

in __gnu_cxx::simd_fast_mersenne_twister_engine::simd_fast_mersenne_twister_engine(std::seed_seq&) ext/random:104
```

[Bug libstdc++/94823] modulo arithmetic bug in random.tcc

2020-04-28 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94823

--- Comment #4 from AK  ---
Makes sense. Thanks for the explanation.

[Bug libstdc++/94823] modulo arithmetic bug in random.tcc

2020-04-28 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94823

--- Comment #5 from AK  ---
> So when __k == 0, then all three of those loads will be _Type(0x8b8b8b8bu) 
> really; no matter what the values of __n, __p will be.


Will it be a good idea to add the explanation in comments, as this may be
tricky for someone to comprehend in future?

[Bug tree-optimization/95565] New: [Feature request] add a flag to only instrument function entry.

2020-06-06 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95565

Bug ID: 95565
   Summary: [Feature request] add a flag to only instrument
function entry.
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

The flag -finstrument-functions instruments both entry and the exit of
function. There are many scenarios (like cheap profiling) only instrumenting
the function entry is sufficient. But gcc instruments exit as well contributing
to unwanted code size increase.

[Bug tree-optimization/95565] [Feature request] add a flag to only instrument function entry.

2020-06-06 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95565

--- Comment #1 from AK  ---
I believe we need to conditionally disable the following code, but I'm not sure
of all the implications. If someone can implement it that'd be great.

```
gcc/gimplify.c   Around Line:14997

  x = builtin_decl_implicit (BUILT_IN_RETURN_ADDRESS);
  call = gimple_build_call (x, 1, integer_zero_node);
  tmp_var = create_tmp_var (ptr_type_node, "return_addr");
  gimple_call_set_lhs (call, tmp_var);
  gimplify_seq_add_stmt (&cleanup, call);
  x = builtin_decl_implicit (BUILT_IN_PROFILE_FUNC_EXIT);
  call = gimple_build_call (x, 2, this_fn_addr, tmp_var);
  gimplify_seq_add_stmt (&cleanup, call);
  tf = gimple_build_try (seq, cleanup, GIMPLE_TRY_FINALLY);


```

[Bug tree-optimization/92638] New: gcc unable to remove empty loop after loop body is removed

2019-11-23 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92638

Bug ID: 92638
   Summary: gcc unable to remove empty loop after loop body is
removed
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

$cat test.c

#include
#include
#include
#include

char* get(const char* value, const char separator) {
  int separator_index = strchr(value, separator) - value;
  char* result = (char*)malloc(separator_index);
  memcpy(result, value, separator_index);
  result[separator_index] = '\0';

  return result;
}

int main() {
  const char separator = ',';
  clock_t t = clock();

  for (size_t i = 0; i < 1; ++i) {
free(get("127.0.0.1, 127.0.0.2:", separator));
  }

  float elapsed_seconds = (((double)(clock() - t)) / CLOCKS_PER_SEC);
  printf("%f seconds.\n", elapsed_seconds);

  return 0;
}


$ gcc -O3 -S -o-

get(char const*, char):
pushrbp
movsx   esi, sil
mov rbp, rdi
pushrbx
sub rsp, 8
callstrchr
sub rax, rbp
movsx   rbx, eax
mov rdi, rbx
callmalloc
mov rdx, rbx
mov rsi, rbp
mov rdi, rax
callmemcpy
mov BYTE PTR [rax+rbx], 0
add rsp, 8
pop rbx
pop rbp
ret
.LC1:
.string "%f seconds.\n"
main:
pushrbx
callclock
mov rbx, rax
mov eax, 1
.L5:
sub rax, 1 // <-- Loop body still there.
jne .L5
callclock
pxorxmm0, xmm0
mov edi, OFFSET FLAT:.LC1
sub rax, rbx
cvtsi2sdxmm0, rax
mov eax, 1
divsd   xmm0, QWORD PTR .LC0[rip]
cvtsd2ssxmm0, xmm0
cvtss2sdxmm0, xmm0
callprintf
xor eax, eax
pop rbx
ret
.LC0:
.long   0
.long   1093567616

[Bug tree-optimization/92638] gcc unable to remove empty loop after loop body is removed

2019-11-23 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92638

--- Comment #1 from AK  ---
FYI: clang -O3 optimizes the empty loop.

[Bug tree-optimization/85610] New: Unable to optimize away mov followed by compare into a cmpb in case of atomic_load

2018-05-02 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85610

Bug ID: 85610
   Summary: Unable to optimize away mov followed by compare into a
cmpb in case of atomic_load
   Product: gcc
   Version: tree-ssa
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

$ cat test.cpp

#include 
std::atomic flag_atomic{false};

extern void f1();
extern void f2();

void foo() {
bool b = flag_atomic.load(std::memory_order_relaxed);
if (b == false) {
f1();
} else {
f2();
}   
}


$ g++-7 -O3 -S -o - test.cpp -std=c++14

__Z3foov:
LFB342:
  movzbl  _flag_atomic(%rip), %eax
  testb %al, %al 
  je  L4  
  jmp __Z2f2v
  .align 4,0x90
L4:
  jmp __Z2f1v


We could just use `cmpb $0, _flag_atomic(%rip)` and avoid a register in this
case. When _flag_atomic is a scalar boolean global variable, that's what
happens.

[Bug tree-optimization/85611] New: Suboptimal code generation for (potentially) redundant atomic loads

2018-05-02 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85611

Bug ID: 85611
   Summary: Suboptimal code generation for (potentially) redundant
atomic loads
   Product: gcc
   Version: tree-ssa
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

$ cat test.cpp
#include 

std::atomic atomic_var{100};
int somevar;
bool cond;

void run1() {
auto a = atomic_var.load(std::memory_order_relaxed);
auto b = atomic_var.load(std::memory_order_relaxed);
   // Some code using a and b;
}

void run2() {
if (atomic_var.load(std::memory_order_relaxed) == 2 && cond) {
 if (atomic_var.load(std::memory_order_relaxed) * somevar > 3) {
   /*...*/
 }
}
}


$ g++-7 -O3 -std=c++17 -S -o - test.cpp -fno-exceptions

.text
.align 4,0x90
.globl __Z4run1v
__Z4run1v:
LFB339:
movl_atomic_var(%rip), %eax
movl_atomic_var(%rip), %eax
ret
LFE339:
.align 4,0x90
.globl __Z4run2v
__Z4run2v:
LFB340:
movl_atomic_var(%rip), %eax
cmpl$2, %eax
je  L5
L3:
ret
.align 4,0x90
L5:
cmpb$0, _cond(%rip)
je  L3
movl_atomic_var(%rip), %eax
ret

[Bug rtl-optimization/87238] New: Redundant Restore of $x0 when memcpy always returns the first argument.

2018-09-05 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87238

Bug ID: 87238
   Summary: Redundant Restore of $x0 when memcpy always returns
the first argument.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

$ cat test.cpp

struct BigStruct {
  int x[64];
};

void structByValue(BigStruct s);

void callStructByValue(int unused, int unused2, BigStruct s) {
  structByValue(s);
}

$ g++ -O3 -arch arm64 test.cpp -S -o -

callStructByValue(int, int, BigStruct):
  stp x29, x30, [sp, -272]!
  mov x1, x2
  mov x2, 256
  add x29, sp, 0
  add x0, x29, 16 << 
  bl memcpy
  add x0, x29, 16 << redundant
  bl structByValue(BigStruct)
  ldp x29, x30, [sp], 272
  ret


We could just do remove the second 'add x0, x29, 16' as memcpy is guaranteed to
return the pointer to desination.
http://man7.org/linux/man-pages/man3/memcpy.3.html


Possibly duplicate of PR82991 but not sure.

[Bug tree-optimization/87505] New: Vectorizer generates a lot of code for a small loop

2018-10-03 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87505

Bug ID: 87505
   Summary: Vectorizer generates a lot of code for a small loop
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

test.cpp

#include 

int bar(int* v, std::size_t base) {
int sum = 0;
for (int i = base; i < base + 4; ++i) {
sum += v[i];
}
return sum;
}


$ gcc-8.2 -std=c++17 -O3 -DNDEBUG test.cpp

bar(int*, unsigned long):
movslq  %esi, %rcx
leaq4(%rsi), %r8
movl%esi, %edx
cmpq%r8, %rcx
jnb .L7
leaq3(%rsi), %rax
movq%r8, %r9
subq%rcx, %rax
subq%rcx, %r9
cmpq$3, %rax
jbe .L8
movq%r9, %rdx
leaq(%rdi,%rcx,4), %rax
pxor%xmm0, %xmm0
shrq$2, %rdx
salq$4, %rdx
addq%rax, %rdx
.L5:
movdqu  (%rax), %xmm2
addq$16, %rax
paddd   %xmm2, %xmm0
cmpq%rdx, %rax
jne .L5
movdqa  %xmm0, %xmm1
movq%r9, %r10
psrldq  $8, %xmm1
andq$-4, %r10
paddd   %xmm1, %xmm0
addq%r10, %rcx
leal(%rsi,%r10), %edx
movdqa  %xmm0, %xmm1
psrldq  $4, %xmm1
paddd   %xmm1, %xmm0
movd%xmm0, %eax
cmpq%r10, %r9
je  .L10
.L3:
addl(%rdi,%rcx,4), %eax
leal1(%rdx), %ecx
movslq  %ecx, %rcx
cmpq%r8, %rcx
jnb .L1
addl(%rdi,%rcx,4), %eax
leal2(%rdx), %ecx
movslq  %ecx, %rcx
cmpq%rcx, %r8
jbe .L1
addl$3, %edx
addl(%rdi,%rcx,4), %eax
movslq  %edx, %rdx
cmpq%rdx, %r8
jbe .L1
addl(%rdi,%rdx,4), %eax
ret
.L7:
xorl%eax, %eax
.L1:
ret
.L10:
ret
.L8:
xorl%eax, %eax
jmp .L3

[Bug c++/17913] ICE jumping into statement expression

2018-02-18 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=17913

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #25 from AK  ---
I think this bug is fixed:


void f(void) { 1 ? 1 : ({ a : 1; 1; }); goto a; }


g++-7.3 -O3 -std=c++11 test.c -S -o -



f():
.L2:
jmp .L2

[Bug c++/22238] Awful error messages with virtual functions

2018-02-18 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22238

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #24 from AK  ---
The recent error messages look much better. Maybe we can close this.

prog.cpp: In member function ‘void A::bar()’:
prog.cpp:6:23: error: could not convert ‘A::foo()’ from ‘void’ to ‘bool’
   void bar() { if (foo()) ; }

[Bug c++/12333] [DR 272] Explicit call to MyClass::~MyClass() not allowed

2018-02-20 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12333

--- Comment #17 from AK  ---
The following workarounds do not emit compiler errors, although I'm not sure if
second option is a correct workaround.

1. this->~X();
2. X::~X(0);

FYI,
ICC 18 also has the same bug. The first workaround works for ICC but not the
second one.

[Bug c++/87628] New: Redundant check of pointer when delete is called

2018-10-16 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87628

Bug ID: 87628
   Summary: Redundant check of pointer when delete is called
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

https://godbolt.org/z/DY9ruv

void if_delete(char *p) {
if (p) {
delete(p);
}
}

$ gcc-8.2 -Os -fno-exceptions

if_delete(char*):
  test rdi, rdi
  je .L1
  mov esi, 1
  jmp operator delete(void*, unsigned long)
.L1:
  ret


While clang removes the check at -Oz:

$ clang -Oz -fno-exceptions
if_delete(char*):
jmp operator delete(void*)

[Bug libstdc++/79349] unused std::string is not optimized away in presense of a call

2017-05-02 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79349

AK  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #2 from AK  ---
The problem is exceptions. When I compile without exceptions (-fno-exceptions)
g++ does optimize this away and gives same output as clang. It seems clang++
compiles without exceptions by default and behaves like g++ when -fexceptions
is passed.

[Bug libstdc++/79349] unused std::string is not optimized away in presense of a call

2017-05-02 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79349

AK  changed:

   What|Removed |Added

 Status|RESOLVED|NEW
 Resolution|WORKSFORME  |---

--- Comment #3 from AK  ---
(In reply to AK from comment #2)
> The problem is exceptions. When I compile without exceptions
> (-fno-exceptions) g++ does optimize this away and gives same output as
> clang. It seems clang++ compiles without exceptions by default and behaves
> like g++ when -fexceptions is passed.

Correction:
For the example:
#include

int main() {
  std::string s("abc");
  return 0;
}

clang (with libc++) optimizes away the std::string even in the presence of
exceptions (-fexceptions). When I compile without exceptions (-fno-exceptions)
g++ does optimize the std::string away.


However, when I introduce a call:

#include

void foo();

int main() {
  std::string s("abc");
  foo();
  return 0;
}

clang++ still optimizes the std::string but g++ does not. I think the problem
is with libstdc++ because when clang is using libstdc++ I can see the
destructor.


Sorry for the confusion.

[Bug libstdc++/78702] New: [libstdc++] class __shim in locale::facet is private

2016-12-06 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78702

Bug ID: 78702
   Summary: [libstdc++] class __shim in locale::facet is private
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

In file: include/bits/locale_classes.h
371   class locale::facet
372   {
...
465 class __shim;
466
467 const facet* _M_sso_shim(const id*) const;
468 const facet* _M_cow_shim(const id*) const;

However in file: src/c++11/cxx11-shim_facets.cc
numpunct_shim derives from facet::__shim which results in compilation error.

227   namespace // unnamed
228   {
229 template
230   struct numpunct_shim : std::numpunct<_CharT>, facet::__shim
231   {
232 typedef typename numpunct<_CharT>::__cache_type __cache_type;
233
234 // f must point to a type derived from numpunct[abi:other]
235 numpunct_shim(const facet* f, __cache_type* c = new __cache_type)
236 : std::numpunct<_CharT>(c), __shim(f), _M_cache(c)
237 {
238   __numpunct_fill_cache(other_abi{}, f, c);
239 }


What could be a possible fix here?

[Bug libstdc++/78702] [libstdc++] class __shim in locale::facet is private

2016-12-06 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78702

--- Comment #2 from AK  ---
Sorry for the confusion, I was using clang++ (trunk) to build libstdc++

[Bug libstdc++/78702] [libstdc++] class __shim in locale::facet is private

2016-12-06 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78702

--- Comment #3 from AK  ---
llvm-project/install/bin/clang++ -std=gnu++14 -D_GLIBCXX_SHARED
-fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi
-fdiagnostics-show-location=once -ffunction-sections -fdata-sections
-frandom-seed=cxx11-shim_facets.lo -c ../../../src/c++11/cxx11-shim_facets.cc 
-fPIC -DPIC -D_GLIBCXX_SHARED -o cxx11-shim_facets.o (+ some include flags from
Makefile)


../../../src/c++11/cxx11-shim_facets.cc:230:60: error: '__shim' is a private
member of 'std::locale::facet'
  struct numpunct_shim : std::numpunct<_CharT>, facet::__shim
   ^
~/s/work/gcc/libstdc++-v3/build/include/bits/locale_classes.h:464:11: note:
declared private here
class __shim;


clang version 4.0.0
Target: x86_64-unknown-linux-gnu

[Bug libstdc++/78717] New: no definition of string::find when lowered to gimple

2016-12-07 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78717

Bug ID: 78717
   Summary: no definition of string::find when lowered to gimple
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

$ cat test.cpp

#include

int foo(const std::string &s1, const std::string &s2, int i) {
 return s1.find(s2) == i;
}


../gcc/install/usr/bin/g++ -S -o a.s ../a.cpp -fdump-tree-all-all

$ cat a.cpp.004t.gimple

int foo(const string&, const string&, int) (const struct string & s1, const
struct string & s2, int i)
{
  intD.9 D.27718;

  # USE = anything
  # CLB = anything
  _1 =
_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE4findERKS4_mD.18492
(s1D.24055, s2D.24056, 0);
  _2 = (long unsigned intD.14) iD.24057;
  _3 = _1 == _2;
  D.27718 = (intD.9) _3;
  return D.27718;
}


The problem is that now inliner cannot see the definition of std::string::find
and hence cannot inline it. Maybe because std::basic_string is an extern
template, but I would hope that at least the definition should be visible to
the optimizer. That would help improve the performance of programs using
string::find.

Thanks,

[Bug libstdc++/66414] string::find ten times slower than strstr

2016-12-07 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66414

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #6 from AK  ---
I have posted a patch up for review for string::find which might help as well.

https://gcc.gnu.org/ml/libstdc++/2016-12/msg00051.html

Please give feedback for improvement.
-Aditya

[Bug libstdc++/78717] no definition of string::find when lowered to gimple

2016-12-10 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78717

--- Comment #2 from AK  ---
With -O3 I see only the following definition of find which calls the real find
function I was expecting to be visible in the gimple.

  D.12805 =
_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE4findERKS4_mD.11635
(s1D.12794, s2D.12795, 0);

std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type
std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::find(const
std::__cxx11::basic_string<_CharT, _Traits, _Alloc>&,
std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type) const

But after I #defined _GLIBCXX_DEBUG in test.cpp (before including string), I
can see the find function.

[Bug middle-end/79349] New: unused std::string is not optimized away in presense of a call

2017-02-02 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79349

Bug ID: 79349
   Summary: unused std::string is not optimized away in presense
of a call
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

g++ version (GCC) 7.0.0 20170118 (experimental)

$ cat t.cpp

#include
void foo();

int main() {
  std::string s("abc");
  foo ();
  return 0;
}

$ install/bin/g++ -O3 t.cpp -S -o t.s
$ cat t.s

main:
.LFB995:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
.cfi_lsda 0x3,.LLSDA995
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
subq$32, %rsp
.cfi_def_cfa_offset 48
leaq16(%rsp), %rax
movb$99, 18(%rsp)
movq$3, 8(%rsp)
movb$0, 19(%rsp)
movq%rax, (%rsp)
movl$25185, %eax
movw%ax, 16(%rsp)
.LEHB0:
call_Z3foov
.LEHE0:
movq(%rsp), %rdi
leaq16(%rsp), %rax
cmpq%rax, %rdi
je  .L6
call_ZdlPv
.L6:
addq$32, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 16
xorl%eax, %eax
popq%rbx
.cfi_def_cfa_offset 8
ret
.L5:
.cfi_restore_state
movq(%rsp), %rdi
leaq16(%rsp), %rdx
movq%rax, %rbx
cmpq%rdx, %rdi
je  .L4
call_ZdlPv
.L4:
movq%rbx, %rdi
.LEHB1:
call_Unwind_Resume
.LEHE1:
.cfi_endproc
.LFE995:
.globl  __gxx_personality_v0
.section.gcc_except_table,"a",@progbits


While clang++ optimizes it away: clang version 5.0.0 (llvm-project SHA:
28b7c19c2379e17b26571260933467b9f98b449c)


$ ./bin/clang++ -O3 t.cpp -S -o t.s -stdlib=libc++
$ cat t.s

main:   # @main
.cfi_startproc
# BB#0: # %entry
pushq   %rax
.Lcfi0:
.cfi_def_cfa_offset 16
callq   _Z3foov
xorl%eax, %eax
popq%rcx
retq
.Lfunc_end0:
.size   main, .Lfunc_end0-main
.cfi_endproc


.ident  "clang version 5.0.0 "
.section".note.GNU-stack","",@progbits

[Bug libstdc++/80331] New: unused const std::string not optimized away

2017-04-05 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331

Bug ID: 80331
   Summary: unused const std::string not optimized away
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

$ cat t.cpp
#include
int sain() {
  const std::string s("a");
  return 0;
}

# gcc version 7.0.0 20170118 (experimental) (GCC)
$ g++ -S -o t.s t.cpp -O2 -fno-exceptions -std=c++11

$ cat t.s
.type   _Z4sainv, @function
_Z4sainv:
.LFB940:
.cfi_startproc
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movl$.LC0+1, %esi
subq$32, %rsp
.cfi_def_cfa_offset 48
leaq16(%rsp), %rbx
movq%rsp, %rdi
movq%rbx, (%rsp)
call   
_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_M_constructIPKcEEvT_S8_St20forward_iterator_tag.isra.16.constprop.20
movq(%rsp), %rdi
cmpq%rbx, %rdi
je  .L13
call_ZdlPv
.L13:
addq$32, %rsp
.cfi_def_cfa_offset 16
xorl%eax, %eax
popq%rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE940:
.size   _Z4sainv, .-_Z4sainv


clang++, on the other hand, completely optimizes the const string.

.type   _Z4sainv,@function
_Z4sainv:   # @_Z4sainv
.cfi_startproc
# BB#0:
xorl%eax, %eax
retq
.Lfunc_end0:
.size   _Z4sainv, .Lfunc_end0-_Z4sainv

[Bug tree-optimization/82776] New: Unable to optimize the loop when iteration count is unavailable.

2017-10-30 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82776

Bug ID: 82776
   Summary: Unable to optimize the loop when iteration count is
unavailable.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

Compiling with 

g++ -O2 --std=c++14 -msse4 -DENABLE_FORLOOP
vs.
g++ -O2 --std=c++14 -msse4

gives dramatically different results in the sense that the loop is completely
optimized when for-loop is present instead of `while(true)` loop. Reproduces
with g++-7.2 and g++-trunk.


$ cat test.cpp

#include 
#include 
#include 
#include 
#include 
#include 
#include 

struct Chunk {
  std::array tags_;
  uint8_t control_;

  bool eof() const {
return (control_ & 1) != 0;
  }

  static constexpr unsigned kFullMask = (1 << 14) - 1;

  __m128i const* tagVector() const {
return static_cast<__m128i const*>(static_cast(&tags_[0]));
  }

  unsigned emptyMask() const {
auto tagV = _mm_load_si128(tagVector());
auto emptyTagV = _mm_cmpeq_epi8(tagV, _mm_setzero_si128());
return _mm_movemask_epi8(emptyTagV) & kFullMask;
  }

  unsigned occupiedMask() const {
return emptyMask() ^ kFullMask;
  }
};

#define LIKELY(x) __builtin_expect((x), true)
#define UNLIKELY(x) __builtin_expect((x), false)

struct Iter {
  Chunk* chunk_;
  std::size_t index_;

  void advance() {
// common case is packed entries
while (index_ > 0) {
  --index_;
  if (LIKELY(chunk_->tags_[index_] != 0)) {
return;
  }
}

// bar only skips the work of advance() if this loop can
// be guaranteed to terminate
#ifdef ENABLE_FORLOOP
for (std::size_t i = 1; i != 0; ++i) {
#else
while (true) {
#endif
  // exhausted the current chunk
  if (chunk_->eof()) {
chunk_ = nullptr;
break;
  }
  ++chunk_;
  auto m = chunk_->occupiedMask();
  if (m != 0) {
index_ = 31 - __builtin_clz(m);
break;
  }
}
  }
};

static Iter foo(Iter iter) {
  puts("hello");
  iter.advance();
  return iter;
}

void bar(Iter iter) {
  foo(iter);
}

[Bug tree-optimization/82776] Unable to optimize the loop when iteration count is unavailable.

2017-11-01 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82776

--- Comment #9 from AK  ---
Are we also taking advantage of this statement in the standard:

> An iteration statement that performs no input/output operations, does not 
> access volatile objects, and performs no
synchronization or atomic operations in its body, controlling
expression, or (in the case of a for statement) its expression may be assumed
by the implementation to terminate.

[Bug rtl-optimization/82889] New: Unnecessary sign extension of int32 to int64

2017-11-07 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82889

Bug ID: 82889
   Summary: Unnecessary sign extension of int32 to int64
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

$ cat t.cpp
#include 

int lol(int32_t* table, int32_t* ht, uint32_t hash, uint32_t mask) {
for (uint64_t probe = (uint32_t)hash & mask, i = 1;; ++i) {
int32_t pos = ht[probe];
if (pos >= 0) {
if (table[pos] == 42) {
return true;
}
} else if (pos & 1) {
return false;
}
probe += i;
probe &= mask;
}
// notreached
}

compile with:
gcc -std=c++11 -O3 -s -o -


lol(int*, int*, unsigned int, unsigned int):
andl%ecx, %edx
movl$1, %r8d
movl%ecx, %ecx
jmp .L5
.L10:
cmpl$42, (%rdi,%rax,4)
je  .L9
.L4:
addq%r8, %rdx
addq$1, %r8
andq%rcx, %rdx
.L5:
movslq  (%rsi,%rdx,4), %rax #<sign extended
testl   %eax, %eax
jns .L10
testb   $1, %al
je  .L4
xorl%eax, %eax
ret
.L9:
movl$1, %eax
ret


Is it possible to get rid of this?

[Bug tree-optimization/46186] Clang creates code running 1600 times faster than gcc's

2017-11-27 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #27 from AK  ---
Seems PR65855 is related to this. btw, it may be worthwhile to try the patch
posted by Sebastian https://gcc.gnu.org/bugzilla/attachment.cgi?id=22201

[Bug ipa/65972] New: ICE after applying a patch to enable the vectorizer.

2015-05-01 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972

Bug ID: 65972
   Summary: ICE after applying a patch to enable the vectorizer.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

I was trying to debug a compiler crash while compiling a program with auto-fdo
enabled during which the compiler crashed. I could triage the bug to this small
case. The issue  here is that ssa is not in a valid form at this point and the
compiler analyzes a function for inlining.

diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 5d99887..d76f396 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -128,6 +128,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-utils.h"
 #include "cilk.h"
 #include "cfgexpand.h"
+#include "tree-ssa.h"

 /* Estimate runtime of function can easilly run into huge numbers with many
nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE * 2 in an
@@ -2506,6 +2507,7 @@ estimate_function_body_sizes (struct cgraph_node *node,
bool early)
  <0,2>.  */
   basic_block bb;
   struct function *my_function = DECL_STRUCT_FUNCTION (node->decl);
+  verify_ssa (false, true);
   int freq;
   struct inline_summary *info = inline_summaries->get (node);
   struct predicate bb_predicate;



Configured with: ../configure --prefix=../install-crash --enable-shared
--enable-static --target=arm-foo-linux-gnueabihf
--with-sysroot=/compiler/tools/cross/arm-foo-linux-gnueabihf/gcc-5.1/2015-04-22-222321-2c9bf40-release-5.1/arm-foo-linux-gnueabihf/sysroot
--disable-__cxa_atexit --with-gnu-ld --disable-libssp --disable-multilib
--with-gmp=/compiler/tools/cross/arm-foo-linux-gnueabihf/gcc-5.1/2015-04-22-222321-2c9bf40-release-5.1
--with-mpfr=/compiler/tools/cross/arm-foo-linux-gnueabihf/gcc-5.1/2015-04-22-222321-2c9bf40-release-5.1
--enable-target-optspace --disable-libquadmath --enable-tls
--disable-libmudflap --enable-threads
--with-mpc=/compiler/tools/cross/arm-foo-linux-gnueabihf/gcc-5.1/2015-04-22-222321-2c9bf40-release-5.1
--disable-decimal-float --enable-languages=c,c++,lto
--with-build-time-tools=/compiler/tools/cross/arm-foo-linux-gnueabihf/gcc-5.1/2015-04-22-222321-2c9bf40-release-5.1/arm-foo-linux-gnueabihf/bin
--disable-libgomp --with-pkgversion='Built at foo from commit SHA:2c9bf40,
Rev:222321' --with-bugurl=foo...@samsung.com --with-abi=aapcs-linux
--with-cpu=cortex-a15 --with-fpu=neon --with-float=hard --with-mode=arm
--disable-bootstrap --enable-languages=c,c++
Thread model: posix

The compiler crashes like this:

../../../libgcc/config/arm/unwind-arm.c: In function ‘__gnu_unwind_pr_common’:
../../../libgcc/config/arm/unwind-arm.c:511:1: error: virtual definition of
statement not up-to-date
 }
 ^
# .MEM_102 = VDEF <.MEM_97>
_103 = _Unwind_decode_typeinfo_ptr.isra.0 (_101);
../../../libgcc/config/arm/unwind-arm.c:511:1: internal compiler error:
verify_ssa failed

gcc/build-crash/./gcc/xgcc -Bgcc/build-crash/./gcc/
-Bgcc/build-crash/../install-crash/arm-foo-linux-gnueabihf/bin/
-Bgcc/build-crash/../install-crash/arm-foo-linux-gnueabihf/lib/ -isystem
gcc/build-crash/../install-crash/arm-foo-linux-gnueabihf/include -isystem
gcc/build-crash/../install-crash/arm-foo-linux-gnueabihf/sys-include-g -O2
-g -Os -O2  -g -O2 -g -Os -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -W -Wall
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes
-Wmissing-prototypes -Wold-style-definition  -isystem ./include   -fPIC
-fno-inline -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector   -fPIC
-fno-inline -I. -I. -I../.././gcc -I../../../libgcc -I../../../libgcc/.
-I../../../libgcc/../gcc -I../../../libgcc/../include  -DHAVE_CC_TLS  -o
unwind-arm.o -MT unwind-arm.o -MD -MP -MF unwind-arm.dep -fexceptions -c
../../../libgcc/config/arm/unwind-arm.c


0xb80500 convert_callers_for_node
../../gcc/tree-sra.c:4940
0xb853cb cgraph_node::call_for_symbol_and_aliases(bool (*)(cgraph_node*,
void*), void*, bool)
../../gcc/cgraph.h:3025
0xb853cb convert_callers
../../gcc/tree-sra.c:4955
0xb853cb modify_function
../../gcc/tree-sra.c:5011
0xb8e861 ipa_early_sra
../../gcc/tree-sra.c:5239
0xb8e861 execute
../../gcc/tree-sra.c:5286
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.



make[2]: *** [unwind-arm.o] Error 1
make[2]: *** Waiting for unfinished jobs
0xc966ab verify_ssa(bool, bool)
../../gcc/tree-ssa.c:1068
0x8ef1d0 estimate_function_body_sizes
../../gcc/ipa-inline-analysis.c:2510
0x8f1f43 compute_inline_parameters(cgraph_node*, bool)
../../gcc/ipa-inline-analysis.c:2973
0xb80500 convert_cal

[Bug ipa/65972] ICE after applying a patch to enable verify_ssa

2015-05-02 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972

--- Comment #1 from AK  ---
PS: The bootstrap fails after applying this patch and emits the error reported
above.


[Bug ipa/65972] ICE after applying a patch to enable verify_ssa

2015-05-04 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972

--- Comment #3 from AK  ---
Created attachment 35457
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35457&action=edit
Preprocesed unwind-arm.i


[Bug middle-end/65947] Vectorizer misses conditional assignment of constant

2015-05-04 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65947

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #2 from AK  ---
Can you please explain how the conditional is commutative. That will be very
helpful.
Thanks,


[Bug ipa/65972] ICE after applying a patch to enable verify_ssa

2015-05-15 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972

--- Comment #6 from AK  ---
Your patch did fix the problem. Thanks!


[Bug rtl-optimization/66206] New: Address of stack memory associated with local variable returned to caller

2015-05-19 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66206

Bug ID: 66206
   Summary: Address of stack memory associated with local variable
returned to caller
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

Clang static analyzer reported this potential bug.
File:   gcc/bt-load.c
Location:   line 234, column 6
Description:Address of stack memory associated with local variable 'x'
returned to caller


220 static rtx *
221 find_btr_use (rtx x, rtx *excludep = 0)
222 {
223 subrtx_ptr_iterator::array_type array;
224 FOR_EACH_SUBRTX_PTR (iter, array, &x, NONCONST)
225 {
226 rtx *loc = *iter;
227 if (loc == excludep)
228 iter.skip_subrtxes ();
229 else
230 {
231 const_rtx x = *loc;
232 if (REG_P (x)
233 && overlaps_hard_reg_set_p (all_btrs, GET_MODE (x), REGNO (x)))
234 return loc; // <-   Address of stack memory associated with local
variable 'x' returned to caller
235 }
236 }
237 return 0;
238 }


[Bug tree-optimization/48052] loop not vectorized if index is "unsigned int"

2015-05-22 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48052

--- Comment #13 from AK  ---
We have an updated patch that works for both the cases.
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01991.html


[Bug tree-optimization/67700] New: [graphite] miscompile due to wrong codegen

2015-09-23 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67700

Bug ID: 67700
   Summary: [graphite] miscompile due to wrong codegen
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

A reduced test case is below:
compile with trunk gcc: 

gcc -O2 -fgraphite-identity test.c
./a.out
int main(): Assertion `abcd->a[8] == 29' failed


struct abc {
  int a[81];
} *abcd;

#define FPMATH_SSE 2
int global;

void __attribute__ ((noinline)) foo()
{
   int pos = 0;
   int i;

   if (!((global & FPMATH_SSE) != 0)) 
for (i = 8; i <= 15; i++)
  abcd->a[pos++] = i;

   for (i = 29; i <= 36; i++)
abcd->a[pos++] = i;
}

#include 
#include 
#include 

int main()
{
  int i;
  abcd = (struct abc*) malloc (sizeof (abc));
  for (i = 0; i <= 80; i++)
abcd->a[i] = 0;

  foo();

  assert (abcd->a[8] == 29);

  return 0;
}


[Bug tree-optimization/67700] [graphite] miscompile due to wrong codegen

2015-09-23 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67700

--- Comment #1 from AK  ---
The problem seems to be in 

static void canonicalize_loop_closed_ssa (loop_p loop)

which generates phi node at a wrong place in this case.


[Bug tree-optimization/67842] Incorrect check in sese.h:bb_in_region

2015-12-24 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67842

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #1 from AK  ---
This check is irrelevant now. I'll put up a patch to remove this check.

[Bug tree-optimization/65850] [5/6 Regression] [graphite]: isl_constraint.c:625: expecting integer value

2015-07-17 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65850

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #4 from AK  ---
This bug appears to be a duplicate of #61929. The fix (r225942) has been pushed
today. Please verify if this fixes your problem.


[Bug middle-end/64394] ICE: in build_linearized_memory_access, at graphite-interchange.c:121 (isl_constraint.c:558: expecting integer value) with -floop-interchange

2015-07-17 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64394

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #3 from AK  ---
This bug appears to be a duplicate of bug61929. The fix (r225942) has been
pushed today. Please verify if this fixes your problem.


[Bug middle-end/70159] missed CSE optimization

2016-07-02 Thread hiraditya at msn dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70159

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #11 from AK  ---
Just as an update, the new gvn-hoist pass in llvm hoists the common
computations:

@cat test.c
float foo_p(float d, float min, float max, float a)
{
  float tmin;
  float tmax;

  float inv = 1.0f / d;
  if (inv >= 0) {
tmin = (min - a) * inv;
tmax = (max - a) * inv;
  } else {
tmin = (max - a) * inv;
tmax = (min - a) * inv;
  }

  return tmax + tmin;
}


clang -c -Ofast test.c -mllvm -print-after-all


*** IR Dump Before Early GVN Hoisting of Expressions ***
; Function Attrs: nounwind uwtable
define float @_Z5foo_p(float %d, float %min, float %max, float %a) #0 {
entry:
  %div = fdiv fast float 1.00e+00, %d
  %cmp = fcmp fast oge float %div, 0.00e+00
  br i1 %cmp, label %if.then, label %if.else

if.then:  ; preds = %entry
  %sub = fsub fast float %min, %a
  %mul = fmul fast float %sub, %div
  %sub1 = fsub fast float %max, %a
  %mul2 = fmul fast float %sub1, %div
  br label %if.end

if.else:  ; preds = %entry
  %sub3 = fsub fast float %max, %a
  %mul4 = fmul fast float %sub3, %div
  %sub5 = fsub fast float %min, %a
  %mul6 = fmul fast float %sub5, %div
  br label %if.end

if.end:   ; preds = %if.else, %if.then
  %tmax.0 = phi float [ %mul2, %if.then ], [ %mul6, %if.else ]
  %tmin.0 = phi float [ %mul, %if.then ], [ %mul4, %if.else ]
  %add = fadd fast float %tmax.0, %tmin.0
  ret float %add
}


*** IR Dump After Early GVN Hoisting of Expressions ***
; Function Attrs: nounwind uwtable
define float @_Z5foo_p(float %d, float %min, float %max, float %a) #0 {
entry:
  %div = fdiv fast float 1.00e+00, %d
  %cmp = fcmp fast oge float %div, 0.00e+00
  %sub = fsub fast float %min, %a
  %mul = fmul fast float %sub, %div
  %sub1 = fsub fast float %max, %a
  %mul2 = fmul fast float %sub1, %div
  br i1 %cmp, label %if.then, label %if.else

if.then:  ; preds = %entry
  br label %if.end

if.else:  ; preds = %entry
  br label %if.end

if.end:   ; preds = %if.else, %if.then
  %tmax.0 = phi float [ %mul2, %if.then ], [ %mul, %if.else ]
  %tmin.0 = phi float [ %mul, %if.then ], [ %mul2, %if.else ]
  %add = fadd fast float %tmax.0, %tmin.0
  ret float %add
}

[Bug tree-optimization/100004] New: Dead write not removed when indirection is introduced.

2021-04-09 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14

Bug ID: 14
   Summary: Dead write not removed when indirection is introduced.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

struct Foo {
int x;
};

struct Bar {
int x;
};

void alias(Foo* foo, Bar* bar) {
foo->x = 5;
foo->x = bar->x;
}

struct Wrap1 {
Foo foo;
};

struct Wrap2 {
Foo foo;
};

void assign_direct(Wrap1* w1, Wrap2* w2)
{
w1->foo.x = 5;
w1->foo.x = w2->foo.x;
}

void assign_via_pointer(Wrap1* w1, Wrap2* w2)
{
Foo* f1 = &w1->foo;
Foo* f2 = &w2->foo;
f1->x = 5;
f1->x = f2->x;
}


$ gcc-arm64 -O2 -std=c++17 -fstrict-aliasing -S -o -

alias(Foo*, Bar*):
ldr w1, [x1]
str w1, [x0]
ret
assign_direct(Wrap1*, Wrap2*):
ldr w1, [x1]
str w1, [x0]
ret
assign_via_pointer(Wrap1*, Wrap2*):
mov w2, 5
str w2, [x0]
ldr w1, [x1]
str w1, [x0]
ret

[Bug tree-optimization/100004] Dead write not removed when indirection is introduced.

2021-04-09 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14

--- Comment #1 from AK  ---
godbolt link: https://gcc.godbolt.org/z/f7Y6G1svf

[Bug tree-optimization/98497] New: [Potential Perf regression] jne to hot branch instead je to cold

2021-01-01 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98497

Bug ID: 98497
   Summary: [Potential Perf regression] jne to hot branch instead
je to cold
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

In the following code generated by gcc 10.2
```
.L2:
movups  xmm3, XMMWORD PTR [rax]
add rax, 16
addps   xmm0, xmm3
cmp rax, rdx
je  .L6
jmp .L2

matrix_sum_column_major.cold:
.L6:
movaps  xmm2, xmm0
# .

```

I think `jne .L2; jmp.L6` should be more efficient as it avoids one instruction
in the hot path.

c code:
```
float matrix_sum_column_major(float* x, int n) {
n = 32767;
float sum = 0;
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
sum += x[j * n + i];
return sum;
}
```

gcc -Ofast -floop-nest-optimize -o -
```
matrix_sum_column_major:
mov eax, 4294836212
lea rdx, [rdi+131056]
pxorxmm1, xmm1
lea rcx, [rdi+rax]
.L3:
mov rax, rdi
pxorxmm0, xmm0
.L2:
movups  xmm3, XMMWORD PTR [rax]
add rax, 16
addps   xmm0, xmm3
cmp rax, rdx
je  .L6
jmp .L2
matrix_sum_column_major.cold:
.L6:
movaps  xmm2, xmm0
addss   xmm1, DWORD PTR [rax+8]
lea rdx, [rax+131068]
add rdi, 131068
movhlps xmm2, xmm0
addps   xmm2, xmm0
movaps  xmm0, xmm2
shufps  xmm0, xmm2, 85
addps   xmm0, xmm2
movss   xmm2, DWORD PTR [rax+4]
addss   xmm2, DWORD PTR [rax]
addss   xmm1, xmm2
addss   xmm1, xmm0
cmp rdx, rcx
jne .L3
movaps  xmm0, xmm1
ret
```


Link to godbolt: https://gcc.godbolt.org/z/ac7YY1

[Bug libstdc++/59048] operator== between std::string and const char* slower than strcmp

2021-02-11 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59048

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #17 from AK  ---
Now that we have string_view, will it be possible to avoid creating a copy?

[Bug tree-optimization/101116] New: missed peephole optimization not of bitwise and

2021-06-17 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101116

Bug ID: 101116
   Summary: missed peephole optimization not of bitwise and
   Product: gcc
   Version: 11.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

$ cat test.c

bool foo(unsigned i) {
return !(i & 1);
}

gcc -O2 test.c -S -o-
foo(unsigned int):
mov eax, edi
not eax
and eax, 1
ret

clang -O2 test.c -S -o-

foo(unsigned int): # @foo(unsigned int)
  testb $1, %dil
  sete %al
  retq


Ref: https://godbolt.org/z/Tndb1dM8Y

[Bug c++/101138] New: Ambiguous code (with operator==) compiled without error

2021-06-19 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101138

Bug ID: 101138
   Summary: Ambiguous code (with operator==) compiled without
error
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

$ cat test.cpp

#include 
using namespace std;

template
struct D {
template bool operator==(Y a) const { 
cout << "f" < 
bool operator==(T a, D b) { 
cout << "fD" < a, b;
if (a == b)
return 0;
return 1;
}

gcc compiles this code fine, bug clang errors out.

https://godbolt.org/z/c13EExxeY

[Bug c/108915] New: invalid pointer access preserved in optimized code

2023-02-23 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108915

Bug ID: 108915
   Summary: invalid pointer access preserved in optimized code
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

Testcase has been reduced from u-boot's linker-list macro:
https://github.com/u-boot/u-boot/blob/master/include/linker_lists.h#L127


#include

char* bar() {
static char start_bar[0] __attribute__((aligned(16)))
   __attribute__((unused))
   __attribute__((section("__u_boot_list_2_1")));
char *p = (char *)start_bar;
for (int i = p[0]; i < p[9]; i++)
printf("asdfasd");
return 0;
}



$ gcc -O3 -fno-unroll-loops -S -o -

.LC0:
.string "asdfasd"
bar:
pushrbx
movsx   eax, BYTE PTR start_bar.1[rip+9]
movsx   ebx, BYTE PTR start_bar.1[rip]
cmp ebx, eax
jge .L2
.L3:
mov edi, OFFSET FLAT:.LC0
xor eax, eax
add ebx, 1
callprintf
movsx   eax, BYTE PTR start_bar.1[rip+9]
cmp eax, ebx
jg  .L3
.L2:
xor eax, eax
pop rbx
ret

-
$ clang -O3 -fno-unroll-loops -S -o -

bar:# @bar
xor eax, eax
ret

[Bug tree-optimization/108915] invalid pointer access preserved in optimized code

2023-02-23 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108915

AK  changed:

   What|Removed |Added

 Resolution|INVALID |FIXED

--- Comment #4 from AK  ---
Adding `__attribute__((used))` also fixed it. Does it reflect the same behavior
as using `asm` as you suggested?

[Bug c++/109017] ICE on unexpanded pack from C++20 explicit-template-parameter lambda syntax

2023-03-04 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109017

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #1 from AK  ---
Example from twitter: https://twitter.com/seanbax/status/1631689332007337985
which had discussion on similar bug.

```
template struct outer1_t {
  void g() {
// Compiles for mysterious reasons.
int array[] { [](){
  int i = Is2;
  return i;
}.template operator()() ... };
  }
};

int main() {
  // Compiles OKAY when this is commented out.
  // ICEs when it's compiled.
  outer1_t<1, 5, 10>().g();
}
```

clang issues a compiler error: https://godbolt.org/z/7f6E55svM

```
:6:15: error: initializer contains unexpanded parameter pack 'Is2'
  int i = Is2;
```

[Bug other/92396] -ftime-trace support

2022-08-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92396

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #12 from AK  ---
I was building a giant file that takes around 100 minutes. The -ftime-report
gave nothing useful to find out hotspots. It is also not clear what we are
reporting here as there is no documentation for it in man gcc. The %ages don't
add up to 100 and that makes it confusing.

I'm wondering if making this task a GSoC project will get more attention?

[Bug libstdc++/78717] no definition of string::find when lowered to gimple

2022-08-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78717

--- Comment #3 from AK  ---
Even with a high inline limit, string::find didn't inline.

g++-11.0.2 -O3  -finline-limit=10  -S -o a.s s.cpp

cat a.s

```
_Z3fooRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES6_i:
.LFB1240:
.cfi_startproc
endbr64
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq8(%rsi), %rcx
movslq  %edx, %rbx
xorl%edx, %edx
movq(%rsi), %rsi
call   
_ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE4findEPKcmm@PLT
cmpq%rax, %rbx
popq%rbx
.cfi_def_cfa_offset 8
sete%al
movzbl  %al, %eax
ret

```

[Bug rtl-optimization/82889] Unnecessary sign extension of int32 to int64

2022-08-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82889

--- Comment #4 from AK  ---
Seems like clang doesn't sign extend.

$ clang -O3 -std=c++14 -g0

```
.text
.intel_syntax noprefix
.file   "example.cpp"
.globl  lol(int*, int*, unsigned int, unsigned int)   
# -- Begin function lol(int*, int*, unsigned int, unsigned int)
.p2align4, 0x90
.type   lol(int*, int*, unsigned int, unsigned int),@function
lol(int*, int*, unsigned int, unsigned int):   #
@lol(int*, int*, unsigned int, unsigned int)
.cfi_startproc
# %bb.0:
# kill: def $edx killed $edx def $rdx
and edx, ecx
mov r8d, ecx
mov ecx, 1
jmp .LBB0_1
.p2align4, 0x90
.LBB0_4:#   in Loop: Header=BB0_1 Depth=1
testal, 1
jne .LBB0_5
.LBB0_7:#   in Loop: Header=BB0_1 Depth=1
add edx, ecx
and edx, r8d
inc rcx
.LBB0_1:# =>This Inner Loop Header: Depth=1
mov eax, dword ptr [rsi + 4*rdx]
testeax, eax
js  .LBB0_4
# %bb.2:#   in Loop: Header=BB0_1 Depth=1
cmp dword ptr [rdi + 4*rax], 42
jne .LBB0_7
# %bb.3:
mov eax, 1
ret
.LBB0_5:
xor eax, eax
ret
.Lfunc_end0:
.size   lol(int*, int*, unsigned int, unsigned int),
.Lfunc_end0-lol(int*, int*, unsigned int, unsigned int)
.cfi_endproc
# -- End function
.ident  "clang version 16.0.0 (https://github.com/llvm/llvm-project.git
5e22ef3198d1686f7978dd150a3eefad4f737bfc)"
.section".note.GNU-stack","",@progbits
.addrsig
```


$ gcc -O3 -std=c++14 -g0

```
lol(int*, int*, unsigned int, unsigned int):
and edx, ecx
mov r8d, 1
mov ecx, ecx
jmp .L5
.L10:
cmp DWORD PTR [rdi+rax*4], 42
je  .L9
.L4:
add rdx, r8
add r8, 1
and rdx, rcx
.L5:
movsx   rax, DWORD PTR [rsi+rdx*4] <--- sign extend
testeax, eax
jns .L10
testal, 1
je  .L4
xor eax, eax
ret
.L9:
mov eax, 1
ret
```

[Bug rtl-optimization/82889] Unnecessary sign extension of int32 to int64

2022-08-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82889

--- Comment #5 from AK  ---
Link to compiler explorer: https://godbolt.org/z/dGYG4dG15

[Bug c++/87628] Redundant check of pointer when operator delete is called

2022-09-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87628

--- Comment #3 from AK  ---
Still happening with gcc trunk.

https://godbolt.org/z/5K94665GK

[Bug c++/106991] New: new+delete pair not optimized by g++ at -O3 but optimized at -Os

2022-09-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106991

Bug ID: 106991
   Summary: new+delete pair not optimized by g++ at -O3 but
optimized at -Os
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

https://godbolt.org/z/PeYcoqTKn

---
#include
#include

int volatile gv = 0;

void* operator new(long unsigned sz ) {
++gv;
return malloc( sz );
}

void operator delete(void *p) noexcept {
--gv;
free(p);
}

class c {
int l;
public:
c() : l(0) {}
int get(){ return l; }
};

int caller( void ){
c *f = new c();
assert( f->get() == 0 );
delete f;
return gv;
}
---

$ g++ -std=c++20 -O3

operator new(unsigned long):
mov eax, DWORD PTR gv[rip]
add eax, 1
mov DWORD PTR gv[rip], eax
jmp malloc
operator delete(void*):
mov eax, DWORD PTR gv[rip]
sub eax, 1
mov DWORD PTR gv[rip], eax
jmp free
caller():
sub rsp, 8
mov eax, DWORD PTR gv[rip]
mov edi, 4
add eax, 1
mov DWORD PTR gv[rip], eax
callmalloc
mov esi, 4
mov rdi, rax
calloperator delete(void*, unsigned long)
mov eax, DWORD PTR gv[rip]
add rsp, 8
ret
gv:
.zero   4
---
$ g++ -std=c++20 -Os

operator new(unsigned long):
mov eax, DWORD PTR gv[rip]
inc eax
mov DWORD PTR gv[rip], eax
jmp malloc
operator delete(void*):
mov eax, DWORD PTR gv[rip]
dec eax
mov DWORD PTR gv[rip], eax
jmp free
caller():
mov eax, DWORD PTR gv[rip]
ret
gv:
.zero   4

[Bug c++/87628] Redundant check of pointer when operator delete is called

2022-09-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87628

--- Comment #4 from AK  ---
Seems like clang now added the check:

$ clang++ -Oz -fno-exceptions

if_delete(char*): # @if_delete(char*)
testrdi, rdi
jne operator delete(void*)@PLT  # TAILCALL
ret

[Bug ipa/106991] new+delete pair not optimized by g++ at -O3 but optimized at -Os

2022-09-21 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106991

--- Comment #3 from AK  ---
Thanks for identifying the underlying issue @Jan 
After modifying the definition of operator delete. gcc does optimize it at -O3
as well.

https://godbolt.org/z/1WPqaWrEr

// source code
#include
#include

int volatile gv = 0;

void* operator new(long unsigned sz ) {
++gv;
return malloc( sz );
}

void operator delete(void *p, unsigned long) noexcept {
--gv;
free(p);
}

class c {
int l;
public:
c() : l(0) {}
int get(){ return l; }
};

int caller( void ){
c *f = new c();
assert( f->get() == 0 );
delete f;
return gv;
}

$ $ g++ -std=c++20 -O3

```
operator new(unsigned long):
mov eax, DWORD PTR gv[rip]
add eax, 1
mov DWORD PTR gv[rip], eax
jmp malloc
operator delete(void*, unsigned long):
mov eax, DWORD PTR gv[rip]
sub eax, 1
mov DWORD PTR gv[rip], eax
jmp free
caller():
mov eax, DWORD PTR gv[rip]
add eax, 1
mov DWORD PTR gv[rip], eax
mov eax, DWORD PTR gv[rip]
sub eax, 1
mov DWORD PTR gv[rip], eax
mov eax, DWORD PTR gv[rip]
ret
gv:
.zero   4
```

[Bug tree-optimization/107005] New: gcc not exploiting undefined behavior to optimize away the result of division

2022-09-21 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107005

Bug ID: 107005
   Summary: gcc not exploiting undefined behavior to optimize away
the result of division
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include 
int main() { return INT_MIN / -1; }

gcc -O2

main:
mov eax, -2147483648
ret


clang -O2

main:   # @main
ret



https://godbolt.org/z/Tjxx3KGdK

[Bug tree-optimization/95565] [Feature request] add a flag to only instrument function entry.

2022-09-21 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95565

--- Comment #2 from AK  ---
clang has `-finstrument-function-entry-bare` to this effect: 
https://reviews.llvm.org/D40276

[Bug tree-optimization/107011] New: instruction with undefined behavior not optimized away

2022-09-22 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107011

Bug ID: 107011
   Summary: instruction with undefined behavior not optimized away
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include

int main() {
return INT_MIN / -1;
}


$ gcc -O3

main:
mov eax, -2147483648
ret


$ clang -O3

main:   # @main
ret


https://godbolt.org/z/393EMqs1E

PS: I reported this bug yesterday as well but for some reason it does not
appear in bugzilla so I'm creating another one.

[Bug tree-optimization/107011] instruction with undefined behavior not optimized away

2022-09-22 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107011

--- Comment #2 from AK  ---
ah ok. sorry for the noise.

[Bug rtl-optimization/107063] New: [X86_64 codegen] Using inc eax instead of inc dword ptr

2022-09-27 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107063

Bug ID: 107063
   Summary: [X86_64 codegen] Using inc eax instead of inc dword
ptr
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

int volatile gv = 0;

void foo() {
++gv;
}


$ gcc -Os

foo():
mov eax, DWORD PTR gv[rip]
inc eax
mov DWORD PTR gv[rip], eax
ret
gv:
.zero   4


$ clang -Os

foo():# @foo()
inc dword ptr [rip + gv]
ret
gv:
.long   0   


https://godbolt.org/z/vzq4jr5vj

[Bug tree-optimization/85611] Suboptimal code generation for (potentially) redundant atomic loads

2022-09-29 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85611

AK  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #2 from AK  ---
Don't remember what I was expecting.

[Bug c++/107335] New: call to throw_bad_cast even with -fno-exceptions

2022-10-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335

Bug ID: 107335
   Summary: call to throw_bad_cast even with -fno-exceptions
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

Testcase:
#include

void foo() {
std::cout << std::endl;
}

$ g++ -std=c++17 -O3 -fno-exceptions

```asm
foo():
mov rax, QWORD PTR std::cout[rip]
pushrbx
mov rax, QWORD PTR [rax-24]
mov rbx, QWORD PTR std::cout[rax+240]
testrbx, rbx
je  .L10
cmp BYTE PTR [rbx+56], 0
je  .L5
movsx   esi, BYTE PTR [rbx+67]
.L6:
mov edi, OFFSET FLAT:std::cout
callstd::basic_ostream >::put(char)
pop rbx
mov rdi, rax
jmp std::basic_ostream >::flush()
.L5:
mov rdi, rbx
callstd::ctype::_M_widen_init() const
mov rax, QWORD PTR [rbx]
mov esi, 10
mov rax, QWORD PTR [rax+48]
cmp rax, OFFSET FLAT:_ZNKSt5ctypeIcE8do_widenEc
je  .L6
mov rdi, rbx
callrax
movsx   esi, al
jmp .L6
.L10: 
callstd::__throw_bad_cast() <--- call to __throw_bad_cast
_GLOBAL__sub_I_foo():
sub rsp, 8
mov edi, OFFSET FLAT:_ZStL8__ioinit
callstd::ios_base::Init::Init() [complete object constructor]
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:_ZStL8__ioinit
mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
add rsp, 8
jmp __cxa_atexit
```

[Bug libstdc++/107335] call to throw_bad_cast even with -fno-exceptions

2022-10-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335

--- Comment #4 from AK  ---
I wasn't sure if this is expected. Thanks for clarifying.

[Bug libstdc++/107335] call to throw_bad_cast even with -fno-exceptions

2022-10-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335

--- Comment #5 from AK  ---
Is this the definition of throw_bad_cast?

https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/cp/rtti.c#L221

[Bug libstdc++/107335] call to throw_bad_cast even with -fno-exceptions

2022-10-20 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335

AK  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #7 from AK  ---
not a bug

[Bug c++/105796] New: error: no matching function for call with template function

2022-05-31 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105796

Bug ID: 105796
   Summary: error: no matching function for call with template
function
   Product: gcc
   Version: 12.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

test.cpp
```
int func(int, char);

template
int testFunc(int (*)(TArgs..., char));

int x = testFunc(func);
```

With gcc trunk:
g++ -std=c++20 test.cpp -c


:6:22: error: no matching function for call to 'testFunc(int
(&)(int, char))'
6 | int x = testFunc(func);
  | ~^~
:4:5: note: candidate: 'template int testFunc(int
(*)(TArgs ..., char))'
4 | int testFunc(int (*)(TArgs..., char));
  | ^~~~
:4:5: note:   template argument deduction/substitution failed:
:6:22: note:   mismatched types 'char' and 'int'
6 | int x = testFunc(func);
  | ~^~
Compiler returned: 1

[Bug tree-optimization/105830] New: call to memcpy when -nostdlib -nodefaultlibs flags provided

2022-06-02 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105830

Bug ID: 105830
   Summary: call to memcpy when -nostdlib -nodefaultlibs flags
provided
   Product: gcc
   Version: 12.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

https://godbolt.org/z/jTEa6ajn3


```
// test.c

// Type your code here, or load an example.
/* Nonzero if either X or Y is not aligned on a "long" boundary.  */
#define UNALIGNED(X, Y) \
  (((unsigned long)X & (sizeof (unsigned long) - 1)) | ((unsigned long)Y &
(sizeof (unsigned long) - 1)))

  #define UNALIGNED1(a) \
((unsigned long)(a) & (sizeof(unsigned long)-1))

/* How many bytes are copied each iteration of the 4X unrolled loop.  */
#define BIGBLOCKSIZE(sizeof (unsigned long) * 4)

/* How many bytes are copied each iteration of the word copy loop.  */
#define LITTLEBLOCKSIZE (sizeof (unsigned long))

/* Threshhold for punting to the byte copier.  */
#define TOO_SMALL(LEN)  ((LEN) < BIGBLOCKSIZE)

void * memcpy (void *__restrict dst0,
const void *__restrict src0,
unsigned long len0)
{
  unsigned char *dst = dst0;
  const unsigned char *src = src0;


  /* If the size is small, or either SRC or DST is unaligned,
 then punt into the byte copy loop.  This should be rare.  */
  if (len0 >= LITTLEBLOCKSIZE && !UNALIGNED (src, dst))
{
unsigned long *aligned_dst;
const unsigned long *aligned_src;
  aligned_dst = (unsigned long*)dst;
  aligned_src = (const unsigned long*)src;

  /* Copy one long word at a time if possible.  */


  /* Copy one long word at a time if possible.  */
  do
{
  *aligned_dst++ = *aligned_src++;
  len0 -= LITTLEBLOCKSIZE;
} while (len0 >= LITTLEBLOCKSIZE);

   /* Pick up any residual with a byte copier.  */
  dst = (unsigned char*)aligned_dst;
  src = (const unsigned char*)aligned_src;
}

  for (; len0; len0--)
*dst++ = *src++;

  return dst0;
}

// ARM gcc trunk
gcc -O3 -nostdlib -nodefaultlibs -S -o -

memcpy:
push{r3, r4, r5, r6, r7, lr}
cmp r2, #3
mov r4, r2
mov r5, r0
mov r6, r1
bls .L5
orr r3, r0, r1
lslsr3, r3, #30
beq .L9
.L3:
mov r2, r4
mov r1, r6
bl  memcpy ; <- call to memcpy
mov r0, r5
pop {r3, r4, r5, r6, r7, pc}
.L9:
subsr7, r2, #4
and r4, r2, #3
bic r7, r7, #3
addsr7, r7, #4
mov r2, r7
add r6, r6, r7
bl  memcpy ; <- call to memcpy
addsr0, r5, r7
.L5:
cmp r4, #0
bne .L3
mov r0, r5
pop {r3, r4, r5, r6, r7, pc}

[Bug tree-optimization/105830] call to memcpy when -nostdlib -nodefaultlibs flags provided

2022-06-04 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105830

--- Comment #3 from AK  ---
with -ffreestanding the calls to memcpy did disappear. Thanks.

[Bug libstdc++/80331] unused const std::string not optimized away

2022-06-04 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331

--- Comment #9 from AK  ---
can't repro this with gcc 12.1 Seems like this is fixed?

https://godbolt.org/z/e6n94zK4E

[Bug c++/114342] New: suboptimal codegen of vector::vector(range)

2024-03-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114342

Bug ID: 114342
   Summary: suboptimal codegen of vector::vector(range)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include
#include 

std::vector td() {
  int arr[]{-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,
15, -5, 10, 15,-5, 10, 15 -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5,
10, 15, -5, 10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5,
10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5,
10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,-5, 10,
15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,};
  auto b = std::ranges::begin(arr);
  auto e = std::ranges::end(arr);
  std::vector dd(b, e);
  return dd;
}

What is the reason for calling `rep movsq` twice?

$ gcc -O3 -std=c++23
```
td():
pushrbp
mov esi, OFFSET FLAT:.LC0
mov ecx, 55
pxorxmm0, xmm0
pushrbx
mov rbx, rdi
sub rsp, 456
mov QWORD PTR [rbx+16], 0
mov rbp, rsp
movups  XMMWORD PTR [rbx], xmm0
mov rdi, rbp
rep movsq
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdi], eax
mov edi, 444
calloperator new(unsigned long)
lea rdx, [rax+444]
mov QWORD PTR [rbx], rax
lea rdi, [rax+8]
mov rsi, rbp
mov QWORD PTR [rbx+16], rdx
mov rcx, QWORD PTR [rsp]
and rdi, -8
mov QWORD PTR [rax], rcx
mov rcx, QWORD PTR [rsp+436]
mov QWORD PTR [rax+436], rcx
sub rax, rdi
sub rsi, rax
add eax, 444
shr eax, 3
mov ecx, eax
mov rax, rbx
rep movsq
mov QWORD PTR [rbx+8], rdx
add rsp, 456
pop rbx
pop rbp
ret
mov rbp, rax
jmp .L2
td() [clone .cold]:
.L2:
mov rdi, QWORD PTR [rbx]
mov rsi, QWORD PTR [rbx+16]
sub rsi, rdi
testrdi, rdi
je  .L3
calloperator delete(void*, unsigned long)
.L3:
mov rdi, rbp
call_Unwind_Resume
```

https://godbolt.org/z/5333db8Px

[Bug middle-end/114342] suboptimal codegen of vector::vector(range)

2024-03-19 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114342

AK  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
Version|unknown |14.0
 Status|NEW |RESOLVED

--- Comment #3 from AK  ---
I see. marking as duplicate. Thanks for clarifying!

*** This bug has been marked as a duplicate of bug 59863 ***

[Bug middle-end/59863] const array in function is placed on stack

2024-03-19 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59863

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #9 from AK  ---
*** Bug 114342 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/107263] Memcpy not elided when initializing struct

2024-03-19 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107263

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #3 from AK  ---
Seems like a duplicate of #59863 ?

[Bug c++/110819] New: Missed optimization: when vector size is 0 but vector::reserve has been called.

2023-07-26 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110819

Bug ID: 110819
   Summary: Missed optimization: when vector size is 0 but
vector::reserve has been called.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include

void f(int);

void use_idx_const_size_reserve() {
std::vector v;
v.reserve(10);
auto s = v.size();
for (std::vector::size_type i = 0; i < s; i++)
f(v[i]);
}

$ g++ -O3

use_idx_const_size_reserve():
sub rsp, 8
mov edi, 40
calloperator new(unsigned long)
mov esi, 40
add rsp, 8
mov rdi, rax
jmp operator delete(void*, unsigned long)


$ clang++ -O3 -stdlib=libc++

use_idx_const_size_reserve():# @use_idx_const_size_reserve()
ret

[Bug tree-optimization/110819] Missed optimization: when vector's size is 0 but vector::reserve has been called.

2023-07-28 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110819

--- Comment #2 from AK  ---
> When compiled with clang, libstdc++'s std::vector uses __builtin_operator_new 
> which always has the -fassume-sane-operator-new semantics, and so can be 
> optimized.

yes clang optimizes with libstdc++ as well. what can be done in gcc for it to
detect that the new+delete pair can be optimized away?

[Bug c++/110137] implement clang -fassume-sane-operator-new

2023-08-01 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110137

--- Comment #3 from AK  ---
1. clang also has noalias on nothrow versions of operator new. will
`-fassume-sane-operator-new` enable that as well?

2. as per: http://eel.is/c++draft/basic.stc.dynamic#allocation-2 

"""If the request succeeds, the value returned by a replaceable allocation
function is a non-null pointer value ([basic.compound]) p0 different from any
previously returned value p1, unless that value p1 was subsequently passed to a
replaceable deallocation function."""

Does this mean that all successful new allocations can be assumed to be a
noalias as long as the pointer wasn't passed to a deallocation function? In
that case when possible, can the compiler `infer` from a bottom-up analysis
that an allocation is a noalias?

[Bug c++/110909] New: Suboptimal codegen in vector copy assignment

2023-08-04 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110909

Bug ID: 110909
   Summary: Suboptimal codegen in vector copy assignment
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include

using Container = std::vector;
int copy_assignment(const Container &v1, Container &v2) {
  v2 = v1;
  return 0;
}


I'd expect this to only generate a memcpy. but i'm not sure why memmoves are
generated?

$ gcc -std=c++2a -O3  -fno-exceptions

copy_assignment(std::vector > const&, std::vector >&):
cmp rsi, rdi
je  .L21
pushr13
pushr12
pushrbp
mov rbp, rdi
pushrbx
mov rbx, rsi
sub rsp, 8
mov rax, QWORD PTR [rdi+8]
mov r13, QWORD PTR [rdi]
mov rdx, QWORD PTR [rsi+16]
mov rdi, QWORD PTR [rsi]
mov r12, rax
sub r12, r13
sub rdx, rdi
cmp rdx, r12
jb  .L25
mov rcx, QWORD PTR [rsi+8]
mov rdx, rcx
sub rdx, rdi
cmp rdx, r12
jnb .L26
cmp rdx, 4
jle .L12
mov rsi, r13
callmemmove
mov rcx, QWORD PTR [rbx+8]
mov rdi, QWORD PTR [rbx]
mov rax, QWORD PTR [rbp+8]
mov r13, QWORD PTR [rbp+0]
mov rdx, rcx
sub rdx, rdi
.L13:
lea rsi, [r13+0+rdx]
sub rax, rsi
mov rdx, rax
cmp rax, 4
jle .L14
mov rdi, rcx
callmemmove
mov rax, QWORD PTR [rbx]
add rax, r12
.L8:
mov QWORD PTR [rbx+8], rax
add rsp, 8
xor eax, eax
pop rbx
pop rbp
pop r12
pop r13
ret
.L21:
xor eax, eax
ret
.L25:
movabs  rax, 9223372036854775804
cmp rax, r12
jb  .L27
mov rdi, r12
calloperator new(unsigned long)
mov rbp, rax
cmp r12, 4
jle .L5
mov rdx, r12
mov rsi, r13
mov rdi, rax
callmemcpy
.L6:
mov rdi, QWORD PTR [rbx]
testrdi, rdi
je  .L7
mov rsi, QWORD PTR [rbx+16]
sub rsi, rdi
calloperator delete(void*, unsigned long)
.L7:
lea rax, [rbp+0+r12]
mov QWORD PTR [rbx], rbp
mov QWORD PTR [rbx+16], rax
jmp .L8
.L26:
cmp r12, 4
jle .L10
mov rdx, r12
mov rsi, r13
callmemmove
mov rax, QWORD PTR [rbx]
add rax, r12
jmp .L8
.L14:
lea rax, [rdi+r12]
jne .L8
mov edx, DWORD PTR [rsi]
mov DWORD PTR [rcx], edx
jmp .L8
.L12:
jne .L13
mov esi, DWORD PTR [r13+0]
mov DWORD PTR [rdi], esi
jmp .L13
.L10:
lea rax, [rdi+r12]
jne .L8
mov edx, DWORD PTR [r13+0]
mov DWORD PTR [rdi], edx
jmp .L8
.L5:
mov eax, DWORD PTR [r13+0]
mov DWORD PTR [rbp+0], eax
jmp .L6
.L27:
callstd::__throw_bad_array_new_length()



Ideally, the above C++ code should translate to an equivalent of the following
C++ code:

using Container = std::vector;
int copy_assignment(const Container &v1, Container &v2) {
  v2.reserve(v1.size());
  std::memcpy(&v2[0], &v1[0], v1.size()*sizeof(int));
  // change the size: v2.size() = v1.size()
  return 0;
}

[Bug tree-optimization/111393] New: ICE: Segmentation fault src/gcc/toplev.cc:314

2023-09-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393

Bug ID: 111393
   Summary: ICE: Segmentation fault src/gcc/toplev.cc:314
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

riscv64-gnu-linux (version Debian 13.1) building llvm-project
(GlobalModuleIndex.cpp) crashed with ICE.

src/gcc/toplev.cc:314
profile_count::operator==(proile_count const&) const
 ../../src/gcc/profile-count.h:865
profile_count::apply_probability(proile_probability) const
 ../../src/gcc/profile-count.h:1104

[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314

2023-09-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393

--- Comment #1 from AK  ---
oot/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build# ninja clang
check-clang
[100/845] Building CXX object
tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o
FAILED:
tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o
 
/usr/bin/c++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE
-D_LIBCPP_ENABLE_HARDENED_MODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS
-D__STDC_LIMIT_MACROS
-I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/tools/clang/lib/Serialization
-I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/lib/Serialization
-I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/include
-I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/tools/clang/include
-I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/include
-I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/llvm/include
-fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time
-fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings
-Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long
-Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-nonnull
-Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move
-Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment
-Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color
-ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual
-fno-strict-aliasing  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG
-std=c++17 -MD -MT
tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o
-MF
tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o.d
-o
tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o
-c
/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/lib/Serialization/GlobalModuleIndex.cpp
In file included from
/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/llvm/include/llvm/ADT/DenseMapInfo.h:20,
 from
/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/llvm/include/llvm/ADT/DenseMap.h:17,
 from
/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/include/clang/Serialization/GlobalModuleIndex.h:18,
 from
/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/lib/Serialization/GlobalModuleIndex.cpp:13:
/usr/include/c++/13/tuple: In instantiation of ‘struct std::_Tuple_impl<0,
clang::ModuleFileExtensionReader*,
std::default_delete >’:
/usr/include/c++/13/tuple:1232:11:   required from ‘class
std::tuple >’
/usr/include/c++/13/bits/unique_ptr.h:232:27:   required from ‘class
std::__uniq_ptr_impl >’
/usr/include/c++/13/bits/unique_ptr.h:239:12:   required from ‘struct
std::__uniq_ptr_data, true, true>’
/usr/include/c++/13/bits/unique_ptr.h:283:33:   required from ‘class
std::unique_ptr’
/usr/include/c++/13/bits/stl_vector.h:367:35:   required from
‘std::_Vector_base<_Tp, _Alloc>::~_Vector_base() [with _Tp =
std::unique_ptr; _Alloc =
std::allocator >]’
/usr/include/c++/13/bits/stl_vector.h:528:7:   required from here
/usr/include/c++/13/tuple:269:7: internal compiler error: Segmentation fault
  269 |   _M_head(_Tuple_impl& __t) noexcept { return _Base::_M_head(__t);
}
  |   ^~~
0x85d7c5 crash_signal
../../src/gcc/toplev.cc:314
0xa0d5e0 profile_count::operator==(profile_count const&) const
../../src/gcc/profile-count.h:865
0xa0d5e0 profile_count::apply_probability(profile_probability) const
../../src/gcc/profile-count.h:1104
0xa0d5e0 edge_def::count() const
../../src/gcc/basic-block.h:639
0xa0d5e0 eliminate_tail_call
../../src/gcc/tree-tailcall.cc:982
0xa0d5e0 optimize_tail_call
../../src/gcc/tree-tailcall.cc:1053
0xa0d5e0 tree_optimize_tail_calls_1
../../src/gcc/tree-tailcall.cc:1193
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314

2023-09-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393

--- Comment #3 from AK  ---
gcc -v

COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/riscv64-linux-gnu/13/lto-wrapper
Target: riscv64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 13.1.0-6'
--with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --prefix=/usr
--with-gcc-major-version-only --program-suffix=-13
--program-prefix=riscv64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-libitm --disable-libquadmath
--disable-libquadmath-support --enable-plugin --enable-default-pie
--with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --disable-multilib --with-arch=rv64gc --with-abi=lp64d
--enable-checking=release --build=riscv64-linux-gnu --host=riscv64-linux-gnu
--target=riscv64-linux-gnu --with-build-config=bootstrap-lto-lean
--enable-link-serialization=32
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.1.0 (Debian 13.1.0-6) 
root@lpi4a:/media/root/d2fc9f48-c166-4a9

[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314

2023-09-12 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393

--- Comment #5 from AK  ---
Created attachment 55890
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55890&action=edit
GlobalModuleIndex.cpp preprocessed files

Everytime the crash is in a different file. it could be just because of memory
issues.

[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314

2023-09-13 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393

--- Comment #8 from AK  ---

> this does seem like a HW issue. Are you sure you have a decent RISCV machine 
> without any memory issues?
> I suspect ninja is building with all of the cores which pushes the memory 
> usage high.

possible. I have the https://sipeed.com/licheepi4a (licheepi 4a board)


> Maybe lower the clock speed of the CPU you are using.

will do. thanks

[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314

2023-09-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393

--- Comment #9 from AK  ---
i think it is okay to close this bug as this doesn't seem to be related to gcc.

[Bug c/111420] New: relocation truncated to fit: R_RISCV_JAL against `.L12287'

2023-09-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420

Bug ID: 111420
   Summary: relocation truncated to fit: R_RISCV_JAL against
`.L12287'
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

CGBuiltin.cpp:(.text._ZN5clang7CodeGen15CodeGenFunction20EmitRISCVBuiltinExprEjPKNS_8CallExprENS0_15ReturnValueSlotE+0x10d0):
relocation truncated to fit: R_RISCV_JAL against `.L12287'


command:


: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition
-fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra
-Wno-unused-parameter -Wwrite-strings -Wcast-qual
-Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough
-Wno-maybe-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move
-Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor
-Wsuggest-override -Wno-comment -Wno-misleading-indentation
-Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections
-fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing 
-Wl,-z,defs -Wl,-z,nodelete  
-Wl,-rpath-link,/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/./lib
 -Wl,--gc-sections -shared -Wl,-soname,libclangCodeGen.so.18git -o
lib/libclangCodeGen.so.18git
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/ABIInfo.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/ABIInfoImpl.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/BackendUtil.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGAtomic.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGBlocks.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGBuiltin.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCUDANV.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCUDARuntime.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCXX.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCXXABI.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCall.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGClass.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCleanup.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCoroutine.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGDebugInfo.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGDecl.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGDeclCXX.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGException.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExpr.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprAgg.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprCXX.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprComplex.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprConstant.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprScalar.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGGPUBuiltin.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGHLSLRuntime.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGLoopInfo.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGNonTrivialStruct.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjC.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjCGNU.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjCMac.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjCRuntime.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGOpenCLRuntime.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGOpenMPRuntime.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGOpenMPRuntimeGPU.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGRecordLayoutBuilder.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGStmt.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGStmtOpenMP.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGVTT.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGVTables.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenABITypes.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenAction.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenFunction.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenModule.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenPGO.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenTBAA.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenTypes.cpp.o
tools/clang/lib/CodeGen/CMakeFiles/ob

[Bug c/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'

2023-09-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420

--- Comment #1 from AK  ---
I got this error while building clang (ninja clang) on a riscv machine.

root@lpi4a:~# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/riscv64-linux-gnu/13/lto-wrapper
Target: riscv64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 13.1.0-6'
--with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --prefix=/usr
--with-gcc-major-version-only --program-suffix=-13
--program-prefix=riscv64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-libitm --disable-libquadmath
--disable-libquadmath-support --enable-plugin --enable-default-pie
--with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --disable-multilib --with-arch=rv64gc --with-abi=lp64d
--enable-checking=release --build=riscv64-linux-gnu --host=riscv64-linux-gnu
--target=riscv64-linux-gnu --with-build-config=bootstrap-lto-lean
--enable-link-serialization=32
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.1.0 (Debian 13.1.0-6) 

--

root@lpi4a:~# uname -a
Linux lpi4a 5.10.113-g7b352f5ac2ba #1 SMP PREEMPT Wed Apr 12 12:06:11 UTC 2023
riscv64 GNU/Linux

[Bug target/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'

2023-09-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420

--- Comment #4 from AK  ---
good catch. By mistake i built at -O0, i wanted to build at -O3.

[Bug target/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'

2023-09-14 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420

AK  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |MOVED

--- Comment #5 from AK  ---
Created: https://sourceware.org/bugzilla/show_bug.cgi?id=30855

[Bug target/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'

2023-09-15 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420

--- Comment #6 from AK  ---
To confirm what Andrew mentioned, the release build (-O3) built successfully.

[Bug tree-optimization/108915] invalid pointer access preserved in optimized code

2023-03-23 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108915

--- Comment #6 from AK  ---
For reference, I had opened a related bug in clang:
https://github.com/llvm/llvm-project/issues/60967

[Bug tree-optimization/109440] New: Missed optimization of vector::at when a function is called inside the loop

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109440

Bug ID: 109440
   Summary: Missed optimization of vector::at when a function is
called inside the loop
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include
#include
using namespace std;

bool bar();

using T = int;

T vat(std::vector v) {
T s;
for (auto i = 0; i < v.size(); ++i) {
if (bar())
s += v.at(i);
}

return s;
}


$ gcc -O2 -fexceptions -fno-unroll-loops


.LC0:
.string "vector::_M_range_check: __n (which is %zu) >= this->size()
(which is %zu)"
vat(std::vector >):
mov rax, QWORD PTR [rdi]
cmp QWORD PTR [rdi+8], rax
je  .L9
pushr12
pushrbp
mov rbp, rdi
pushrbx
xor ebx, ebx
jmp .L6
.L14:
mov rax, QWORD PTR [rbp+8]
sub rax, QWORD PTR [rbp+0]
add rbx, 1
sar rax, 2
cmp rbx, rax
jnb .L13
.L6:
callbar()
testal, al
je  .L14
mov rcx, QWORD PTR [rbp+0]
mov rdx, QWORD PTR [rbp+8]
sub rdx, rcx
sar rdx, 2
mov rax, rdx
cmp rbx, rdx
jnb .L15
add r12d, DWORD PTR [rcx+rbx*4]
add rbx, 1
cmp rbx, rax
jb  .L6
.L13:
mov eax, r12d
pop rbx
pop rbp
pop r12
ret
.L9:
mov eax, r12d
ret
.L15:
mov rsi, rbx
mov edi, OFFSET FLAT:.LC0
xor eax, eax
callstd::__throw_out_of_range_fmt(char const*, ...)

[Bug tree-optimization/109441] New: missed optimization when all elements of vector are known

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109441

Bug ID: 109441
   Summary: missed optimization when all elements of vector are
known
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

Reference: https://godbolt.org/z/af4x6zhz9

When all elements of vector are 0, then the compiler should be able to remove
the loop and just return 0.

Testcase:

#include
using namespace std;

using T = int;


T v() {
T s;
std::vector v;
v.resize(1000, 0);
for (auto i = 0; i < v.size(); ++i) {
s += v[i];
}

return s;
}



$ g++ -O3 -std=c++17

.LC0:
  .string "vector::_M_fill_insert"
v():
  push rbx
  pxor xmm0, xmm0
  mov edx, 1000
  xor esi, esi
  sub rsp, 48
  lea rcx, [rsp+12]
  lea rdi, [rsp+16]
  mov QWORD PTR [rsp+32], 0
  mov DWORD PTR [rsp+12], 0
  movaps XMMWORD PTR [rsp+16], xmm0
  call std::vector
>::_M_fill_insert(__gnu_cxx::__normal_iterator > >, unsigned long, int const&)
  mov rdx, QWORD PTR [rsp+24]
  mov rdi, QWORD PTR [rsp+16]
  mov rax, rdx
  sub rax, rdi
  mov rsi, rax
  sar rsi, 2
  cmp rdx, rdi
  je .L99
  test rax, rax
  mov ecx, 1
  cmovne rcx, rsi
  cmp rax, 12
  jbe .L107
  mov rdx, rcx
  pxor xmm0, xmm0
  mov rax, rdi
  shr rdx, 2
  sal rdx, 4
  add rdx, rdi
.L101:
  movdqu xmm2, XMMWORD PTR [rax]
  add rax, 16
  paddd xmm0, xmm2
  cmp rdx, rax
  jne .L101
  movdqa xmm1, xmm0
  psrldq xmm1, 8
  paddd xmm0, xmm1
  movdqa xmm1, xmm0
  psrldq xmm1, 4
  paddd xmm0, xmm1
  movd ebx, xmm0
  test cl, 3
  je .L99
  and rcx, -4
  mov eax, ecx
.L100:
  lea edx, [rax+1]
  add ebx, DWORD PTR [rdi+rcx*4]
  movsx rdx, edx
  cmp rdx, rsi
  jnb .L99
  add eax, 2
  lea rcx, [0+rdx*4]
  add ebx, DWORD PTR [rdi+rdx*4]
  cdqe
  cmp rax, rsi
  jnb .L99
  add ebx, DWORD PTR [rdi+4+rcx]
.L99:
  test rdi, rdi
  je .L98
  mov rsi, QWORD PTR [rsp+32]
  sub rsi, rdi
  call operator delete(void*, unsigned long)
.L98:
  add rsp, 48
  mov eax, ebx
  pop rbx
  ret
.L107:
  xor eax, eax
  xor ecx, ecx
  jmp .L100
  mov rbx, rax
  jmp .L105
v() [clone .cold]:

[Bug tree-optimization/109441] missed optimization when all elements of vector are known

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109441

--- Comment #1 from AK  ---
I guess a better test case is this:

#include
using namespace std;

using T = int;


T v(std::vector v) {
T s;
std::fill(v.begin(), v.end(), T());
for (auto i = 0; i < v.size(); ++i) {
s += v[i];
}

return s;
}

which has similar effect.

$ g++ -O3 -std=c++17

v(std::vector >):
pushrbp
pushrbx
sub rsp, 8
mov rbp, QWORD PTR [rdi+8]
mov rcx, QWORD PTR [rdi]
cmp rcx, rbp
je  .L7
sub rbp, rcx
mov rdi, rcx
xor esi, esi
mov rbx, rcx
mov rdx, rbp
callmemset
mov rdi, rbp
mov edx, 1
mov rcx, rbx
sar rdi, 2
testrbp, rbp
cmovne  rdx, rdi
cmp rbp, 12
jbe .L8
mov rax, rdx
pxorxmm0, xmm0
shr rax, 2
sal rax, 4
add rax, rbx
.L4:
movdqu  xmm2, XMMWORD PTR [rbx]
add rbx, 16
paddd   xmm0, xmm2
cmp rbx, rax
jne .L4
movdqa  xmm1, xmm0
psrldq  xmm1, 8
paddd   xmm0, xmm1
movdqa  xmm1, xmm0
psrldq  xmm1, 4
paddd   xmm0, xmm1
movdeax, xmm0
testdl, 3
je  .L1
and rdx, -4
mov esi, edx
.L3:
add eax, DWORD PTR [rcx+rdx*4]
lea edx, [rsi+1]
movsx   rdx, edx
cmp rdx, rdi
jnb .L1
add esi, 2
lea r8, [0+rdx*4]
add eax, DWORD PTR [rcx+rdx*4]
movsx   rsi, esi
cmp rsi, rdi
jnb .L1
add eax, DWORD PTR [rcx+4+r8]
.L1:
add rsp, 8
pop rbx
pop rbp
ret
.L7:
add rsp, 8
xor eax, eax
pop rbx
pop rbp
ret
.L8:
xor eax, eax
xor esi, esi
xor edx, edx
jmp .L3

  1   2   >