[Bug c/69616] New: optimization of 8 movb

2016-02-02 Thread izaberina at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69616

Bug ID: 69616
   Summary: optimization of 8 movb
   Product: gcc
   Version: 5.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: izaberina at gmail dot com
  Target Milestone: ---

I'm on arch linux on x86_64, using gcc 5.3.0.
From this code:

char tape[65536];
void f() {
tape[0] = 0;
tape[1] = 0;
tape[2] = 0;
tape[3] = 0;
tape[4] = 0;
tape[5] = 0;
tape[6] = 0;
tape[7] = 0;
}

gcc produces 8 movb at any -O level, while clang produces 1 movq.

Why is that not being optimized?

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/5.3.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /build/gcc/src/gcc-5.3.0/configure --prefix=/usr
--libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared
--enable-threads=posix --enable-libmpx --with-system-zlib --with-isl
--enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu
--disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object
--enable-linker-build-id --enable-lto --enable-plugin
--enable-install-libiberty --with-linker-hash-style=gnu
--enable-gnu-indirect-function --disable-multilib --disable-werror
--enable-checking=release
Thread model: posix
gcc version 5.3.0 (GCC)

[Bug target/69616] optimization of 8 movb

2016-02-02 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69616

Markus Trippelsdorf  changed:

   What|Removed |Added

 Target||x86_64-*-*, i?86-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-02-02
 CC||trippels at gcc dot gnu.org
  Component|c   |target
 Ever confirmed|0   |1

--- Comment #1 from Markus Trippelsdorf  ---
https://goo.gl/k9lDZQ

[Bug target/69577] [5/6 Regression] wrong code with -fno-forward-propagate -mavx and 128bit arithmetics since r215450

2016-02-02 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69577

--- Comment #7 from rsandifo at gcc dot gnu.org  
---
(In reply to Uroš Bizjak from comment #6)
> IMO, we should revert r215450, and fix a couple of cases using narrowing
> conversions with gen_lowpart that were introduced after r215450.

Please give me a few days to look at it first.  I still think r215450 is
correct and reverting it is likely to regress code quality.

[Bug target/69616] optimization of 8 movb

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69616

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #2 from Richard Biener  ---
Generic BB vectorization could do this, I have partial patches to enable it. 
And we can combine stores on RTL.  We have several duplicate bugreports here.

[Bug target/69532] FAIL: gcc.target/arm/{vect-,}fmaxmin.c execution test on armv7

2016-02-02 Thread david.sherwood at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69532

--- Comment #4 from david.sherwood at arm dot com ---
(In reply to vries from comment #3)
> Also for the non-vect version:
> ...
> FAIL: gcc.target/arm/fmaxmin.c execution test
> ...

Hi, if you are not already fixing this, I can take a look if you want?

[Bug tree-optimization/69615] 0 to limit signed range checks don't always use unsigned compare

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69615

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-02-02
 CC||rguenth at gcc dot gnu.org
  Component|rtl-optimization|tree-optimization
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
GCC relies on some fold() routine to do this and we end up with

;; Function r0_to_imax_2 (null)
;; enabled by -tree-original


{
  if ((unsigned int) x <= 2147483645)
{
  ext ();
}
}


;; Function r0_to_imax_1 (null)
;; enabled by -tree-original


{
  if (x >= 0 && x != 2147483647)
{
  ext ();
}
}

;; Function r0_to_imax (null)
;; enabled by -tree-original


{
  if (x >= 0)
{
  ext ();
}


so it seems we are confused by the trick that triggers first, replacing
the <= INT_MAX-1 compare with a != INT_MAX compare and that not being handled
in the range construction code.

Looks like ifcombine doesn't handle it either (maybe_fold_and_comparisons).

void ext(void);
void r0_to_imax_1(int x){ if (x>=0 && x<=(__INT_MAX__-1)) ext(); }

[Bug target/69614] [6 Regression] wrong code with -Os -fno-expensive-optimizations -fschedule-insns -mtpcs-leaf-frame -fira-algorithm=priority @ armv7a

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69614

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |6.0

[Bug rtl-optimization/69606] [5/6 Regression] wrong code at -Os and above on x86_64-linux-gnu

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69606

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #2 from Richard Biener  ---
Mine.

[Bug tree-optimization/69615] 0 to limit signed range checks don't always use unsigned compare

2016-02-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69615

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Shouldn't be hard to teach the range code in reassoc about this, but stage1
material.

[Bug target/69617] New: PowerPC/e6500: Atomic byte/halfword operations not properly supported

2016-02-02 Thread sebastian.hu...@embedded-brains.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69617

Bug ID: 69617
   Summary: PowerPC/e6500: Atomic byte/halfword operations not
properly supported
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sebastian.hu...@embedded-brains.de
  Target Milestone: ---

The PowerPC/e6500 support lacks support for the load/store byte/halfword with
decoration indexed instructions:

#include 

unsigned char inc_uchar(atomic_uchar *a)
{
return atomic_fetch_add(a, 1);
}

unsigned short inc_ushort(atomic_ushort *a)
{
return atomic_fetch_add(a, 1);
}

powerpc-rtems4.12-gcc -O2 -Wall -Wextra -pedantic -mcpu=e6500 -m32 -S atomic.c
cat atomic.s
.file   "atomic.c"
.machine power4
.section".text"
.align 2
.p2align 4,,15
.globl inc_uchar
.type   inc_uchar, @function
inc_uchar:
rlwinm 8,3,3,27,28
li 7,255
xori 8,8,0x18
li 6,1
sync
slw 7,7,8
slw 6,6,8
rlwinm 9,3,0,0,29
.L2:
lwarx 3,0,9
add 5,3,6
andc 10,3,7
and 5,5,7
or 10,10,5
stwcx. 10,0,9
bne- 0,.L2
isync
srw 3,3,8
rlwinm 3,3,0,0xff
blr
.size   inc_uchar, .-inc_uchar
.align 2
.p2align 4,,15
.globl inc_ushort
.type   inc_ushort, @function
inc_ushort:
rlwinm 8,3,3,27,27
li 7,0
xori 8,8,0x10
ori 7,7,0x
li 6,1
sync
slw 7,7,8
slw 6,6,8
rlwinm 9,3,0,0,29
.L6:
lwarx 3,0,9
add 5,3,6
andc 10,3,7
and 5,5,7
or 10,10,5
stwcx. 10,0,9
bne- 0,.L6
isync
srw 3,3,8
rlwinm 3,3,0,0x
blr
.size   inc_ushort, .-inc_ushort
.ident  "GCC: (GNU) 6.0.0 20160202 (experimental)

[Bug target/69616] optimization of 8 movb

2016-02-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69616

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
Like PR22141.  Note that we probably want to deal with this at the GIMPLE level
instead of RTL, that is just too late, and handle bitfields and other adjacent
memory operations in there shortly before expansion, then at least for
bitfields go through some GIMPLE passes (ccp, forwprop?) to clean that up.

[Bug target/69532] FAIL: gcc.target/arm/{vect-,}fmaxmin.c execution test on armv7

2016-02-02 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69532

--- Comment #5 from vries at gcc dot gnu.org ---
(In reply to david.sherwood from comment #4)
> (In reply to vries from comment #3)
> > Also for the non-vect version:
> > ...
> > FAIL: gcc.target/arm/fmaxmin.c execution test
> > ...
> 
> Hi, if you are not already fixing this, I can take a look if you want?

Please do :)

Thanks,
- Tom

[Bug tree-optimization/67921] [6 Regression] "internal compiler error: in build_polynomial_chrec, at tree-chrec.h:147" when using -fsanitize=undefined

2016-02-02 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67921

amker at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from amker at gcc dot gnu.org ---
Should be fixed.

[Bug rtl-optimization/69609] [6 Regression] block reordering consumes an inordinate amount of time, REE consumes much memory

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69609

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Keywords||compile-time-hog,
   ||memory-hog
   Last reconfirmed||2016-02-02
  Component|c   |rtl-optimization
 CC||rguenth at gcc dot gnu.org
 Ever confirmed|0   |1
Summary|block reordering consumes   |[6 Regression] block
   |an inordinate amount of |reordering consumes an
   |time|inordinate amount of time,
   ||REE consumes much memory
   Target Milestone|--- |6.0

--- Comment #1 from Richard Biener  ---
Also takes a lot of memory with GCC 4.9 at least (killed at 2Gb).  GCC 5 seems
to peak at ~1.6GB.

Note that with this kind of generated code I _always_ recommend -O1.  -O2
simply
has too many quadraticnesses.

This case is a load of indirect jumps and loops and thus a very twisted CFG.
I'd say the number of BBs hits the BB reordering slowness.

First memory peak is for RTL pre (400MB), then rest_of_handle_ud_dce (800MB),
then REE (on trunk 2.1GB!).  All basically DF issues.

It looks like we leak somewhere as well.

Thus first tracking the memory-use regression towards GCC 5 at -O2.

-O1 behaves reasonably (300MB memory use, 30sec compile-time for GCC 6), only
out-lier is

 df live&initialized regs:  11.51 (39%) usr   0.08 ( 9%) sys  11.47 (37%) wall 
 0 kB ( 0%) ggc

well, we know DF is slow and memory hungry.

[Bug libgomp/69597] execution failure for libgomp.oacc-c-c++-common/atomic_capture-1.c with -flto

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69597

--- Comment #2 from Richard Biener  ---
OACC uses IPA PTA unconditionally, right?

[Bug target/69618] New: PowerPC/e6500: Atomic fence operations not properly supported

2016-02-02 Thread sebastian.hu...@embedded-brains.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69618

Bug ID: 69618
   Summary: PowerPC/e6500: Atomic fence operations not properly
supported
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sebastian.hu...@embedded-brains.de
  Target Milestone: ---

On the PowerPC/e6500 elemental synchronization operations should be used for
acquire/release barriers. Currently a lwsync is used instead:

#include 

void release(void)
{
atomic_thread_fence(memory_order_release);
}

void acquire(void)
{
atomic_thread_fence(memory_order_acquire);
}

powerpc-rtems4.12-gcc -O2 -Wall -Wextra -pedantic -mcpu=e6500 -m32 -S fence.c
cat fence.s
.file   "fence.c"
.machine power4
.section".text"
.align 2
.p2align 4,,15
.globl release
.type   release, @function
release:
lwsync
blr
.size   release, .-release
.align 2
.p2align 4,,15
.globl acquire
.type   acquire, @function
acquire:
lwsync
blr
.size   acquire, .-acquire
.ident  "GCC: (GNU) 6.0.0 20160202 (experimental)

See also e6500 Core Reference Manual, 5.5.5.2.1 (Simplified memory barrier
recommendations) and EREF: A Programmer’s Reference Manual for Freescale Power
Architecture Processors (06/2014), 7.4.8.3 (Forcing Load and Store Ordering
(Memory Barriers)).

For acquire semantic a "ESYNC 12" instruction should be used (Load-Load- and
Load-Store-Barriere) and for release semantic a "ESYNC 5" instruction
(Store-Store- and Load-Store-Barrier). See also Memory Model Rationales,
section "Why ordering constraints are never limited to loads or stores"
(www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2176.html).

The e6500 core honours in contrast to the general PowerPC architecture a
Load-Store-Ordering (EREF, 7.4.8.2 (Architecture Ordering Requirements), number
4), so maybe for acquire semantic "ESYNC 8" instruction (Load-Load-Barrier) and
for release semantic a "ESYNC 1" instruction is sufficient
(Store-Store-Barrier). See EREF, 7.4.8.4 (Architectural Memory Access
Ordering), 7.4.8.6.1 (Acquire Lock and Import Shared Memory) and 7.4.8.7.1
(Export Shared Memory and Release Lock).

[Bug libgomp/69597] execution failure for libgomp.oacc-c-c++-common/atomic_capture-1.c with -flto

2016-02-02 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69597

--- Comment #3 from vries at gcc dot gnu.org ---
(In reply to Richard Biener from comment #2)
> OACC uses IPA PTA unconditionally, right?


It uses it by default. I think -fno-ipa-pta should work as expected.

[Bug target/69614] [6 Regression] wrong code with -Os -fno-expensive-optimizations -fschedule-insns -mtpcs-leaf-frame -fira-algorithm=priority @ armv7a

2016-02-02 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69614

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-02-02
 CC||ktkachov at gcc dot gnu.org
  Known to work||4.9.4, 5.3.1
 Ever confirmed|0   |1

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Reproduced with the command line:
-Os -fno-expensive-optimizations -fschedule-insns -mtpcs-leaf-frame
-fira-algorithm=priority -march=armv7-a -mfloat-abi=hard -marm -mfpu=vfpv4

Note that you didn't specify a --with-fpu value in your gcc configuration and
didn't use an -mfpu option in your command line, so gcc defaulted to the value
'vfp' for the fpu. For an architecture like armv7-a it's usually more common to
use values like neon or neon-vfpv4 or vfpv4 if you don't want NEON.  'vfp' is a
very old fpu level. Though this particular bug reproduces with -mfpu=vfp as
well.

That's just for future reference if you want a gcc targeting a more realistic
armv7-a setup by default.

[Bug rtl-optimization/69606] [5/6 Regression] wrong code at -Os and above on x86_64-linux-gnu

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69606

--- Comment #3 from Richard Biener  ---
So before VRP2 we have

  :
  load_dst_16 = b;
  # RANGE [2, 65535] NONZERO 65535
  _12 = (int) load_dst_16;
  # RANGE [0, 255]
  _9 = (unsigned char) load_dst_16;
  e = _9;
  # RANGE [0, 255] NONZERO 255
  _11 = (int) _9;
  d = _12;
  # RANGE [0, 1] NONZERO 1
  _14 = 1 % 0;
  c = _14;
  return 0;

the range on _12 is bogus, it was once conditional:

  :
  a.0_4 = a;
  # RANGE [-128, 127]
  _5 = (int) a.0_4;
  # RANGE [-128, 127]
  g_6 = ~_5;
  b.1_7 = b;
  # RANGE [0, 65535] NONZERO 65535
  _8 = (int) b.1_7;
  if (_8 > 1)
goto ;
  else
goto ;

  :
  # RANGE [0, 255]
  _9 = (unsigned char) b.1_7;
  e = _9;
  # RANGE [0, 255] NONZERO 255
  _11 = (int) _9;
  # RANGE [2, 65535] NONZERO 65535
  _12 = _8 | _11;
  d = _12;

  :
  # RANGE [-128, 127]
  # g_1 = PHI 
  # RANGE [0, 1] NONZERO 1
  _14 = 1 % g_1;
  c = _14;
  return 0;

And it's the bswap pass breaking it:

16 bit load in target endianness found at: _12 = (int) load_dst_16;
...
  :
  a.0_4 = a;
  # RANGE [-128, 127]
  _5 = (int) a.0_4;
  # RANGE [-128, 127]
  g_6 = ~_5;
  load_dst_16 = b;
  # RANGE [2, 65535] NONZERO 65535
  _12 = (int) load_dst_16;

because it does

  if (!useless_type_conversion_p (TREE_TYPE (tgt), load_type))
{
  val_tmp = make_temp_ssa_name (aligned_load_type, NULL,
"load_dst");
  load_stmt = gimple_build_assign (val_tmp, val_expr);
  gimple_set_vuse (load_stmt, n->vuse);
  gsi_insert_before (&gsi, load_stmt, GSI_SAME_STMT);
  gimple_assign_set_rhs_with_ops (&gsi, NOP_EXPR, val_tmp);
}
  else
{
  gimple_assign_set_rhs_with_ops (&gsi, MEM_REF, val_expr);
  gimple_set_vuse (cur_stmt, n->vuse);
}

thus simply replaces the RHS and before:

  /* Move cur_stmt just before  one of the load of the original
 to ensure it has the same VUSE.  See PR61517 for what could
 go wrong.  */
  gsi_move_before (&gsi, &gsi_ins);
  gsi = gsi_for_stmt (cur_stmt);

which in this case moves the computation to a different BB.

[Bug middle-end/68542] [6 Regression] 10% 481.wrf performance regression

2016-02-02 Thread ienkovich at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68542

--- Comment #8 from Ilya Enkovich  ---
Author: ienkovich
Date: Tue Feb  2 09:46:26 2016
New Revision: 233068

URL: https://gcc.gnu.org/viewcvs?rev=233068&root=gcc&view=rev
Log:
gcc/

2016-02-02  Yuri Rumyantsev  

PR middle-end/68542
* config/i386/i386.c (ix86_expand_branch): Add support for conditional
branch with vector comparison.
* config/i386/sse.md (VI48_AVX): New mode iterator.
(define_expand "cbranch4): Add support for conditional branch
with vector comparison.
* tree-vect-loop.c (optimize_mask_stores): New function.
* tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
has_mask_store field of vect_info.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
vectorized loops having masked stores after vec_info destroy.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Add prototype.

gcc/testsuite

2016-02-02  Yuri Rumyantsev  

PR middle-end/68542
* gcc.dg/vect/vect-mask-store-move-1.c: New test.
* gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/vect/vect-mask-store-move-1.c
trunk/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/config/i386/sse.md
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-loop.c
trunk/gcc/tree-vect-stmts.c
trunk/gcc/tree-vectorizer.c
trunk/gcc/tree-vectorizer.h

[Bug rtl-optimization/69606] [5/6 Regression] wrong code at -Os and above on x86_64-linux-gnu

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69606

--- Comment #4 from Richard Biener  ---
Testing

Index: gcc/tree-ssa-math-opts.c
===
*** gcc/tree-ssa-math-opts.c(revision 233067)
--- gcc/tree-ssa-math-opts.c(working copy)
*** bswap_replace (gimple *cur_stmt, gimple
*** 2622,2627 
--- 2622,2629 
/* Move cur_stmt just before  one of the load of the original
 to ensure it has the same VUSE.  See PR61517 for what could
 go wrong.  */
+   if (gimple_bb (cur_stmt) != gimple_bb (src_stmt))
+   reset_flow_sensitive_info (gimple_assign_lhs (cur_stmt));
gsi_move_before (&gsi, &gsi_ins);
gsi = gsi_for_stmt (cur_stmt);

[Bug tree-optimization/69619] New: [6 Regression] compilation doesn't terminate during CCMP expansion

2016-02-02 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619

Bug ID: 69619
   Summary: [6 Regression] compilation doesn't terminate during
CCMP expansion
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Keywords: compile-time-hog, memory-hog
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

Testcase:
int a, b, c, d;
int e[1];
void
fn1 ()
{
  int *f = &d;
  c = 6;
  for (; c; c--)
{
  b = 0;
  for (; b <= 5; b++)
{
  short g = e[(b + 2) * 9 + c];
  *f = *f == a && e[(b + 2) * 9 + c];
}
}
}

Given -O3 for aarch64 GCC doesn't terminate compilation and seems to keep
eating more and more memory. The testcase does contain undefined behaviour and
GCC warns about it:
mycrash.c: In function 'fn1':
mycrash.c:13:22: warning: iteration 1 invokes undefined behavior
[-Waggressive-loop-optimizations]
   short g = e[(b + 2) * 9 + c];
 ~^
mycrash.c:11:7: note: within this loop
   for (; b <= 5; b++)


but GCC shouldn't go into an infinite loop. At -O2 the testcase compiles
instantaneously.

Interrupting compilation in gdb and dumping the callstack shows that it's
recursing deeply into the ccmp expansion code. After a few seconds of
compilation I see a stack frame more than 400 levels deep with expand_ccmp_expr
appearing periodically in there, though I don't know if that's the ccmp expand
code's fault or the tree optimisers' fault.

I can provide the tree dumps if needed, though they should be easy to reproduce
with a recent aarch64 compiler

[Bug tree-optimization/69619] [6 Regression] compilation doesn't terminate during CCMP expansion

2016-02-02 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||wdijkstr at arm dot com
   Target Milestone|--- |6.0
  Known to fail||6.0

[Bug testsuite/69620] New: gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized fails for powerpc64-linux-gnu

2016-02-02 Thread tiago.brusamarello at datacom dot ind.br
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69620

Bug ID: 69620
   Summary: gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times
optimized fails for powerpc64-linux-gnu
   Product: gcc
   Version: 4.9.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tiago.brusamarello at datacom dot ind.br
  Target Milestone: ---

Hi.

The test "gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized" is failing
for gcc 4.9.1 on powerpc64-gnu-linux:

Executing on host: powerpc64-unknown-linux-gnu-gcc
/opt/x-tools/powerpc64-unknown-linux-gnu_v1.0/test-suite/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
 -fno-diagnostics-show-caret -fdiagnostics-color=never   -O3
-fno-tree-loop-distribute-patterns -fno-prefetch-loop-arrays
-fdump-tree-optimized -fno-common -S  -o loop-19.s(timeout = 300)
spawn powerpc64-unknown-linux-gnu-gcc
/opt/x-tools/powerpc64-unknown-linux-gnu_v1.0/test-suite/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
-fno-diagnostics-show-caret -fdiagnostics-color=never -O3
-fno-tree-loop-distribute-patterns -fno-prefetch-loop-arrays
-fdump-tree-optimized -fno-common -S -o loop-19.s
PASS: gcc.dg/tree-ssa/loop-19.c (test for excess errors)
FAIL: gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized "MEM.(base:
&|symbol: )a," 2
FAIL: gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized "MEM.(base:
&|symbol: )c," 2

[Bug c/69602] [6 Regression] over-ambitious logical-op warning on EAGAIN vs EWOULDBLOCK

2016-02-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69602

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek  ---
Even if we look through macros, I'd actually think we should warn here.
Because this is actually:
#define EAGAIN 11
#define EWOULDBLOCK EAGAIN
extern int *__errno_location (void) __attribute__ ((__nothrow__, __leaf__,
__const__));
#define errno (*__errno_location ())

int
foo ()
{
  if (errno == EAGAIN || errno == EWOULDBLOCK)
return 1;
  return 0;
}
and even errno.h claims that EWOULDBLOCK is EAGAIN.
The warning on this started with r222408.

[Bug c++/69277] [6 Regression] ICE mangling a flexible array member

2016-02-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69277

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek  ---
So, any progress on this?  If the updated patch needs pinging, please do ping
it, patches should be pinged after a week.

[Bug tree-optimization/69615] 0 to limit signed range checks don't always use unsigned compare

2016-02-02 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69615

--- Comment #3 from Peter Cordes  ---
@Richard and Jakub:

That's just addressing the first part of my report, the problem with  x <=
(INT_MAX-1), right?

You may have missed the second part of the problem, since I probably buried it
under too much detail with the first:

In the case where the limit is variable, but can easily be proven to itself be
in the range [0 .. INT_MAX-1) or much smaller:

// gcc always fails to optimize this to an unsigned compare, but clang succeeds
void rangecheck_var(int64_t x, int64_t lim2) {
  //lim2 >>= 60;
  lim2 &= 0xf;  // let the compiler figure out the limited range of limit
  if (x>=0 && x

[Bug target/69577] [5/6 Regression] wrong code with -fno-forward-propagate -mavx and 128bit arithmetics since r215450

2016-02-02 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69577

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rsandifo at gcc dot 
gnu.org

--- Comment #8 from rsandifo at gcc dot gnu.org  
---
Testing a patch.

[Bug tree-optimization/69615] 0 to limit signed range checks don't always use unsigned compare

2016-02-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69615

--- Comment #4 from Jakub Jelinek  ---
I suppose even that is doable in the reassoc framework, or it could be done in
match.pd just using the recorded value ranges, like richi has handled PR69595,
or both.

[Bug target/69619] [6 Regression] compilation doesn't terminate during CCMP expansion

2016-02-02 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Just increasing the size of 'e' avoids undefined behaviour.
The following doesn't give a warning and still shows the bug:
int a, b, c, d;
int e[100];
void
fn1 ()
{
  int *f = &d;
  c = 6;
  for (; c; c--)
{
  b = 0;
  for (; b <= 5; b++)
{
  short g = e[(b + 2) * 9 + c];
  *f = *f == a && e[(b + 2) * 9 + c];
}
}
}

[Bug c++/69621] New: extern std::string used as reference template-argument does not have [abi:cxx11] tag applied

2016-02-02 Thread ed at catmur dot co.uk
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69621

Bug ID: 69621
   Summary: extern std::string used as reference template-argument
does not have [abi:cxx11] tag applied
   Product: gcc
   Version: 5.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ed at catmur dot co.uk
  Target Milestone: ---

#include 
template struct S { static void f(); };
extern std::string s;
void g() { S::f(); }

asm emitted for g():
jmp S::f()

Expected:
jmp S::f()

The latter will be emitted if there is a definition for the std::string in the
translation unit.

Broken versions: 5.1.0, 5.2.0, 5.3.0

Workaround: change S to take its template-parameter by pointer:
jmp S<&(s[abi:cxx11])>::f()

Original reporter:
http://stackoverflow.com/questions/35151104/stdstring-as-template-parameter-and-abi-tag-in-gcc-5

Possibly related: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66971

[Bug rtl-optimization/69570] [6 Regression] if-conversion bug on i?86

2016-02-02 Thread bernds at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69570

--- Comment #8 from Bernd Schmidt  ---
Looks like it can be slightly reduced, removing not executed paths.

template  constexpr inline const T &
min (const T &a, const T &b)
{
  if (b < a)
return b;
  return a;
}

template < typename T > constexpr inline const T &
max (const T &a, const T &b)
{
  if (a < b)
return b;
  return a;
}

static inline void
foo (unsigned x, unsigned y, unsigned z, double &h, double &s, double &l)
{
  double r = x / 255.0;
  double g = y / 255.0;
  double b = z / 255.0;
  double m = max (r, max (g, b));
  double n = min (r, min (g, b));
  double d = m - n;
  double e = m + n;
  h = 0.0, s = 0.0, l = e / 2.0;
  if (d > 0.0)
{
  s = l > 0.5 ? d / (2.0 - e) : d / e;
  if (m == g)
h = (b - r) / d + 2.0;
  h /= 6.0;
}
}

__attribute__ ((noinline, noclone))
void bar (unsigned x[3], double y[3])
{
  double h, s, l;
  foo (x[0], x[1], x[2], h, s, l);
  y[0] = h;
  y[1] = s;
  y[2] = l;
}

int
main ()
{
  unsigned x[3] = { 0, 128, 0 };
  double y[3];

  bar (x, y);
  if (__builtin_fabs (y[0] - 0.3) > 0.001)
__builtin_abort ();

  return 0;
}

[Bug target/69622] New: compiler reordering of non-temporal (write-combining) stores produces significant performance hit

2016-02-02 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69622

Bug ID: 69622
   Summary: compiler reordering of non-temporal (write-combining)
stores produces significant performance hit
   Product: gcc
   Version: 5.3.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: peter at cordes dot ca
  Target Milestone: ---
Target: i386-linux-gnu, x86_64-linux-gnu

IDK whether to mark this as "target" or something else.  Other architectures
might have similar write-combining stores that are sensitive to writing whole
cache-lines at once.


For background, see this SO question:
http://stackoverflow.com/questions/25778302/wrong-gcc-generated-assembly-ordering-results-in-performance-hit

In an unrolled copy loop, gcc decides to emit  vmovntdq  stores in a different
order than they appear in the source.  There's no correctness issue, but the
amount of fill-buffers is very limited (maybe each core has 10 or so?).  So
it's *much* better to write all of one cacheline, then all of the next
cacheline.  See my answer on that SO question for lots of discussion and links.

The poster of that question got a 33% speedup (from ~10.2M packets per second
to ~13.3M packets per second by putting the loads and stores in source order in
the binary.  (Unknown hardware and surrounding code, but presumably this loop
is *the* bottleneck in his app).  Anyway, real numbers show that this isn't
just a theoretical argument that some code would be better.


Compilable test-case that demonstrates the issue:

#include 
#include 

//#define compiler_writebarrier()   __asm__ __volatile__ ("")
#define compiler_writebarrier()   // empty.

void copy_mcve(void *const destination, const void *const source, const size_t
bytes)
{
  __m256i *dst  = destination;
const __m256i *src  = source;
const __m256i *dst_endp = (destination + bytes);

while (dst < dst_endp)  { 
  __m256i m0 = _mm256_load_si256( src + 0 );
  __m256i m1 = _mm256_load_si256( src + 1 );
  __m256i m2 = _mm256_load_si256( src + 2 );
  __m256i m3 = _mm256_load_si256( src + 3 );

  _mm256_stream_si256( dst+0, m0 );
  compiler_writebarrier();   // even one anywhere in the loop is enough for
current gcc
  _mm256_stream_si256( dst+1, m1 );
  compiler_writebarrier();
  _mm256_stream_si256( dst+2, m2 );
  compiler_writebarrier();
  _mm256_stream_si256( dst+3, m3 );
  compiler_writebarrier();

  src += 4;
  dst += 4;
}

}

compiles (with the barriers defined as a no-op) to (gcc 5.3.0 -O3
-march=haswell:  http://goo.gl/CwtpS7): 

copy_mcve:
addq%rdi, %rdx
cmpq%rdx, %rdi
jnb .L7
.L5:
vmovdqa 32(%rsi), %ymm2
subq$-128, %rdi
subq$-128, %rsi
vmovdqa -64(%rsi), %ymm1
vmovdqa -32(%rsi), %ymm0
vmovdqa -128(%rsi), %ymm3
 # If dst is aligned, the four halves of two cache lines are {A B} {C D}:
vmovntdq%ymm2, -96(%rdi) # B
vmovntdq%ymm1, -64(%rdi) # C
vmovntdq%ymm0, -32(%rdi) # D
vmovntdq%ymm3, -128(%rdi)# A
cmpq%rdi, %rdx
ja  .L5
vzeroupper
.L7:ret


If the output buffer is aligned, that B C D A store ordering maximally
separates the two halves of the first cache line, giving the most opportunity
for partially-full fill buffers to get flushed.


Doing the +32 load first makes no sense with that placement of the
pointer-increment instructions.  Doing the +0 load first could save a byte of
code-size by not needing a displacement byte.  I'm guessing that's what one
optimizer function was going for when it put the subs there, but then something
else came along and re-ordered the loads.

Is there something that tries to touch both cache-lines as early as possible,
to trigger the loads?  Assuming the buffer is 64B-aligned?

Doing the subs after the last store would save another insn byte, because one
of the stores could use an empty displacement as well.  That's where clang puts
the pointer increments (and it keeps the loads and stores in source order). 
clang also uses vmovaps / vmovntps.  It's probably a holdover from saving an
insn byte in the non-VEX encoding of the 128b insn, but does make the output
work with AVX1 instead of requiring AVX2.


Using a 2-register addressing mode for the loads could save a sub instruction
inside the loop.  Increment dst normally, but reference src with a 2-register
addressing mode with dst and a register initialized with src-dst.  (In the
godbolt link, uncomment the #define ADDRESSING_MODE_HACK.  With ugly enough
source, gcc can be bludgeoned into making code like that.  It wastes insns in
the intro, though, apparently to avoid 3-component (base+index+disp) addresses.

[Bug tree-optimization/68715] [6 Regression] ice: in harmful_stmt_in_region, at graphite-scop-detection.c:1043

2016-02-02 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68715

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed|2016-01-07 00:00:00 |2016-2-2
 CC||ktkachov at gcc dot gnu.org

--- Comment #4 from ktkachov at gcc dot gnu.org ---
Seeing this on aarch64 with current trunk at -Ofast -floop-interchange with the
C testcase with ISL 0.15:

int a[1], c[1];
int b, d, e;

void
fn1 (int p1)
{
  for (;;)
;
}

int
fn3 ()
{
  for (; e; e++)
c[e] = 2;
  for (; d; d--)
a[d] = 8;
  return 0;
}

int fn5 (int);

int
fn2 ()
{
  fn3 ();
}

void
fn4 ()
{
  fn1 (b || fn5 (fn2 ()));
}

[Bug target/69619] [6 Regression] compilation doesn't terminate during CCMP expansion

2016-02-02 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619

--- Comment #2 from Wilco  ---
Changing to c = 3 generates code after a short time. The issue is recursive
calls to expand_ccmp_expr during the 2 possible options tried to determine
costs. That makes the algorithm exponential.

A fix would be to expand the LHS and RHS of both gs0 and gs1 in
expand_ccmp_expr_1 (and not in gen_ccmp_first) as these expansions will be
identical no matter what order we choose to expand gs0 and gs1 in. It's not
clear how easy that can be achieved using the existing interfaces though.

[Bug c++/69623] New: CWG 1388; Invalid deduction of non-trailing template parameter pack

2016-02-02 Thread colu...@gmx-topmail.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69623

Bug ID: 69623
   Summary: CWG 1388; Invalid deduction of non-trailing template
parameter pack
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: colu...@gmx-topmail.de
  Target Milestone: ---

template 
void f(T..., U...) {}

int main() {f();}

---

This shouldn't compile. Although U is deduced to the empty pack via
[temp.arg.explicit]/3, T isn't, because it isn't trailing. I.e. this code
should yield a deduction failure.

Also see https://llvm.org/bugs/show_bug.cgi?id=26435.

[Bug c/69602] [6 Regression] over-ambitious logical-op warning on EAGAIN vs EWOULDBLOCK

2016-02-02 Thread manu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69602

--- Comment #6 from Manuel López-Ibáñez  ---
(In reply to Jakub Jelinek from comment #5)
> Even if we look through macros, I'd actually think we should warn here.

I think we should NOT look through macros. The purpose of the warning is to
catch mistakes like xxx && !!xxx or 0 < A || A > 0, but if the user writes f()
&& g() and both functions return the same value, it is clearly not a bug, even
if GCC knows that they are the same function. For the purposes of this warning
(and probably several others), we should treat macros as variables and
functions. Thus,

(errno == EAGAIN || errno == EWOULDBLOCK) /* no warning: EAGAIN and EWOULDBLOCK
are not the same macro */

(errno == EAGAIN || EAGAIN == errno) /* warn:  logical 'or' of the same
expressions */


Of course, this is not trivial to implement at the moment (mostly because we
don't have locations for macros that expand to constants, thus even finding
that something comes from a macro is not possible in most cases).

[Bug c/69602] [6 Regression] over-ambitious logical-op warning on EAGAIN vs EWOULDBLOCK

2016-02-02 Thread manu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69602

--- Comment #7 from Manuel López-Ibáñez  ---
(In reply to Eric Blake from comment #0)
> However, as shown by the sample code below, gcc 6.0's new warning is
> over-ambitious, and is likely to _cause_ rather than cure user bugs, when
> uninformed users unaware that Linux has the two errno values equal dumb down
> the code to silence the warning, but in the process break their code on
> other platforms where it is important to check for both values.

As a work-around, something like:

if (0 || errno == EAGAIN || errno == EWOULDBLOCK)

silences the warning (although it should not).

[Bug rtl-optimization/69606] [5 Regression] wrong code at -Os and above on x86_64-linux-gnu

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69606

Richard Biener  changed:

   What|Removed |Added

  Known to work||6.0
Summary|[5/6 Regression] wrong code |[5 Regression] wrong code
   |at -Os and above on |at -Os and above on
   |x86_64-linux-gnu|x86_64-linux-gnu
  Known to fail||5.3.0

--- Comment #5 from Richard Biener  ---
Fixed on trunk sofar.

[Bug rtl-optimization/69606] [5 Regression] wrong code at -Os and above on x86_64-linux-gnu

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69606

Richard Biener  changed:

   What|Removed |Added

  Known to work||6.0
Summary|[5/6 Regression] wrong code |[5 Regression] wrong code
   |at -Os and above on |at -Os and above on
   |x86_64-linux-gnu|x86_64-linux-gnu
  Known to fail||5.3.0

--- Comment #5 from Richard Biener  ---
Fixed on trunk sofar.

--- Comment #6 from Richard Biener  ---
Author: rguenth
Date: Tue Feb  2 12:39:36 2016
New Revision: 233069

URL: https://gcc.gnu.org/viewcvs?rev=233069&root=gcc&view=rev
Log:
2016-02-02  Richard Biener  

PR tree-optimization/69606
* tree-ssa-math-opts.c (bswap_replace): Clear flow sensitive
info on the result before moving a stmt.

* gcc.dg/torture/pr69606.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/torture/pr69606.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-math-opts.c

[Bug c/69602] [6 Regression] over-ambitious logical-op warning on EAGAIN vs EWOULDBLOCK

2016-02-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69602

--- Comment #8 from Jakub Jelinek  ---
(In reply to Manuel López-Ibáñez from comment #7)
> (In reply to Eric Blake from comment #0)
> > However, as shown by the sample code below, gcc 6.0's new warning is
> > over-ambitious, and is likely to _cause_ rather than cure user bugs, when
> > uninformed users unaware that Linux has the two errno values equal dumb down
> > the code to silence the warning, but in the process break their code on
> > other platforms where it is important to check for both values.
> 
> As a work-around, something like:
> 
> if (0 || errno == EAGAIN || errno == EWOULDBLOCK)
> 
> silences the warning (although it should not).

That is something that should be fixed.
But
  if (errno == EAGAIN || (EWOULDBLOCK != EAGAIN && errno == EWOULDBLOCK))
could be better workaround.

[Bug rtl-optimization/69307] [4.9/5/6 Regression] wrong code with -O2 -fselective-scheduling @ armv7a

2016-02-02 Thread abel at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69307

--- Comment #6 from Andrey Belevantsev  ---
Created attachment 37551
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37551&action=edit
proposed patch

Here before reload we're trying to rename a hard register.  At the very final
point of choosing the new register we forget to properly check hard_regno_nregs
when checking liveness restrictions (though we did all the way to that point). 
Then we incorrectly choose the original register as it seems to be good enough.
 Fixed by looping over all registers specified in hard_regno_nregs at that
place, too.

[Bug target/69622] compiler reordering of non-temporal (write-combining) stores produces significant performance hit

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69622

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-02-02
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
A workaround is -fno-schedule-insns2.  I suppose the compiler is trying to
increase the distance of the loads and stores (in a greedy way) to reduce
the impact on load latency in the general premise of moving loads up and
stores down.

In fact with -fno-schedule-insns2 you can see that we end up with

.L5:
vmovdqa 32(%rsi), %ymm2
vmovdqa 64(%rsi), %ymm1
vmovdqa 96(%rsi), %ymm0
vmovdqa (%rsi), %ymm3
vmovntdq%ymm3, (%rdi)
vmovntdq%ymm2, 32(%rdi)
vmovntdq%ymm1, 64(%rdi)
vmovntdq%ymm0, 96(%rdi)

which is because we do TER the zero-offset load (thus RTL expand it right
before the store).  Possibly scheduling tries to fix that up but does a
miserable job.

[Bug target/67032] [4.9/5/6 Regression] Geode optimizations incorrectly return -NaN

2016-02-02 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67032

Uroš Bizjak  changed:

   What|Removed |Added

   Keywords|ra  |
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com

--- Comment #13 from Uroš Bizjak  ---
Well... at the end, it was target problem. For the RA to work as expected, we
have to provide some sensible cost values for MMX and SSE register moves.
Following patch fixes the problem:

--cut here--
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b500233..121e802 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -595,17 +595,17 @@ struct processor_costs geode_cost = {
   {4, 6, 6},   /* cost of storing fp registers
   in SFmode, DFmode and XFmode */

-  1,   /* cost of moving MMX register */
-  {1, 1},  /* cost of loading MMX registers
+  2,   /* cost of moving MMX register */
+  {2, 2},  /* cost of loading MMX registers
   in SImode and DImode */
-  {1, 1},  /* cost of storing MMX registers
+  {2, 2},  /* cost of storing MMX registers
   in SImode and DImode */
-  1,   /* cost of moving SSE register */
-  {1, 1, 1},   /* cost of loading SSE registers
+  2,   /* cost of moving SSE register */
+  {2, 2, 8},   /* cost of loading SSE registers
   in SImode, DImode and TImode */
-  {1, 1, 1},   /* cost of storing SSE registers
+  {2, 2, 8},   /* cost of storing SSE registers
   in SImode, DImode and TImode */
-  1,   /* MMX or SSE register to integer */
+  6,   /* MMX or SSE register to integer */
   64,  /* size of l1 cache.  */
   128, /* size of l2 cache.  */
   32,  /* size of prefetch block */
--cut here--

[Bug target/69614] [6 Regression] wrong code with -Os -fno-expensive-optimizations -fschedule-insns -mtpcs-leaf-frame -fira-algorithm=priority @ armv7a

2016-02-02 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69614

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Bisection showed this started with r228302.
But I'm not sure if that's the cause or just exposes a latent bug.

[Bug target/69619] [6 Regression] compilation doesn't terminate during CCMP expansion

2016-02-02 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619

--- Comment #3 from Wilco  ---
A simple workaround is to calculate cost1 early and only try the 2nd option if
the cost is low (ie. it's not a huge expression that may evaluate into lots of
ccmps). A slightly more advanced way would be to walk prep_seq_1 to count the
actual number of ccmp instructions.

[Bug target/69614] [6 Regression] wrong code with -Os -fno-expensive-optimizations -fschedule-insns -mtpcs-leaf-frame -fira-algorithm=priority @ armv7a

2016-02-02 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69614

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Note that -mtpcs-leaf-frame was deprecated in GCC 5 due to a number of bugs
with it: https://gcc.gnu.org/gcc-5/changes.html

There are a number of known issues with these options relating to the old ABI:
https://gcc.gnu.org/bugzilla/buglist.cgi?quicksearch=mapcs-frame&list_id=139237

[Bug c/69624] New: sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread jirislaby at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

Bug ID: 69624
   Summary: sanitize-coverage=trace-pc miscompiles kernel
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jirislaby at gmail dot com
  Target Milestone: ---

I have
commit a8175057d14fa8ff8cc4589edf55a6855d9afdf4
Author: Dmitry Vyukov 
Date:   Mon Nov 9 19:59:08 2015 +0100

new coverage that uses shared buffer

applied to kernel 4.4.

I am seeing crashes in netlink_bind at 0xd5dc:
d5bd:   4c 89 e2mov%r12,%rdx
d5c0:   e8 00 00 00 00  callq  d5c5 
d5c1: R_X86_64_PC32 __sw_hweight32-0x4
d5c5:   03 83 d0 02 00 00   add0x2d0(%rbx),%eax
d5cb:   48 c1 ea 03 shr$0x3,%rdx
d5cf:   41 89 c5mov%eax,%r13d
d5d2:   48 b8 00 00 00 00 00movabs $0xdc00,%rax
d5d9:   fc ff df 
d5dc:   80 3c 02 00 cmpb   $0x0,(%rdx,%rax,1)

because rdx is 0.

rdx is fetched from r12, then __sw_hweight32 is called, it zeroes rdx and
(%rdx,%rax,1) dereference is then rax == 0xdc00 dereference which
leads to a crash.

[Bug target/69613] [6 Regression] wrong code with -O and simple 128bit arithmetics and vectors @ aarch64

2016-02-02 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69613

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-02-02
 CC||ktkachov at gcc dot gnu.org
  Known to work||4.9.4, 5.3.1
 Ever confirmed|0   |1

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread jirislaby at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #1 from Jiri Slaby  ---
Created attachment 37552
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37552&action=edit
__sw_hweight32 assembly

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread jirislaby at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #2 from Jiri Slaby  ---
Created attachment 37553
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37553&action=edit
__sanitizer_cov_trace_pc implementation

This guys actually changes rdx.

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread jirislaby at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #3 from Jiri Slaby  ---
Preprocessed code:
http://www.fi.muni.cz/~xslaby/sklad/af_netlink.i

This one results in the code from initial description. I.e. rdx is loaded
before a call.

[Bug c++/69621] extern std::string used as reference template-argument does not have [abi:cxx11] tag applied

2016-02-02 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69621

Jonathan Wakely  changed:

   What|Removed |Added

   Keywords||wrong-code
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-02-02
 Ever confirmed|0   |1

[Bug libgomp/69625] New: deadlock in libgomp.c/doacross-1.c test

2016-02-02 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69625

Bug ID: 69625
   Summary: deadlock in libgomp.c/doacross-1.c test
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vogt at linux dot vnet.ibm.com
CC: jakub at gcc dot gnu.org
  Target Milestone: ---
Target: s390x

Created attachment 37554
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37554&action=edit
.s file of test program

On s390x with -march=z196 -O2/-O3 the test hangs with a deadlock (and also
doacross-[2.3].c and doacross-1.C, but I haven't looked at them yet).  I've
stripped down the test to this:

-- snip --
#include 
#define N 64
int b[N / 16][8][4];

int
main ()
{
  int i, j, k, l;
  (void)l;
  #pragma omp parallel
  {
printf("+++\n");
#pragma omp for schedule(static, 0) ordered (3) nowait
for (i = 2; i < N / 16 - 1; i++)
  for (j = 0; j < 8; j += 2)
for (k = 1; k <= 3; k++)
  {
#pragma omp atomic write
b[i][j][k] = 11;
#pragma omp ordered depend(sink: i, j - 2, k - 1) \
depend(sink: i - 2, j - 2, k + 1)
#pragma omp ordered depend(sink: i - 3, j + 2, k - 2)
if (j >= 2 && k > 1)
  {
#pragma omp atomic read
l = b[i][j - 2][k - 1];
  }
#pragma omp atomic write
b[i][j][k] = 22;
if (i >= 4 && j >= 2 && k < 3)
  {
#pragma omp atomic read
l = b[i - 2][j - 2][k + 1];
  }
#pragma omp ordered depend(source)
#pragma omp atomic write
b[i][j][k] = 33;
  }
printf("---\n");
  }
printf("done\n");
  return 0;
}
-- snip --

(See attachment for full .s file.)
(Running on an LPAR with 17 cores inside gdb.)

The function GOMP_parallel starts threads 2 to 17 which enter and leave the
parallel region (they print both "+++" and "---" then hang in a
team_barrier_wait_final() call in gomp_thread_start.  Only then thread 1 runs
the thread function.

  gomp_team_start (fn, data, num_threads, flags, gomp_new_team (num_threads));
  fn (data);

Thread 1 comes across

   0x8b7a <+522>:   brasl   %r14,0x87b0


with %r10 == 2 (which presumably contains k), then continues through 

   0x8cf6 <+902>:   brasl   %r14,0x86f0


and finally comes back to

   0x8b7a <+522>:   brasl   %r14,0x87b0


with %r10 == 3.  In GOMP_doacross_wait() it ends up calling doacross_spin() and
never gets out of that again:

   doacross_spin (array, flattened, cur);

   0x03fff7ef5562 <+282>:   lg  %r1,0(%r5)
   0x03fff7ef5568 <+288>:   clgr%r1,%r2
   0x03fff7ef556c <+292>:   jle 0x3fff7ef5562 

The value of r1 (= *r5 (= *array?)) remains 6 (since there's no other thread
left that could modify it) while the value of r2 is 0xfffb4a1.  To me this
looks as if doacross_spin() compares an integer value with an address or
rubbish.

Any ideas what's going on?

[Bug libstdc++/69626] New: [6 Regression] std::strtoll no longer defined in c++98 mode

2016-02-02 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69626

Bug ID: 69626
   Summary: [6 Regression] std::strtoll no longer defined in c++98
mode
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org
  Target Milestone: ---

for __cplusplus < 201103L bits/c++config.h does:

# ifndef _GLIBCXX_USE_C99_STDLIB
# define _GLIBCXX_USE_C99_STDLIB _GLIBCXX98_USE_C99_STDLIB
# endif

but acinclude.m4 never defines _GLIBCXX98_USE_C99_STDLIB so the C99 stdlib.h
functions are no longer defined for C++98.

PR 69350 says we shouldn't be defining those non-C++98 functions anyway, but
removing them now was not intentional (and we probably don't want to remove
them for -std=gnu++98 anyway).



#include 

int main()
{
  &std::strtoll;
}

$ g++ -std=c++98 ll.cc
ll.cc: In function ‘int main()’:
ll.cc:5:4: error: ‘strtoll’ is not a member of ‘std’
   &std::strtoll;
^~~

ll.cc:5:4: note: suggested alternative:
In file included from /home/jwakely/gcc/6/include/c++/6.0.0/cstdlib:75:0,
 from ll.cc:1:
/usr/include/stdlib.h:209:22: note:   ‘strtoll’
 extern long long int strtoll (const char *__restrict __nptr,
  ^~~

[Bug c++/69627] New: [6 Regression] Conditional jump or move depends on uninitialised value(s) in (anonymous namespace)::layout::get_state_at_point

2016-02-02 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69627

Bug ID: 69627
   Summary: [6 Regression] Conditional jump or move depends on
uninitialised value(s) in (anonymous
namespace)::layout::get_state_at_point
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
  Target Milestone: ---

Hello.

$ cat tc2.c
typedef float __m128;
void test_1 () {
__m128 myvec[2];
int const *ptr;
myvec[1]/ptr;
}

$ valgrind --leak-check=yes --trace-children=yes ./gcc/xgcc -B gcc tc2.c

produces:
==14470== Conditional jump or move depends on uninitialised value(s)
==14470==at 0x1901E24: (anonymous
namespace)::layout::get_state_at_point(int, int, int, int, (anonymous
namespace)::point_state*) (diagnostic-show-locus.c:725)
==14470==by 0x1901861: (anonymous
namespace)::layout::print_source_line(int, (anonymous namespace)::line_bounds*)
(diagnostic-show-locus.c:561)
==14470==by 0x190210A: diagnostic_show_locus(diagnostic_context*,
diagnostic_info const*) (diagnostic-show-locus.c:835)
==14470==by 0x8B50E6: c_diagnostic_finalizer(diagnostic_context*,
diagnostic_info*) (c-opts.c:167)
==14470==by 0x18FEBCC: diagnostic_report_diagnostic(diagnostic_context*,
diagnostic_info*) (diagnostic.c:800)
==14470==by 0x18FFF96: error_at_rich_loc(rich_location*, char const*, ...)
(diagnostic.c:1173)
==14470==by 0x85E415: binary_op_error(rich_location*, tree_code,
tree_node*, tree_node*) (c-common.c:3865)
==14470==by 0x7FF1E0: build_binary_op(unsigned int, tree_code, tree_node*,
tree_node*, int) (c-typeck.c:11577)
==14470==by 0x7DF958: parser_build_binary_op(unsigned int, tree_code,
c_expr, c_expr) (c-typeck.c:3515)
==14470==by 0x81D970: c_parser_binary_expression(c_parser*, c_expr*,
tree_node*) (c-parser.c:6636)
==14470==by 0x81C27B: c_parser_conditional_expression(c_parser*, c_expr*,
tree_node*) (c-parser.c:6279)
==14470==by 0x81BF8B: c_parser_expr_no_commas(c_parser*, c_expr*,
tree_node*) (c-parser.c:6196)
==14470== 
==14470== Conditional jump or move depends on uninitialised value(s)
==14470==at 0x1901E24: (anonymous
namespace)::layout::get_state_at_point(int, int, int, int, (anonymous
namespace)::point_state*) (diagnostic-show-locus.c:725)
==14470==by 0x190199F: (anonymous
namespace)::layout::print_annotation_line(int, (anonymous
namespace)::line_bounds) (diagnostic-show-locus.c:603)
==14470==by 0x1902129: diagnostic_show_locus(diagnostic_context*,
diagnostic_info const*) (diagnostic-show-locus.c:837)
==14470==by 0x8B50E6: c_diagnostic_finalizer(diagnostic_context*,
diagnostic_info*) (c-opts.c:167)
==14470==by 0x18FEBCC: diagnostic_report_diagnostic(diagnostic_context*,
diagnostic_info*) (diagnostic.c:800)
==14470==by 0x18FFF96: error_at_rich_loc(rich_location*, char const*, ...)
(diagnostic.c:1173)
==14470==by 0x85E415: binary_op_error(rich_location*, tree_code,
tree_node*, tree_node*) (c-common.c:3865)
==14470==by 0x7FF1E0: build_binary_op(unsigned int, tree_code, tree_node*,
tree_node*, int) (c-typeck.c:11577)
==14470==by 0x7DF958: parser_build_binary_op(unsigned int, tree_code,
c_expr, c_expr) (c-typeck.c:3515)
==14470==by 0x81D970: c_parser_binary_expression(c_parser*, c_expr*,
tree_node*) (c-parser.c:6636)
==14470==by 0x81C27B: c_parser_conditional_expression(c_parser*, c_expr*,
tree_node*) (c-parser.c:6279)
==14470==by 0x81BF8B: c_parser_expr_no_commas(c_parser*, c_expr*,
tree_node*) (c-parser.c:6196)

Thanks,
Martin

[Bug c++/69628] New: [6 Regression] Conditional jump or move depends on uninitialised value(s)

2016-02-02 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69628

Bug ID: 69628
   Summary: [6 Regression] Conditional jump or move depends on
uninitialised value(s)
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
  Target Milestone: ---

Hello.

Running:
$ echo "0'';" | valgrind --leak-check=yes --trace-children=yes ./gcc/xg++ -Bgcc
-std=c++14 -xc++ -

produces:
==14976== Conditional jump or move depends on uninitialised value(s)
==14976==at 0xB1A59D: lex_charconst(cpp_token const*) (c-lex.c:1252)
==14976==by 0xB18692: c_lex_with_flags(tree_node**, unsigned int*, unsigned
char*, int) (c-lex.c:550)
==14976==by 0x92889A: cp_lexer_get_preprocessor_token(cp_lexer*, cp_token*)
(parser.c:792)
==14976==by 0x928545: cp_lexer_new_main() (parser.c:656)
==14976==by 0x92BD1A: cp_parser_new() (parser.c:3687)
==14976==by 0x97B2C0: c_parse_file() (parser.c:37354)
==14976==by 0xB23FDA: c_common_parse_file() (c-opts.c:1064)
==14976==by 0x11225EE: compile_file() (toplev.c:465)
==14976==by 0x1124B96: do_compile() (toplev.c:1988)
==14976==by 0x1124E21: toplev::main(int, char**) (toplev.c:2096)
==14976==by 0x1B4DE9F: main (main.c:39)

Thanks,
Martin

[Bug target/69619] [6 Regression] compilation doesn't terminate during CCMP expansion

2016-02-02 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69619

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-02-02
 Ever confirmed|0   |1

--- Comment #4 from ktkachov at gcc dot gnu.org ---
Confirmed then

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2016-02-02
 Ever confirmed|0   |1

--- Comment #4 from Jakub Jelinek  ---
What gcc options are you using on the preprocessed source to trigger this?

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread jirislaby at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #5 from Jiri Slaby  ---
(In reply to Jakub Jelinek from comment #4)
> What gcc options are you using on the preprocessed source to trigger this?

By default this:
gcc-6 -nostdinc -fno-strict-aliasing -fno-common -std=gnu89 -mno-sse -mno-mmx
-mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1 -falign-loops=1 -mno-80387
-mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic
-mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -pipe
-fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0 
-fstack-protector -Wno-unused-but-set-variable -fno-omit-frame-pointer
-fno-optimize-sibling-calls -fno-var-tracking-assignments
-fasynchronous-unwind-tables -pg -mfentry  -fno-inline-functions-called-once 
-fno-strict-overflow -fconserve-stack  -fsanitize=kernel-address
-fasan-shadow-offset=0xdc00 --param asan-stack=1 --param
asan-globals=1 --param asan-instrumentation-with-call-threshold=1
-fsanitize-coverage=trace-pc   -S -o - af_netlink.i

And this simplified one produces the same around the call:
gcc-6 -nostdinc -O2 -std=gnu89   -fsanitize=kernel-address
-fasan-shadow-offset=0xdc00 --param asan-stack=1 --param
asan-globals=1 --param asan-instrumentation-with-call-threshold=1
-fsanitize-coverage=trace-pc   -S -o - af_netlink.i

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread dvyukov at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #6 from Dmitry Vyukov  ---
Also what gcc version?

I've tried:
gcc version 6.0.0 20160105 (experimental) (GCC) 
$ gcc /tmp/af_netlink.c -c -O2 -fsanitize-coverage=trace-pc
-fsanitize=kernel-address --param asan-stack=1 --param asan-globals=1 --param
asan-instrumentation-with-call-threshold=1
-fasan-shadow-offset=0xdc00

(which should resemble what kernel uses), but I don't see any similar code
fragments.

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread jirislaby at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #7 from Jiri Slaby  ---
(In reply to Dmitry Vyukov from comment #6)
> Also what gcc version?

$ gcc-6 --version
gcc-6 (SUSE Linux) 6.0.0 20160121 (experimental) [trunk revision 232670]

> I've tried:
> gcc version 6.0.0 20160105 (experimental) (GCC) 
> $ gcc /tmp/af_netlink.c -c -O2 -fsanitize-coverage=trace-pc
> -fsanitize=kernel-address --param asan-stack=1 --param asan-globals=1
> --param asan-instrumentation-with-call-threshold=1
> -fasan-shadow-offset=0xdc00

With this I see that too:
movq%r12, %rdx
#APP
# 28 "../arch/x86/include/asm/arch_hweight.h" 1
661:
call __sw_hweight32
662:
#...more ALTINST crap
#NO_APP
addl720(%rbx), %eax
shrq$3, %rdx
movl%eax, %r13d
movabsq $-2305847407260205056, %rax
cmpb$0, (%rdx,%rax)

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread dvyukov at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #8 from Dmitry Vyukov  ---
First of all, are you sure that r12 is not 0 before the call?

Deference of 0xdc00 is how KASAN reacts on NULL deref, it does
shadow check before the memory accesses. If original address is NULL, the
shadow check will go to 0xdc00. I see such GPFs quite frequently,
so that's what I would assume first.

If you just switched to gcc6, then it can be some latent bug (undefined
behavior), which started to fire with a new compiler.

p.s. I can reproduce the generated code now.

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread jirislaby at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #9 from Jiri Slaby  ---
(In reply to Dmitry Vyukov from comment #8)
> First of all, are you sure that r12 is not 0 before the call?

Yes.

> Deference of 0xdc00 is how KASAN reacts on NULL deref, it does
> shadow check before the memory accesses. If original address is NULL, the
> shadow check will go to 0xdc00. I see such GPFs quite
> frequently, so that's what I would assume first.

I know, I thought so first too. But later, I debugged that to a gcc bug :).

> If you just switched to gcc6, then it can be some latent bug (undefined
> behavior), which started to fire with a new compiler.

W/ CONFIG_KCOV=n (i.e. no -fsanitize-coverage), it works, apparently.

[Bug c++/69628] [6 Regression] Conditional jump or move depends on uninitialised value(s) in lex_charconst(cpp_token const*) (c-lex.c:1252)

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69628

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |6.0

[Bug c++/69627] [6 Regression] Conditional jump or move depends on uninitialised value(s) in (anonymous namespace)::layout::get_state_at_point

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69627

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |6.0

[Bug libstdc++/69626] [6 Regression] std::strtoll etc. no longer defined in c++98 mode

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69626

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |6.0

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #10 from Jakub Jelinek  ---
If you are calling a function (__sw_hweight32) without letting gcc know you do
that, are you sure that function call does not modify any registers other than
"flags" and "rax"?

[Bug target/69577] [5/6 Regression] wrong code with -fno-forward-propagate -mavx and 128bit arithmetics since r215450

2016-02-02 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69577

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/ml/gcc-
   ||patches/2016-02/msg00127.ht
   ||ml

--- Comment #9 from rsandifo at gcc dot gnu.org  
---
Patch posted as: https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00127.html
Would definitely appreciate wider testing, e.g. for SPEC performance.
(Assuming the patch is OK in principle of course.)

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread jirislaby at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #11 from Jiri Slaby  ---
(In reply to Jakub Jelinek from comment #10)
> If you are calling a function (__sw_hweight32) without letting gcc know you
> do that, are you sure that function call does not modify any registers other
> than "flags" and "rax"?

Not at all, I suppose. See attachment #37552. __sw_hweight32 changes only
retval (rax) and parameter (rdi).

But __sw_hweight32 proper is as well instrumented by the coverage hook. And it
changes whatever... 

In any way, what is the proper way of annotating asm() directive when there is
a call inside?

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread jirislaby at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #12 from Jiri Slaby  ---
(In reply to Jiri Slaby from comment #11)
> __sw_hweight32 changes only retval (rax) and parameter (rdi).

... and rdi is stored to and restored from stack.

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #13 from Jakub Jelinek  ---
Seems hweight.c is compiled with
-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx
-fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11
but that of course expects that all the functions in there are leaf, if they
aren't leaf, you'd need to compile with the same flags also all the functions
that might be called from there.
So bet you want to arrange for hweight.c from being compiled without
-fsanitize-coverage=trace-pc and without other options that might introduce
calls.

[Bug rtl-optimization/69570] [6 Regression] if-conversion bug on i?86

2016-02-02 Thread bernds at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69570

Bernd Schmidt  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #9 from Bernd Schmidt  ---
Ah, of course.

 804856f:   df ec   fucomip %st(4),%st

pc 0x804856f0x804856f 
st00.5019607843137254902230771913540508 (raw
0x3ffe8080808080808081)
st11(raw 0x3fff8000)
st20.2509803921568627451115385956770254 (raw
0x3ffd8080808080808081)
st30(raw 0x)
st40.5019607843137254832299731788225472 (raw
0x3ffe8080808080808000)

An equality comparison of floating point numbers, on x87. One of those was just
loaded from a stack slot, the other was kept in a register the whole time.

This code needs -ffloat-store, or -mpc64, when compiled for 32-bit x86.

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread dvyukov at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #14 from Dmitry Vyukov  ---
Wait, I already disabled instrumentation of hweight.c for because of this:

+# Kernel does not boot if we instrument this file as it uses custom calling
+# convention (see CONFIG_ARCH_HWEIGHT_CFLAGS).
+KCOV_INSTRUMENT_hweight.o := n

If you apply the latest kcov patch "[PATCH v6] kernel: add kcov code coverage",
it should work.

[Bug rtl-optimization/69570] [6 Regression] if-conversion bug on i?86

2016-02-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69570

--- Comment #10 from Jakub Jelinek  ---
(In reply to Bernd Schmidt from comment #9)
> Ah, of course.
> 
>  804856f: df ec   fucomip %st(4),%st
> 
> pc 0x804856f  0x804856f 
> st00.5019607843137254902230771913540508   (raw
> 0x3ffe8080808080808081)
> st11  (raw 0x3fff8000)
> st20.2509803921568627451115385956770254   (raw
> 0x3ffd8080808080808081)
> st30  (raw 0x)
> st40.5019607843137254832299731788225472   (raw
> 0x3ffe8080808080808000)
> 
> An equality comparison of floating point numbers, on x87. One of those was
> just loaded from a stack slot, the other was kept in a register the whole
> time.
> 
> This code needs -ffloat-store, or -mpc64, when compiled for 32-bit x86.

Or -fexcess-precision=standard.

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread jirislaby at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #15 from Jiri Slaby  ---
(In reply to Dmitry Vyukov from comment #14)
> If you apply the latest kcov patch "[PATCH v6] kernel: add kcov code
> coverage", it should work.

Could you please push that to the syzkaller tree [1] then?

[1] https://github.com/dvyukov/linux/commits/kcov

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread dvyukov at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #16 from Dmitry Vyukov  ---
> Could you please push that to the syzkaller tree [1] then?

Sorry, syzkaller page referred to outdated patch. I was hoping that Andrew will
take it soon, so that I can update the link to a more respected location.
Updated now:
https://github.com/dvyukov/linux/commit/33787098ffaaa83b8a7ccf519913ac5fd6125931

If you will have any other issue feel free to contact
syzkal...@googlegroups.com.

Re the original issue.

We could call a special __sanitizer_cov_trace_pc_special callback which would
save/restore all registers when a file is compiled with -fcall-saved*. But it
probably does not worth it while we have a single case.

We could also add a finer-grained function attibute which disables kcov
instrumentation of a single function. But the same reasoning applies.

[Bug c/69624] sanitize-coverage=trace-pc miscompiles kernel

2016-02-02 Thread dvyukov at google dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69624

--- Comment #17 from Dmitry Vyukov  ---
Jakub, I guess you can close this.
Sorry again.

[Bug tree-optimization/69595] [6 Regression] Bogus -Warray-bound warning due to missed optimization

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69595

--- Comment #3 from Richard Biener  ---
Author: rguenth
Date: Tue Feb  2 15:19:32 2016
New Revision: 233076

URL: https://gcc.gnu.org/viewcvs?rev=233076&root=gcc&view=rev
Log:
2016-02-02  Richard Biener  

PR tree-optimization/69595
* match.pd: Add range test simplifications to true/false.

* gcc.dg/Warray-bounds-17.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/Warray-bounds-17.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/match.pd
trunk/gcc/testsuite/ChangeLog

[Bug target/69613] [6 Regression] wrong code with -O and simple 128bit arithmetics and vectors @ aarch64

2016-02-02 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69613

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Bisection shows this started with r226901, the big copyrename dropping patch.
I didn't investigate whether it's actually the cause of the bug or just exposes
another latent one.

[Bug tree-optimization/69599] [6 Regression] libgomp.c fipa-pta tests compiled with -flto -flto-partition=max fail in execution

2016-02-02 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69599

vries at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |6.0
Summary|libgomp.c fipa-pta tests|[6 Regression] libgomp.c
   |compiled with -flto |fipa-pta tests compiled
   |-flto-partition=max fail in |with -flto
   |execution   |-flto-partition=max fail in
   ||execution

--- Comment #2 from vries at gcc dot gnu.org ---
omp-nested-2.c is omp-nested-1.c with -fipa-pta.

omp-nested-1.c with -fipa-pta -flto -flto-partition=max passes for
gcc-5-branch.

So, marking as 6 regression.

[Bug tree-optimization/69595] [6 Regression] Bogus -Warray-bound warning due to missed optimization

2016-02-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69595

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Richard Biener  ---
Fixed.

[Bug rtl-optimization/69570] [6 Regression] if-conversion bug on i?86

2016-02-02 Thread tom at compton dot nu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69570

--- Comment #11 from Tom Hughes  ---
This is C++ so -fexcess-precision=standard is no help as that is C only.

Likewise -ffloat-store is, as I understand it, not much help in real world code
because you need to make sure that you force stores in order to trigger it?

Using -mpc64 seems very scary as I believe it alters the global state of the
program.

So short of -mfpmath=sse I suspect the only solution is to replace the equality
with a comparisin that allow some variation.

I'll just slink off and go back to hating x87 FP math I think...

[Bug lto/69630] New: [6 Regression] LTO ICE in types_same_for_odr at ipa-devirt.c:402

2016-02-02 Thread burnus at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69630

Bug ID: 69630
   Summary: [6 Regression] LTO ICE in types_same_for_odr at
ipa-devirt.c:402
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: burnus at gcc dot gnu.org
  Target Milestone: ---

Running the attached C++ (.ii) file fails with:

$ g++ -r -nostdlib -O2 -flto -Wsuggest-final-methods test.ii

lto1: internal compiler error: Segmentation fault
0xa78a4f crash_signal
../../gcc/toplev.c:335
0x87b824 types_same_for_odr(tree_node const*, tree_node const*, bool)
../../gcc/ipa-devirt.c:402
0x885bc5 possible_polymorphic_call_targets(tree_node*, long,
ipa_polymorphic_call_context, bool*, void**, bool)
../../gcc/ipa-devirt.c:3200
0x887427 possible_polymorphic_call_targets(cgraph_edge*, bool*, void**, bool)
../../gcc/ipa-utils.h:114
0x887427 ipa_devirt
../../gcc/ipa-devirt.c:3575

[Bug lto/69630] [6 Regression] LTO ICE in types_same_for_odr at ipa-devirt.c:402

2016-02-02 Thread burnus at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69630

--- Comment #1 from Tobias Burnus  ---
Created attachment 37557
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37557&action=edit
test.ii  test case

[Bug libstdc++/69626] [6 Regression] std::strtoll etc. no longer defined in c++98 mode

2016-02-02 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69626

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2016-02-02
   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org
 Ever confirmed|0   |1

[Bug target/67032] [4.9/5/6 Regression] Geode optimizations incorrectly return -NaN

2016-02-02 Thread uros at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67032

--- Comment #14 from uros at gcc dot gnu.org ---
Author: uros
Date: Tue Feb  2 16:07:24 2016
New Revision: 233079

URL: https://gcc.gnu.org/viewcvs?rev=233079&root=gcc&view=rev
Log:
PR target/67032
* config/i386/i386.c (geode_cost): Increase cost of MMX and SSE moves.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c

[Bug target/67032] [4.9/5/6 Regression] Geode optimizations incorrectly return -NaN

2016-02-02 Thread uros at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67032

--- Comment #15 from uros at gcc dot gnu.org ---
Author: uros
Date: Tue Feb  2 16:08:56 2016
New Revision: 233080

URL: https://gcc.gnu.org/viewcvs?rev=233080&root=gcc&view=rev
Log:
PR target/67032
* config/i386/i386.c (geode_cost): Increase cost of MMX and SSE moves.


Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/i386/i386.c

[Bug target/67032] [4.9/5/6 Regression] Geode optimizations incorrectly return -NaN

2016-02-02 Thread uros at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67032

--- Comment #16 from uros at gcc dot gnu.org ---
Author: uros
Date: Tue Feb  2 16:10:04 2016
New Revision: 233081

URL: https://gcc.gnu.org/viewcvs?rev=233081&root=gcc&view=rev
Log:
PR target/67032
* config/i386/i386.c (geode_cost): Increase cost of MMX and SSE moves.


Modified:
branches/gcc-4_9-branch/gcc/ChangeLog
branches/gcc-4_9-branch/gcc/config/i386/i386.c

[Bug rtl-optimization/67609] [5 Regression] Generates wrong code for SSE2 _mm_load_pd

2016-02-02 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #43 from rsandifo at gcc dot gnu.org  
---
FWIW, the proposed patch for PR69577 fixes this testcase
with the aarch64_cannot_change_mode_class change reverted.
The code quality looks slightly better too.

[Bug target/67032] [4.9/5/6 Regression] Geode optimizations incorrectly return -NaN

2016-02-02 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67032

Uroš Bizjak  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #17 from Uroš Bizjak  ---
Fixed everywhere.

[Bug c++/69631] New: Bogus overflow in constant expression error

2016-02-02 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69631

Bug ID: 69631
   Summary: Bogus overflow in constant expression error
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mpolacek at gcc dot gnu.org
  Target Milestone: ---

Starting with the C++ delayed folding merge, we reject this test with -fwrapv:

struct C {
  static const unsigned short max = static_cast((32767 * 2 +
1));
};

q.C:2:80: error: overflow in constant expression [-fpermissive]
   static const unsigned short max = static_cast((32767 * 2 +
1));
   
^
q.C:2:80: error: overflow in constant expression [-fpermissive]
q.C:2:80: error: overflow in constant expression [-fpermissive]
q.C:2:80: error: overflow in constant expression [-fpermissive]

[Bug tree-optimization/67282] Wrong code with -floop-nest-optimize

2016-02-02 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67282

--- Comment #2 from Marek Polacek  ---
I can reproduce this one with:
Using built-in specs.
COLLECT_GCC=./xgcc
Target: x86_64-unknown-linux-gnu
Configured with: /home/marek/src/gcc/configure --enable-languages=c,c++
--enable-checking=yes -with-system-zlib --disable-bootstrap --disable-libvtv
--disable-libcilkrts --disable-libitm --disable-libgomp --disable-libcc1
--disable-libstdcxx-pch --disable-libssp --enable-isl
Thread model: posix
gcc version 5.3.1 20160202 (GCC)

But not anymore with 6.

[Bug lto/69630] [6 Regression] LTO ICE in types_same_for_odr at ipa-devirt.c:402

2016-02-02 Thread burnus at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69630

Tobias Burnus  changed:

   What|Removed |Added

   Target Milestone|--- |6.0

[Bug rtl-optimization/69570] [6 Regression] if-conversion bug on i?86

2016-02-02 Thread bernds at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69570

--- Comment #12 from Bernd Schmidt  ---
Or lose the equality tests on the max values, instead use something like
  if (b > r && b >= g)
I suppose that could still have problems if b and g are equal and one of them
is spilled. Someone who knows that code would have to say whether that's a
problem or not.

[Bug c++/69632] New: No error issued for declaring a parameter having a late-specified return type without the 'auto' type specifier

2016-02-02 Thread ppalka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69632

Bug ID: 69632
   Summary: No error issued for declaring a parameter having a
late-specified return type without the 'auto' type
specifier
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ppalka at gcc dot gnu.org
  Target Milestone: ---

g++ does not issue an error for following the invalid declaration:

int foo (long (int) -> char);

This does not appear to be a regression. All versions of g++ (since the
introduction of C++11 support) accept this code.

[Bug c++/69277] [6 Regression] ICE mangling a flexible array member

2016-02-02 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69277

--- Comment #6 from Martin Sebor  ---
In response to another patch for a related problem Jason asked me to change the
representation of flexible array members in C++.  The alternate representation
has an impact on how this bug is dealt with so it's on hold pending Jason's
decision.  The patch with the alternate representation was posted last week and
pinged last night:
  https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01901.html
In his response from this morning Jason requests some additional tweaks to the
patch.  I should have an updated one by tomorrow.

[Bug rtl-optimization/69633] New: [6 Regression] Redundant move is generated after r228097

2016-02-02 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69633

Bug ID: 69633
   Summary: [6 Regression] Redundant move is generated after
r228097
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ysrumyan at gmail dot com
  Target Milestone: ---

Sorry, that we noticed this regression just now but not in September.
After Makarov's fix for 61578 ( and s390 regression) we noticed that for
attached simple test-case extracted from real benchmark one more redundant move
instruction is generated (till 20160202 compiler build):

before fix (postreload dump)
   86: NOTE_INSN_BASIC_BLOCK 4
   40: dx:QI=[si:SI]
   41: ax:QI=[si:SI+0x1]
   42: {si:SI=si:SI+0x3;clobber flags:CC;}
   43: dx:SI=zero_extend(dx:QI)
   44: ax:SI=zero_extend(ax:QI)
   45: cx:SI=zero_extend([si:SI-0x1])
   46: {di:SI=dx:SI*0x4c8b;clobber flags:CC;}
   47: {bx:SI=ax:SI*0x9646;clobber flags:CC;}
   48: {bx:SI=bx:SI+di:SI;clobber flags:CC;}
   49: {di:SI=cx:SI*0x1d2f;clobber flags:CC;}
   50: NOTE_INSN_DELETED
   51: bx:SI=bx:SI+di:SI+0x8000
   52: {bx:SI=bx:SI>>0x10;clobber flags:CC;}
   53: [bp:SI]=bx:QI
   96: bx:SI=dx:SI
   55: {bx:SI=bx:SI<<0xf;clobber flags:CC;}
   57: {bx:SI=bx:SI-dx:SI;clobber flags:CC;}

after fix
   86: NOTE_INSN_BASIC_BLOCK 4
   40: dx:QI=[si:SI]
   41: ax:QI=[si:SI+0x1]
   42: {si:SI=si:SI+0x3;clobber flags:CC;}
   43: dx:SI=zero_extend(dx:QI)
   44: ax:SI=zero_extend(ax:QI)
   45: cx:SI=zero_extend([si:SI-0x1])
   46: {di:SI=dx:SI*0x4c8b;clobber flags:CC;}
   47: {bx:SI=ax:SI*0x9646;clobber flags:CC;}
   48: {bx:SI=bx:SI+di:SI;clobber flags:CC;}
   49: {di:SI=cx:SI*0x1d2f;clobber flags:CC;}
   50: NOTE_INSN_DELETED
   51: bx:SI=bx:SI+di:SI+0x8000
   52: {bx:SI=bx:SI>>0x10;clobber flags:CC;}
   53: [bp:SI]=bx:QI
   96: bx:SI=dx:SI
   55: {bx:SI=bx:SI<<0xf;clobber flags:CC;}
   98: di:SI=bx:SI   !! redundnat move
   57: {di:SI=di:SI-dx:SI;clobber flags:CC;}

In result, we got >3% slowdown on Silvermont in pie & 32-bit mode.

[Bug c++/69632] No error issued for declaring a parameter having a late-specified return type without the 'auto' type specifier

2016-02-02 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69632

Jonathan Wakely  changed:

   What|Removed |Added

   Keywords||accepts-invalid
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-02-02
 Ever confirmed|0   |1

[Bug rtl-optimization/69633] [6 Regression] Redundant move is generated after r228097

2016-02-02 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69633

--- Comment #1 from Yuri Rumyantsev  ---
Created attachment 37559
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37559&action=edit
test-case to reproduce

Need to be compiled with -O2 -m32 -pie -fPIE.
Assume that -march=slm is not needed.

  1   2   >