[Bug target/88498] [9 Regression] FAIL: gcc.target/i386/avx512vl-pr79299-1.c (internal compiler error)

2018-12-14 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88498

--- Comment #3 from Terry Guo  ---
I just tried and both of failures are gone with Jakub's patch.

[Bug c++/58372] internal compiler error: ix86_compute_frame_layout

2018-10-30 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58372

--- Comment #21 from xuepeng guo  ---
Thanks for fix. I am glad to help to test it out.

[Bug c++/58372] internal compiler error: ix86_compute_frame_layout

2018-10-30 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58372

--- Comment #22 from xuepeng guo  ---
(In reply to Uroš Bizjak from comment #20)
> Created attachment 44928 [details]
> Proposed patch
> 
> It turned out that functions, called directly through emit_library_call (as
> the above testcase, which builds call to _Unwind_SjLj_Register from
> sjlj_emit_function_enter) miss a whole lot of stack realignmnet setup. There
> is an update to crtl->preferred_stack_boundary, but several updates for
> SUPPORTS_STACK_ALIGNMENT targets are missing, including eventual DRAP setup.
> 
> Attached patch fixes the path through emit_library_call.
> 
> Can someone please test the patch on SJLJ target?

On an x86_64 Linux platform, I simply configured gcc with command:
../gcc/configure --enable-sjlj-exceptions

Then with Uroš's patch, the gcc bootstrap has no problem and the case can be
successfully compiled. I am doing the gcc regression test to make sure there is
no regression with Uroš's patch, it will take some time to finish. I am also
attempting to cross-build for i686-w64-mingw32 to verify for Comment #14.

If I am missing something, please let me know. Thanks very much.

[Bug c++/58372] internal compiler error: ix86_compute_frame_layout

2018-10-30 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58372

--- Comment #23 from Terry Guo  ---
Hi Uroš:

With your fix, I identified two regressions so far: one is that we should run
the case you provided with c++ standard newer than c++11. The 'noexcept' was
introduced in c++14. Guess we need a directive like "{ ! target c++14_down }".
Another regression is related to -fsanitize=address shown as below:

./gcc/cc1plus use-after-scope-types-5.ii  -quiet -m32 -O0 -fsanitize=address 
-o use-after-scope-types-5.s
during RTL pass: expand
In file included from
/export/users/xuepengg/58372-ice-stack-alignment/gcc/gcc/testsuite/g++.dg/asan/use-after-scope-types-5.C:4:
/export/users/xuepengg/58372-ice-stack-alignment/gcc/gcc/testsuite/g++.dg/asan/use-after-scope-types.h:
In function ‘void test() [with T = char [1000]]’:
/export/users/xuepengg/58372-ice-stack-alignment/gcc/gcc/testsuite/g++.dg/asan/use-after-scope-types.h:22:51:
internal compiler error: in safe_as_a, at is-a.h:210
   22 | template  __attribute__((noinline)) void test() {
  |   ^~~~
0x11d5b1f rtx_insn* safe_as_a(rtx_def*)
../../gcc/gcc/is-a.h:210
0x11d5b1f NEXT_INSN(rtx_insn const*)
../../gcc/gcc/rtl.h:1461
0x11d5b1f ix86_get_drap_rtx
../../gcc/gcc/config/i386/i386.c:12050
0xa92e12 emit_library_call_value_1(int, rtx_def*, rtx_def*, libcall_type,
machine_mode, int, std::pair*)
../../gcc/gcc/calls.c:4757
0xecc975 emit_library_call(rtx_def*, libcall_type, machine_mode, rtx_def*,
machine_mode, rtx_def*, machine_mode, rtx_def*, machine_mode)
../../gcc/gcc/rtl.h:4149
0xecc975 asan_emit_stack_protection(rtx_def*, rtx_def*, unsigned int, long*,
tree_node**, int)
../../gcc/gcc/asan.c:1500
0xaa53ff expand_used_vars
../../gcc/gcc/cfgexpand.c:2273
0xaa6d13 execute
../../gcc/gcc/cfgexpand.c:6268
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug c++/58372] internal compiler error: ix86_compute_frame_layout

2018-10-30 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58372

--- Comment #24 from Terry Guo  ---
Created attachment 44934
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44934&action=edit
case to reproduce problem related to sanitize

[Bug c++/58372] internal compiler error: ix86_compute_frame_layout

2018-10-31 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58372

--- Comment #25 from Terry Guo  ---
Debugged the ICE further and found that below line in function
ix86_get_drap_rtx is causing ICE:

12050 insn = emit_insn_before (seq, NEXT_INSN (entry_of_function ()));

It is called when generating call to __asan_stack_free_5 via
emit_library_call_value_1. The entry_of_function() is returned something
invalid.

[Bug c++/58372] internal compiler error: ix86_compute_frame_layout

2018-10-31 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58372

--- Comment #26 from Terry Guo  ---
Hi Uroš:

I think I found why your proposed patch causes problem in Comment 23. It is all
about timing. The below code from patch is trying to set up DRAP reg in a
rather early stage when the function is not fully expanded to RTL.

+  if (crtl->drap_reg == NULL_RTX)
+   {
+ rtx drap_rtx = targetm.calls.get_drap_rtx ();

The targetm.calls.get_drap_rtx () will be hooked to ix86_get_drap_rtx () where
we will have code:

12046 drap_vreg = copy_to_reg (arg_ptr);
(gdb) 
12047 seq = get_insns ();
(gdb) 
12048 end_sequence ();
(gdb) 
12050 insn = emit_insn_before (seq, NEXT_INSN (entry_of_function ()));

At this stage, what returned from (entry_of_function ()) is actually GIMPLE
form of the function, not the RTL form we are expecting. Then NEXT_INSN
(something_in_gimple) goes wrong.

[Bug c++/58372] internal compiler error: ix86_compute_frame_layout

2018-10-31 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58372

--- Comment #31 from Terry Guo  ---
(In reply to Uroš Bizjak from comment #30)
> (In reply to Jakub Jelinek from comment #29)
> > > Let's ask Jakub about asan, if it is possible to move generation of the 
> > > call
> > > after the function is already expanded to RTL.
> > 
> > I'm afraid no.
> 
> Hm...
> 
> ... maybe we could go with following patch:
> 
> +  if (SUPPORTS_STACK_ALIGNMENT)
> +{
> +  if (preferred_stack_boundary > crtl->stack_alignment_estimated)
> + crtl->stack_alignment_estimated = preferred_stack_boundary;
> +  if (preferred_stack_boundary > crtl->stack_alignment_needed)
> + crtl->stack_alignment_needed = preferred_stack_boundary;
> +}
> 
> This means that for functions, emitted through emit_library_call, stack
> won't be realigned. This would cure the assert (and would follow a bit more
> expand_stack_alignment from cfgrtl.c).

I have same thought. I will test this one.

[Bug c++/58372] internal compiler error: ix86_compute_frame_layout

2018-10-31 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58372

--- Comment #32 from Terry Guo  ---
(In reply to David Grayson from comment #27)
> Thanks to everyone who is working on this.  I can confirm that the patch in
> comment #20 by Uroš Bizjak applies cleanly to GCC 7.3.0, and I successfully
> used the resulting toolchain targeting i686-w64-mingw32 to build Qt and
> several Qt GUI examples, all of which run correctly.
> 
> Just in case it helps you find more bugs: I noticed there are several other
> places in the code (of gcc-8-20181019) where ctrl->preferred_stack_boundary
> gets updated without any obvious update of ctrl->stack_alignment_needed:
> 
> gcc/explow.c:1247 in get_dynamic_stack_size
> gcc/explow.c:1595 in get_dynamic_stack_base
> gcc/calls.c:3811 in expand_call
> gcc/config/i386/i386.c:12593 in ix86_update_stack_boundary

Hello David,
Do you have instructions about how to build toolchain targeting
i686-w64-mingw32? I searched around and just found:
https://sourceforge.net/p/mingw-w64/code/HEAD/tree/trunk/mingw-w64-doc/howto-build/mingw-w64-howto-build.txt

[Bug c++/58372] internal compiler error: ix86_compute_frame_layout

2018-10-31 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58372

--- Comment #34 from Terry Guo  ---
(In reply to David Grayson from comment #33)
> Hello, Terry.  I'd be happy to help.  I hope you have access to a Linux
> computer.  I've actually spent a lot of time working on build scripts for
> cross-compilers running on Linux and here's what I have come up with for you:
> 
> https://gist.github.com/DavidEGrayson/d5ca447cca1ea23d5adca2f353dbb67a

Thanks David. I will give it a try.

[Bug c++/58372] internal compiler error: ix86_compute_frame_layout

2018-11-01 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58372

--- Comment #36 from Terry Guo  ---
(In reply to Uroš Bizjak from comment #35)
> 
> Actually, we can use crtl->stack_realign_processed to delay DRAP generation.
> The condition in the patch should be changed to:
> 
>   crtl->stack_realign_needed
>   = INCOMING_STACK_BOUNDARY < crtl->stack_alignment_estimated;
>   crtl->stack_realign_tried = crtl->stack_realign_needed;
> 
> --->  if (crtl->stack_realign_processed && crtl->drap_reg == NULL_RTX)
>   {
> rtx drap_rtx = targetm.calls.get_drap_rtx ();
> 
> Can you please test this change? The testcase from Comment #23 does not fail
> for me.

OK. Do it right now.

[Bug c++/58372] internal compiler error: ix86_compute_frame_layout

2018-11-01 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58372

--- Comment #39 from Terry Guo  ---
(In reply to Uroš Bizjak from comment #38)
> (In reply to Terry Guo from comment #36)
> 
> > OK. Do it right now.
> 
> I think that latest attachment is the one that should be tested.
> Functionally it is the same, but avoids unnecessary variable updates before
> expand_stack_alignment is called. expand_stack_alignment will do everything
> for us.

Yes. The latest one works perfectly. Bootstrap and regression test on x86_64
show no problem. I also managed to build a gcc for i686-w64-mingw32 with SJLJ
enabled, the case can be compiled successfully.

[Bug target/87853] _mm_cmpgt_epi8 broken with -funsigned-char

2018-11-01 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87853

--- Comment #3 from Terry Guo  ---
(In reply to H.J. Lu from comment #2)
> Xuepeng, can you take a look?

OK. Working on it now.

[Bug target/87853] _mm_cmpgt_epi8 broken with -funsigned-char

2018-11-01 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87853

--- Comment #4 from Terry Guo  ---
Thanks Derek for reporting. I can reproduce what Derek described and do think
this is a bug.

_mm_cmpgt_epi8 (__m128i __A, __m128i __B)
{
  return (__m128i) ((__v16qi)__A > (__v16qi)__B);
}

This one performs signed comparison. But the below definition of __v16qi could
be signed char by default or implicitly changed to be unsigned char with option
-funsigned-char.

typedef char __v16qi __attribute__ ((__vector_size__ (16)));

We may need a new definition in gcc like:

typedef signed char __v16qs __attribute__ ((__vector_size__ (16)));

I will sort out a patch to test this idea.

[Bug target/87853] _mm_cmpgt_epi8 broken with -funsigned-char

2018-11-02 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87853

--- Comment #5 from Terry Guo  ---
Hi folks,

What about a fix like below? I tested with bootstrap and regression test, there
is no problem.

diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index 7a6ff80..3c1f04b 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -45,6 +45,7 @@ typedef unsigned int __v4su __attribute__ ((__vector_size__
(16)));
 typedef short __v8hi __attribute__ ((__vector_size__ (16)));
 typedef unsigned short __v8hu __attribute__ ((__vector_size__ (16)));
 typedef char __v16qi __attribute__ ((__vector_size__ (16)));
+typedef signed char __v16qs __attribute__ ((__vector_size__ (16)));
 typedef unsigned char __v16qu __attribute__ ((__vector_size__ (16)));

 /* The Intel API is flexible enough that we must allow aliasing with other
@@ -1295,7 +1296,7 @@ _mm_xor_si128 (__m128i __A, __m128i __B)
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
 _mm_cmpeq_epi8 (__m128i __A, __m128i __B)
 {
-  return (__m128i) ((__v16qi)__A == (__v16qi)__B);
+  return (__m128i) ((__v16qs)__A == (__v16qs)__B);
 }

 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
@@ -1313,7 +1314,7 @@ _mm_cmpeq_epi32 (__m128i __A, __m128i __B)
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
 _mm_cmplt_epi8 (__m128i __A, __m128i __B)
 {
-  return (__m128i) ((__v16qi)__A < (__v16qi)__B);
+  return (__m128i) ((__v16qs)__A < (__v16qs)__B);
 }

 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
@@ -1331,7 +1332,7 @@ _mm_cmplt_epi32 (__m128i __A, __m128i __B)
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
 _mm_cmpgt_epi8 (__m128i __A, __m128i __B)
 {
-  return (__m128i) ((__v16qi)__A > (__v16qi)__B);
+  return (__m128i) ((__v16qs)__A > (__v16qs)__B);
 }

 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__,
__artificial__))

[Bug rtl-optimization/87718] [9 Regression] FAIL: gcc.target/i386/avx512dq-concatv2si-1.c

2018-11-13 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718

--- Comment #4 from Terry Guo  ---
(In reply to Uroš Bizjak from comment #2)
> Following testcase:
> 
> --cut here--
> typedef int V __attribute__((vector_size (8)));
> 
> void foo (int x, int y)
> {
>   register int a __asm ("xmm1");
>   register int b __asm ("xmm2");
>   register V c __asm ("xmm3");
>   a = x;
>   b = y;
>   asm volatile ("" : "+v" (a), "+v" (b));
>   c = (V) { a, b };
>   asm volatile ("" : "+v" (c));
> }
> --cut here--
> 
> gets compiled with -O2 -mavx -mtune=intel:
> 
> vmovd   %edi, %xmm1
> vmovd   %esi, %xmm2
> vmovd   %xmm2, %eax
> vpinsrd $1, %eax, %xmm1, %xmm3
> ret
> 
> The relevant pattern is defined as:
> 
> (define_insn "*vec_concatv2si_sse4_1"
>   [(set (match_operand:V2SI 0 "register_operand"
> "=Yr,*x, x, v,Yr,*x, v, v, *y,*y")
>   (vec_concat:V2SI
> (match_operand:SI 1 "nonimmediate_operand"
> "  0, 0, x,Yv, 0, 0,Yv,rm,  0,rm")
> (match_operand:SI 2 "nonimm_or_0_operand"
> " rm,rm,rm,rm,Yr,*x,Yv, C,*ym, C")))]
>   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
>   "@
>pinsrd\t{$1, %2, %0|%0, %2, 1}
>pinsrd\t{$1, %2, %0|%0, %2, 1}
>vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
>vpinsrd\t{$1, %2, %1, %0|%0, %1, %2, 1}
>punpckldq\t{%2, %0|%0, %2}
>punpckldq\t{%2, %0|%0, %2}
>vpunpckldq\t{%2, %1, %0|%0, %1, %2}
>%vmovd\t{%1, %0|%0, %1}
>punpckldq\t{%2, %0|%0, %2}
>movd\t{%1, %0|%0, %1}"
> 
> but for some reason RA chooses alternative 2 (x<-x,rm) instead of
> alternative 6 (v<-Yv,Yv), although alternative 2 needs an extra reload from
> %xmm2 to %eax.

I dig this a bit and looks like we missed something in combine pass, hence fail
to get a pattern that can match alternative 6. The combine pass dump of old gcc
shows:
---
  REG_UNUSED flags:CC
insn_cost 4 for10: r82:SI=xmm16:SI
  REG_DEAD xmm16:SI
insn_cost 4 for11: r83:SI=xmm17:SI
  REG_DEAD xmm17:SI
insn_cost 4 for12: r87:V2SI=vec_concat(r82:SI,r83:SI)
  REG_DEAD r83:SI
  REG_DEAD r82:SI
---

then we got:
---
Trying 10 -> 12:
   10: r82:SI=xmm16:SI
  REG_DEAD xmm16:SI
   12: r87:V2SI=vec_concat(r82:SI,r83:SI)
  REG_DEAD r83:SI
  REG_DEAD r82:SI
Successfully matched this instruction:
(set (reg:V2SI 87)
(vec_concat:V2SI (reg/v:SI 52 xmm16 [ a ])
(reg:SI 83 [ b.1_2 ])))
allowing combination of insns 10 and 12
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 10.
modifying insn i312: r87:V2SI=vec_concat(xmm16:SI,r83:SI)
  REG_DEAD xmm16:SI
  REG_DEAD r83:SI
deferring rescan insn with uid = 12.

Trying 11 -> 12:
   11: r83:SI=xmm17:SI
  REG_DEAD xmm17:SI
   12: r87:V2SI=vec_concat(xmm16:SI,r83:SI)
  REG_DEAD xmm16:SI
  REG_DEAD r83:SI
Successfully matched this instruction:
(set (reg:V2SI 87)
(vec_concat:V2SI (reg/v:SI 52 xmm16 [ a ])
(reg/v:SI 53 xmm17 [ b ])))
allowing combination of insns 11 and 12
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 11.
modifying insn i312: r87:V2SI=vec_concat(xmm16:SI,xmm17:SI)
  REG_DEAD xmm17:SI
  REG_DEAD xmm16:SI
deferring rescan insn with uid = 12.
---

There are two successful combine attempts. We end up with pattern that can
match alternative 6.

However dump from current GCC trunk shows:
---
insn_cost 4 for19: r90:SI=xmm16:SI
  REG_DEAD xmm16:SI
insn_cost 4 for10: r82:SI=r90:SI
  REG_DEAD r90:SI
insn_cost 4 for20: r91:SI=xmm17:SI
  REG_DEAD xmm17:SI
insn_cost 4 for11: r83:SI=r91:SI
  REG_DEAD r91:SI
insn_cost 4 for12: r87:V2SI=vec_concat(r82:SI,r83:SI)
  REG_DEAD r83:SI
  REG_DEAD r82:SI
insn_cost 4 for13: xmm3:V2SI=r87:V2SI
  REG_DEAD r87:V2SI
---
Trying 11 -> 12:
   11: r83:SI=r91:SI
  REG_DEAD r91:SI
   12: r87:V2SI=vec_concat(r90:SI,r83:SI)
  REG_DEAD r90:SI
  REG_DEAD r83:SI
Successfully matched this instruction:
(set (reg:V2SI 87)
(vec_concat:V2SI (reg:SI 90)
(reg:SI 91)))
allowing combination of insns 11 and 12
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 11.
modifying insn i312: r87:V2SI=vec_concat(r90:SI,r91:SI)
  REG_DEAD r91:SI
  REG_DEAD r90:SI
deferring rescan insn with uid = 12.
---

We end up with "12: r87:V2SI=vec_concat(r90:SI,r91:SI)", later in LRA pass, the
operand r90 is replaced with XMM register, the r91 is kept as general register.
Then no chance match against preferred alternative 6.

[Bug target/87853] _mm_cmpgt_epi8 broken with -funsigned-char

2018-11-20 Thread xuepeng.guo at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87853

--- Comment #10 from Terry Guo  ---
(In reply to Uroš Bizjak from comment #9)
> (In reply to Martin Liška from comment #8)
> > Uros: Can the bug be marked as resolved? Or please update Known to work.
> 
> Patch needs to be backported to release branches.

Am doing it now.