[Bug target/85473] internal compiler error: in emit_move_insn, at expr.c:3722

2018-04-20 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85473

--- Comment #3 from Sebastian Peryt  ---
Proposed patch sent to ML:
https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01011.html

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-12-14 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #39 from Sebastian Peryt  ---
I have tested it on SKX with SPEC2006INT and SPEC2017INT and don't see any
regressions.

[Bug target/83546] -march=silvermont doesn't enable rdrnd by default despite what docs say

2018-01-15 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83546

Sebastian Peryt  changed:

   What|Removed |Added

 CC||sebastian.peryt at intel dot 
com

--- Comment #1 from Sebastian Peryt  ---
Patch sent to mailing list:
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01338.html

[Bug middle-end/84200] r256888 causes 30% performance regression of 519.lbm_r at -Ofast generic tuning on Zen

2018-02-05 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84200

Sebastian Peryt  changed:

   What|Removed |Added

 CC||sebastian.peryt at intel dot 
com

--- Comment #1 from Sebastian Peryt  ---
I'm not sure if that can be treated as duplicate but that performance
degradation looks like is related to PR84149.

[Bug c/84431] Suboptimal code for masked shifts (x86/x86-64)

2018-02-19 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84431

Sebastian Peryt  changed:

   What|Removed |Added

 CC||sebastian.peryt at intel dot 
com

--- Comment #1 from Sebastian Peryt  ---
Ruslan, can you provide which compilation options you have used to reproduce
this issue?

[Bug c++/84783] Missing _mm256_permutexvar_epi64() intrinsic for AVX512VL

2018-03-12 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84783

--- Comment #1 from Sebastian Peryt  ---
It was added in r249759 I can see it in latest trunk. Maybe you have some old
version of GCC?

[Bug c++/84783] Missing _mm256_permutexvar_epi64() intrinsic for AVX512VL

2018-03-12 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84783

--- Comment #2 from Sebastian Peryt  ---
Oh, ok I see now version in report. Sorry, my mistake. It was added to trunk
and not backported.

[Bug c++/84783] Missing _mm256_permutexvar_epi64() intrinsic for AVX512VL

2018-03-22 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84783

--- Comment #3 from Sebastian Peryt  ---
Proposed patch sent to list
https://gcc.gnu.org/ml/gcc-patches/2018-03/msg01181.html

[Bug target/80862] New: [x86] Wrong rounding results for some test cases

2017-05-23 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80862

Bug ID: 80862
   Summary: [x86] Wrong rounding results for some test cases
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sebastian.peryt at intel dot com
CC: julia.koval at intel dot com, ubizjak at gmail dot com
  Target Milestone: ---
Target: X86

Created attachment 41408
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41408&action=edit
Patch to reproduce described error.

Recently I have found that rounding intrinsics for some particular cases
produce wrong results. There have to be three specific conditions fulfilled to
produce it:
- test has to be compiled with O1 or O2 (doesn't appear on O0),
- test case has to have only two intrinsics - regular (e.g. _mm512_cvtps_epi32)
and round (e.g. _mm512_cvt_roundps_epi32),
- both intrinsics must use the same input argument.

As a result value from first (regular) intrinsic is copied to the second
(round)intrinsic result. In asm output it can be seen that the same register is
used for both assignments:

vcvtps2dq %zmm0, %zmm1
vmovdqa64 %zmm1, -368(%rbp)
pushq -312(%rbp)
pushq -320(%rbp)
pushq -328(%rbp)
vcvtps2dq {rz-sae}, %zmm0, %zmm0
pushq -336(%rbp)
vmovdqa64 %zmm1, -304(%rbp)

>From what I gathered so far this is happening due to the use of parallel side
effect for rounding md template in i386/subst.md. Because parallel is executing
each side effect individually at first, on cse1 pass the part which is similar
for both intrinsics get optimized. After that the same register is assigned for
move operation in both assignments of the results and effectively regular and
round intrinsic produces the same result.

Probably some other side effect has to be used to set rounding flags to fix
this issue, but I am not sure which one it should be. Eventually some
modifications have to be made in cse.c to properly handle such use of parallel.

[Bug web/80941] New: Broken bookmarks on GCC internals PDF available online

2017-06-01 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80941

Bug ID: 80941
   Summary: Broken bookmarks on GCC internals PDF available online
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: web
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sebastian.peryt at intel dot com
  Target Milestone: ---

There is possible bug present in GCC internals documentation PDF file present
on GCC website under: https://gcc.gnu.org/onlinedocs/gccint.pdf

Whether document has been downloaded or is browsed online two bookmarks appear
to be broken:

- Machine  Descriptions
- Constraints for Particular Machines under Machine Descriptions -> Operand
Constraints

Whichever bookmark is pressed it jumps to the beginning of the document (when
PDF has been downloaded or on IE) or stays where it is (on Chrome). This bug is
present in current version as well as in 7.1.0
(https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gccint.pdf), but not in docs build
from sources using make pdf.

[Bug target/81034] New: [x86] Broken IRA pass when printing results of intrinsic execution

2017-06-09 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81034

Bug ID: 81034
   Summary: [x86] Broken IRA pass when printing results of
intrinsic execution
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sebastian.peryt at intel dot com
  Target Milestone: ---
Target: x86_64-*-*, i?86-*-*

Created attachment 41516
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41516&action=edit
Reproducible

During missing intrinsic implementation I found strange bug connected to
printf. The reproducer is attached.

The intrinsics I've been implementing are _mm_mask_load_sd and
_mm_maskz_load_sd. Both of them use the same md. Unfortunately, with
_mm_maskz_load_sd, when I want to printf values that were generated for some
particular cases it breaks compilation and results in following error:

during RTL pass: ira
In file included from
/gcc/gcc/testsuite/gcc.target/i386/avx512f-vmovsd-2.c:5:0:
/gcc/gcc/testsuite/gcc.target/i386/avx512f-check.h: In function ‘do_test’:
/gcc/gcc/testsuite/gcc.target/i386/avx512f-check.h:11:1: internal compiler
error: in wide_int_to_tree, at tree.c:1487
0xe9e9f3 wide_int_to_tree(tree_node*,
generic_wide_int > const&)
../../gcc/gcc/tree.c:1487
0x895df6 make_tree(tree_node*, rtx_def*)
../../gcc/gcc/expmed.c:5113
0x895e8b make_tree(tree_node*, rtx_def*)
../../gcc/gcc/expmed.c:5139
0xee83a2 force_const_mem(machine_mode, rtx_def*)
../../gcc/gcc/varasm.c:3733
0xa1652b setup_reg_equiv
../../gcc/gcc/ira.c:3992
0xa1652b ira
../../gcc/gcc/ira.c:5244
0xa1652b execute
../../gcc/gcc/ira.c:5580

To reproduce this error two conditions has to be met:
- mask value has to be either 0 or 2
- optimization has to be O2, O3 or Ofast

It is also interesting that for Os optimization it works.

When printf of res4 in attached code is commented out it also compiles. On the
other hand, for res3 printf doesn't make a difference - it always works.

I have compared passes' dumps for version with res4 printf and without and I
found some interesting discrepancies there:
1. On 029t.einline pass, in compiling (non-printf) version function do_test ()
has been partially expanded by what looks like to be the content of
avx512f_test () function.
2. On 051i.ipa_oacc pass the order of functions in dump files has been changed
and optimized - for non-compiling one setting the order:
  a.avx512f_test ()
  b.main ()
  c.do_test ()
and for compiling one:
  a.main ()
  b.do_test ()
3. On 087t.fixup_cfg4 pass main () is totally deleted from not compiling
version leaving only do_test ().

I would appreciate any input on that issue.

[Bug target/81034] [x86] Broken IRA pass when printing results of intrinsic execution

2017-06-09 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81034

--- Comment #2 from Sebastian Peryt  ---
I agree, that vec_merge takes 3 operands. And 3 are in my md (naming according
to GCC internals):

vec1:
(vec_merge:V2DF
(match_operand:V2DF 1 "nonimmediate_operand" "m")
(match_operand:V2DF 2 "vector_move_operand" "0C")
(match_operand:QI 3 "register_operand" "Yk"))

vec2:
(const_vector:V2DF [(const_int 0) (const_int 0)])

items:
(const_int 1)

I am not sure if const_vec should have been here match operand actually. Maybe
not with vec_merge, but still similar use can be seen already in sse.md e.g. -
floatv2div2sf2_mask.

Also, if md would be wrong I'd expect some other issue show up also, and both
intrinsic not work. 

My best guess is that the problem might be due to the fact that with mask 0 or
2 all of the elements in the vector are actually 0 and this might be getting
optimized.

[Bug web/80941] Broken bookmarks on GCC internals PDF available online

2017-06-10 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80941

Sebastian Peryt  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Sebastian Peryt  ---
Looking at the current version of GCC Internals available online, it looks like
the issue has been fixed.

[Bug target/82268] [8 regression] i386/pr82196-1.c fail

2017-10-23 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82268

--- Comment #3 from Sebastian Peryt  ---
It passes with the provided modification.

[Bug target/82767] [8 regression] gcc.target/i386/pr71321.c scan-assembler-times fail

2017-10-30 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82767

Sebastian Peryt  changed:

   What|Removed |Added

 CC||sebastian.peryt at intel dot 
com

--- Comment #1 from Sebastian Peryt  ---
Fix seems to be basic for this case. My proposition below:

diff --git a/gcc/testsuite/gcc.target/i386/pr71321.c
b/gcc/testsuite/gcc.target/i386/pr71321.c
index 7b00097..4931b88 100644
--- a/gcc/testsuite/gcc.target/i386/pr71321.c
+++ b/gcc/testsuite/gcc.target/i386/pr71321.c
@@ -12,5 +12,5 @@ unsigned cvt_to_2digit_ascii(uint8_t i)
 {
   return cvt_to_2digit(i, 10) + 0x0a3030;
 }
-/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,4" 3
} } */
+/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,4" 2
} } */
 /* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,8" 1
} } */

[Bug target/82767] [8 regression] gcc.target/i386/pr71321.c scan-assembler-times fail

2017-11-05 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82767

--- Comment #2 from Sebastian Peryt  ---
Candidate patch:
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00308.html

[Bug target/82767] [8 regression] gcc.target/i386/pr71321.c scan-assembler-times fail

2017-11-06 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82767

--- Comment #3 from Sebastian Peryt  ---
As per Uros's suggestion
(https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00329.html ) I checked
-mtune=generic idea and this works without additional changes either in
testcase or in cost model.

>From what I gathered the cause of this test failing is below change in core
cost model:
/gcc/config/i386/x86-tune-costs.h
@@ -2253,7 +2253,7 @@ struct processor_costs core_cost = {
COSTS_N_INSNS (4),  /*   DI */
COSTS_N_INSNS (4)}, /*other */
   0,   /* cost of multiply per each bit set */
+  {COSTS_N_INSNS (8),  /* cost of a divide/mod for QI */
-  {COSTS_N_INSNS (18), /* cost of a divide/mod for QI */
COSTS_N_INSNS (8),  /*  HI */
/* 8-11 */
COSTS_N_INSNS (11), /*  SI */

Because most of Intel's CPUs are using core_cost model (including haswell) this
testcase is failing without additional tuning to generic cost model, which is
still using old cost values.

[Bug target/82942] Generate vzeroupper with -mavx512f -mno-avx512er -O2

2017-11-14 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82942

Sebastian Peryt  changed:

   What|Removed |Added

 CC||sebastian.peryt at intel dot 
com

--- Comment #6 from Sebastian Peryt  ---
Patch has been sent: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01052.html

[Bug target/82941] Missing vzeroupper with -march=skylake-avx512 -O2

2017-11-14 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82941

Sebastian Peryt  changed:

   What|Removed |Added

 CC||sebastian.peryt at intel dot 
com

--- Comment #1 from Sebastian Peryt  ---
Patch has been sent: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01052.html

[Bug target/82990] Update the default -mzeroupper setting

2017-11-15 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82990

--- Comment #5 from Sebastian Peryt  ---
(In reply to H.J. Lu from comment #3)
> Created attachment 42611 [details]
> A better patch
> 
> Sebastian, please take a look.
LGTM

[Bug target/82767] [8 regression] gcc.target/i386/pr71321.c scan-assembler-times fail

2017-11-17 Thread sebastian.peryt at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82767

--- Comment #4 from Sebastian Peryt  ---
Created attachment 42632
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42632&action=edit
Proposed patch to fix PR.

Better patch.