[Bug target/85473] internal compiler error: in emit_move_insn, at expr.c:3722
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85473 --- Comment #3 from Sebastian Peryt --- Proposed patch sent to ML: https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01011.html
[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 --- Comment #39 from Sebastian Peryt --- I have tested it on SKX with SPEC2006INT and SPEC2017INT and don't see any regressions.
[Bug target/83546] -march=silvermont doesn't enable rdrnd by default despite what docs say
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83546 Sebastian Peryt changed: What|Removed |Added CC||sebastian.peryt at intel dot com --- Comment #1 from Sebastian Peryt --- Patch sent to mailing list: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01338.html
[Bug middle-end/84200] r256888 causes 30% performance regression of 519.lbm_r at -Ofast generic tuning on Zen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84200 Sebastian Peryt changed: What|Removed |Added CC||sebastian.peryt at intel dot com --- Comment #1 from Sebastian Peryt --- I'm not sure if that can be treated as duplicate but that performance degradation looks like is related to PR84149.
[Bug c/84431] Suboptimal code for masked shifts (x86/x86-64)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84431 Sebastian Peryt changed: What|Removed |Added CC||sebastian.peryt at intel dot com --- Comment #1 from Sebastian Peryt --- Ruslan, can you provide which compilation options you have used to reproduce this issue?
[Bug c++/84783] Missing _mm256_permutexvar_epi64() intrinsic for AVX512VL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84783 --- Comment #1 from Sebastian Peryt --- It was added in r249759 I can see it in latest trunk. Maybe you have some old version of GCC?
[Bug c++/84783] Missing _mm256_permutexvar_epi64() intrinsic for AVX512VL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84783 --- Comment #2 from Sebastian Peryt --- Oh, ok I see now version in report. Sorry, my mistake. It was added to trunk and not backported.
[Bug c++/84783] Missing _mm256_permutexvar_epi64() intrinsic for AVX512VL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84783 --- Comment #3 from Sebastian Peryt --- Proposed patch sent to list https://gcc.gnu.org/ml/gcc-patches/2018-03/msg01181.html
[Bug target/80862] New: [x86] Wrong rounding results for some test cases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80862 Bug ID: 80862 Summary: [x86] Wrong rounding results for some test cases Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sebastian.peryt at intel dot com CC: julia.koval at intel dot com, ubizjak at gmail dot com Target Milestone: --- Target: X86 Created attachment 41408 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41408&action=edit Patch to reproduce described error. Recently I have found that rounding intrinsics for some particular cases produce wrong results. There have to be three specific conditions fulfilled to produce it: - test has to be compiled with O1 or O2 (doesn't appear on O0), - test case has to have only two intrinsics - regular (e.g. _mm512_cvtps_epi32) and round (e.g. _mm512_cvt_roundps_epi32), - both intrinsics must use the same input argument. As a result value from first (regular) intrinsic is copied to the second (round)intrinsic result. In asm output it can be seen that the same register is used for both assignments: vcvtps2dq %zmm0, %zmm1 vmovdqa64 %zmm1, -368(%rbp) pushq -312(%rbp) pushq -320(%rbp) pushq -328(%rbp) vcvtps2dq {rz-sae}, %zmm0, %zmm0 pushq -336(%rbp) vmovdqa64 %zmm1, -304(%rbp) >From what I gathered so far this is happening due to the use of parallel side effect for rounding md template in i386/subst.md. Because parallel is executing each side effect individually at first, on cse1 pass the part which is similar for both intrinsics get optimized. After that the same register is assigned for move operation in both assignments of the results and effectively regular and round intrinsic produces the same result. Probably some other side effect has to be used to set rounding flags to fix this issue, but I am not sure which one it should be. Eventually some modifications have to be made in cse.c to properly handle such use of parallel.
[Bug web/80941] New: Broken bookmarks on GCC internals PDF available online
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80941 Bug ID: 80941 Summary: Broken bookmarks on GCC internals PDF available online Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: web Assignee: unassigned at gcc dot gnu.org Reporter: sebastian.peryt at intel dot com Target Milestone: --- There is possible bug present in GCC internals documentation PDF file present on GCC website under: https://gcc.gnu.org/onlinedocs/gccint.pdf Whether document has been downloaded or is browsed online two bookmarks appear to be broken: - Machine Descriptions - Constraints for Particular Machines under Machine Descriptions -> Operand Constraints Whichever bookmark is pressed it jumps to the beginning of the document (when PDF has been downloaded or on IE) or stays where it is (on Chrome). This bug is present in current version as well as in 7.1.0 (https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gccint.pdf), but not in docs build from sources using make pdf.
[Bug target/81034] New: [x86] Broken IRA pass when printing results of intrinsic execution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81034 Bug ID: 81034 Summary: [x86] Broken IRA pass when printing results of intrinsic execution Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: sebastian.peryt at intel dot com Target Milestone: --- Target: x86_64-*-*, i?86-*-* Created attachment 41516 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41516&action=edit Reproducible During missing intrinsic implementation I found strange bug connected to printf. The reproducer is attached. The intrinsics I've been implementing are _mm_mask_load_sd and _mm_maskz_load_sd. Both of them use the same md. Unfortunately, with _mm_maskz_load_sd, when I want to printf values that were generated for some particular cases it breaks compilation and results in following error: during RTL pass: ira In file included from /gcc/gcc/testsuite/gcc.target/i386/avx512f-vmovsd-2.c:5:0: /gcc/gcc/testsuite/gcc.target/i386/avx512f-check.h: In function ‘do_test’: /gcc/gcc/testsuite/gcc.target/i386/avx512f-check.h:11:1: internal compiler error: in wide_int_to_tree, at tree.c:1487 0xe9e9f3 wide_int_to_tree(tree_node*, generic_wide_int > const&) ../../gcc/gcc/tree.c:1487 0x895df6 make_tree(tree_node*, rtx_def*) ../../gcc/gcc/expmed.c:5113 0x895e8b make_tree(tree_node*, rtx_def*) ../../gcc/gcc/expmed.c:5139 0xee83a2 force_const_mem(machine_mode, rtx_def*) ../../gcc/gcc/varasm.c:3733 0xa1652b setup_reg_equiv ../../gcc/gcc/ira.c:3992 0xa1652b ira ../../gcc/gcc/ira.c:5244 0xa1652b execute ../../gcc/gcc/ira.c:5580 To reproduce this error two conditions has to be met: - mask value has to be either 0 or 2 - optimization has to be O2, O3 or Ofast It is also interesting that for Os optimization it works. When printf of res4 in attached code is commented out it also compiles. On the other hand, for res3 printf doesn't make a difference - it always works. I have compared passes' dumps for version with res4 printf and without and I found some interesting discrepancies there: 1. On 029t.einline pass, in compiling (non-printf) version function do_test () has been partially expanded by what looks like to be the content of avx512f_test () function. 2. On 051i.ipa_oacc pass the order of functions in dump files has been changed and optimized - for non-compiling one setting the order: a.avx512f_test () b.main () c.do_test () and for compiling one: a.main () b.do_test () 3. On 087t.fixup_cfg4 pass main () is totally deleted from not compiling version leaving only do_test (). I would appreciate any input on that issue.
[Bug target/81034] [x86] Broken IRA pass when printing results of intrinsic execution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81034 --- Comment #2 from Sebastian Peryt --- I agree, that vec_merge takes 3 operands. And 3 are in my md (naming according to GCC internals): vec1: (vec_merge:V2DF (match_operand:V2DF 1 "nonimmediate_operand" "m") (match_operand:V2DF 2 "vector_move_operand" "0C") (match_operand:QI 3 "register_operand" "Yk")) vec2: (const_vector:V2DF [(const_int 0) (const_int 0)]) items: (const_int 1) I am not sure if const_vec should have been here match operand actually. Maybe not with vec_merge, but still similar use can be seen already in sse.md e.g. - floatv2div2sf2_mask. Also, if md would be wrong I'd expect some other issue show up also, and both intrinsic not work. My best guess is that the problem might be due to the fact that with mask 0 or 2 all of the elements in the vector are actually 0 and this might be getting optimized.
[Bug web/80941] Broken bookmarks on GCC internals PDF available online
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80941 Sebastian Peryt changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #1 from Sebastian Peryt --- Looking at the current version of GCC Internals available online, it looks like the issue has been fixed.
[Bug target/82268] [8 regression] i386/pr82196-1.c fail
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82268 --- Comment #3 from Sebastian Peryt --- It passes with the provided modification.
[Bug target/82767] [8 regression] gcc.target/i386/pr71321.c scan-assembler-times fail
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82767 Sebastian Peryt changed: What|Removed |Added CC||sebastian.peryt at intel dot com --- Comment #1 from Sebastian Peryt --- Fix seems to be basic for this case. My proposition below: diff --git a/gcc/testsuite/gcc.target/i386/pr71321.c b/gcc/testsuite/gcc.target/i386/pr71321.c index 7b00097..4931b88 100644 --- a/gcc/testsuite/gcc.target/i386/pr71321.c +++ b/gcc/testsuite/gcc.target/i386/pr71321.c @@ -12,5 +12,5 @@ unsigned cvt_to_2digit_ascii(uint8_t i) { return cvt_to_2digit(i, 10) + 0x0a3030; } -/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,4" 3 } } */ +/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,4" 2 } } */ /* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,8" 1 } } */
[Bug target/82767] [8 regression] gcc.target/i386/pr71321.c scan-assembler-times fail
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82767 --- Comment #2 from Sebastian Peryt --- Candidate patch: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00308.html
[Bug target/82767] [8 regression] gcc.target/i386/pr71321.c scan-assembler-times fail
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82767 --- Comment #3 from Sebastian Peryt --- As per Uros's suggestion (https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00329.html ) I checked -mtune=generic idea and this works without additional changes either in testcase or in cost model. >From what I gathered the cause of this test failing is below change in core cost model: /gcc/config/i386/x86-tune-costs.h @@ -2253,7 +2253,7 @@ struct processor_costs core_cost = { COSTS_N_INSNS (4), /* DI */ COSTS_N_INSNS (4)}, /*other */ 0, /* cost of multiply per each bit set */ + {COSTS_N_INSNS (8), /* cost of a divide/mod for QI */ - {COSTS_N_INSNS (18), /* cost of a divide/mod for QI */ COSTS_N_INSNS (8), /* HI */ /* 8-11 */ COSTS_N_INSNS (11), /* SI */ Because most of Intel's CPUs are using core_cost model (including haswell) this testcase is failing without additional tuning to generic cost model, which is still using old cost values.
[Bug target/82942] Generate vzeroupper with -mavx512f -mno-avx512er -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82942 Sebastian Peryt changed: What|Removed |Added CC||sebastian.peryt at intel dot com --- Comment #6 from Sebastian Peryt --- Patch has been sent: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01052.html
[Bug target/82941] Missing vzeroupper with -march=skylake-avx512 -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82941 Sebastian Peryt changed: What|Removed |Added CC||sebastian.peryt at intel dot com --- Comment #1 from Sebastian Peryt --- Patch has been sent: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01052.html
[Bug target/82990] Update the default -mzeroupper setting
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82990 --- Comment #5 from Sebastian Peryt --- (In reply to H.J. Lu from comment #3) > Created attachment 42611 [details] > A better patch > > Sebastian, please take a look. LGTM
[Bug target/82767] [8 regression] gcc.target/i386/pr71321.c scan-assembler-times fail
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82767 --- Comment #4 from Sebastian Peryt --- Created attachment 42632 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42632&action=edit Proposed patch to fix PR. Better patch.