Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)
On Mon, 29 Jan 2018, Jeff Law wrote: > On 01/29/2018 04:37 PM, Jakub Jelinek wrote: > > Hi! > > > > As mentioned in the PR, cunroll sometimes registers some SSA_NAMEs for > > renaming and then invokes some SCEV using functions before finally updating > > the SSA form. On the testcase we end up with 3 degenerate PHIs pointing at > > each other, so follow_copies_to_constant loops forever. > I'm at a loss to understand how we could get in that situation. Was it > edge removal (if so for which PHI) or the duplicate process that created > the loop? The issue is that static bool tree_unroll_loops_completely_1 (bool may_increase_size, bool unroll_outer, bitmap father_bbs, struct loop *loop) { struct loop *loop_father; bool changed = false; struct loop *inner; enum unroll_level ul; /* Process inner loops first. */ for (inner = loop->inner; inner != NULL; inner = inner->next) changed |= tree_unroll_loops_completely_1 (may_increase_size, unroll_outer, father_bbs, inner); when recursing tree_unroll_loops_completely_1 appends unrolled bodies and loops contained therein in the loop->inner chain. We specifically do _not_ allow processing of outer loops here due to not up-to-date SSA form and then iterate the unrolling process instead: /* If we changed an inner loop we cannot process outer loops in this iteration because SSA form is not up-to-date. Continue with siblings of outer loops instead. */ if (changed) return true; so I think the proper fix is to avoid visiting loops that were added in the above iteration over loop->inner. For example by collecting the loop->inner chain in a vec and then iterating over that (much like FOR_EACH_LOOP does). Or rely on the fact that loop->num of new loops will be >= number_of_loops () [number_of_loops () is really mis-named, would be a good time to add max_loop_num () or so] Richard. > We've seen one case where this happened inside DOM, but it was only > because we ignored an unexecutable edge which then caused a PHI to > become a degenerate.One or more of the PHI args in the degenerate > was marked as EDGE_DFS_BACK on its associated edge. > > Are any of the PHI args in pr84111 marked in a similar manner? > > > > > The following patch has 2 different fixes for this, one is to avoid looking > > at SSA_NAMEs registered for update (in theory all we care are the old ssa > > names, but we don't have an API to query just those), the other is to put > > some upper bound on how many times we follow SSA_NAMEs (either single defs > > or degenerate phis). Either of those changes is sufficient to fix this. > Interestingly enough I was looking at a bug several weeks ago where the > ability to query one of the sets (I can't remember if it was old or new) > of names for rewriting would have been useful. > > > > > > During x86_64-linux and i686-linux bootstraps/regtests > > follow_copies_to_constant > > actually never returned anything useful, so the patch doesn't regress at > > least on these targets anything, there are quite a few cases where SSA_NAME > > is name_registered_for_update_p but the function would never return anything > > but var in those cases. And the highest depth ever seen was 3, except for > > the newly added testcase if the && !name_registered_for_update_p (res) part > > is commented out. > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > 2018-01-29 Jakub Jelinek > > > > PR tree-optimization/84111 > > * tree-scalar-evolution.c: Include tree-into-ssa.h. > > (follow_copies_to_constant): Stop on SSA_NAMEs registered for update, > > loop at most 128 times. > > > > * gcc.c-torture/compile/pr84111.c: New test. > Probably reasonable. Though I would like to better understand how we > got the degenerates forming a loop before final ack. > > jeff > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)
On Mon, Jan 29, 2018 at 11:58:45PM -0700, Jeff Law wrote: > On 01/29/2018 04:37 PM, Jakub Jelinek wrote: > > Hi! > > > > As mentioned in the PR, cunroll sometimes registers some SSA_NAMEs for > > renaming and then invokes some SCEV using functions before finally updating > > the SSA form. On the testcase we end up with 3 degenerate PHIs pointing at > > each other, so follow_copies_to_constant loops forever. > I'm at a loss to understand how we could get in that situation. Was it > edge removal (if so for which PHI) or the duplicate process that created > the loop? The original loop before unrolling had following PHIs for the y parameter: [local count: 118111601]: # y_37 = PHI [local count: 3277506]: # y_29 = PHI [local count: 3498303]: # y_8 = PHI and actually no other PHIs, so effectively all uses of y in the function could be y_20(D), but we haven't cleaned it up yet; before pre we had: [local count: 118111601]: # y_29 = PHI <0(4), y_37(3)> which prevented optimizing everything into y_20(D). In the cunroll emergency dump I've done when it was looping I got: ;; basic block 15, loop depth 0 ;;pred: 2 # y_25 = PHI ;; basic block 17, loop depth 1 ;;pred: 16 ;;19 # y_10 = PHI ;; basic block 20, loop depth 0 ;;pred: 18 # y_5 = PHI ;; basic block 3, loop depth 2 ;;pred: 11 ;;14 # y_37 = PHI ;; basic block 13, loop depth 1 ;;pred: 5 # y_29 = PHI ;; basic block 6, loop depth 1 ;;pred: 20 ;;13 # y_8 = PHI where bb 3, 13 and 6 are the same PHIs as before which aren't going to be renamed, and bb20 is the new cunroll copy of bb13, bb17 copy of bb3. The PHI results of the PHIs in the new bbs already have a new SSA_NAME, but the PHI arguments are waiting to be renamed. Already before the cunroll, 2 of the PHIs are degenerate and only one is not. When SCEV is called, the only non-degenerate among those 3 previously looks like degenerate too, but we'll be replacing y_29(20) in there with y_5(20) during ssa update. Now, with the patch at the end of cunroll after ssa update and cfg cleanup we have: [local count: 18182121]: # y_10 = PHI [local count: 2063999]: # y_5 = PHI PHIs for y and nothing else and finally phicprop2 optimizes all y references to just y_20(D). > > The following patch has 2 different fixes for this, one is to avoid looking > > at SSA_NAMEs registered for update (in theory all we care are the old ssa > > names, but we don't have an API to query just those), the other is to put > > some upper bound on how many times we follow SSA_NAMEs (either single defs > > or degenerate phis). Either of those changes is sufficient to fix this. > Interestingly enough I was looking at a bug several weeks ago where the > ability to query one of the sets (I can't remember if it was old or new) > of names for rewriting would have been useful. tree-cfg.c has a comment like that: tree-cfg.c- /* Technically only new names matter. */ tree-cfg.c: if (name_registered_for_update_p (PHI_RESULT (phi))) tree-cfg.c- return false; Jakub
Contents of PO file 'cpplib-8.1-b20180128.vi.po'
cpplib-8.1-b20180128.vi.po.gz Description: Binary data The Translation Project robot, in the name of your translation coordinator.
New Vietnamese PO file for 'cpplib' (version 8.1-b20180128)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'cpplib' has been submitted by the Vietnamese team of translators. The file is available at: http://translationproject.org/latest/cpplib/vi.po (This file, 'cpplib-8.1-b20180128.vi.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/cpplib/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/cpplib.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
[PATCH, i386, AVX] PR target/83828: fix AVX-512BITALG test failures
Hello, If masked variant of vpopcnt[w,q] is being issued: there's no way for reload to put 32/64 bit mask into mask register, since kmov[d,q] are only available under -mavx512bw switch. We can insist user to issue -mavx512bw along w/ -mavx512bitalg if she is going to use masked variants of corresponding intrinsics. gcc/testsuite PR target/83828 * gcc.target/i386/avx512bitalg-vpopcntb-1.c: Fix test. * gcc.target/i386/avx512bitalg-vpopcntw-1.c: Ditto. * gcc.target/i386/avx512bitalgvl-vpopcntb-1.c: Ditto. * gcc.target/i386/avx512bitalgvl-vpopcntw-1.c: Ditto. Bootstrapped & regtested. Checked into main trunk. -- Thanks, K Index: gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb-1.c === --- gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb-1.c (revision 257172) +++ gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb-1.c (working copy) @@ -1,6 +1,7 @@ /* { dg-do run } */ -/* { dg-options "-O2 -mavx512bitalg" } */ +/* { dg-options "-O2 -mavx512bitalg -mavx512bw" } */ /* { dg-require-effective-target avx512bitalg } */ +/* { dg-require-effective-target avx512bw } */ #include "avx512f-helper.h" Index: gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw-1.c === --- gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw-1.c (revision 257172) +++ gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw-1.c (working copy) @@ -1,6 +1,7 @@ /* { dg-do run } */ -/* { dg-options "-O2 -mavx512bitalg" } */ +/* { dg-options "-O2 -mavx512bitalg -mavx512bw" } */ /* { dg-require-effective-target avx512bitalg } */ +/* { dg-require-effective-target avx512bw } */ #include "avx512f-helper.h" Index: gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopcntb-1.c === --- gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopcntb-1.c (revision 257172) +++ gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopcntb-1.c (working copy) @@ -1,7 +1,8 @@ /* { dg-do run } */ -/* { dg-options "-O2 -mavx512vl -mavx512bitalg" } */ +/* { dg-options "-O2 -mavx512vl -mavx512bitalg -mavx512bw" } */ /* { dg-require-effective-target avx512vl } */ /* { dg-require-effective-target avx512bitalg } */ +/* { dg-require-effective-target avx512bw } */ #define AVX512VL #define AVX512F_LEN 256 Index: gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopcntw-1.c === --- gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopcntw-1.c (revision 257172) +++ gcc/testsuite/gcc.target/i386/avx512bitalgvl-vpopcntw-1.c (working copy) @@ -1,7 +1,8 @@ /* { dg-do run } */ -/* { dg-options "-O2 -mavx512vl -mavx512bitalg" } */ +/* { dg-options "-O2 -mavx512vl -mavx512bitalg -mavx512bw" } */ /* { dg-require-effective-target avx512vl } */ /* { dg-require-effective-target avx512bitalg } */ +/* { dg-require-effective-target avx512bw } */ #define AVX512VL #define AVX512F_LEN 256
New Russian PO file for 'gcc' (version 8.1-b20180128)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the Russian team of translators. The file is available at: http://translationproject.org/latest/gcc/ru.po (This file, 'gcc-8.1-b20180128.ru.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)
On Tue, Jan 30, 2018 at 08:59:43AM +0100, Richard Biener wrote: > The issue is that > > static bool > tree_unroll_loops_completely_1 (bool may_increase_size, bool unroll_outer, > bitmap father_bbs, struct loop *loop) > { > struct loop *loop_father; > bool changed = false; > struct loop *inner; > enum unroll_level ul; > > /* Process inner loops first. */ > for (inner = loop->inner; inner != NULL; inner = inner->next) > changed |= tree_unroll_loops_completely_1 (may_increase_size, >unroll_outer, father_bbs, >inner); > > when recursing tree_unroll_loops_completely_1 appends unrolled > bodies and loops contained therein in the loop->inner chain. We > specifically do _not_ allow processing of outer loops here due > to not up-to-date SSA form and then iterate the unrolling process > instead: > > /* If we changed an inner loop we cannot process outer loops in this > iteration because SSA form is not up-to-date. Continue with > siblings of outer loops instead. */ > if (changed) > return true; > > so I think the proper fix is to avoid visiting loops that were added > in the above iteration over loop->inner. For example by collecting > the loop->inner chain in a vec and then iterating over that (much > like FOR_EACH_LOOP does). Or rely on the fact that loop->num of new > loops will be >= number_of_loops () [number_of_loops () is really > mis-named, would be a good time to add max_loop_num () or so] So like this if it passes bootstrap/regtest? OT, why is loop->num signed rather than unsigned? 2018-01-30 Richard Biener Jakub Jelinek PR tree-optimization/84111 * tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1): Skip inner loops added during recursion, as they don't have up-to-date SSA form. * gcc.c-torture/compile/pr84111.c: New test. --- gcc/tree-ssa-loop-ivcanon.c.jj 2018-01-29 14:05:46.689153783 +0100 +++ gcc/tree-ssa-loop-ivcanon.c 2018-01-30 09:30:21.932069737 +0100 @@ -1373,13 +1373,17 @@ tree_unroll_loops_completely_1 (bool may bool changed = false; struct loop *inner; enum unroll_level ul; + unsigned num = number_of_loops (cfun); - /* Process inner loops first. */ + /* Process inner loops first. Don't walk loops added by the recursive + calls because SSA form is not up-to-date. They can be handled in the + next iteration. */ for (inner = loop->inner; inner != NULL; inner = inner->next) -changed |= tree_unroll_loops_completely_1 (may_increase_size, - unroll_outer, father_bbs, - inner); - +if ((unsigned) inner->num < num) + changed |= tree_unroll_loops_completely_1 (may_increase_size, +unroll_outer, father_bbs, +inner); + /* If we changed an inner loop we cannot process outer loops in this iteration because SSA form is not up-to-date. Continue with siblings of outer loops instead. */ --- gcc/testsuite/gcc.c-torture/compile/pr84111.c.jj2018-01-29 20:38:10.236528914 +0100 +++ gcc/testsuite/gcc.c-torture/compile/pr84111.c 2018-01-29 20:37:53.227522763 +0100 @@ -0,0 +1,31 @@ +/* PR tree-optimization/84111 */ + +void +foo (int x, int y, int z) +{ + int a = 0; + int *b = &x; + + while (a < 1) +{ + int c = y; + *b = x; + lab: + for (a = 0; a < 36; ++a) +{ + *b = 0; + if (x != 0) +y = 0; + while (c < 1) + ; +} +} + if (z < 33) +{ + b = (int *) 0; + ++y; + ++z; + if (x / *b != 0) +goto lab; +} +} Jakub
RE: [patch][x86] -march=icelake
Renamed it. Ok for trunk? gcc/c-family/ * c-common.h (omp_clause_mask): Move to wide_int_bitmask.h gcc/ * config/i386/i386.c (ix86_option_override_internal): Change flags type to wide_int_bitmask. * wide-int-bitmask.h: New. Thanks, Julia > -Original Message- > From: Richard Biener [mailto:rguent...@suse.de] > Sent: Wednesday, January 24, 2018 12:18 PM > To: Koval, Julia > Cc: Jakub Jelinek ; Uros Bizjak ; GCC > Patches ; Kirill Yukhin > Subject: RE: [patch][x86] -march=icelake > > On Wed, 24 Jan 2018, Koval, Julia wrote: > > > I think we may want to extend it to more than 2 ints someday, when we run > out of bits again. It won't break the existing functionality if 3rd int will > be zero by > default. That's why I tried to avoid "two" in the name. > > > > Julia > > > > > -Original Message- > > > From: Jakub Jelinek [mailto:ja...@redhat.com] > > > Sent: Wednesday, January 24, 2018 12:06 PM > > > To: Uros Bizjak ; Richard Biener > > > Cc: Koval, Julia ; GCC Patches > > patc...@gcc.gnu.org>; Kirill Yukhin > > > Subject: Re: [patch][x86] -march=icelake > > > > > > On Wed, Jan 24, 2018 at 12:00:26PM +0100, Uros Bizjak wrote: > > > > On Mon, Jan 22, 2018 at 3:44 PM, Koval, Julia > wrote: > > > > > Yes, you are right, any() is not required. Here is the patch. > > > > > > > > Please also attach ChangeLog. > > > > > > > > The patch is OK for x86 target, it needs global reviewer approval > > > > (Maybe Jakub, as the patch touches OMP part). > > > > > > I don't like the new class name nor header name, bit_mask is way too > generic > > > name for something very specialized (double hwi bitmask). > > > > > > Richard, any suggestions for this? > > Maybe wide_int_bitmask? You could then even use fixed_wide_int <> as > "implementation". > > Richard. 0001-bitmask.patch Description: 0001-bitmask.patch
New Swedish PO file for 'cpplib' (version 8.1-b20180128)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'cpplib' has been submitted by the Swedish team of translators. The file is available at: http://translationproject.org/latest/cpplib/sv.po (This file, 'cpplib-8.1-b20180128.sv.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/cpplib/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/cpplib.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Contents of PO file 'cpplib-8.1-b20180128.sv.po'
cpplib-8.1-b20180128.sv.po.gz Description: Binary data The Translation Project robot, in the name of your translation coordinator.
Re: [patch][x86] -march=icelake
On Tue, Jan 30, 2018 at 08:35:38AM +, Koval, Julia wrote: > * c-common.h (omp_clause_mask): Move to wide_int_bitmask.h Missing dot ad the end. + wide_int_bitmask PTA_3DNOW (HOST_WIDE_INT_1U << 0); Can't all these be const wide_int_bitmask instead of just wide_int_bitmask? ... + + wide_int_bitmask PTA_CORE2 = PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 +| PTA_SSE3 | PTA_SSSE3 | PTA_CX16 | PTA_FXSR; + wide_int_bitmask PTA_NEHALEM = PTA_CORE2 | PTA_SSE4_1 | PTA_SSE4_2 +| PTA_POPCNT; + wide_int_bitmask PTA_WESTMERE = PTA_NEHALEM | PTA_AES | PTA_PCLMUL; + wide_int_bitmask PTA_SANDYBRIDGE = PTA_WESTMERE | PTA_AVX | PTA_XSAVE +| PTA_XSAVEOPT; + wide_int_bitmask PTA_IVYBRIDGE = PTA_SANDYBRIDGE | PTA_FSGSBASE | PTA_RDRND +| PTA_F16C; + wide_int_bitmask PTA_HASWELL = PTA_IVYBRIDGE | PTA_AVX2 | PTA_BMI | PTA_BMI2 +| PTA_LZCNT | PTA_FMA | PTA_MOVBE | PTA_HLE; + wide_int_bitmask PTA_BROADWELL = PTA_HASWELL | PTA_ADX | PTA_PRFCHW +| PTA_RDSEED; + wide_int_bitmask PTA_SKYLAKE = PTA_BROADWELL | PTA_CLFLUSHOPT | PTA_XSAVEC +| PTA_XSAVES; + wide_int_bitmask PTA_SKYLAKE_AVX512 = PTA_SKYLAKE | PTA_AVX512F | PTA_AVX512CD +| PTA_AVX512VL | PTA_AVX512BW | PTA_AVX512DQ | PTA_PKU | PTA_CLWB; + wide_int_bitmask PTA_CANNONLAKE = PTA_SKYLAKE_AVX512 | PTA_AVX512VBMI +| PTA_AVX512IFMA | PTA_SHA; + wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF | PTA_AVX512ER +| PTA_AVX512F | PTA_AVX512CD; + wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE; + wide_int_bitmask PTA_SILVERMONT = PTA_WESTMERE | PTA_MOVBE | PTA_RDRND; + wide_int_bitmask PTA_KNM = PTA_KNL | PTA_AVX5124VNNIW | PTA_AVX5124FMAPS +| PTA_AVX512VPOPCNTDQ; Likewise for these. --- /dev/null +++ b/gcc/wide-int-bitmask.h @@ -0,0 +1,145 @@ +/* Operation with 128 bit bitmask. + Copyright (C) 1987-2018 Free Software Foundation, Inc. Please use 2013-2018 instead, all the omp_clause_mask stuff was introduced in 2013. + +#ifndef GCC_BIT_MASK_H +#define GCC_BIT_MASK_H The macro hasn't been renamed for the header file rename. + +#endif /* ! GCC_BIT_MASK_H */ Here as well. Otherwise LGTM. Jakub
Re: [PATCH] Fix SCEV ICE during cunroll (PR tree-optimization/84111)
On Tue, 30 Jan 2018, Jakub Jelinek wrote: > On Tue, Jan 30, 2018 at 08:59:43AM +0100, Richard Biener wrote: > > The issue is that > > > > static bool > > tree_unroll_loops_completely_1 (bool may_increase_size, bool unroll_outer, > > bitmap father_bbs, struct loop *loop) > > { > > struct loop *loop_father; > > bool changed = false; > > struct loop *inner; > > enum unroll_level ul; > > > > /* Process inner loops first. */ > > for (inner = loop->inner; inner != NULL; inner = inner->next) > > changed |= tree_unroll_loops_completely_1 (may_increase_size, > >unroll_outer, father_bbs, > >inner); > > > > when recursing tree_unroll_loops_completely_1 appends unrolled > > bodies and loops contained therein in the loop->inner chain. We > > specifically do _not_ allow processing of outer loops here due > > to not up-to-date SSA form and then iterate the unrolling process > > instead: > > > > /* If we changed an inner loop we cannot process outer loops in this > > iteration because SSA form is not up-to-date. Continue with > > siblings of outer loops instead. */ > > if (changed) > > return true; > > > > so I think the proper fix is to avoid visiting loops that were added > > in the above iteration over loop->inner. For example by collecting > > the loop->inner chain in a vec and then iterating over that (much > > like FOR_EACH_LOOP does). Or rely on the fact that loop->num of new > > loops will be >= number_of_loops () [number_of_loops () is really > > mis-named, would be a good time to add max_loop_num () or so] > > So like this if it passes bootstrap/regtest? Yes. > OT, why is loop->num signed rather than unsigned? History. Like so many others (block->index, etc.). Possibly to allow better optimization of loops over those indexes (relying on undefined overflow). Richard. > 2018-01-30 Richard Biener > Jakub Jelinek > > PR tree-optimization/84111 > * tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1): Skip > inner loops added during recursion, as they don't have up-to-date > SSA form. > > * gcc.c-torture/compile/pr84111.c: New test. > > --- gcc/tree-ssa-loop-ivcanon.c.jj2018-01-29 14:05:46.689153783 +0100 > +++ gcc/tree-ssa-loop-ivcanon.c 2018-01-30 09:30:21.932069737 +0100 > @@ -1373,13 +1373,17 @@ tree_unroll_loops_completely_1 (bool may >bool changed = false; >struct loop *inner; >enum unroll_level ul; > + unsigned num = number_of_loops (cfun); > > - /* Process inner loops first. */ > + /* Process inner loops first. Don't walk loops added by the recursive > + calls because SSA form is not up-to-date. They can be handled in the > + next iteration. */ >for (inner = loop->inner; inner != NULL; inner = inner->next) > -changed |= tree_unroll_loops_completely_1 (may_increase_size, > -unroll_outer, father_bbs, > -inner); > - > +if ((unsigned) inner->num < num) > + changed |= tree_unroll_loops_completely_1 (may_increase_size, > + unroll_outer, father_bbs, > + inner); > + >/* If we changed an inner loop we cannot process outer loops in this > iteration because SSA form is not up-to-date. Continue with > siblings of outer loops instead. */ > --- gcc/testsuite/gcc.c-torture/compile/pr84111.c.jj 2018-01-29 > 20:38:10.236528914 +0100 > +++ gcc/testsuite/gcc.c-torture/compile/pr84111.c 2018-01-29 > 20:37:53.227522763 +0100 > @@ -0,0 +1,31 @@ > +/* PR tree-optimization/84111 */ > + > +void > +foo (int x, int y, int z) > +{ > + int a = 0; > + int *b = &x; > + > + while (a < 1) > +{ > + int c = y; > + *b = x; > + lab: > + for (a = 0; a < 36; ++a) > +{ > + *b = 0; > + if (x != 0) > +y = 0; > + while (c < 1) > + ; > +} > +} > + if (z < 33) > +{ > + b = (int *) 0; > + ++y; > + ++z; > + if (x / *b != 0) > +goto lab; > +} > +} > > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH][PR debug/83758] look more carefully for internal_arg_pointer in vt_add_function_parameter()
Segher Boessenkool writes: > On Mon, Jan 29, 2018 at 11:41:32PM +0100, Jakub Jelinek wrote: >> On Mon, Jan 29, 2018 at 04:26:50PM -0600, Segher Boessenkool wrote: >> > > The code actually meant pointer comparison, the question is what is >> > > different on powerpc* that you end up with a different REG. >> > > >From what I can see, function.c uses crtl->args.internal_arg_pointer >> > > directly rather than a REG with the same REGNO. >> > > Where does it become something different and why? >> > >> > There is a lot of code that copies any RTX that isn't obviously unique. >> > Here we have a PLUS of some things, which always needs copying, can >> > never be shared. >> >> Yes, PLUS needs to be unshared. >> So it ought to be handled by the PLUS handling code. >> || (GET_CODE (XEXP (incoming, 0)) == PLUS >> && XEXP (XEXP (incoming, 0), 0) >> == crtl->args.internal_arg_pointer >> && CONST_INT_P (XEXP (XEXP (incoming, 0), 1) >> Here we are talking about MEMs on the PARM_DECL DECL_RTLs, those are not >> instantiated as normal IL in the RTL stream is. > > This is a PLUS of something with the internal_arg_pointer. Our > internal_arg_pointer _itself_ is a PLUS, so it should be copy_rtx'd > everywhere it is used (or risk RTL sharing). Is that needed even in DECL_RTL? >> I'm not saying the var-tracking.c change is wrong, but I'd like to see >> analysis on what is going on. > > The rtx_equal_p is equal to the == for anything that uses just a single > register. For us it makes the compiler bootstrap again (with Go enabled), > not unnice to have in stage4 ;-) > > I agree the code does not seem to be set up for PLUS to work here. > >> Is the patch for -fsplit-stack where rs6000_internal_arg_pointer >> returns a PLUS of some pseudo and some offset? > > Yeah. > >> In that case I wonder how does the patch with rtx_equal_p actually work, >> because function.c then uses plus_constant on this result, and if there are > > Not everywhere, in places it does > > gen_rtx_PLUS (Pmode, stack_parm, offset_rtx); > > (so we end up with non-canonical RTL). But in that case, what does the copying? That's what seems strange. I can see why we'd have two nested pluses with the inner plus being pointer-equal to internal_arg_ptr. And I can see why we'd have a single canonical plus (which IMO would be better, but I agree it's not stage 4 material). It's having the two nested pluses without maintaining pointer equality that seems strange. Thanks, Richard
Re: [PATCH] [PR testsuite/81010] Fix PPC test
On Jan 29 2018, Jeff Law wrote: > On 01/07/2018 10:09 AM, Segher Boessenkool wrote: >> Hi! >> >> On Sun, Jan 07, 2018 at 12:58:25AM -0700, Jeff Law wrote: >>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr56605.c >>> b/gcc/testsuite/gcc.target/powerpc/pr56605.c >>> index 3bc335f..be6456c 100644 >>> --- a/gcc/testsuite/gcc.target/powerpc/pr56605.c >>> +++ b/gcc/testsuite/gcc.target/powerpc/pr56605.c >>> @@ -1,7 +1,7 @@ >>> /* PR rtl-optimization/56605 */ >>> -/* { dg-do compile { target { powerpc64-*-* && lp64 } } } */ >>> +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ >> >> You could as well do powerpc*-*-* afaics? > I guess, but I suspect the lp64 test would prevent it from running on > the 32bit configurations anyway. -m64 will pass the lp64 test. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Re: [PATCH][PR debug/83758] look more carefully for internal_arg_pointer in vt_add_function_parameter()
On Tue, Jan 30, 2018 at 09:41:47AM +, Richard Sandiford wrote: > Segher Boessenkool writes: > > On Mon, Jan 29, 2018 at 11:41:32PM +0100, Jakub Jelinek wrote: > >> On Mon, Jan 29, 2018 at 04:26:50PM -0600, Segher Boessenkool wrote: > >> > > The code actually meant pointer comparison, the question is what is > >> > > different on powerpc* that you end up with a different REG. > >> > > >From what I can see, function.c uses crtl->args.internal_arg_pointer > >> > > directly rather than a REG with the same REGNO. > >> > > Where does it become something different and why? > >> > > >> > There is a lot of code that copies any RTX that isn't obviously unique. > >> > Here we have a PLUS of some things, which always needs copying, can > >> > never be shared. > >> > >> Yes, PLUS needs to be unshared. > >> So it ought to be handled by the PLUS handling code. > >> || (GET_CODE (XEXP (incoming, 0)) == PLUS > >> && XEXP (XEXP (incoming, 0), 0) > >> == crtl->args.internal_arg_pointer > >> && CONST_INT_P (XEXP (XEXP (incoming, 0), 1) > >> Here we are talking about MEMs on the PARM_DECL DECL_RTLs, those are not > >> instantiated as normal IL in the RTL stream is. > > > > This is a PLUS of something with the internal_arg_pointer. Our > > internal_arg_pointer _itself_ is a PLUS, so it should be copy_rtx'd > > everywhere it is used (or risk RTL sharing). > > Is that needed even in DECL_RTL? When it is copied to the RTL stream, it has to yes. No sharing allowed, even if nothing inside this is ever modified. > >> I'm not saying the var-tracking.c change is wrong, but I'd like to see > >> analysis on what is going on. > > > > The rtx_equal_p is equal to the == for anything that uses just a single > > register. For us it makes the compiler bootstrap again (with Go enabled), > > not unnice to have in stage4 ;-) > > > > I agree the code does not seem to be set up for PLUS to work here. > > > >> Is the patch for -fsplit-stack where rs6000_internal_arg_pointer > >> returns a PLUS of some pseudo and some offset? > > > > Yeah. > > > >> In that case I wonder how does the patch with rtx_equal_p actually work, > >> because function.c then uses plus_constant on this result, and if there are > > > > Not everywhere, in places it does > > > > gen_rtx_PLUS (Pmode, stack_parm, offset_rtx); > > > > (so we end up with non-canonical RTL). > > But in that case, what does the copying? I don't know. Aaron will look at it, but timezones etc. :-) > That's what seems strange. I can see why we'd have two nested > pluses with the inner plus being pointer-equal to internal_arg_ptr. > And I can see why we'd have a single canonical plus (which IMO would > be better, but I agree it's not stage 4 material). It's having the two > nested pluses without maintaining pointer equality that seems strange. The inner plus is *not* pointer-equal, that is the problem. Something did copy_rtx (or such) on it, many things do. We can tell you what exactly later today. Segher
[PATCH] Fix PR84101, account for function ABI details in vectorization costs
This patch tries to deal with the "easy" part of a function ABI, the return value location, in vectorization costing. The testcase shows that if we vectorize the returned value but the function doesn't return in memory or in a vector register but as in this case in an integer register pair (reg:TI ax) (bah, ABI details exposed late? why's this not a parallel?) we end up spilling badly. The idea is to account for such spilling so if vectorization benefits outweight the spilling we'll vectorize anyway. I think the particular testcase could be fixed in the subreg pass basically undoing the vectorization but I realize that generally this is a too hard problem and avoiding vectorization is better. Still this patch is somewhat fragile in that it depends on us "seeing" that the stored to decl is returned (see cfun_returns). Bootstrap & regtest running on x86_64-unknown-linux-gnu. I'd like to hear opinions on my use of hard_function_value and also from other target maintainers. I'm not sure we have sufficient testsuite coverage of _profitable_ vectorization of a return value. Feel free to add to this for your target. Ok for trunk? Thanks, Richard. 2018-01-30 Richard Biener PR tree-optimization/84101 * tree-vect-stmts.c: Include explow.h for hard_function_value. (cfun_returns): New helper. (vect_model_store_cost): When vectorizing a store to a decl we return and the function ABI returns via a non-vector register account for the possible spilling that will happen. * gcc.target/i386/pr84101.c: New testcase. Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 257139) +++ gcc/tree-vect-stmts.c (working copy) @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. #include "tree-cfg.h" #include "tree-ssa-loop-manip.h" #include "cfgloop.h" +#include "explow.h" #include "tree-ssa-loop.h" #include "tree-scalar-evolution.h" #include "tree-vectorizer.h" @@ -893,6 +894,22 @@ vect_model_promotion_demotion_cost (stmt "prologue_cost = %d .\n", inside_cost, prologue_cost); } +/* Returns true if the current function returns DECL. */ + +static bool +cfun_returns (tree decl) +{ + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) +{ + greturn *ret = safe_dyn_cast (last_stmt (e->src)); + if (ret && gimple_return_retval (ret) == decl) + return true; +} + return false; +} + /* Function vect_model_store_cost Models cost for stores. In the case of grouped accesses, one access @@ -971,6 +988,36 @@ vect_model_store_cost (stmt_vec_info stm vec_to_scalar, stmt_info, 0, vect_body); } + /* When vectorizing an SLP store into the function result assign + a penalty if the function returns in a non-vector register. + In this case we assume we'll end up with having to spill the + vector result and do element loads. */ + if (slp_node) +{ + tree base = get_base_address (dr->ref); + if (base + && (TREE_CODE (base) == RESULT_DECL + || (DECL_P (base) && cfun_returns (base + { + rtx reg = hard_function_value (TREE_TYPE (base), cfun->decl, 0, 1); + if ((REG_P (reg) && ! VECTOR_MODE_P (GET_MODE (reg))) + || GET_CODE (reg) == PARALLEL + || GET_CODE (reg) == CONCAT) + { + /* Spill. */ + inside_cost += record_stmt_cost (body_cost_vec, + ncopies, vector_store, + stmt_info, 0, vect_body); + /* Element loads. */ + unsigned int assumed_nunits = vect_nunits_for_cost (vectype); + inside_cost += record_stmt_cost (body_cost_vec, + ncopies * assumed_nunits, + scalar_load, + stmt_info, 0, vect_body); + } + } +} + if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "vect_model_store_cost: inside_cost = %d, " Index: gcc/testsuite/gcc.target/i386/pr84101.c === --- gcc/testsuite/gcc.target/i386/pr84101.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/pr84101.c (working copy) @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-slp2-details" } */ + +typedef struct uint64_pair uint64_pair_t ; +struct uint64_pair +{ + unsigned long w0 ; + unsigned long w1 ; +} ; + +uint64_pair_t pair(int num) +{ + uint64_pair_t p ; + + p.w0 = num << 1 ; + p.w1 = num >> 1 ; + + return p ; +} + +/* { dg-final { scan-tree-dump-not "basic block vectorized" "slp2" } } */
Re: [PATCH][PR debug/83758] look more carefully for internal_arg_pointer in vt_add_function_parameter()
Segher Boessenkool writes: > On Tue, Jan 30, 2018 at 09:41:47AM +, Richard Sandiford wrote: >> Segher Boessenkool writes: >> > On Mon, Jan 29, 2018 at 11:41:32PM +0100, Jakub Jelinek wrote: >> >> On Mon, Jan 29, 2018 at 04:26:50PM -0600, Segher Boessenkool wrote: >> >> > > The code actually meant pointer comparison, the question is what is >> >> > > different on powerpc* that you end up with a different REG. >> >> > > >From what I can see, function.c uses crtl->args.internal_arg_pointer >> >> > > directly rather than a REG with the same REGNO. >> >> > > Where does it become something different and why? >> >> > >> >> > There is a lot of code that copies any RTX that isn't obviously unique. >> >> > Here we have a PLUS of some things, which always needs copying, can >> >> > never be shared. >> >> >> >> Yes, PLUS needs to be unshared. >> >> So it ought to be handled by the PLUS handling code. >> >> || (GET_CODE (XEXP (incoming, 0)) == PLUS >> >> && XEXP (XEXP (incoming, 0), 0) >> >> == crtl->args.internal_arg_pointer >> >> && CONST_INT_P (XEXP (XEXP (incoming, 0), 1) >> >> Here we are talking about MEMs on the PARM_DECL DECL_RTLs, those are not >> >> instantiated as normal IL in the RTL stream is. >> > >> > This is a PLUS of something with the internal_arg_pointer. Our >> > internal_arg_pointer _itself_ is a PLUS, so it should be copy_rtx'd >> > everywhere it is used (or risk RTL sharing). >> >> Is that needed even in DECL_RTL? > > When it is copied to the RTL stream, it has to yes. No sharing allowed, > even if nothing inside this is ever modified. Right. But DECL_RTL isn't in the RTL stream. It needs to be copied before being used there. And this code is looking at DECL_RTL, not at rtl stream (insn patterns). >> >> I'm not saying the var-tracking.c change is wrong, but I'd like to see >> >> analysis on what is going on. >> > >> > The rtx_equal_p is equal to the == for anything that uses just a single >> > register. For us it makes the compiler bootstrap again (with Go enabled), >> > not unnice to have in stage4 ;-) >> > >> > I agree the code does not seem to be set up for PLUS to work here. >> > >> >> Is the patch for -fsplit-stack where rs6000_internal_arg_pointer >> >> returns a PLUS of some pseudo and some offset? >> > >> > Yeah. >> > >> >> In that case I wonder how does the patch with rtx_equal_p actually work, >> >> because function.c then uses plus_constant on this result, and if there >> >> are >> > >> > Not everywhere, in places it does >> > >> > gen_rtx_PLUS (Pmode, stack_parm, offset_rtx); >> > >> > (so we end up with non-canonical RTL). >> >> But in that case, what does the copying? > > I don't know. Aaron will look at it, but timezones etc. :-) > >> That's what seems strange. I can see why we'd have two nested >> pluses with the inner plus being pointer-equal to internal_arg_ptr. >> And I can see why we'd have a single canonical plus (which IMO would >> be better, but I agree it's not stage 4 material). It's having the two >> nested pluses without maintaining pointer equality that seems strange. > > The inner plus is *not* pointer-equal, that is the problem. Something > did copy_rtx (or such) on it, many things do. We can tell you what > exactly later today. Yeah, I realise that. And I'm saying that's what seems strange :-) If something is deliberately avoiding folding the plus so that we can still see internal_arg_ptr, why does it end up copying the internal_arg_ptr regardless? Thanks, Richard
[PATCH] Fix PR83008
I have been asked to push this change, fixing (somewhat) the impreciseness of costing constant/invariant vector uses in SLP stmts. The previous code always just considered a single constant to be generated in the prologue irrespective of how many we'd need. With this patch we properly handle this count and optimize for the case when we can use a vector splat. It doesn't yet handle CSE (or CSE among stmts) which means it could in theory regress cases it overall costed correctly before "optimistically" (aka by accident). But at least the costing now matches code generation. Bootstrapped and tested on x86_64-unknown-linux-gnu. On x86_64 Haswell with AVX2 SPEC 2k6 shows no off-noise changes. The patch is said to help the case in the PR when additional backend costing changes are done (for AVX512). Ok for trunk at this stage? Thanks, Richard. 2018-01-30 Richard Biener PR tree-optimization/83008 * tree-vect-slp.c (vect_analyze_slp_cost_1): Properly cost invariant and constant vector uses in stmts when they need more than one stmt. Index: gcc/tree-vect-slp.c === --- gcc/tree-vect-slp.c (revision 257047) +++ gcc/tree-vect-slp.c (working copy) @@ -1911,18 +1911,56 @@ vect_analyze_slp_cost_1 (slp_instance in enum vect_def_type dt; if (!op || op == lhs) continue; - if (vect_is_simple_use (op, stmt_info->vinfo, &def_stmt, &dt)) + if (vect_is_simple_use (op, stmt_info->vinfo, &def_stmt, &dt) + && (dt == vect_constant_def || dt == vect_external_def)) { /* Without looking at the actual initializer a vector of constants can be implemented as load from the constant pool. -??? We need to pass down stmt_info for a vector type -even if it points to the wrong stmt. */ - if (dt == vect_constant_def) - record_stmt_cost (prologue_cost_vec, 1, vector_load, - stmt_info, 0, vect_prologue); - else if (dt == vect_external_def) - record_stmt_cost (prologue_cost_vec, 1, vec_construct, - stmt_info, 0, vect_prologue); +When all elements are the same we can use a splat. */ + tree vectype = get_vectype_for_scalar_type (TREE_TYPE (op)); + unsigned group_size = SLP_TREE_SCALAR_STMTS (node).length (); + unsigned num_vects_to_check; + unsigned HOST_WIDE_INT const_nunits; + unsigned nelt_limit; + if (TYPE_VECTOR_SUBPARTS (vectype).is_constant (&const_nunits) + && ! multiple_p (const_nunits, group_size)) + { + num_vects_to_check = SLP_TREE_NUMBER_OF_VEC_STMTS (node); + nelt_limit = const_nunits; + } + else + { + /* If either the vector has variable length or the vectors +are composed of repeated whole groups we only need to +cost construction once. All vectors will be the same. */ + num_vects_to_check = 1; + nelt_limit = group_size; + } + tree elt = NULL_TREE; + unsigned nelt = 0; + for (unsigned j = 0; j < num_vects_to_check * nelt_limit; ++j) + { + unsigned si = j % group_size; + if (nelt == 0) + elt = gimple_op (SLP_TREE_SCALAR_STMTS (node)[si], i); + /* ??? We're just tracking whether all operands of a single +vector initializer are the same, ideally we'd check if +we emitted the same one already. */ + else if (elt != gimple_op (SLP_TREE_SCALAR_STMTS (node)[si], i)) + elt = NULL_TREE; + nelt++; + if (nelt == nelt_limit) + { + /* ??? We need to pass down stmt_info for a vector type +even if it points to the wrong stmt. */ + record_stmt_cost (prologue_cost_vec, 1, + dt == vect_external_def + ? (elt ? scalar_to_vec : vec_construct) + : vector_load, + stmt_info, 0, vect_prologue); + nelt = 0; + } + } } }
Re: [PATCH][PR debug/83758] look more carefully for internal_arg_pointer in vt_add_function_parameter()
On Tue, Jan 30, 2018 at 03:55:58AM -0600, Segher Boessenkool wrote: > > But in that case, what does the copying? > > I don't know. Aaron will look at it, but timezones etc. :-) > > > That's what seems strange. I can see why we'd have two nested > > pluses with the inner plus being pointer-equal to internal_arg_ptr. > > And I can see why we'd have a single canonical plus (which IMO would > > be better, but I agree it's not stage 4 material). It's having the two > > nested pluses without maintaining pointer equality that seems strange. > > The inner plus is *not* pointer-equal, that is the problem. Something > did copy_rtx (or such) on it, many things do. We can tell you what > exactly later today. Most likely unshare_all_rtl, which does: for (tree decl = DECL_ARGUMENTS (cfun->decl); decl; decl = DECL_CHAIN (decl)) { if (DECL_RTL_SET_P (decl)) SET_DECL_RTL (decl, copy_rtx_if_shared (DECL_RTL (decl))); DECL_INCOMING_RTL (decl) = copy_rtx_if_shared (DECL_INCOMING_RTL (decl)); } Anyway, my preference would be to change that gen_rtx_PLUS into stack_parm = crtl->args.internal_arg_pointer; if (!CONST_INT_P (offset_rtx)) stack_parm = gen_rtx_PLUS (Pmode, stack_parm, offset_rtx); else if (offset_rtx != const0_rtx) stack_parm = plus_constant (Pmode, stack_parm, INTVAL (offset_rtx)); stack_parm = gen_rtx_MEM (data->promoted_mode, stack_parm); and deal specially with GET_CODE (crtl->args.internal_arg_pointer) in var-tracking.c. rs6000/powerpcspe with -fsplit-stack are the only cases where crtl->args.internal_arg_pointer is not a REG, so just running libgo testsuite on powerpc{,64,64le} should cover it all. Jakub
Re: [PATCH] Fix PR83008
On Tue, Jan 30, 2018 at 11:07:50AM +0100, Richard Biener wrote: > > I have been asked to push this change, fixing (somewhat) the impreciseness > of costing constant/invariant vector uses in SLP stmts. The previous > code always just considered a single constant to be generated in the > prologue irrespective of how many we'd need. With this patch we > properly handle this count and optimize for the case when we can use > a vector splat. It doesn't yet handle CSE (or CSE among stmts) which > means it could in theory regress cases it overall costed correctly > before "optimistically" (aka by accident). But at least the costing > now matches code generation. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. On x86_64 > Haswell with AVX2 SPEC 2k6 shows no off-noise changes. > > The patch is said to help the case in the PR when additional backend > costing changes are done (for AVX512). > > Ok for trunk at this stage? LGTM. > 2018-01-30 Richard Biener > > PR tree-optimization/83008 > * tree-vect-slp.c (vect_analyze_slp_cost_1): Properly cost > invariant and constant vector uses in stmts when they need > more than one stmt. Jakub
RE: [patch][x86] -march=icelake
Thank you for your comments, fixed them and rebased Ice Lake patch on top of it. Ok for trunk? Bitmask patch changelog: gcc/c-family/ * c-common.h (omp_clause_mask): Move to wide_int_bitmask.h. gcc/ * config/i386/i386.c (ix86_option_override_internal): Change flags type to wide_int_bitmask. * wide-int-bitmask.h: New. Icelake patch changelog: gcc/ * config.gcc: Add -march=icelake. * config/i386/driver-i386.c (host_detect_local_cpu): Detect icelake. * config/i386/i386-c.c (ix86_target_macros_internal): Handle icelake. * config/i386/i386.c (processor_costs): Add m_ICELAKE. (PTA_ICELAKE, PTA_AVX512VNNI, PTA_GFNI, PTA_VAES, PTA_AVX512VBMI2, PTA_VPCLMULQDQ, PTA_RDPID, PTA_AVX512BITALG): New. (processor_target_table): Add icelake. (ix86_option_override_internal): Handle new PTAs. (get_builtin_code_for_version): Handle icelake. (M_INTEL_COREI7_ICELAKE): New. (fold_builtin_cpu): Handle icelake. * config/i386/i386.h (TARGET_ICELAKE, PROCESSOR_ICELAKE): New. * doc/invoke.texi: Add -march=icelake. gcc/testsuite/ * gcc.target/i386/funcspec-56.inc: Handle new march. * g++.dg/ext/mv16.C: Ditto. libgcc/ * config/i386/cpuinfo.h (processor_subtypes): Add INTEL_COREI7_ICELAKE. Thanks, Julia > -Original Message- > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Tuesday, January 30, 2018 9:47 AM > To: Koval, Julia > Cc: Richard Biener ; Uros Bizjak ; > GCC Patches ; Kirill Yukhin > > Subject: Re: [patch][x86] -march=icelake > > On Tue, Jan 30, 2018 at 08:35:38AM +, Koval, Julia wrote: > > * c-common.h (omp_clause_mask): Move to wide_int_bitmask.h > > Missing dot ad the end. > > + wide_int_bitmask PTA_3DNOW (HOST_WIDE_INT_1U << 0); > > Can't all these be const wide_int_bitmask instead of just > wide_int_bitmask? > > ... > + > + wide_int_bitmask PTA_CORE2 = PTA_64BIT | PTA_MMX | PTA_SSE | > PTA_SSE2 > +| PTA_SSE3 | PTA_SSSE3 | PTA_CX16 | PTA_FXSR; > + wide_int_bitmask PTA_NEHALEM = PTA_CORE2 | PTA_SSE4_1 | PTA_SSE4_2 > +| PTA_POPCNT; > + wide_int_bitmask PTA_WESTMERE = PTA_NEHALEM | PTA_AES | > PTA_PCLMUL; > + wide_int_bitmask PTA_SANDYBRIDGE = PTA_WESTMERE | PTA_AVX | > PTA_XSAVE > +| PTA_XSAVEOPT; > + wide_int_bitmask PTA_IVYBRIDGE = PTA_SANDYBRIDGE | PTA_FSGSBASE | > PTA_RDRND > +| PTA_F16C; > + wide_int_bitmask PTA_HASWELL = PTA_IVYBRIDGE | PTA_AVX2 | PTA_BMI | > PTA_BMI2 > +| PTA_LZCNT | PTA_FMA | PTA_MOVBE | PTA_HLE; > + wide_int_bitmask PTA_BROADWELL = PTA_HASWELL | PTA_ADX | > PTA_PRFCHW > +| PTA_RDSEED; > + wide_int_bitmask PTA_SKYLAKE = PTA_BROADWELL | PTA_CLFLUSHOPT | > PTA_XSAVEC > +| PTA_XSAVES; > + wide_int_bitmask PTA_SKYLAKE_AVX512 = PTA_SKYLAKE | PTA_AVX512F | > PTA_AVX512CD > +| PTA_AVX512VL | PTA_AVX512BW | PTA_AVX512DQ | PTA_PKU | > PTA_CLWB; > + wide_int_bitmask PTA_CANNONLAKE = PTA_SKYLAKE_AVX512 | > PTA_AVX512VBMI > +| PTA_AVX512IFMA | PTA_SHA; > + wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF | > PTA_AVX512ER > +| PTA_AVX512F | PTA_AVX512CD; > + wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE; > + wide_int_bitmask PTA_SILVERMONT = PTA_WESTMERE | PTA_MOVBE | > PTA_RDRND; > + wide_int_bitmask PTA_KNM = PTA_KNL | PTA_AVX5124VNNIW | > PTA_AVX5124FMAPS > +| PTA_AVX512VPOPCNTDQ; > > Likewise for these. > > --- /dev/null > +++ b/gcc/wide-int-bitmask.h > @@ -0,0 +1,145 @@ > +/* Operation with 128 bit bitmask. > + Copyright (C) 1987-2018 Free Software Foundation, Inc. > > Please use 2013-2018 instead, all the omp_clause_mask stuff was > introduced in 2013. > > + > +#ifndef GCC_BIT_MASK_H > +#define GCC_BIT_MASK_H > > The macro hasn't been renamed for the header file rename. > > + > +#endif /* ! GCC_BIT_MASK_H */ > > Here as well. Otherwise LGTM. > > Jakub 0001-bitmask.patch Description: 0001-bitmask.patch 0002-icelake_rebased.patch Description: 0002-icelake_rebased.patch
New Brazilian Portuguese PO file for 'cpplib' (version 8.1-b20180128)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'cpplib' has been submitted by the Brazilian Portuguese team of translators. The file is available at: http://translationproject.org/latest/cpplib/pt_BR.po (This file, 'cpplib-8.1-b20180128.pt_BR.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/cpplib/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/cpplib.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Contents of PO file 'cpplib-8.1-b20180128.pt_BR.po'
cpplib-8.1-b20180128.pt_BR.po.gz Description: Binary data The Translation Project robot, in the name of your translation coordinator.
Re: [PATCH][PR debug/83758] look more carefully for internal_arg_pointer in vt_add_function_parameter()
On Tue, 2018-01-30 at 11:13 +0100, Jakub Jelinek wrote: > On Tue, Jan 30, 2018 at 03:55:58AM -0600, Segher Boessenkool wrote: > > > But in that case, what does the copying? > > > > I don't know. Aaron will look at it, but timezones etc. :-) Indeed I did see unshare_all_rtl() copying internal_arg_pointer. But also several places in function.c: assign_parm_adjust_entry_rtl: move_block_from_reg (REGNO (entry_parm), validize_mem (copy_rtx (stack_parm)), data->partial / UNITS_PER_WORD); assign_parm_setup_reg: /* Copy the value into the register, thus bridging between assign_parm_find_data_types and expand_expr_real_1. */ equiv_stack_parm = data->stack_parm; validated_mem = validize_mem (copy_rtx (data->entry_parm)); assign_parm_setup_block: mem = validize_mem (copy_rtx (stack_parm)); in expr.c: expand_expr_real_1: /* DECL_MODE might change when TYPE_MODE depends on attribute target settings for VECTOR_TYPE_P that might switch for the function. */ if (currently_expanding_to_rtl && code == VAR_DECL && MEM_P (decl_rtl) && VECTOR_TYPE_P (type) && exp && DECL_MODE (exp) != mode) decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0); else decl_rtl = copy_rtx (decl_rtl); > > > > > That's what seems strange. I can see why we'd have two nested > > > pluses with the inner plus being pointer-equal to > > > internal_arg_ptr. > > > And I can see why we'd have a single canonical plus (which IMO > > > would > > > be better, but I agree it's not stage 4 material). It's having > > > the two > > > nested pluses without maintaining pointer equality that seems > > > strange. > > > > The inner plus is *not* pointer-equal, that is the > > problem. Something > > did copy_rtx (or such) on it, many things do. We can tell you what > > exactly later today. > > Most likely unshare_all_rtl, which does: > for (tree decl = DECL_ARGUMENTS (cfun->decl); decl; decl = > DECL_CHAIN (decl)) > { > if (DECL_RTL_SET_P (decl)) > SET_DECL_RTL (decl, copy_rtx_if_shared (DECL_RTL (decl))); > DECL_INCOMING_RTL (decl) = copy_rtx_if_shared > (DECL_INCOMING_RTL (decl)); > } > > Anyway, my preference would be to change that gen_rtx_PLUS into > stack_parm = crtl->args.internal_arg_pointer; > if (!CONST_INT_P (offset_rtx)) > stack_parm = gen_rtx_PLUS (Pmode, stack_parm, offset_rtx); > else if (offset_rtx != const0_rtx) > stack_parm = plus_constant (Pmode, stack_parm, INTVAL > (offset_rtx)); > stack_parm = gen_rtx_MEM (data->promoted_mode, stack_parm); > and deal specially with GET_CODE (crtl->args.internal_arg_pointer) > in var-tracking.c. > rs6000/powerpcspe with -fsplit-stack are the only cases where > crtl->args.internal_arg_pointer is not a REG, so just running libgo > testsuite on powerpc{,64,64le} should cover it all. I'll give this a try today when I get to the office. Thanks, Aaron > > Jakub > -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain
Re: [PING^3] [PATCH] Remove CANADIAN, that break compilation for foreign target
On 30/01/18 10:19 +0300, Petr Ovtchenkov wrote: On Tue, 19 Dec 2017 11:37:43 +0300 Petr Ovtchenkov wrote: ping^3 I don't fully understand the consequences (or need) for this patch. I asked some other people to look at it, and didn't get confirmation it's OK. So I'm reluctant to make the change. Especially this late in the GCC 8 cycle. On Thu, 16 Nov 2017 20:55:37 +0300 Petr Ovtchenkov wrote: > On Wed, 20 Sep 2017 13:44:59 +0300 > Petr Ovtchenkov wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71212 > > > > On Fri, 20 May 2016 16:10:50 +0300 > > Petr Ovtchenkov wrote: > > > > > Some old ad-hoc (adding -I/usr/include to compiler > > > flags) break compilation of libstdc++ for foreign > > > target architecture (due to compiler see includes > > > of native). > > Reference for terms: > > https://gcc.gnu.org/onlinedocs/gccint/Configure-Terms.html > > Present of "CANADIAN=yes" lead to inclusion of > headers from build (-I/usr/include). "CANADIAN=yes" used _only_ > to set "-I/usr/include". > > Inclusion of build headers in cross-compilation > process is not a mistake only in case of native (i.e. it is mistake > for cross, for canadian, for crossed native and for crossback), > but sometimes give "success". > > Note, that build/host/target may be different not only due to > different architectures, but due to different sysroots > (libc, kernel, binutils, etc.). > > CANADIAN is set to "yes" by code > > - # If Canadian cross, then don't pick up tools from the build directory. > - # Used only in GLIBCXX_EXPORT_INCLUDES. > - if test -n "$with_cross_host" && > - test x"$build_alias" != x"$with_cross_host" && > - test x"$build" != x"$target"; > - then > -CANADIAN=yes > - else > -CANADIAN=no > - fi > > and it add "-I/usr/include" to compiler flags for building libstdc++. > This is wrong. > > Reference to patch: > https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01332.html
Re: [PATCH][PR debug/83758] look more carefully for internal_arg_pointer in vt_add_function_parameter()
On Tue, Jan 30, 2018 at 06:50:50AM -0600, Aaron Sawdey wrote: > > Anyway, my preference would be to change that gen_rtx_PLUS into > > stack_parm = crtl->args.internal_arg_pointer; > > if (!CONST_INT_P (offset_rtx)) > > stack_parm = gen_rtx_PLUS (Pmode, stack_parm, offset_rtx); > > else if (offset_rtx != const0_rtx) > > stack_parm = plus_constant (Pmode, stack_parm, INTVAL > > (offset_rtx)); > > stack_parm = gen_rtx_MEM (data->promoted_mode, stack_parm); > > and deal specially with GET_CODE (crtl->args.internal_arg_pointer) > > in var-tracking.c. > > rs6000/powerpcspe with -fsplit-stack are the only cases where > > crtl->args.internal_arg_pointer is not a REG, so just running libgo > > testsuite on powerpc{,64,64le} should cover it all. > > I'll give this a try today when I get to the office. On IRC when discussing it with Segher this morning we've come to the conclusion that it would be best if rs6000 just followed what all other ports to, i.e. return a pseudo from the target hook, like: --- gcc/config/rs6000/rs6000.c 2018-01-30 12:30:27.416360076 +0100 +++ gcc/config/rs6000/rs6000.c 2018-01-30 13:59:07.360639803 +0100 @@ -29602,8 +29602,9 @@ rs6000_internal_arg_pointer (void) emit_insn_after (pat, get_insns ()); pop_topmost_sequence (); } - return plus_constant (Pmode, cfun->machine->split_stack_arg_pointer, - FIRST_PARM_OFFSET (current_function_decl)); + rtx ret = plus_constant (Pmode, cfun->machine->split_stack_arg_pointer, + FIRST_PARM_OFFSET (current_function_decl)); + return copy_to_reg (ret); } return virtual_incoming_args_rtx; } copy_to_reg is what e.g. the generic or pa target hook conditionally uses. Optionally, check also whether the push_topmost_sequence (); emit_insn_after (pat, get_insns ()); pop_topmost_sequence (); stuff is needed instead of just emit_insn (pat); and whether it needs to be guarded with cfun->machine->split_stack_arg_pointer == NULL, because the target hook is called exactly once for each function. Jakub
Fix ipa-inline ICE
Hi, this patch fixed ICE that was introduced by Richard Standiford's change to reorder can and want_inline predicates to reduce amount of work done to verify inlining limits. This bypasses check that the function is optimized that makes inliner to ICE because function summary is missing. This patch breaks out the expensive limits checking from can predicate to new one which makes code bit more convoluted but I hope to clean things up next stage1. Bootstrapped/regtested x86_64-linux, will commit it later today. Honza PR ipa/81360 * ipa-inline.c (can_inline_edge_p): Break out late tests to... (can_inline_edge_by_limits_p): ... here. (can_early_inline_edge_p, check_callers, update_caller_keys, update_callee_keys, recursive_inlining, add_new_edges_to_heap, speculation_useful_p, inline_small_functions, inline_small_functions, flatten_function, inline_to_all_callers_1): Update. * g++.dg/torture/pr81360.C: New testcase Index: ipa-inline.c === --- ipa-inline.c(revision 257174) +++ ipa-inline.c(working copy) @@ -289,18 +289,16 @@ sanitize_attrs_match_for_inline_p (const (opts_for_fn (caller->decl)->x_##flag\ != opts_for_fn (callee->decl)->x_##flag) - /* Decide if we can inline the edge and possibly update +/* Decide if we can inline the edge and possibly update inline_failed reason. We check whether inlining is possible at all and whether caller growth limits allow doing so. - if REPORT is true, output reason to the dump file. - - if DISREGARD_LIMITS is true, ignore size limits.*/ + if REPORT is true, output reason to the dump file. */ static bool can_inline_edge_p (struct cgraph_edge *e, bool report, - bool disregard_limits = false, bool early = false) + bool early = false) { gcc_checking_assert (e->inline_failed); @@ -316,9 +314,6 @@ can_inline_edge_p (struct cgraph_edge *e cgraph_node *caller = e->caller->global.inlined_to ? e->caller->global.inlined_to : e->caller; cgraph_node *callee = e->callee->ultimate_alias_target (&avail, caller); - tree caller_tree = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (caller->decl); - tree callee_tree -= callee ? DECL_FUNCTION_SPECIFIC_OPTIMIZATION (callee->decl) : NULL; if (!callee->definition) { @@ -379,12 +374,47 @@ can_inline_edge_p (struct cgraph_edge *e e->inline_failed = CIF_ATTRIBUTE_MISMATCH; inlinable = false; } + if (!inlinable && report) +report_inline_failed_reason (e); + return inlinable; +} + +/* Decide if we can inline the edge and possibly update + inline_failed reason. + We check whether inlining is possible at all and whether + caller growth limits allow doing so. + + if REPORT is true, output reason to the dump file. + + if DISREGARD_LIMITS is true, ignore size limits. */ + +static bool +can_inline_edge_by_limits_p (struct cgraph_edge *e, bool report, +bool disregard_limits = false, bool early = false) +{ + gcc_checking_assert (e->inline_failed); + + if (cgraph_inline_failed_type (e->inline_failed) == CIF_FINAL_ERROR) +{ + if (report) +report_inline_failed_reason (e); + return false; +} + + bool inlinable = true; + enum availability avail; + cgraph_node *caller = e->caller->global.inlined_to + ? e->caller->global.inlined_to : e->caller; + cgraph_node *callee = e->callee->ultimate_alias_target (&avail, caller); + tree caller_tree = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (caller->decl); + tree callee_tree += callee ? DECL_FUNCTION_SPECIFIC_OPTIMIZATION (callee->decl) : NULL; /* Check if caller growth allows the inlining. */ - else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl) - && !disregard_limits - && !lookup_attribute ("flatten", -DECL_ATTRIBUTES (caller->decl)) - && !caller_growth_limits (e)) + if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl) + && !disregard_limits + && !lookup_attribute ("flatten", +DECL_ATTRIBUTES (caller->decl)) + && !caller_growth_limits (e)) inlinable = false; /* Don't inline a function with a higher optimization level than the caller. FIXME: this is really just tip of iceberg of handling @@ -541,7 +571,8 @@ can_early_inline_edge_p (struct cgraph_e fprintf (dump_file, " edge not inlinable: not in SSA form\n"); return false; } - if (!can_inline_edge_p (e, true, false, true)) + if (!can_inline_edge_p (e, true, true) + || !can_inline_edge_by_limits_p (e, true, false, true)) return false; return true; } @@ -925,6 +956,8 @@ check_callers (struct cgraph_node *node, return true; if (e->recursive_p ()) return true; + if (!can_inline_edge_
Fix template for ipa-inline-2.c and ipa-inline-3.c
Hi, this PR is about rounding issue of sreal->integer conversion. I converted the whole code to sreals but the change seems bit too big and with no practical improvement to be justifiable, so this patch goes by simply fixing the template for current roundoff error. Hopefully counter code is now stable enough so we won't more random flips in GCC 8 release window. Bootstrapped/regtested x86_64-linux, comitted. Honza PR ipa/83179 * gcc.dg/ipa/inline-2.c: Fix template. * gcc.dg/ipa/inline-3.c: Fix template. Index: testsuite/gcc.dg/ipa/inline-2.c === --- testsuite/gcc.dg/ipa/inline-2.c (revision 257182) +++ testsuite/gcc.dg/ipa/inline-2.c (working copy) @@ -28,6 +28,6 @@ void foo (int invariant) } } /* After inlining bar into foo, op2 is invariant within inner loop. */ -/* { dg-final { scan-ipa-dump "op2 change 10.00. of time" "inline" } } */ +/* { dg-final { scan-ipa-dump "op2 change 9.99. of time" "inline" } } */ /* After inlining bar into foo, op3 is invariant within both loops. */ /* { dg-final { scan-ipa-dump "op3 change 1.00. of time" "inline" } } */ Index: testsuite/gcc.dg/ipa/inline-3.c === --- testsuite/gcc.dg/ipa/inline-3.c (revision 257182) +++ testsuite/gcc.dg/ipa/inline-3.c (working copy) @@ -21,4 +21,4 @@ int foo (int invariant) } -/* { dg-final { scan-ipa-dump "Scaling time by probability:0.10" "inline" } } */ +/* { dg-final { scan-ipa-dump "Scaling time by probability:0.099900" "inline" } } */
Re: Link with correct values-*.o files on Solaris (PR target/40411)
Hi Joseph, > On Fri, 12 Jan 2018, Rainer Orth wrote: > >> At the same time, I had a new look at when values-Xc.o is used by the >> Studio compilers. It selects strict ISO C mode in a couple of cases, >> and the latest Studio 12.6 cc, which is about to do away with the >> previous -Xc option which enabled that mode in favour of gcc-compatible >> -pedantic, uses the latter to control its use. So I've changed gcc to >> follow suit. > > gcc -pedantic is only ever a diagnostic option, not a semantic one; it's > incorrect to use it to change library semantics. The relevant options for > strict ISO C (and C++) modes are -ansi, -std=c*, -std=iso9899:199409 and > aliases for those options. that's what you get for changing your code at the eleventh hour ;-) Before introducing -pedantic, I had #define STARTFILE_ARCH_SPEC \ "%{ansi|std=c90|std=iso9899\\:199409:values-Xc.o%s; :values-Xa.o%s} \ (which isn't correct either since it only handled C90 and C94). I don't think you need to handle the aliases explicitly: last time I checked they never arrive in specs. Prompted by the realization that -ansi applies to both C and C++ and I only meant to affect C, I had a look at what the (then recently released) Studio 12.6 compilers do, which strive for more GCC compatibility. I've now also checked the OpenSolaris libc and libm sources for uses of _lib_version == strict_ansi (as is set in values-Xc.o). libc has none, and the uses in libm all follow this pattern (ignoring C Issue 4.2 compatibility mode handling no longer activated): if (lib_version == strict_ansi) { errno = EDOM; } else if (!matherr(&exc)) { errno = EDOM; } The default implementation of matherr (overridable by the user) just returns 0. So it seems the following snippet #define STARTFILE_ARCH_SPEC \ [...] %{std=c9*|std=iso9899\\:199409|std=c1*:values-Xc.o%s; :values-Xa.o%s} \ seems like the right thing to do, as you said. > -ansi is not defined as a .opt alias of -std=c90 (because the option it's > an alias for depends on whether C or C++ is being compiled), so I'd expect > the specs to need to handle it along with -std=c90. I'd also expect > -std=iso9899:199409 to need to be handled there, supposing handling it > like -std=c90 is correct. And what about C++ versions not based on C99 or > later? I'm of mixed minds about whether or not to include -ansi in the above list: for C it's correct, for C++ it's less clear. AFAIK there's no way to distinguish between different languages in specs (like via an -x switch always passed in). OTOH, it has always been there already. The following patch implements the above with corresponding comment adjustments. I'm open to suggestions what to do about -ansi. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2018-01-29 Rainer Orth PR target/40411 * config/sol2.h (STARTFILE_ARCH_SPEC): Use -std=c9*, -std=iso9899:199409, -std=c1* instead of -pedantic to select values-Xc.o. # HG changeset patch # Parent 85f8b72d36b77c7c044e6383f825596017ad Fix use of Solaris values-Xc.o (PR target/40411) diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h --- a/gcc/config/sol2.h +++ b/gcc/config/sol2.h @@ -180,7 +180,9 @@ along with GCC; see the file COPYING3. The values-X[ac].o objects set the variable _lib_version. The Studio C compilers use values-Xc.o with either -Xc or (since Studio 12.6) -pedantic to select strictly conformant ISO C behaviour, otherwise - values-Xa.o. + values-Xa.o. Since -pedantic is a diagnostic option only in GCC, we + need to specifiy the -std=c* options and -std=iso9899:199409. -ansi is + omitted from that list to avoid influencing C++. The values-xpg[46].o objects define either or both __xpg[46] variables, selecting XPG4 mode (__xpg4) and conforming C99/SUSv3 behavior (__xpg6). @@ -195,7 +197,7 @@ along with GCC; see the file COPYING3. #undef STARTFILE_ARCH_SPEC #define STARTFILE_ARCH_SPEC \ "%{!shared:%{!symbolic: \ - %{pedantic:values-Xc.o%s; :values-Xa.o%s} \ + %{std=c9*|std=iso9899\\:199409|std=c1*:values-Xc.o%s; :values-Xa.o%s} \ %{std=c90|std=gnu90:values-xpg4.o%s; :values-xpg6.o%s}}}" #if defined(HAVE_LD_PIE) && defined(HAVE_SOLARIS_CRTS)
[PATCH][PR target/84066] Wrong shadow stack register size is saved for x32
x32 is a 64-bit process with 32-bit software pointer and kernel may place x32 shadow stack above 4GB. We need to save and restore 64-bit shadow stack register for x32. builtin jmp buf size is 5 pointers. We have space to save 64-bit shadow stack pointer: 32-bit SP, 32-bit FP, 32-bit IP, 64-bit SSP for x32. PR target/84066 * gcc/config/i386/i386.md: Replace Pmode with word_mode in builtin_setjmp_setup and builtin_longjmp to support x32. * gcc/testsuite/gcc.target/i386/cet-sjlj-6.c: New test. Ok for trunk? Igor 0001-PR84066-Wrong-shadow-stack-register-size-is-saved-fo.patch Description: 0001-PR84066-Wrong-shadow-stack-register-size-is-saved-fo.patch
Re: [PATCH] Fix PR84101, account for function ABI details in vectorization costs
On Tue, 30 Jan 2018, Richard Biener wrote: > > This patch tries to deal with the "easy" part of a function ABI, > the return value location, in vectorization costing. The testcase > shows that if we vectorize the returned value but the function > doesn't return in memory or in a vector register but as in this > case in an integer register pair (reg:TI ax) (bah, ABI details > exposed late? why's this not a parallel?) we end up spilling > badly. > > The idea is to account for such spilling so if vectorization > benefits outweight the spilling we'll vectorize anyway. > > I think the particular testcase could be fixed in the subreg > pass basically undoing the vectorization but I realize that > generally this is a too hard problem and avoiding vectorization > is better. Still this patch is somewhat fragile in that it > depends on us "seeing" that the stored to decl is returned > (see cfun_returns). > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > I'd like to hear opinions on my use of hard_function_value > and also from other target maintainers. I'm not sure we > have sufficient testsuite coverage of _profitable_ vectorization > of a return value. Feel free to add to this for your > target. > > Ok for trunk? Ok, looks like hard_function_value doesn't like being called with type we then ICE: #0 fancy_abort ( file=0x2094ef0 "/space/rguenther/src/svn/early-lto-debug/gcc/machmode.h", line=292, function=0x2095534 ::require() const::__FUNCTION__> "require") at /space/rguenther/src/svn/early-lto-debug/gcc/diagnostic.c:1500 #1 0x00c34790 in opt_mode::require ( this=0x7fffcf20) at /space/rguenther/src/svn/early-lto-debug/gcc/machmode.h:292 #2 0x00db271a in hard_function_value (valtype=0x765009d8, func=0x764fff00, fntype=0x0, outgoing=1) at /space/rguenther/src/svn/early-lto-debug/gcc/explow.c:2212 2201 /* int_size_in_bytes can return -1. We don't need a check here 2202 since the value of bytes will then be large enough that no 2203 mode will match anyway. */ 2204 2205 FOR_EACH_MODE_IN_CLASS (tmpmode, MODE_INT) 2206{ (gdb) l 2207 /* Have we found a large enough mode? */ 2208 if (GET_MODE_SIZE (tmpmode.require ()) >= bytes) 2209break; 2210} 2211 2212 PUT_MODE (val, tmpmode.require ()); which means we didn't find an integer mode of the wanted size. The backend simply returned (reg:BLK 0 ax). I suppose I have to guard hard_function_value with sth ... Richard. > Thanks, > Richard. > > 2018-01-30 Richard Biener > > PR tree-optimization/84101 > * tree-vect-stmts.c: Include explow.h for hard_function_value. > (cfun_returns): New helper. > (vect_model_store_cost): When vectorizing a store to a decl > we return and the function ABI returns via a non-vector > register account for the possible spilling that will happen. > > * gcc.target/i386/pr84101.c: New testcase. > > Index: gcc/tree-vect-stmts.c > === > --- gcc/tree-vect-stmts.c (revision 257139) > +++ gcc/tree-vect-stmts.c (working copy) > @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. > #include "tree-cfg.h" > #include "tree-ssa-loop-manip.h" > #include "cfgloop.h" > +#include "explow.h" > #include "tree-ssa-loop.h" > #include "tree-scalar-evolution.h" > #include "tree-vectorizer.h" > @@ -893,6 +894,22 @@ vect_model_promotion_demotion_cost (stmt > "prologue_cost = %d .\n", inside_cost, prologue_cost); > } > > +/* Returns true if the current function returns DECL. */ > + > +static bool > +cfun_returns (tree decl) > +{ > + edge_iterator ei; > + edge e; > + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) > +{ > + greturn *ret = safe_dyn_cast (last_stmt (e->src)); > + if (ret && gimple_return_retval (ret) == decl) > + return true; > +} > + return false; > +} > + > /* Function vect_model_store_cost > > Models cost for stores. In the case of grouped accesses, one access > @@ -971,6 +988,36 @@ vect_model_store_cost (stmt_vec_info stm > vec_to_scalar, stmt_info, 0, vect_body); > } > > + /* When vectorizing an SLP store into the function result assign > + a penalty if the function returns in a non-vector register. > + In this case we assume we'll end up with having to spill the > + vector result and do element loads. */ > + if (slp_node) > +{ > + tree base = get_base_address (dr->ref); > + if (base > + && (TREE_CODE (base) == RESULT_DECL > + || (DECL_P (base) && cfun_returns (base > + { > + rtx reg = hard_function_value (TREE_TYPE (base), cfun->decl, 0, 1); > + if ((REG_P (reg) && ! VECTOR_MODE_P (GET_MODE (reg))) > + || GET_CODE (reg) ==
Re: [PATCH][PR target/84066] Wrong shadow stack register size is saved for x32
On Tue, Jan 30, 2018 at 3:19 PM, Tsimbalist, Igor V wrote: > x32 is a 64-bit process with 32-bit software pointer and kernel may > place x32 shadow stack above 4GB. We need to save and restore 64-bit > shadow stack register for x32. builtin jmp buf size is 5 pointers. We > have space to save 64-bit shadow stack pointer: 32-bit SP, 32-bit FP, > 32-bit IP, 64-bit SSP for x32. > > PR target/84066 > * gcc/config/i386/i386.md: Replace Pmode with word_mode in > builtin_setjmp_setup and builtin_longjmp to support x32. > * gcc/testsuite/gcc.target/i386/cet-sjlj-6.c: New test. > > Ok for trunk? LGTM, but please check the testcase with -mx32 -maddress-mode={short,long} nevertheless to catch any incosistencies. Thanks, Uros.
Re: Link with correct values-*.o files on Solaris (PR target/40411)
On Tue, 30 Jan 2018, Rainer Orth wrote: > So it seems the following snippet > > #define STARTFILE_ARCH_SPEC \ > [...] > %{std=c9*|std=iso9899\\:199409|std=c1*:values-Xc.o%s; :values-Xa.o%s} \ > > seems like the right thing to do, as you said. That version would need updating when we add -std=c2x (once there's a C2x working draft with features being added to it). If you use std=c* instead of separate std=c9* and std=c1* you'd avoid needing such a change - but then of course it would cover -std=c++* for C++. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH][PR target/84066] Wrong shadow stack register size is saved for x32
On Tue, Jan 30, 2018 at 6:38 AM, Uros Bizjak wrote: > On Tue, Jan 30, 2018 at 3:19 PM, Tsimbalist, Igor V > wrote: >> x32 is a 64-bit process with 32-bit software pointer and kernel may >> place x32 shadow stack above 4GB. We need to save and restore 64-bit >> shadow stack register for x32. builtin jmp buf size is 5 pointers. We >> have space to save 64-bit shadow stack pointer: 32-bit SP, 32-bit FP, >> 32-bit IP, 64-bit SSP for x32. >> >> PR target/84066 >> * gcc/config/i386/i386.md: Replace Pmode with word_mode in >> builtin_setjmp_setup and builtin_longjmp to support x32. >> * gcc/testsuite/gcc.target/i386/cet-sjlj-6.c: New test. >> >> Ok for trunk? > > LGTM, but please check the testcase with -mx32 > -maddress-mode={short,long} nevertheless to catch any incosistencies. > Speaking of -maddress-mode=, shouldn't + reg_adj = gen_rtx_SUBREG (Pmode, reg_ssp, 0); tmp = gen_rtx_SET (reg_adj, gen_rtx_LSHIFTRT (Pmode, negate_rtx (Pmode, reg_adj), - GEN_INT ((Pmode == SImode) + GEN_INT ((word_mode == SImode) ? 2 : 3))); be + reg_adj = gen_rtx_SUBREG (ptr_mode, reg_ssp, 0); tmp = gen_rtx_SET (reg_adj, gen_rtx_LSHIFTRT (ptr_mode, negate_rtx (ptr_mode, reg_adj), - GEN_INT ((Pmode == SImode) + GEN_INT ((word_mode == SImode) ? 2 : 3))); Pmode == word_mode for -maddress-mode=long. +++ b/gcc/testsuite/gcc.target/i386/cet-sjlj-6.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fcf-protection -mcet -mx32" } */ +/* { dg-final { scan-assembler-times "endbr64" 2 } } */ +/* { dg-final { scan-assembler-times "movq\t.*buf\\+12" 1 } } */ +/* { dg-final { scan-assembler-times "subq\tbuf\\+12" 1 } } */ +/* { dg-final { scan-assembler-times "rdsspq" 2 } } */ +/* { dg-final { scan-assembler-times "incsspq" 2 } } */ Please add a test for tmp = gen_rtx_SET (reg_adj, gen_rtx_LSHIFTRT (Pmode, negate_rtx (Pmode, reg_adj), - GEN_INT ((Pmode == SImode) + GEN_INT ((word_mode == SImode) ? 2 : 3))); -- H.J.
Re: Link with correct values-*.o files on Solaris (PR target/40411)
Hi Joseph, > On Tue, 30 Jan 2018, Rainer Orth wrote: > >> So it seems the following snippet >> >> #define STARTFILE_ARCH_SPEC \ >> [...] >> %{std=c9*|std=iso9899\\:199409|std=c1*:values-Xc.o%s; :values-Xa.o%s} \ >> >> seems like the right thing to do, as you said. > > That version would need updating when we add -std=c2x (once there's a C2x > working draft with features being added to it). If you use std=c* instead > of separate std=c9* and std=c1* you'd avoid needing such a change - but > then of course it would cover -std=c++* for C++. I know, that why I used the current form. Admittedly it's a maintenance problem (likely to be forgotten in the future). If we add in -ansi, we can just as well add -std=c++* (thus -std=c*), too. I guess that's the safest option overall, unless Jonathan has objections from a C++ standards perspective. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: Link with correct values-*.o files on Solaris (PR target/40411)
On 30/01/18 15:51 +0100, Rainer Orth wrote: Hi Joseph, On Tue, 30 Jan 2018, Rainer Orth wrote: So it seems the following snippet #define STARTFILE_ARCH_SPEC \ [...] %{std=c9*|std=iso9899\\:199409|std=c1*:values-Xc.o%s; :values-Xa.o%s} \ seems like the right thing to do, as you said. That version would need updating when we add -std=c2x (once there's a C2x working draft with features being added to it). If you use std=c* instead of separate std=c9* and std=c1* you'd avoid needing such a change - but then of course it would cover -std=c++* for C++. I know, that why I used the current form. Admittedly it's a maintenance problem (likely to be forgotten in the future). If we add in -ansi, we can just as well add -std=c++* (thus -std=c*), too. I guess that's the safest option overall, unless Jonathan has objections from a C++ standards perspective. No objections from me, I'm happy either way.
Re: [PR83370][AARCH64]Use tighter register constraints for sibcall patterns.
Hi Richard, Thanks for the review! On 29/01/18 20:23, Richard Sandiford wrote: The patch looks good to me FWIW. How about adding something like the following testcase? /* { dg-do run } */ /* { dg-options "-O2" } */ typedef void (*fun) (void); void __attribute__ ((noipa)) f (fun x1) { register fun x2 asm ("x16"); int arr[5000]; int *volatile ptr = arr; asm ("mov %0, %1" : "=r" (x2) : "r" (x1)); x2 (); } void g (void) {} int main (void) { f (g); } It was hard for me to construct an test case at that time. Your example here exactly reflect the problem. The code-gen before the change is: f: mov x16, 20016 sub sp, sp, x16 add x0, sp, 16 mov x16, 20016 str x0, [sp, 8] add sp, sp, x16 br x16 After the change to the register constraint: f: mov x16, 20016 sub sp, sp, x16 // Start of user assembly // 9 "indirect_tail_call_reg.c" 1 mov x16, x0 // 0 "" 2 // End of user assembly add x0, sp, 16 str x0, [sp, 8] mov x0, x16 mov x16, 20016 add sp, sp, x16 br x0 I updated the patch with the new test case, the wording about the register constraint is also updated. Thanks, Renlin gcc/ChangeLog: 2018-01-30 Renlin Li * config/aarch64/aarch64.c (aarch64_class_max_nregs): Handle TAILCALL_ADDR_REGS. (aarch64_register_move_cost): Likewise. * config/aarch64/aarch64.h (reg_class): Rename CALLER_SAVE_REGS to TAILCALL_ADDR_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Rename CALLER_SAVE_REGS to TAILCALL_ADDR_REGS. Remove IP registers. * config/aarch64/aarch64.md (Ucs): Update register constraint. gcc/testsuite/ChangeLog: 2018-01-30 Richard Sandiford * gcc.target/aarch64/indirect_tail_call_reg.c: New. [...] diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index af4143ef756464afac29d17f124b436520f90451..c3791aa89562a5d5542098d2f7951afc57901150 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -21,8 +21,8 @@ (define_register_constraint "k" "STACK_REG" "@internal The stack register.") -(define_register_constraint "Ucs" "CALLER_SAVE_REGS" - "@internal The caller save registers.") +(define_register_constraint "Ucs" "TAILCALL_ADDR_REGS" + "@internal The indirect tail call address registers") (define_register_constraint "w" "FP_REGS" "Floating point and SIMD vector registers.") Maybe "@internal Registers suitable for an indirect tail call"? Unlike the caller-save registers, these aren't a predefined set. Thanks, Richard diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 93d29b84d47b7017661a2129d61e7d740bbf7c93..322b7f4628aa69cf331c12ff2c8df351890da9ef 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -446,7 +446,7 @@ extern unsigned aarch64_architecture_version; enum reg_class { NO_REGS, - CALLER_SAVE_REGS, + TAILCALL_ADDR_REGS, GENERAL_REGS, STACK_REG, POINTER_REGS, @@ -462,7 +462,7 @@ enum reg_class #define REG_CLASS_NAMES\ { \ "NO_REGS", \ - "CALLER_SAVE_REGS",\ + "TAILCALL_ADDR_REGS",\ "GENERAL_REGS",\ "STACK_REG", \ "POINTER_REGS",\ @@ -475,7 +475,7 @@ enum reg_class #define REG_CLASS_CONTENTS \ { \ { 0x, 0x, 0x }, /* NO_REGS */ \ - { 0x0007, 0x, 0x }, /* CALLER_SAVE_REGS */ \ + { 0x0004, 0x, 0x }, /* TAILCALL_ADDR_REGS */\ { 0x7fff, 0x, 0x0003 }, /* GENERAL_REGS */ \ { 0x8000, 0x, 0x }, /* STACK_REG */ \ { 0x, 0x, 0x0003 }, /* POINTER_REGS */ \ diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1da313f57e0eed4df36dbd15aecbae9fd73f7388..59ca95019ddf0491c382e7ee2b99966694d0b36d 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -6062,7 +6062,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode) { switch (regclass) { -case CALLER_SAVE_REGS: +case TAILCALL_ADDR_REGS: case POINTER_REGS: case GENERAL_REGS: case ALL_REGS: @@ -8228,10 +8228,10 @@ aarch64_register_move_cost (machine_mode mode, = aarch64_tune_params.regmove_cost; /* Caller save and pointer regs are equivalent to GENERAL_REGS. */ - if (to == CALLER_SAVE_REGS || to == POINTER_REGS) + if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS) to = GENERAL_REGS; - if (from == CALLER_SAVE_REGS || from == POINTER_REGS) + if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS) from = GENERAL_REGS; /* Moving between GPR and stack cost is the same as GP2GP. */ diff --git a/gcc/config/aarch64/constraints.md b/gcc/c
[PATCH] Fix sched-deps.c handling of frame related insns (PR rtl-optimization/83986)
Hi! We don't want to move /f instructions across jumps in either direction, sched_analyze_insn has code for that by adding the insn into sched_before_next_jump list and by adding anti dependence against pending_jump_insns. That works fine, as long as we don't flush the dependency lists, flush_pending_lists will flush them and just add the insn that was flushed into last_pending_memory_flush. For frame related insns, we are looking solely for deps->pending_jump_insns and thus if we've flushed, we miss anti dependency against insn that has anti dependencies against jumps. The following patch fixes it by also adding an anti dependency against last_pending_memory_flush insn if any. Bootstrapped on {x86_64,i686,powerpc64le,powerpc64}-linux and regtested on the first 3, with powerpc64-linux regtest -m32/-m64 still pending, ok for trunk if it passes even there? I believe this matches what we do elsewhere, but just in case it would look too costly (as it could add dependencies also in cases where there weren't any pending jump insns, just insns using memory and too many dependencies for them), there might be another option like have a flag in deps whether in the bb/ebb we've ever added something into pending_jump_insns and only do the newly added call if that bool is true. Or tweak flush_pending_lists, such that if deps->pending_jump_insns used to be non-empty then add at the end the insn in question to deps->pending_jump_insns in addition to deps->last_pending_memory_flush, and tweak the: if (deps->pending_flush_length++ >= MAX_PENDING_LIST_LENGTH) flush_pending_lists (deps, insn, true, true); else deps->pending_jump_insns = alloc_INSN_LIST (insn, deps->pending_jump_insns); so that it would for the case when calling flush_pending_lists also do the else part if before the flush_pending_lists pending_jump_insns used to be non-empty. 2018-01-30 Jakub Jelinek PR rtl-optimization/83986 * sched-deps.c (sched_analyze_insn): For frame related insns, add anti dependence against last_pending_memory_flush in addition to pending_jump_insns. * gcc.dg/pr83986.c: New test. --- gcc/sched-deps.c.jj 2018-01-27 07:27:51.723937094 +0100 +++ gcc/sched-deps.c2018-01-30 13:09:16.891173722 +0100 @@ -2922,6 +2922,8 @@ sched_analyze_insn (struct deps_desc *de = alloc_INSN_LIST (insn, deps->sched_before_next_jump); /* Make sure epilogue insn is scheduled after preceding jumps. */ + add_dependence_list (insn, deps->last_pending_memory_flush, 1, + REG_DEP_ANTI, true); add_dependence_list (insn, deps->pending_jump_insns, 1, REG_DEP_ANTI, true); } --- gcc/testsuite/gcc.dg/pr83986.c.jj 2018-01-30 13:34:13.254828800 +0100 +++ gcc/testsuite/gcc.dg/pr83986.c 2018-01-30 13:33:55.973832214 +0100 @@ -0,0 +1,14 @@ +/* PR rtl-optimization/83986 */ +/* { dg-do compile } */ +/* { dg-options "-g -O2 -fsched2-use-superblocks -funwind-tables --param max-pending-list-length=1" } */ + +int v; + +int +foo (int x) +{ + v &= !!v && !!x; + if (v != 0) +foo (0); + return 0; +} Jakub
Re: [PATCH, rs6000][PR debug/83758] v2 rs6000_internal_arg_pointer should only return a register
On Tue, 2018-01-30 at 14:04 +0100, Jakub Jelinek wrote: > On IRC when discussing it with Segher this morning we've come to the > conclusion that it would be best if rs6000 just followed what all > other > ports to, i.e. return a pseudo from the target hook, like: > > --- gcc/config/rs6000/rs6000.c2018-01-30 12:30:27.416360076 > +0100 > +++ gcc/config/rs6000/rs6000.c2018-01-30 13:59:07.360639803 > +0100 > @@ -29602,8 +29602,9 @@ rs6000_internal_arg_pointer (void) > emit_insn_after (pat, get_insns ()); > pop_topmost_sequence (); > } > - return plus_constant (Pmode, cfun->machine- > >split_stack_arg_pointer, > - FIRST_PARM_OFFSET > (current_function_decl)); > + rtx ret = plus_constant (Pmode, cfun->machine- > >split_stack_arg_pointer, > +FIRST_PARM_OFFSET > (current_function_decl)); > + return copy_to_reg (ret); > } >return virtual_incoming_args_rtx; > } > > copy_to_reg is what e.g. the generic or pa target hook conditionally > uses. This fix looks good, passes bootstrap, go tests run. Segher is currently regtesting on ppc64le power9. OK for trunk if tests pass? 2018-01-30 Aaron Sawdey * config/rs6000/rs6000.c (rs6000_internal_arg_pointer ): Only return a reg rtx. -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 257188) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -29602,8 +29602,9 @@ emit_insn_after (pat, get_insns ()); pop_topmost_sequence (); } - return plus_constant (Pmode, cfun->machine->split_stack_arg_pointer, - FIRST_PARM_OFFSET (current_function_decl)); + rtx ret = plus_constant (Pmode, cfun->machine->split_stack_arg_pointer, + FIRST_PARM_OFFSET (current_function_decl)); + return copy_to_reg (ret); } return virtual_incoming_args_rtx; }
Re: [PATCH, rs6000][PR debug/83758] v2 rs6000_internal_arg_pointer should only return a register
On Tue, Jan 30, 2018 at 10:21:33AM -0600, Aaron Sawdey wrote: > 2018-01-30 Aaron Sawdey > > * config/rs6000/rs6000.c (rs6000_internal_arg_pointer ): Only return No space before ): . Will defer ack to Segher or David. > a reg rtx. > Index: gcc/config/rs6000/rs6000.c > === > --- gcc/config/rs6000/rs6000.c(revision 257188) > +++ gcc/config/rs6000/rs6000.c(working copy) > @@ -29602,8 +29602,9 @@ > emit_insn_after (pat, get_insns ()); > pop_topmost_sequence (); > } > - return plus_constant (Pmode, cfun->machine->split_stack_arg_pointer, > - FIRST_PARM_OFFSET (current_function_decl)); > + rtx ret = plus_constant (Pmode, cfun->machine->split_stack_arg_pointer, > +FIRST_PARM_OFFSET (current_function_decl)); > + return copy_to_reg (ret); > } >return virtual_incoming_args_rtx; > } Jakub
Re: [PATCH] PR 78534 Reinstate better string copy algorithm
PING On Tue, Jan 23, 2018 at 3:31 PM, Janne Blomqvist wrote: > As part of the change to larger character lengths, the string copy > algorithm was temporarily pessimized to get around some spurious > -Wstringop-overflow warnings. Having tried a number of variations of > this algorithm I have managed to get it down to one spurious warning, > only with -O1 optimization, in the testsuite. This patch reinstates > the optimized variant and modifies this one testcase to ignore the > warning. > > Regtested on x86_64-pc-linux-gnu, Ok for trunk? > > gcc/fortran/ChangeLog: > > 2018-01-23 Janne Blomqvist > > PR 78534 > * trans-expr.c (fill_with_spaces): Use memset instead of > generating loop. > (gfc_trans_string_copy): Improve opportunity to use builtins with > constant lengths. > > gcc/testsuite/ChangeLog: > > 2018-01-23 Janne Blomqvist > > PR 78534 > * gfortran.dg/allocate_deferred_char_scalar_1.f03: Prune > -Wstringop-overflow warnings due to spurious warning with -O1. > * gfortran.dg/char_cast_1.f90: Update dump scan pattern. > * gfortran.dg/transfer_intrinsic_1.f90: Likewise. > --- > gcc/fortran/trans-expr.c | 52 > -- > .../allocate_deferred_char_scalar_1.f03| 2 + > gcc/testsuite/gfortran.dg/char_cast_1.f90 | 6 +-- > gcc/testsuite/gfortran.dg/transfer_intrinsic_1.f90 | 2 +- > 4 files changed, 35 insertions(+), 27 deletions(-) > > diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c > index e90036f..4da6cee 100644 > --- a/gcc/fortran/trans-expr.c > +++ b/gcc/fortran/trans-expr.c > @@ -6407,8 +6407,6 @@ fill_with_spaces (tree start, tree type, tree size) >tree i, el, exit_label, cond, tmp; > >/* For a simple char type, we can call memset(). */ > - /* TODO: This code does work and is potentially more efficient, but > - causes spurious -Wstringop-overflow warnings. >if (compare_tree_int (TYPE_SIZE_UNIT (type), 1) == 0) > return build_call_expr_loc (input_location, > builtin_decl_explicit (BUILT_IN_MEMSET), > @@ -6416,7 +6414,6 @@ fill_with_spaces (tree start, tree type, tree size) > build_int_cst (gfc_get_int_type (gfc_c_int_kind), >lang_hooks.to_target_charset (' > ')), > fold_convert (size_type_node, size)); > - */ > >/* Otherwise, we use a loop: > for (el = start, i = size; i > 0; el--, i+= TYPE_SIZE_UNIT (type)) > @@ -6522,11 +6519,20 @@ gfc_trans_string_copy (stmtblock_t * block, tree > dlength, tree dest, > >/* The string copy algorithm below generates code like > > - if (dlen > 0) { > - memmove (dest, src, min(dlen, slen)); > - if (slen < dlen) > - memset(&dest[slen], ' ', dlen - slen); > - } > + if (destlen > 0) > + { > + if (srclen < destlen) > + { > + memmove (dest, src, srclen); > + // Pad with spaces. > + memset (&dest[srclen], ' ', destlen - srclen); > + } > + else > + { > + // Truncate if too long. > + memmove (dest, src, destlen); > + } > + } >*/ > >/* Do nothing if the destination length is zero. */ > @@ -6555,21 +6561,16 @@ gfc_trans_string_copy (stmtblock_t * block, tree > dlength, tree dest, >else > src = gfc_build_addr_expr (pvoid_type_node, src); > > - /* First do the memmove. */ > - tmp2 = fold_build2_loc (input_location, MIN_EXPR, TREE_TYPE (dlen), dlen, > - slen); > - tmp2 = build_call_expr_loc (input_location, > - builtin_decl_explicit (BUILT_IN_MEMMOVE), > - 3, dest, src, > - fold_convert (size_type_node, tmp2)); > - stmtblock_t tmpblock2; > - gfc_init_block (&tmpblock2); > - gfc_add_expr_to_block (&tmpblock2, tmp2); > - > - /* If the destination is longer, fill the end with spaces. */ > + /* Truncate string if source is too long. */ >cond2 = fold_build2_loc (input_location, LT_EXPR, logical_type_node, slen, >dlen); > > + /* Copy and pad with spaces. */ > + tmp3 = build_call_expr_loc (input_location, > + builtin_decl_explicit (BUILT_IN_MEMMOVE), > + 3, dest, src, > + fold_convert (size_type_node, slen)); > + >/* Wstringop-overflow appears at -O3 even though this warning is not > explicitly available in fortran nor can it be switched off. If the > source length is a constant, its negative appears as a very large > @@ -6584,14 +6585,19 @@ gfc_trans_string_copy (stmtblock_t * block, tree > dlength, tree dest, >tmp4 = fill_with_spaces (tmp4, chartype, tmp); > >gfc_init_block (&tempblock); > + gfc_add_expr_to
Re: [PATCH] Use pointer sized array indices.
PING**2 On Wed, Jan 17, 2018 at 8:02 PM, Janne Blomqvist wrote: > PING > > On Fri, Dec 29, 2017 at 8:28 PM, Janne Blomqvist > wrote: >> On Fri, Dec 29, 2017 at 7:17 PM, Thomas Koenig wrote: >>> Hi Janne, >>> Using pointer sized variables (e.g. size_t / ptrdiff_t) when the variables are used as array indices allows accessing larger arrays, and can be a slight performance improvement due to no need for sign or zero extending, or masking. >>> >>> >>> Unless I have missed something, all the examples are for cases where >>> the array is of maximum size GFC_MAX_DIMENSIONS. >> >> Many, but not all. >> >>> This is why they >>> were left as int in the first place (because it is unlikely we will have >>> arrays of more than 2^31-1 dimensions soon :-). >>> >>> Do you really think this change is necessary? If not, I'd rather avoid >>> such a change. >> >> I'm not planning on supporting > 2**31-1 dimensions, no. :) >> >> But even if we know that the maximum value is always going to be >> smaller, by using pointer-sized variables the compiler can generate >> slightly more efficient code. >> >> See e.g. https://godbolt.org/g/oAvw5L ; in the functions with a loop, >> the ones which use pointer-sized indices have shorter preambles as >> well as loop bodies. And for the simple functions that just index an >> array, by using pointer-sized indices there is no need to zero the >> upper half of the registers. >> >> I mean, it's not a huge improvement, but it might be a tiny one in some >> cases. >> >> Also, by moving the induction variable from the beginning of the >> function into the loop header, it makes it easier for both readers and >> the compiler to see the scope of the variable. >> >> -- >> Janne Blomqvist > > > > -- > Janne Blomqvist -- Janne Blomqvist
Re: [patch, libfortran] Adapt handling of derived types to new dtype
On Mon, Jan 29, 2018 at 11:51 PM, Thomas Koenig wrote: > Hello world, > > this patch fixes the library issues left over from Paul's rank-15 patch > by removing the GFC_DTYPE_DERIVED_* macros and (mostly) handling these > cases separately. > > The problem was with folding the type and size into a single word. For > all normal data types, this is perfectly fine, because the sizes are > known to fit. For derived types or characters, there could be a problem > if a user specified a very large size. > > Regression-tested. OK for trunk? Ok, thanks. -- Janne Blomqvist
Re: [PATCH, rs6000][PR debug/83758] v2 rs6000_internal_arg_pointer should only return a register
Hi! On Tue, Jan 30, 2018 at 10:21:33AM -0600, Aaron Sawdey wrote: > This fix looks good, passes bootstrap, go tests run. > > Segher is currently regtesting on ppc64le power9. OK for trunk if tests > pass? > > 2018-01-30 Aaron Sawdey > > * config/rs6000/rs6000.c (rs6000_internal_arg_pointer ): Only return > a reg rtx. Yes please, with the changelog typo fixed as Jakub said. Thanks! Glad this is over with :-) I think it needs backports, too? Those are okay as well. Segher > Index: gcc/config/rs6000/rs6000.c > === > --- gcc/config/rs6000/rs6000.c(revision 257188) > +++ gcc/config/rs6000/rs6000.c(working copy) > @@ -29602,8 +29602,9 @@ > emit_insn_after (pat, get_insns ()); > pop_topmost_sequence (); > } > - return plus_constant (Pmode, cfun->machine->split_stack_arg_pointer, > - FIRST_PARM_OFFSET (current_function_decl)); > + rtx ret = plus_constant (Pmode, cfun->machine->split_stack_arg_pointer, > +FIRST_PARM_OFFSET (current_function_decl)); > + return copy_to_reg (ret); > } >return virtual_incoming_args_rtx; > }
Re: [PATCH] make -Wrestrict for strcat more meaningful (PR 83698)
Ping: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01488.html This is a minor improvement but there have been a couple of comments lately pointing out that the numbers in the -Wrestrict messages can make them confusing. Jakub, since you made one of those comments (the other came up in the context of bug 84095 - [8 Regression] false-positive -Wrestrict warnings for memcpy within array). Can you please either approve this patch or suggest changes? On 01/16/2018 05:35 PM, Martin Sebor wrote: On 01/16/2018 02:32 PM, Jakub Jelinek wrote: On Tue, Jan 16, 2018 at 01:36:26PM -0700, Martin Sebor wrote: --- gcc/gimple-ssa-warn-restrict.c(revision 256752) +++ gcc/gimple-ssa-warn-restrict.c(working copy) @@ -384,6 +384,12 @@ builtin_memref::builtin_memref (tree expr, tree si base = SSA_NAME_VAR (base); } + if (DECL_P (base) && TREE_CODE (TREE_TYPE (base)) == ARRAY_TYPE) +{ + if (offrange[0] < 0 && offrange[1] > 0) +offrange[0] = 0; +} Why the 2 nested ifs? No particular reason. There may have been more code in there that I ended up removing. Or a comment. I can remove the extra braces when the patch is approved. @@ -1079,14 +1085,35 @@ builtin_access::strcat_overlap () return false; /* When strcat overlap is certain it is always a single byte: - the terminatinn NUL, regardless of offsets and sizes. When + the terminating NUL, regardless of offsets and sizes. When overlap is only possible its range is [0, 1]. */ acs.ovlsiz[0] = dstref->sizrange[0] == dstref->sizrange[1] ? 1 : 0; acs.ovlsiz[1] = 1; - acs.ovloff[0] = (dstref->sizrange[0] + dstref->offrange[0]).to_shwi (); - acs.ovloff[1] = (dstref->sizrange[1] + dstref->offrange[1]).to_shwi (); You use to_shwi many times in the patch, do the callers or something earlier in this function guarantee that you aren't throwing away any bits (unlike tree_to_shwi, to_shwi method doesn't ICE, just throws away upper bits). Especially when you perform additions like here, even if both wide_ints fit into a shwi, the result might not. No, I'm not sure. In fact, it wouldn't surprise me if it did happen. It doesn't cause false positives or negatives but it can make the offsets less than meaningful in cases where they are within valid bounds. There are also cases where they are meaningless to begin with and there is little the pass can do about that. IMO, the ideal solution to the first problem is to add a format specifier for wide ints to the pretty printer and get rid of the conversions. It's probably too late for something like that now but I'd like to do it for GCC 9. Unless someone files a bug/regression, it's also too late for me to go and try to find and fix these conversions now. Martin PS While looking for a case you asked about I came up with the following. I don't think there's any slicing involved but the offsets are just as meaningless as if there were. I think the way to do significantly better is to detect out-of-bounds offsets earlier (e.g., as in this patch: https://gcc.gnu.org/ml/gcc-patches/2017-10/msg02143.html) $ cat z.c && gcc -O2 -S -Warray-bounds -m32 z.c extern int a[]; void f (__PTRDIFF_TYPE__ i) { if (i < __PTRDIFF_MAX__ - 7 || __PTRDIFF_MAX__ - 5 < i) i = __PTRDIFF_MAX__ - 7; const int *s = a + i; __builtin_memcpy (a, &s[i], 3); } z.c: In function ‘f’: z.c:10:3: warning: ‘__builtin_memcpy’ offset [-64, -48] is out of the bounds of object ‘a’ with type ‘int[]’ [-Warray-bounds] __builtin_memcpy (a, &s[i], 3); ^~ z.c:1:12: note: ‘a’ declared here extern int a[]; ^
Re: [PR81611] improve auto-inc
On Jan 25, 2018, Alexandre Oliva wrote: > Thanks, I'll retest with the simplified test (just in case; for I can't > recall why I ended up with all those redundant conditions), As suspected, removing the redundant tests didn't regress anything. I suppose they mattered in some earlier experimental version of the patch (I vaguely recall having a long sequence of tests within the loop at some point), and I just failed to further simplify the final form. Thanks for catching the further simplification opportunities! FTR, here's the patch I'm installing, while awaiting a review from someone else on the rtl auto-inc patch (the second and last patch in https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01994.html). [PR81611] accept copies in simple_iv_increment_p If there are copies between the GIMPLE_PHI at the loop body and the increment that reaches it (presumably through a back edge), still regard it as a simple_iv_increment, so that we won't consider the value in the back edge eligible for forwprop. Doing so would risk making the phi node and the incremented conflicting value live within the loop, and the phi node to be preserved for propagated uses after the loop. for gcc/ChangeLog PR tree-optimization/81611 * tree-ssa-dom.c (simple_iv_increment_p): Skip intervening copies. --- gcc/tree-ssa-dom.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index 2b371667253a..a6f176c5def0 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -1276,8 +1276,11 @@ record_equality (tree x, tree y, class const_and_copies *const_and_copies) /* Returns true when STMT is a simple iv increment. It detects the following situation: - i_1 = phi (..., i_2) - i_2 = i_1 +/- ... */ + i_1 = phi (..., i_k) + [...] + i_j = i_{j-1} for each j : 2 <= j <= k-1 + [...] + i_k = i_{k-1} +/- ... */ bool simple_iv_increment_p (gimple *stmt) @@ -1305,8 +1308,15 @@ simple_iv_increment_p (gimple *stmt) return false; phi = SSA_NAME_DEF_STMT (preinc); - if (gimple_code (phi) != GIMPLE_PHI) -return false; + while (gimple_code (phi) != GIMPLE_PHI) +{ + /* Follow trivial copies, but not the DEF used in a back edge, +so that we don't prevent coalescing. */ + if (!gimple_assign_ssa_name_copy_p (phi)) + return false; + preinc = gimple_assign_rhs1 (phi); + phi = SSA_NAME_DEF_STMT (preinc); +} for (i = 0; i < gimple_phi_num_args (phi); i++) if (gimple_phi_arg_def (phi, i) == lhs) -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: [SFN+LVU+IEPM v4 7/9] [LVU] Introduce location views
On Jan 26, 2018, Jakub Jelinek wrote: > Having to tweak debug info consumers so that they treat DW_LLE_* of 9 > one way for .debug_loclist of version 5 and another way for .debug_loclist > of version 6 isn't a good idea. Maybe we don't have to do that. The reason I implemented the proposed format was to have a reference implementation, but since it's an extension to a non-extensible part of DWARF5, it's hard to justify consumer implementations for that. > Why don't you emit the DW_LLE_GNU_view_pair stuff in .debug_loclists > already with -gdwarf-5, if needed paired without some TU attribute that says > that views are emitted? Because that would make the loclists unreadable for any DWARF5-compliant consumer. > Haven't looked for it it in the coding standards, but it is something > I've been asked to do in my code by various reviewers over the years and > what I've been asking others in my patch reviews from others too. Thanks. It was the first (well, second; you'd requested similar changes another time before) I heard of that. I'm still unconvinced the way I used is not compliant, but I'll keep that in mind. Hopefully I won't run into someone who insists the other one (the one you reject) is the only correct one ;-) > I feel strongly about indenting stuff right, I'm with you on that, we just have different ideas about what "right" stands for ;-) > which if it wouldn't fit would be > return verylongcondition > && otherlongcondition__; > rather than > return verylongcondition > && otherlongcondition__; > or similar, it would surprise me if it is not in the coding standard. I agree neither of these two forms is correct. But it is my understanding that both of the following are correct: return (verylongcondition && otherlongcondition__); return verylongcondition && otherlongcondition__; The first, because the parenthesized expression is continued with indentation to match the parenthesis, the second because the return statement is continued with the correct indentation for the continuation of a statement. >> Personally, I don't see that line breaks before operators are only >> permitted when the operand that follows wouldn't fit; the standards are > Yes, you can break it, but why, it doesn't make the code more readable, > but less in this case. I thought it made it more readable, especially in the context of the patch. I guess this means we'll have to agree to disagree. > So, what is the reason not to emit the format you are proposing already with > -gdwarf-5, perhaps with some extra attribute on the TU (of course not when > -gstrict-dwarf)? See above. We don't want to create unreadable loclists for standard-compliant consumers, do we? -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: [PATCH] Fix sched-deps.c handling of frame related insns (PR rtl-optimization/83986)
On 01/30/2018 11:11 AM, Jakub Jelinek wrote: Hi! We don't want to move /f instructions across jumps in either direction, sched_analyze_insn has code for that by adding the insn into sched_before_next_jump list and by adding anti dependence against pending_jump_insns. That works fine, as long as we don't flush the dependency lists, flush_pending_lists will flush them and just add the insn that was flushed into last_pending_memory_flush. Yes, we don't think to move frame related insns.As most frame insns are moves of args from hard regs to pseudos it might result in troubles in LRA can not find a spill regs because it has limited possibilities to spill hard regs. RA actually will try to remove these moves by assigning the same hard regs to the pseudos so scheduling them has a little sense. For frame related insns, we are looking solely for deps->pending_jump_insns and thus if we've flushed, we miss anti dependency against insn that has anti dependencies against jumps. The following patch fixes it by also adding an anti dependency against last_pending_memory_flush insn if any. Bootstrapped on {x86_64,i686,powerpc64le,powerpc64}-linux and regtested on the first 3, with powerpc64-linux regtest -m32/-m64 still pending, ok for trunk if it passes even there? OK. Thanks, Jakub. 2018-01-30 Jakub Jelinek PR rtl-optimization/83986 * sched-deps.c (sched_analyze_insn): For frame related insns, add anti dependence against last_pending_memory_flush in addition to pending_jump_insns. * gcc.dg/pr83986.c: New test. --- gcc/sched-deps.c.jj 2018-01-27 07:27:51.723937094 +0100 +++ gcc/sched-deps.c2018-01-30 13:09:16.891173722 +0100 @@ -2922,6 +2922,8 @@ sched_analyze_insn (struct deps_desc *de = alloc_INSN_LIST (insn, deps->sched_before_next_jump); /* Make sure epilogue insn is scheduled after preceding jumps. */ + add_dependence_list (insn, deps->last_pending_memory_flush, 1, + REG_DEP_ANTI, true); add_dependence_list (insn, deps->pending_jump_insns, 1, REG_DEP_ANTI, true); } --- gcc/testsuite/gcc.dg/pr83986.c.jj 2018-01-30 13:34:13.254828800 +0100 +++ gcc/testsuite/gcc.dg/pr83986.c 2018-01-30 13:33:55.973832214 +0100 @@ -0,0 +1,14 @@ +/* PR rtl-optimization/83986 */ +/* { dg-do compile } */ +/* { dg-options "-g -O2 -fsched2-use-superblocks -funwind-tables --param max-pending-list-length=1" } */ + +int v; + +int +foo (int x) +{ + v &= !!v && !!x; + if (v != 0) +foo (0); + return 0; +} Jakub
[PR lto/84105] handle TYPE_IDENTIFIERs for FUNCTION_TYPEs in the gimple pretty printer
Hi! As discussed in the PR, the ICE here happens in dump_generic_node: case FUNCTION_TYPE: case METHOD_TYPE: ... if (TYPE_NAME (node) && DECL_NAME (TYPE_NAME (node))) dump_decl_name (pp, TYPE_NAME (node), flags); (gdb) print node $21 = (gdb) print node.type_common.name $22 = TYPE_NAME is an IDENTIFIER_NODE, whereas we're expecting a DECL_P, and bad things happen. OK pending tests? gcc/ PR lto/84105 * tree-pretty-print.c (dump_generic_node): Handle a TYPE_NAME with an IDENTIFIER_NODE for FUNCTION_TYPE's. diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c index 54a8dfa3b6f..73eb27c8e8f 100644 --- a/gcc/tree-pretty-print.c +++ b/gcc/tree-pretty-print.c @@ -1822,7 +1822,9 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags, pp_string (pp, ""); pp_colon_colon (pp); } - if (TYPE_NAME (node) && DECL_NAME (TYPE_NAME (node))) + if (TYPE_IDENTIFIER (node)) + dump_generic_node (pp, TYPE_NAME (node), spc, flags, false); + else if (TYPE_NAME (node) && DECL_NAME (TYPE_NAME (node))) dump_decl_name (pp, TYPE_NAME (node), flags); else if (flags & TDF_NOUID) pp_printf (pp, "");
Re: [PING^3] [PATCH] Remove CANADIAN, that break compilation for foreign target
On Tue, 30 Jan 2018 12:54:21 + Jonathan Wakely wrote: > On 30/01/18 10:19 +0300, Petr Ovtchenkov wrote: > >On Tue, 19 Dec 2017 11:37:43 +0300 > >Petr Ovtchenkov wrote: > > > >ping^3 > > > I don't fully understand the consequences (or need) for this patch. > > I asked some other people to look at it, and didn't get confirmation > it's OK. So I'm reluctant to make the change. Especially this late in > the GCC 8 cycle. Hmm, almost two years for this words. Do you really think that -I/usr/include may be correct in build of cross of some kind (CANADIAN is just variant of cross)? Of cause, I don't consider case with rolling target headers in /usr/include. > > > >> On Thu, 16 Nov 2017 20:55:37 +0300 > >> Petr Ovtchenkov wrote: > >> > >> > On Wed, 20 Sep 2017 13:44:59 +0300 > >> > Petr Ovtchenkov wrote: > >> > > >> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71212 > >> > > > >> > > On Fri, 20 May 2016 16:10:50 +0300 > >> > > Petr Ovtchenkov wrote: > >> > > > >> > > > Some old ad-hoc (adding -I/usr/include to compiler > >> > > > flags) break compilation of libstdc++ for foreign > >> > > > target architecture (due to compiler see includes > >> > > > of native). > >> > > >> > Reference for terms: > >> > > >> > https://gcc.gnu.org/onlinedocs/gccint/Configure-Terms.html > >> > > >> > Present of "CANADIAN=yes" lead to inclusion of > >> > headers from build (-I/usr/include). "CANADIAN=yes" used _only_ > >> > to set "-I/usr/include". > >> > > >> > Inclusion of build headers in cross-compilation > >> > process is not a mistake only in case of native (i.e. it is mistake > >> > for cross, for canadian, for crossed native and for crossback), > >> > but sometimes give "success". > >> > > >> > Note, that build/host/target may be different not only due to > >> > different architectures, but due to different sysroots > >> > (libc, kernel, binutils, etc.). > >> > > >> > CANADIAN is set to "yes" by code > >> > > >> > - # If Canadian cross, then don't pick up tools from the build > >> > directory. > >> > - # Used only in GLIBCXX_EXPORT_INCLUDES. > >> > - if test -n "$with_cross_host" && > >> > - test x"$build_alias" != x"$with_cross_host" && > >> > - test x"$build" != x"$target"; > >> > - then > >> > -CANADIAN=yes > >> > - else > >> > -CANADIAN=no > >> > - fi > >> > > >> > and it add "-I/usr/include" to compiler flags for building libstdc++. > >> > This is wrong. > >> > > >> > Reference to patch: > >> > https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01332.html
Re: [PATCH] Use pointer sized array indices.
Hi Janne, PING**2 OK. Regards, Thomas
C++ PATCH for c++/84098, ICE with lambda in template NSDMI
When instantiating the members of a class template, we don't need to consider lambdas, as instantiating them will be handled when we get to tsubst_lambda_expr. And now the new assert catches places we were failing to ignore them, as here. Tested x86_64-pc-linux-gnu, applied to trunk. commit d97240d44096258f2c5218560861631b908d1024 Author: Jason Merrill Date: Mon Jan 29 17:31:42 2018 -0500 PR c++/84098 - ICE with lambda in template NSDMI. * pt.c (instantiate_class_template_1): Ignore more lambdas. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 6c5d06b9ebb..9516be893aa 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -10527,6 +10527,11 @@ instantiate_class_template_1 (tree type) { if (TYPE_P (t)) { + if (LAMBDA_TYPE_P (t)) + /* A closure type for a lambda in an NSDMI or default argument. + Ignore it; it will be regenerated when needed. */ + continue; + /* Build new CLASSTYPE_NESTED_UTDS. */ tree newtag; @@ -10594,11 +10599,10 @@ instantiate_class_template_1 (tree type) && DECL_OMP_DECLARE_REDUCTION_P (r)) cp_check_omp_declare_reduction (r); } - else if (DECL_CLASS_TEMPLATE_P (t) + else if ((DECL_CLASS_TEMPLATE_P (t) || DECL_IMPLICIT_TYPEDEF_P (t)) && LAMBDA_TYPE_P (TREE_TYPE (t))) - /* A closure type for a lambda in a default argument for a - member template. Ignore it; it will be instantiated with - the default argument. */; + /* A closure type for a lambda in an NSDMI or default argument. + Ignore it; it will be regenerated when needed. */; else { /* Build new TYPE_FIELDS. */ diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda19.C b/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda19.C new file mode 100644 index 000..a16d31c59eb --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda19.C @@ -0,0 +1,13 @@ +// PR c++/84098 +// { dg-options -std=c++17 } + +struct A{}; + +template < typename > +struct Test{ +static constexpr auto var = []{}; +}; + +int main(){ +(void)Test< A >::var; +}
C++ PATCH for c++/84091, ICE with local class in lambda in template
Another refinement to local class handling in determine_visibility. If we're looking at a local class inside a lambda, it thinks it is in a template but the lambda isn't a template instantiation. So look for the enclosing template context to use. Tested x86_64-pc-linux-gnu, applying to trunk. commit cc2f223a50dc6477844c7658403fb4be896036e5 Author: Jason Merrill Date: Tue Jan 30 07:45:01 2018 -0500 PR c++/84091 - ICE with local class in lambda in template. * decl2.c (determine_visibility): Look for outer containing template instantiation. diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index ef7e6de41c3..2da6f9023c5 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -2418,6 +2418,16 @@ determine_visibility (tree decl) by that. */ if (DECL_LANG_SPECIFIC (fn) && DECL_USE_TEMPLATE (fn)) template_decl = fn; + else if (template_decl) + { + /* FN must be a regenerated lambda function, since they don't +have template arguments. Find a containing non-lambda +template instantiation. */ + tree ctx = fn; + while (ctx && !get_template_info (ctx)) + ctx = get_containing_scope (ctx); + template_decl = ctx; + } } else if (VAR_P (decl) && DECL_TINFO_P (decl) && flag_visibility_ms_compat) diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-local1.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-local1.C new file mode 100644 index 000..a2dd350369a --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-local1.C @@ -0,0 +1,13 @@ +// PR c++/84091 +// { dg-do compile { target c++11 } } + +template < typename > void f () +{ + [] { struct A {} a; } (); +} + +int main () +{ + f < int > (); + return 0; +}
Re: Fix ipa-inline ICE
Hi Jan, > this patch fixed ICE that was introduced by Richard Standiford's change to > reorder > can and want_inline predicates to reduce amount of work done to verify > inlining limits. > This bypasses check that the function is optimized that makes inliner to ICE > because > function summary is missing. > > This patch breaks out the expensive limits checking from can predicate to new > one > which makes code bit more convoluted but I hope to clean things up next > stage1. [...] > * g++.dg/torture/pr81360.C: New testcase the new testcase comes out UNRESOLVED everywhere: pr81360.C -O0 scan-ipa-dump icf "Equal symbols: 0" +UNRESOLVED: g++.dg/torture/pr81360.C -O1 scan-ipa-dump icf "Equal symbols: 0" +UNRESOLVED: g++.dg/torture/pr81360.C -O2 scan-ipa-dump icf "Equal symbols: 0" +UNRESOLVED: g++.dg/torture/pr81360.C -O2 -flto scan-ipa-dump icf "Equal symbols: 0" +UNRESOLVED: g++.dg/torture/pr81360.C -O2 -flto -flto-partition=none scan-ipa-dump icf "Equal symbols: 0" +UNRESOLVED: g++.dg/torture/pr81360.C -O3 -g scan-ipa-dump icf "Equal symbols: 0" +UNRESOLVED: g++.dg/torture/pr81360.C -Os scan-ipa-dump icf "Equal symbols: 0" with g++.dg/torture/pr81360.C -O0 : dump file does not exist in the log. The following patch fixes that, tested with the appropriate runtest invocation. I guess this is obvious? Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2018-01-30 Rainer Orth * g++.dg/torture/pr81360.C: Add -fdump-ipa-icf to dg-options. diff --git a/gcc/testsuite/g++.dg/torture/pr81360.C b/gcc/testsuite/g++.dg/torture/pr81360.C --- a/gcc/testsuite/g++.dg/torture/pr81360.C +++ b/gcc/testsuite/g++.dg/torture/pr81360.C @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fno-early-inlining" } */ +/* { dg-options "-O2 -fno-early-inlining -fdump-ipa-icf" } */ template class B; template class TriaObjectAccessor;
Re: Fix ipa-inline ICE
On January 30, 2018 9:05:31 PM GMT+01:00, Rainer Orth wrote: >Hi Jan, > >> this patch fixed ICE that was introduced by Richard Standiford's >change to reorder >> can and want_inline predicates to reduce amount of work done to >verify inlining limits. >> This bypasses check that the function is optimized that makes inliner >to ICE because >> function summary is missing. >> >> This patch breaks out the expensive limits checking from can >predicate to new one >> which makes code bit more convoluted but I hope to clean things up >next stage1. >[...] >> * g++.dg/torture/pr81360.C: New testcase > >the new testcase comes out UNRESOLVED everywhere: > >pr81360.C -O0 scan-ipa-dump icf "Equal symbols: 0" >+UNRESOLVED: g++.dg/torture/pr81360.C -O1 scan-ipa-dump icf "Equal >symbols: 0" >+UNRESOLVED: g++.dg/torture/pr81360.C -O2 scan-ipa-dump icf "Equal >symbols: 0" >+UNRESOLVED: g++.dg/torture/pr81360.C -O2 -flto scan-ipa-dump icf >"Equal symbols: 0" >+UNRESOLVED: g++.dg/torture/pr81360.C -O2 -flto -flto-partition=none > scan-ipa-dump icf "Equal symbols: 0" >+UNRESOLVED: g++.dg/torture/pr81360.C -O3 -g scan-ipa-dump icf >"Equal symbols: 0" >+UNRESOLVED: g++.dg/torture/pr81360.C -Os scan-ipa-dump icf "Equal >symbols: 0" > >with > >g++.dg/torture/pr81360.C -O0 : dump file does not exist > >in the log. The following patch fixes that, tested with the >appropriate >runtest invocation. > >I guess this is obvious? Yes. > Rainer
C++ PATCH to fix ICE in generic lambda with user-defined conversion (PR c++/84125)
This testcase breaks since r256550 because we end up trying to build_address of a CONSTRUCTOR, but that doesn't work because we hit gcc_checking_assert (TREE_CODE (t) != CONSTRUCTOR); finish_static_assert gets {} as 'condition'. In the testcase we have a user-defined conversion, so {} should be turned to false, via perform_implicit_conversion_flags -> ... -> build_user_type_conversion_1, but that crashes as explained above. This only happens while processing generic lambda because processing_template_decl is 1, so finish_compound_literal returns {} instead of a TARGET_EXPR. Part of r256550 was this change: @@ -8640,9 +8642,9 @@ finish_static_assert (tree condition, tree message, location_t location, } /* Fold the expression and convert it to a boolean value. */ - condition = instantiate_non_dependent_expr (condition); - condition = cp_convert (boolean_type_node, condition, tf_warning_or_error); - condition = maybe_constant_value (condition); + condition = perform_implicit_conversion_flags (boolean_type_node, condition, +complain, LOOKUP_NORMAL); + condition = fold_non_dependent_expr (condition); if (TREE_CODE (condition) == INTEGER_CST && !integer_zerop (condition)) /* Do nothing; the condition is satisfied. */ where instantiate_non_dependent_expr turned {} into TARGET_EXPR , which we can process just fine. So perhaps we need to put the call back. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2018-01-30 Marek Polacek PR c++/84125 * semantics.c (finish_static_assert): Call instantiate_non_dependent_expr. * g++.dg/cpp1y/lambda-generic-84125.C: New test. diff --git gcc/cp/semantics.c gcc/cp/semantics.c index b758051965e..38829e327c7 100644 --- gcc/cp/semantics.c +++ gcc/cp/semantics.c @@ -8642,6 +8642,7 @@ finish_static_assert (tree condition, tree message, location_t location, } /* Fold the expression and convert it to a boolean value. */ + condition = instantiate_non_dependent_expr (condition); condition = perform_implicit_conversion_flags (boolean_type_node, condition, complain, LOOKUP_NORMAL); condition = fold_non_dependent_expr (condition); diff --git gcc/testsuite/g++.dg/cpp1y/lambda-generic-84125.C gcc/testsuite/g++.dg/cpp1y/lambda-generic-84125.C index e69de29bb2d..8bf6a09652e 100644 --- gcc/testsuite/g++.dg/cpp1y/lambda-generic-84125.C +++ gcc/testsuite/g++.dg/cpp1y/lambda-generic-84125.C @@ -0,0 +1,10 @@ +// PR c++/84125 +// { dg-do compile { target c++14 } } + +struct X { constexpr operator bool() const { return true; } }; + +int main(){ +[](auto) { +static_assert(X{}, ""); +}; +} Marek
Fix gnat.dg/lto20.adb XPASS
This patch 2018-01-30 Jan Hubicka gcc: PR lto/83954 * lto-symtab.c (warn_type_compatibility_p): Silence false positive for type match warning on arrays of pointers. gcc/testsuite: PR lto/83954 * gcc.dg/lto/pr83954.h: New testcase. * gcc.dg/lto/pr83954_0.c: New testcase. * gcc.dg/lto/pr83954_1.c: New testcase. (which I didn't see posted on gcc-patches yet) seems to have caused +XPASS: gnat.dg/lto20.adb (test for excess errors) (seen on i386-pc-solaris2.11 and sparc-sun-solaris2.11). The original warning was /vol/gcc/src/hg/trunk/local/gcc/testsuite/gnat.dg/lto20_pkg.ads:7:13: warning: type of 'lto20_pkg__proc' does not match original declaration [-Wlto-type-mismatch] so just removing the dg-excess-errors seems the right thing to do. Tested with the appropriate runtest invocation on sparc-sun-solaris2.11. Ok for mainline? Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2018-01-30 Rainer Orth PR lto/83954 * gnat.dg/lto20.adb: Remove dg-excess-errors. diff --git a/gcc/testsuite/gnat.dg/lto20.adb b/gcc/testsuite/gnat.dg/lto20.adb --- a/gcc/testsuite/gnat.dg/lto20.adb +++ b/gcc/testsuite/gnat.dg/lto20.adb @@ -1,6 +1,5 @@ -- { dg-do run } -- { dg-options "-flto" { target lto } } --- { dg-excess-errors "does not match original declaration" } with Lto20_Pkg;
patch to fix PR84112
The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84112 The patch was successfully bootstrapped and tested on x86-64 and ppc64. Committed as rev. 257204. Index: ChangeLog === --- ChangeLog (revision 257203) +++ ChangeLog (working copy) @@ -1,3 +1,9 @@ +2018-01-30 Vladimir Makarov + + PR target/84112 + * lra-constraints.c (curr_insn_transform): Process AND in the + address. + 2018-01-30 Jakub Jelinek PR rtl-optimization/83986 Index: lra-constraints.c === --- lra-constraints.c (revision 257203) +++ lra-constraints.c (working copy) @@ -4216,7 +4216,17 @@ curr_insn_transform (bool check_only_p) GET_MODE_SIZE (GET_MODE (op))); else if (get_reload_reg (OP_IN, Pmode, *loc, rclass, FALSE, "offsetable address", &new_reg)) - lra_emit_move (new_reg, *loc); + { + rtx addr = *loc; + enum rtx_code code = GET_CODE (addr); + + if (code == AND && CONST_INT_P (XEXP (addr, 1))) + /* (and ... (const_int -X)) is used to align to X bytes. */ + addr = XEXP (*loc, 0); + lra_emit_move (new_reg, addr); + if (addr != *loc) + emit_move_insn (new_reg, gen_rtx_AND (GET_MODE (new_reg), new_reg, XEXP (*loc, 1))); + } before = get_insns (); end_sequence (); *loc = new_reg; Index: testsuite/ChangeLog === --- testsuite/ChangeLog (revision 257203) +++ testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2018-01-30 Vladimir Makarov + + PR target/84112 + * pr84112.c: New. + 2018-01-30 Jakub Jelinek PR rtl-optimization/83986 Index: testsuite/gcc.target/powerpc/pr84112.c === --- testsuite/gcc.target/powerpc/pr84112.c (nonexistent) +++ testsuite/gcc.target/powerpc/pr84112.c (working copy) @@ -0,0 +1,33 @@ +/* { dg-do compile { target powerpc*-*-* } }*/ +/* { dg-options "-mcpu=power8 -O3 -fstack-protector-strong -fpic" } */ + +char *b; +int c, d, e, f; + +void +foo (char *h, int k, int l, int m, int j, int q) +{ + char *i = b; + int n = d; + int s = e; + while (c) +{ + for (; j <= 0; j += 12) + { + i[j] = n & k - h[j] >> 31 | q & ~(k - h[j] >> 31); + i[j + 1] = n & l - h[j + 1] >> 31 | q & ~(l - h[j + 1] >> 31); + i[j + 2] = n & m - h[j + 2] >> 31 | s & ~(m - h[j + 2] >> 31); + i[j + 3] = n & k - h[j + 3] >> 31 | q & ~(k - h[j + 3] >> 31); + i[j + 4] = n & l - h[j + 4] >> 31 | q & ~(l - h[j + 4] >> 31); + i[j + 5] = n & m - h[j + 5] >> 31 | s & ~(m - h[j + 5] >> 31); + i[j + 6] = n & k - h[j + 6] >> 31 | q & ~(k - h[j + 6] >> 31); + i[j + 7] = n & l - h[j + 7] >> 31 | q & ~(l - h[j + 7] >> 31); + i[j + 8] = n & m - h[j + 8] >> 31 | s & ~(m - h[j + 8] >> 31); + i[j + 9] = n & k - h[j + 9] >> 31 | q & ~(k - h[j + 9] >> 31); + i[j + 10] = n & l - h[j + 10] >> 31 | q & ~(l - h[j + 10] >> 31); + i[j + 11] = n & m - h[j + 11] >> 31 | s & ~(m - h[j + 11] >> 31); + } + while (j < f) + ; +} +}
Re: patch to fix PR84112
Hi Vladimir, > The following patch fixes > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84112 > > The patch was successfully bootstrapped and tested on x86-64 and ppc64. > > Committed as rev. 257204. [...] > Index: testsuite/ChangeLog > === > --- testsuite/ChangeLog (revision 257203) > +++ testsuite/ChangeLog (working copy) > @@ -1,3 +1,8 @@ > +2018-01-30 Vladimir Makarov > + > + PR target/84112 > + * pr84112.c: New. this should be gcc.target/powerpc/pr84112.c instead. > Index: testsuite/gcc.target/powerpc/pr84112.c Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: patch to fix PR84112
On 01/30/2018 03:37 PM, Rainer Orth wrote: Hi Vladimir, The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84112 The patch was successfully bootstrapped and tested on x86-64 and ppc64. Committed as rev. 257204. [...] Index: testsuite/ChangeLog === --- testsuite/ChangeLog (revision 257203) +++ testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2018-01-30 Vladimir Makarov + + PR target/84112 + * pr84112.c: New. this should be gcc.target/powerpc/pr84112.c instead. Thank you for fixing my changelog entry.
[patch, fortran, committed] Fix PR 84133, regression with matmul
Hello world, I have just committed the attached patch as obvious, after regtesting. Regards Thomas 2018-01-30 Thomas Koenig PR fortran/84133 * frontend-passes (matmul_to_var_expr): Return early if in association list. (inline_matmul_assign): Likewise. 2018-01-30 Thomas Koenig PR fortran/84133 * gfortran.dg/inline_matmul_21.f90: New test case. Index: frontend-passes.c === --- frontend-passes.c (Revision 257131) +++ frontend-passes.c (Arbeitskopie) @@ -2763,7 +2763,7 @@ matmul_to_var_expr (gfc_expr **ep, int *walk_subtr return 0; if (forall_level > 0 || iterator_level > 0 || in_omp_workshare - || in_where) + || in_where || in_assoc_list) return 0; /* Check if this is already in the form c = matmul(a,b). */ @@ -3728,7 +3728,7 @@ inline_matmul_assign (gfc_code **c, int *walk_subt if (co->op != EXEC_ASSIGN) return 0; - if (in_where) + if (in_where || in_assoc_list) return 0; /* The BLOCKS generated for the temporary variables and FORALL don't ! { dg-do compile } ! { dg-additional-options "-ffrontend-optimize" } ! PR 84133 - this used to ICE. Original test case by ! Gerhard Steinmetz. program p real :: x(2,2) = 1.0 real :: z(2,2) associate (y => matmul(x,x)) z = y end associate print *, z end
Re: [SFN+LVU+IEPM v4 7/9] [LVU] Introduce location views
Alexandre Oliva writes: > On Jan 26, 2018, Jakub Jelinek wrote: > >> Having to tweak debug info consumers so that they treat DW_LLE_* of 9 >> one way for .debug_loclist of version 5 and another way for .debug_loclist >> of version 6 isn't a good idea. > > Maybe we don't have to do that. The reason I implemented the proposed > format was to have a reference implementation, but since it's an > extension to a non-extensible part of DWARF5, it's hard to justify > consumer implementations for that. > >> Why don't you emit the DW_LLE_GNU_view_pair stuff in .debug_loclists >> already with -gdwarf-5, if needed paired without some TU attribute that says >> that views are emitted? > > Because that would make the loclists unreadable for any DWARF5-compliant > consumer. > > >> Haven't looked for it it in the coding standards, but it is something >> I've been asked to do in my code by various reviewers over the years and >> what I've been asking others in my patch reviews from others too. > > Thanks. It was the first (well, second; you'd requested similar changes > another time before) I heard of that. I'm still unconvinced the way I > used is not compliant, but I'll keep that in mind. Hopefully I won't > run into someone who insists the other one (the one you reject) is the > only correct one ;-) > >> I feel strongly about indenting stuff right, > > I'm with you on that, we just have different ideas about what "right" > stands for ;-) > >> which if it wouldn't fit would be >> return verylongcondition >> && otherlongcondition__; >> rather than >> return verylongcondition >> && otherlongcondition__; >> or similar, it would surprise me if it is not in the coding standard. > > I agree neither of these two forms is correct. > > But it is my understanding that both of the following are correct: > > return (verylongcondition > && otherlongcondition__); > > return verylongcondition > && otherlongcondition__; > > The first, because the parenthesized expression is continued with > indentation to match the parenthesis, the second because the return > statement is continued with the correct indentation for the continuation > of a statement. Thought it had to be the first. When they talk about indenting leading operators, the conventions say: Insert extra parentheses so that Emacs will indent the code properly. which at least implies that not inserting parantheses and indenting by two spaces isn't "properly". And I think most GCC code does stick to that (i.e. use the bracketed form). Thanks, Richard
Re: Link with correct values-*.o files on Solaris (PR target/40411)
Hi Jonathan, > On 30/01/18 15:51 +0100, Rainer Orth wrote: >>Hi Joseph, >> >>> On Tue, 30 Jan 2018, Rainer Orth wrote: >>> So it seems the following snippet #define STARTFILE_ARCH_SPEC \ [...] %{std=c9*|std=iso9899\\:199409|std=c1*:values-Xc.o%s; :values-Xa.o%s} \ seems like the right thing to do, as you said. >>> >>> That version would need updating when we add -std=c2x (once there's a C2x >>> working draft with features being added to it). If you use std=c* instead >>> of separate std=c9* and std=c1* you'd avoid needing such a change - but >>> then of course it would cover -std=c++* for C++. >> >>I know, that why I used the current form. Admittedly it's a maintenance >>problem (likely to be forgotten in the future). If we add in -ansi, we >>can just as well add -std=c++* (thus -std=c*), too. >> >>I guess that's the safest option overall, unless Jonathan has objections >>from a C++ standards perspective. > > No objections from me, I'm happy either way. thanks. So I've opted to include -ansi for C++ backwards compatibility and thus -std=c* for ease of maintenance. Here's what I've commited after another round of testing. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2018-01-29 Rainer Orth PR target/40411 * config/sol2.h (STARTFILE_ARCH_SPEC): Use -std=c*, -std=iso9899:199409 instead of -pedantic to select values-Xc.o. # HG changeset patch # Parent 53e957fef7f285d1a5bb84f370d5158d238b2ceb Fix use of Solaris values-Xc.o (PR target/40411) diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h --- a/gcc/config/sol2.h +++ b/gcc/config/sol2.h @@ -180,7 +180,10 @@ along with GCC; see the file COPYING3. The values-X[ac].o objects set the variable _lib_version. The Studio C compilers use values-Xc.o with either -Xc or (since Studio 12.6) -pedantic to select strictly conformant ISO C behaviour, otherwise - values-Xa.o. + values-Xa.o. Since -pedantic is a diagnostic option only in GCC, we + need to specifiy the -std=c* options and -std=iso9899:199409. We + traditionally include -ansi, which affects C and C++, and also -std=c++* + for consistency. The values-xpg[46].o objects define either or both __xpg[46] variables, selecting XPG4 mode (__xpg4) and conforming C99/SUSv3 behavior (__xpg6). @@ -195,7 +198,7 @@ along with GCC; see the file COPYING3. #undef STARTFILE_ARCH_SPEC #define STARTFILE_ARCH_SPEC \ "%{!shared:%{!symbolic: \ - %{pedantic:values-Xc.o%s; :values-Xa.o%s} \ + %{ansi|std=c*|std=iso9899\\:199409:values-Xc.o%s; :values-Xa.o%s} \ %{std=c90|std=gnu90:values-xpg4.o%s; :values-xpg6.o%s}}}" #if defined(HAVE_LD_PIE) && defined(HAVE_SOLARIS_CRTS)
[Patch, Fortran, committed] fix bracket in dejagnu directive
Hi all, I have just committed an obvious patch that adds a closing bracket in a dg-options directive in the testsuite as r257208: Index: gcc/testsuite/gfortran.dg/pr68318_1.f90 === --- gcc/testsuite/gfortran.dg/pr68318_1.f90(revision 257207) +++ gcc/testsuite/gfortran.dg/pr68318_1.f90(working copy) @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O0" +! { dg-options "-O0" } ! PR fortran/68318 ! Original code submitted by Gerhard Steinmetz ! Cheers, Janus 2018-01-30 Janus Weil * gfortran.dg/pr68318_1.f90: Add closing bracket in dejagnu directive.
Re: [C++ PATCH] Speed up inplace_merge algorithm & fix inefficient logic(PR c++/83938)
Your change looks good, don't hesitate to re-submit a proper patch with the point 1 below fix. You should add gcc-patches@gcc.gnu.org when you do so. Remaining questions will be: Copyright assignment, your change seems limited enough to avoid it but you will need an official maintainer confirmation. Stage 1: We are in a Fix-Only development phase. Your change brings such an important improvement that it could be considered as a bug fix but once again an official maintainer will have to decide. Otherwise you will have to wait for a couple of month to see you patch committed. François On 25/01/2018 23:37, chang jc wrote: Hi: 1. The __len = (__len + 1) / 2; is as you suggested, need to modify as __len = (__len == 1) ? 0 : ((__len + 1) / 2); 2. The coding gain has been shown PR c++/83938; I re-post here 21 20 19 18 17 16 0.471136 0.625695 0.767262 0.907461 1.04838 1.19508 0.340845 0.48651 0.639139 0.770133 0.898454 1.04632 it means when Merge [0, 4325376, 16777216); A is a sorted integer with 4325376 & B with 12451840 elements, total with 16M integers The proposed method has the speed up under given buffer size =, ex 2^16, 2^17, ... 2^21 in unit of sizeof(int), for example, 2^16 means given sizof(int)*64K bytes. 3. As your suggestion, _TmpBuf __buf should be rewrite. 4. It represents a fact that the intuitive idea to split from larger part is wrong. For example, if you have an input sorted array A & B, A has 8 integers & B has 24 integers. Given tmp buffer whose capacity = 4 integers. Current it tries to split from B, right? Then we have: A1 | A2 B1 | B2 B1 & B2 has 12 integers each, right? Current algorithm selects pivot as 13th integer from B, right? If the corresponding upper bound of A is 6th integer. Then it split in A1 = 5 | A2 = 3 | B1 = 12 | B2 = 12 After rotate, we have two arrays to merge [A1 = 5 | B1 = 12] & [A2 = 3 | B2 = 12] Great, [A2 = 3 | B2 = 12] can use tmp buffer to merge. Sadly, [A1 = 5 | B1 = 12] CANNOT. So we do rotate again, split & merge the two split arrays from [A1 = 5 | B1 = 12] again. But wait, if we always split from the smaller one instead of larger one. After rotate, it promises two split arrays both contain ceiling[small/2]. Since tmp buffer also allocate by starting from sizeof(small) & recursively downgrade by ceiling[small/2^(# of fail allocate)]. It means the allocated tmp buffer promises to be sufficient at the level of (# of fail allocate). Instead, you can see if split from large at level (# of fail allocate) several split array still CANNOT use tmp buffer to do buffered merge. As you know, buffered merge is far speed then (split, rotate, and merge two sub-arrays) (PR c++/83938 gives the profiling figures), the way should provide speedup. Thanks. On 24/01/2018 18:23, François Dumont wrote: Hi It sounds like a very sensitive change to make but nothing worth figures. Do you have any bench showing the importance of the gain ? At least the memory usage optimization is obvious. On 19/01/2018 10:43, chang jc wrote: Current std::inplace_merge() suffers from performance issue by inefficient logic under limited memory, It leads to performance downgrade. Please help to review it. Index: include/bits/stl_algo.h === --- include/bits/stl_algo.h(revision 256871) +++ include/bits/stl_algo.h(working copy) @@ -2437,7 +2437,7 @@ _BidirectionalIterator __second_cut = __middle; _Distance __len11 = 0; _Distance __len22 = 0; - if (__len1 > __len2) + if (__len1 < __len2) { __len11 = __len1 / 2; std::advance(__first_cut, __len11); @@ -2539,9 +2539,15 @@ const _DistanceType __len1 = std::distance(__first, __middle); const _DistanceType __len2 = std::distance(__middle, __last); + typedef _Temporary_buffer<_BidirectionalIterator, _ValueType> _TmpBuf; - _TmpBuf __buf(__first, __last); - + _BidirectionalIterator __start, __end; + if (__len1 < __len2) { +__start = __first; __end = __middle; + } else { +__start = __middle; __end = __last; + } + _TmpBuf __buf(__start, ___end); Note another optimization, we could introduce a _Temporary_buffer<> constructor in order to write: _TmpBuf __buf(std::min(__len1, __len2), __first); So that std::distance do not need to be called again. if (__buf.begin() == 0) std::__merge_without_buffer (__first, __middle, __last, __len1, __len2, __comp); Index: include/bits/stl_tempbuf.h === --- include/bits/stl_tempbuf.h(revision 256871) +++ include/bits/stl_tempbuf.h(working copy) @@ -95,7 +95,7 @@ std::nothrow)); if (__tmp != 0) return std::pair<_Tp*, ptrdiff_t>(__tmp, __len); -
[PR tree-optimization/84047] missing -Warray-bounds on an out-of-bounds index
Hi! [Note: Jakub has mentioned that missing -Warray-bounds regressions should be punted to GCC 9. I think this particular one is easy pickings, but if this and/or the rest of the -Warray-bounds regressions should be marked as GCC 9 material, please let me know so we can adjust all relevant PRs.] This is a -Warray-bounds regression that happens because the IL now has an MEM_REF instead on ARRAY_REF. Previously we had an ARRAY_REF we could diagnose: D.2720_5 = "12345678"[1073741824]; But now this is represented as: _1 = MEM[(const char *)"12345678" + 1073741824B]; I think we can just allow check_array_bounds() to handle MEM_REF's and everything should just work. The attached patch fixes both regressions mentioned in the PR. Tested on x86-64 Linux. OK? gcc/ PR tree-optimization/84047 * tree-vrp.c (search_for_addr_array): Handle ADDR_EXPRs. (check_array_bounds): Same. diff --git a/gcc/testsuite/gcc.dg/pr84047.c b/gcc/testsuite/gcc.dg/pr84047.c new file mode 100644 index 000..9f9c6ca4a9e --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr84047.c @@ -0,0 +1,18 @@ +/* { dg-compile } */ +/* { dg-options "-O2 -Warray-bounds" } */ + +int f (void) +{ + const char *s = "12345678"; + int i = 1U << 30; + return s[i]; /* { dg-warning "array bounds" } */ +} + +char a[8]; + +int g (void) +{ + const char *s = a + 4; + int i = 8; + return s[i]; /* { dg-warning "array bounds" } */ +} diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 3af81f70872..07d39bab8a4 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -4922,14 +4922,15 @@ void vrp_prop::search_for_addr_array (tree t, location_t location) { /* Check each ARRAY_REFs in the reference chain. */ - do -{ - if (TREE_CODE (t) == ARRAY_REF) - check_array_ref (location, t, true /*ignore_off_by_one*/); + if (TREE_CODE (t) == ADDR_EXPR) +do + { + if (TREE_CODE (t) == ARRAY_REF) + check_array_ref (location, t, true /*ignore_off_by_one*/); - t = TREE_OPERAND (t, 0); -} - while (handled_component_p (t)); + t = TREE_OPERAND (t, 0); + } +while (handled_component_p (t)); if (TREE_CODE (t) == MEM_REF && TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR @@ -5012,7 +5013,9 @@ check_array_bounds (tree *tp, int *walk_subtree, void *data) if (TREE_CODE (t) == ARRAY_REF) vrp_prop->check_array_ref (location, t, false /*ignore_off_by_one*/); - else if (TREE_CODE (t) == ADDR_EXPR) + else if (TREE_CODE (t) == ADDR_EXPR + /* Handle (MEM_REF (ADDR_EXPR STRING_CST) INTEGER_CST). */ + || TREE_CODE (t) == MEM_REF) { vrp_prop->search_for_addr_array (t, location); *walk_subtree = FALSE;
Re: [PATCH] PR 78534 Reinstate better string copy algorithm
OK. Thought I already sent OK, but must of got side track with work. -- steve On Tue, Jan 30, 2018 at 06:25:31PM +0200, Janne Blomqvist wrote: > PING > > On Tue, Jan 23, 2018 at 3:31 PM, Janne Blomqvist > wrote: > > As part of the change to larger character lengths, the string copy > > algorithm was temporarily pessimized to get around some spurious > > -Wstringop-overflow warnings. Having tried a number of variations of > > this algorithm I have managed to get it down to one spurious warning, > > only with -O1 optimization, in the testsuite. This patch reinstates > > the optimized variant and modifies this one testcase to ignore the > > warning. > > > > Regtested on x86_64-pc-linux-gnu, Ok for trunk? > > > > gcc/fortran/ChangeLog: > > > > 2018-01-23 Janne Blomqvist > > > > PR 78534 > > * trans-expr.c (fill_with_spaces): Use memset instead of > > generating loop. > > (gfc_trans_string_copy): Improve opportunity to use builtins with > > constant lengths. > > > > gcc/testsuite/ChangeLog: > > > > 2018-01-23 Janne Blomqvist > > > > PR 78534 > > * gfortran.dg/allocate_deferred_char_scalar_1.f03: Prune > > -Wstringop-overflow warnings due to spurious warning with -O1. > > * gfortran.dg/char_cast_1.f90: Update dump scan pattern. > > * gfortran.dg/transfer_intrinsic_1.f90: Likewise. > > --- > > gcc/fortran/trans-expr.c | 52 > > -- > > .../allocate_deferred_char_scalar_1.f03| 2 + > > gcc/testsuite/gfortran.dg/char_cast_1.f90 | 6 +-- > > gcc/testsuite/gfortran.dg/transfer_intrinsic_1.f90 | 2 +- > > 4 files changed, 35 insertions(+), 27 deletions(-) > > > > diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c > > index e90036f..4da6cee 100644 > > --- a/gcc/fortran/trans-expr.c > > +++ b/gcc/fortran/trans-expr.c > > @@ -6407,8 +6407,6 @@ fill_with_spaces (tree start, tree type, tree size) > >tree i, el, exit_label, cond, tmp; > > > >/* For a simple char type, we can call memset(). */ > > - /* TODO: This code does work and is potentially more efficient, but > > - causes spurious -Wstringop-overflow warnings. > >if (compare_tree_int (TYPE_SIZE_UNIT (type), 1) == 0) > > return build_call_expr_loc (input_location, > > builtin_decl_explicit (BUILT_IN_MEMSET), > > @@ -6416,7 +6414,6 @@ fill_with_spaces (tree start, tree type, tree size) > > build_int_cst (gfc_get_int_type > > (gfc_c_int_kind), > >lang_hooks.to_target_charset (' > > ')), > > fold_convert (size_type_node, size)); > > - */ > > > >/* Otherwise, we use a loop: > > for (el = start, i = size; i > 0; el--, i+= TYPE_SIZE_UNIT (type)) > > @@ -6522,11 +6519,20 @@ gfc_trans_string_copy (stmtblock_t * block, tree > > dlength, tree dest, > > > >/* The string copy algorithm below generates code like > > > > - if (dlen > 0) { > > - memmove (dest, src, min(dlen, slen)); > > - if (slen < dlen) > > - memset(&dest[slen], ' ', dlen - slen); > > - } > > + if (destlen > 0) > > + { > > + if (srclen < destlen) > > + { > > + memmove (dest, src, srclen); > > + // Pad with spaces. > > + memset (&dest[srclen], ' ', destlen - srclen); > > + } > > + else > > + { > > + // Truncate if too long. > > + memmove (dest, src, destlen); > > + } > > + } > >*/ > > > >/* Do nothing if the destination length is zero. */ > > @@ -6555,21 +6561,16 @@ gfc_trans_string_copy (stmtblock_t * block, tree > > dlength, tree dest, > >else > > src = gfc_build_addr_expr (pvoid_type_node, src); > > > > - /* First do the memmove. */ > > - tmp2 = fold_build2_loc (input_location, MIN_EXPR, TREE_TYPE (dlen), dlen, > > - slen); > > - tmp2 = build_call_expr_loc (input_location, > > - builtin_decl_explicit (BUILT_IN_MEMMOVE), > > - 3, dest, src, > > - fold_convert (size_type_node, tmp2)); > > - stmtblock_t tmpblock2; > > - gfc_init_block (&tmpblock2); > > - gfc_add_expr_to_block (&tmpblock2, tmp2); > > - > > - /* If the destination is longer, fill the end with spaces. */ > > + /* Truncate string if source is too long. */ > >cond2 = fold_build2_loc (input_location, LT_EXPR, logical_type_node, > > slen, > >dlen); > > > > + /* Copy and pad with spaces. */ > > + tmp3 = build_call_expr_loc (input_location, > > + builtin_decl_explicit (BUILT_IN_MEMMOVE), > > + 3, dest, src, > > + fold_convert (size_type_node, slen)); > > + > >/* Wstringop-overflow appears a
[patch, fortran, committed] Fix PR 84134
Hello world, another obvious fix, this time for an ice-on-invalid regression, committed after regression-testing. Regards Thomas 2017-01-30 Thomas Koenig PR fortran/84134 * array.c (gfc_ref_dimen_size): Whitespace fixes. If stride is zero, return false. 2017-01-30 Thomas Koenig PR fortran/84134 Index: array.c === --- array.c (Revision 257131) +++ array.c (Arbeitskopie) @@ -2245,9 +2245,12 @@ gfc_ref_dimen_size (gfc_array_ref *ar, int dimen, else { stride_expr = gfc_copy_expr(ar->stride[dimen]); + if(!gfc_simplify_expr(stride_expr, 1)) gfc_internal_error("Simplification error"); - if (stride_expr->expr_type != EXPR_CONSTANT) + + if (stride_expr->expr_type != EXPR_CONSTANT + || mpz_cmp_ui (stride_expr->value.integer, 0) == 0) { mpz_clear (stride); return false; ! { dg-do compile } ! PR fortran/84134 - this used to ICE. ! Test case by Gerhard Steinmetz program p integer :: i, x(3) data (x(i+1:i+2:i),i=0,1) /1,2,3/ ! { dg-error "Nonconstant array section" } end
Re: Fix ipa-inline ICE
On Tue, Jan 30, 2018 at 09:05:31PM +0100, Rainer Orth wrote: > > this patch fixed ICE that was introduced by Richard Standiford's change to > > reorder > > can and want_inline predicates to reduce amount of work done to verify > > inlining limits. > > This bypasses check that the function is optimized that makes inliner to > > ICE because > > function summary is missing. > > > > This patch breaks out the expensive limits checking from can predicate to > > new one > > which makes code bit more convoluted but I hope to clean things up next > > stage1. > [...] > > * g++.dg/torture/pr81360.C: New testcase > > the new testcase comes out UNRESOLVED everywhere: > > pr81360.C -O0 scan-ipa-dump icf "Equal symbols: 0" > +UNRESOLVED: g++.dg/torture/pr81360.C -O1 scan-ipa-dump icf "Equal > symbols: 0" > +UNRESOLVED: g++.dg/torture/pr81360.C -O2 scan-ipa-dump icf "Equal > symbols: 0" > +UNRESOLVED: g++.dg/torture/pr81360.C -O2 -flto scan-ipa-dump icf "Equal > symbols: 0" > +UNRESOLVED: g++.dg/torture/pr81360.C -O2 -flto -flto-partition=none > scan-ipa-dump icf "Equal symbols: 0" > +UNRESOLVED: g++.dg/torture/pr81360.C -O3 -g scan-ipa-dump icf "Equal > symbols: 0" > +UNRESOLVED: g++.dg/torture/pr81360.C -Os scan-ipa-dump icf "Equal > symbols: 0" > > with > > g++.dg/torture/pr81360.C -O0 : dump file does not exist > > in the log. The following patch fixes that, tested with the appropriate > runtest invocation. > > I guess this is obvious? Yes, but I wonder why the test is in g++.dg/torture/, that makes no sense for a test with -O2 in dg-options which overrides all the optimization options it cycles through. Move it into g++.dg/ipa/ instead? Jakub
[PATCH, rs6000] Fix VSX/altivec assumptions in altivec-13.c testcase
Hi, Some VSX function has previosly crept into the altivec-13 testcase. In particular, anything 'vector long long' and 'vector double', causing issues on platforms without VSX support. So, break that content out into it's own testcase, allowing better testcase coverage. Sniff-tested on Power6, altivec-13.c test now passes there, where it used to fail. Still plan to do a sniff-test on AIX, but have expectations that this should help. OK for trunk? Thanks, -Will [testsuite] 2018-01-30 Will Schmidt * gcc.target/powerpc/altivec-13.c: Remove VSX-requiring built-ins. * gcc.target/powerpc/vsx-13.c: New. diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-13.c b/gcc/testsuite/gcc.target/powerpc/altivec-13.c index 2315f6e..31ff509 100644 --- a/gcc/testsuite/gcc.target/powerpc/altivec-13.c +++ b/gcc/testsuite/gcc.target/powerpc/altivec-13.c @@ -6,54 +6,47 @@ /* This test case exercises intrinsic/argument combinations that, while not in the Motorola AltiVec PIM, have nevertheless crept into the AltiVec vernacular over the years. */ +/* Tests requiring VSX support (vector long long and vector double) have + been moved over to vsx-13.c. */ + #include -void foo (void) +void foo (void) { vector bool int boolVec1 = (vector bool int) vec_splat_u32(3); vector bool short boolVec2 = (vector bool short) vec_splat_u16(3); vector bool char boolVec3 = (vector bool char) vec_splat_u8(3); vector signed char vsc1, vsc2, vscz; vector unsigned char vuc1, vuc2, vucz; vector signed short int vssi1, vssi2, vssiz; vector signed int vsi1, vsi2, vsiz; vector unsigned int vui1, vui2, vuiz; vector unsigned short int vusi1, vusi2, vusiz; - vector bool long long vubll1, vubll2, vubllz; - vector signed int long long vsill1, vsill2, vsillz; - vector unsigned int long long vuill1, vuill2, vuillz; vector pixel vp1, vp2, vpz; vector float vf1, vf2, vfz; - vector double vd1, vd2, vdz; boolVec1 = vec_sld( boolVec1, boolVec1, 4 ); boolVec2 = vec_sld( boolVec2, boolVec2, 2 ); boolVec3 = vec_sld( boolVec3, boolVec3, 1 ); vscz = vec_sld( vsc1, vsc2, 1 ); vucz = vec_sld( vuc1, vuc2, 1 ); vsiz = vec_sld( vsi1, vsi2, 1 ); vuiz = vec_sld( vui1, vui2, 1 ); - vubllz = vec_sld( vubll1, vubll2, 1 ); - vsillz = vec_sld( vsill1, vsill2, 1 ); - vuillz = vec_sld( vuill1, vuill2, 1 ); vssiz = vec_sld( vssi1, vssi2, 1 ); vusiz = vec_sld( vusi1, vusi2, 1 ); vfz = vec_sld( vf1, vf2, 1 ); - vdz = vec_sld( vd1, vd2, 1 ); vpz = vec_sld( vp1, vp2, 1 ); vucz = vec_srl(vuc1, vuc2); vsiz = vec_srl(vsi1, vuc2); vuiz = vec_srl(vui1, vuc2); - vsillz = vec_srl(vsill1, vuc2); - vuillz = vec_srl(vuill1, vuc2); vpz = vec_srl(vp1, vuc2); vssiz = vec_srl(vssi1, vuc2); vusiz = vec_srl(vusi1, vuc2); vscz = vec_sro(vsc1, vsc2); @@ -62,14 +55,10 @@ void foo (void) vucz = vec_sro(vuc1, vuc2); vsiz = vec_sro(vsi1, vsc2); vsiz = vec_sro(vsi1, vuc2); vuiz = vec_sro(vui1, vsc2); vuiz = vec_sro(vui1, vuc2); - vsillz = vec_sro(vsill1, vsc2); - vsillz = vec_sro(vsill1, vuc2); - vuillz = vec_sro(vuill1, vsc2); - vuillz = vec_sro(vuill1, vuc2); vpz = vec_sro(vp1, vsc2); vpz = vec_sro(vp1, vuc2); vssiz = vec_sro(vssi1, vsc2); vssiz = vec_sro(vssi1, vuc2); vusiz = vec_sro(vusi1, vsc2); @@ -81,8 +70,8 @@ void foo (void) /* Expected results: vec_sld vsldoi vec_srl vsr vec_sro vsro */ -/* { dg-final { scan-assembler-times "vsldoi" 15 } } */ -/* { dg-final { scan-assembler-times "vsr " 8 } } */ -/* { dg-final { scan-assembler-times "vsro" 20 } } */ +/* { dg-final { scan-assembler-times "vsldoi" 11 } } */ +/* { dg-final { scan-assembler-times "vsr " 6 } } */ +/* { dg-final { scan-assembler-times "vsro" 16 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-13.c b/gcc/testsuite/gcc.target/powerpc/vsx-13.c new file mode 100644 index 000..5b4eb68 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vsx-13.c @@ -0,0 +1,42 @@ +/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-mvsx" } */ + +/* Variations of tests that require VSX support. This is a variation of + the altivec-13.c testcase. */ + +#include + +void foo (void) +{ + + vector signed char vsc1, vsc2, vscz; + vector unsigned char vuc1, vuc2, vucz; + vector bool long long vubll1, vubll2, vubllz; + vector signed int long long vsill1, vsill2, vsillz; + vector unsigned int long long vuill1, vuill2, vuillz; + vector double vd1, vd2, vdz; + + vubllz = vec_sld( vubll1, vubll2, 1 ); + vsillz = vec_sld( vsill1, vsill2, 1 ); + vuillz = vec_sld( vuill1, vuill2, 1 ); + + vsillz = vec_srl(vsill1, vuc2); + vuillz = vec_srl(vuill1, vuc2); + + vsillz = vec_sro(vsill1, vsc2); + vsillz = vec_sro(vsill1, vuc2); + vuillz = vec_sro(vuill1, vsc2); + vuillz = vec_sro(vuill1, vuc2); + + vdz = vec_sld( vd1, vd2, 1 );
[PATCH] Mark falign-*= as Optimization (PR c/84100)
Hi! While -falign-{functions,jumps,labels,loops} are marked Optimization, -falign-{functions,jumps,labels,loops}= are not. They use the same Var(), for that it makes no difference, but it means we reject them in optimize attribute starting with r237174. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-01-30 Jakub Jelinek PR c/84100 * common.opt (falign-functions=, falign-jumps=, falign-labels=, falign-loops=): Add Optimization flag. * gcc.dg/pr84100.c: New test. --- gcc/common.opt.jj 2018-01-18 21:11:58.664207128 +0100 +++ gcc/common.opt 2018-01-30 14:53:25.091842202 +0100 @@ -954,7 +954,7 @@ Common Report Var(align_functions,0) Opt Align the start of functions. falign-functions= -Common RejectNegative Joined UInteger Var(align_functions) +Common RejectNegative Joined UInteger Var(align_functions) Optimization flimit-function-alignment Common Report Var(flag_limit_function_alignment) Optimization Init(0) @@ -964,21 +964,21 @@ Common Report Var(align_jumps,0) Optimiz Align labels which are only reached by jumping. falign-jumps= -Common RejectNegative Joined UInteger Var(align_jumps) +Common RejectNegative Joined UInteger Var(align_jumps) Optimization falign-labels Common Report Var(align_labels,0) Optimization UInteger Align all labels. falign-labels= -Common RejectNegative Joined UInteger Var(align_labels) +Common RejectNegative Joined UInteger Var(align_labels) Optimization falign-loops Common Report Var(align_loops,0) Optimization UInteger Align the start of loops. falign-loops= -Common RejectNegative Joined UInteger Var(align_loops) +Common RejectNegative Joined UInteger Var(align_loops) Optimization fargument-alias Common Ignore --- gcc/testsuite/gcc.dg/pr84100.c.jj 2018-01-30 15:01:59.243703278 +0100 +++ gcc/testsuite/gcc.dg/pr84100.c 2018-01-30 15:00:30.569726685 +0100 @@ -0,0 +1,14 @@ +/* PR c/84100 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +void bar (void); + +__attribute__((optimize ("align-loops=16", "align-jumps=16", +"align-labels=16", "align-functions=16"))) +void +foo (void) +{ /* { dg-bogus "bad option" } */ + for (int i = 0; i < 1024; ++i) +bar (); +} Jakub
[PATCH] Fix -traditional-cpp preprocessing ICE (PR preprocessor/69869)
Hi! This restores GCC 3.3 behavior on this testcase, before 3.4 started ICEing on it when it started to use the common block comment skipping code for -traditional-cpp and just a simple function for macro block comments has been added. Even the macro block comments can be not properly terminated and we shouldn't crash on that, just diagnose it. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-01-30 Jakub Jelinek PR preprocessor/69869 * traditional.c (skip_macro_block_comment): Return bool, true if the macro block comment is unterminated. (copy_comment): Use return value from skip_macro_block_comment instead of always false. * gcc.dg/cpp/trad/pr69869.c: New test. --- libcpp/traditional.c.jj 2018-01-03 10:42:55.965763452 +0100 +++ libcpp/traditional.c2018-01-30 17:31:00.251952284 +0100 @@ -120,7 +120,7 @@ check_output_buffer (cpp_reader *pfile, /* Skip a C-style block comment in a macro as a result of -CC. Buffer->cur points to the initial asterisk of the comment. */ -static void +static bool skip_macro_block_comment (cpp_reader *pfile) { const uchar *cur = pfile->buffer->cur; @@ -131,10 +131,15 @@ skip_macro_block_comment (cpp_reader *pf /* People like decorating comments with '*', so check for '/' instead for efficiency. */ - while(! (*cur++ == '/' && cur[-2] == '*') ) -; + while (! (*cur++ == '/' && cur[-2] == '*')) +if (cur[-1] == '\n') + { + pfile->buffer->cur = cur - 1; + return true; + } pfile->buffer->cur = cur; + return false; } /* CUR points to the asterisk introducing a comment in the current @@ -158,7 +163,7 @@ copy_comment (cpp_reader *pfile, const u buffer->cur = cur; if (pfile->context->prev) -unterminated = false, skip_macro_block_comment (pfile); +unterminated = skip_macro_block_comment (pfile); else unterminated = _cpp_skip_block_comment (pfile); --- gcc/testsuite/gcc.dg/cpp/trad/pr69869.c.jj 2018-01-30 17:41:41.309544998 +0100 +++ gcc/testsuite/gcc.dg/cpp/trad/pr69869.c 2018-01-30 17:45:06.783501149 +0100 @@ -0,0 +1,8 @@ +/* PR preprocessor/69869 */ +/* { dg-do preprocess } */ +/* { dg-options "-traditional-cpp" } */ + +#define C(a,b)a/**/b +C (foo/,**/) +C (foo/,*) +/* { dg-error "-:unterminated comment" "" {target "*-*-*"} .-1 } */ Jakub
[PATCH] Fix debug info after the fortran descriptor changes (PR debug/84131)
Hi! DW_AT_data_location needs to be the data pointer, which is at offset 0, but we were emitting a bogus +8 in that case (+4 for 32-bit targets). With this patch, the change on -g -dA is: - .uleb128 0x4# DW_AT_data_location + .uleb128 0x2# DW_AT_data_location .byte 0x97# DW_OP_push_object_address - .byte 0x23# DW_OP_plus_uconst - .uleb128 0x8 .byte 0x6 # DW_OP_deref and gdb can debug the array in there again. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-01-30 Jakub Jelinek PR debug/84131 * trans-array.c (gfc_get_descriptor_offsets_for_info): Set *data_off to DATA_FIELD's offset rather than OFFSET_FIELD's offset. --- gcc/fortran/trans-array.c.jj2018-01-26 12:43:25.164922494 +0100 +++ gcc/fortran/trans-array.c 2018-01-30 19:34:01.844232363 +0100 @@ -511,7 +511,7 @@ gfc_get_descriptor_offsets_for_info (con tree type; type = TYPE_MAIN_VARIANT (desc_type); - field = gfc_advance_chain (TYPE_FIELDS (type), OFFSET_FIELD); + field = gfc_advance_chain (TYPE_FIELDS (type), DATA_FIELD); *data_off = byte_position (field); field = gfc_advance_chain (TYPE_FIELDS (type), DTYPE_FIELD); *dtype_off = byte_position (field); Jakub
Re: [PATCH] Fix debug info after the fortran descriptor changes (PR debug/84131)
On Tue, Jan 30, 2018 at 11:53:40PM +0100, Jakub Jelinek wrote: > > DW_AT_data_location needs to be the data pointer, which is at offset 0, > but we were emitting a bogus +8 in that case (+4 for 32-bit targets). > > With this patch, the change on -g -dA is: > - .uleb128 0x4# DW_AT_data_location > + .uleb128 0x2# DW_AT_data_location > .byte 0x97# DW_OP_push_object_address > - .byte 0x23# DW_OP_plus_uconst > - .uleb128 0x8 > .byte 0x6 # DW_OP_deref > and gdb can debug the array in there again. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > Yes. Thanks for looking at this! -- Steve
[PATCH] correct -Wrestrict handling of arrays of arrays (PR 84095)
Testing GCC 8 with recent Linux kernel sources has uncovered a bug in the handling of arrays of arrays by the -Wrestrict checker where it fails to take references to different array elements into consideration, issuing false positives. The attached patch corrects this mistake. In addition, to make warnings involving excessive offset bounds more meaningful (less confusing), I've made a cosmetic change to constrain them to the bounds of the accessed object. I've done this in response to multiple comments indicating that the warnings are hard to interpret. This change is meant to be applied on top of the patch for bug 83698 (submitted mainly to improve the readability of the offsets): https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01488.html Martin PR middle-end/84095 - false-positive -Wrestrict warnings for memcpy within array gcc/ChangeLog: PR middle-end/84095 * gimple-ssa-warn-restrict.c (builtin_memref::builtin_memref): Handle inner references. gcc/testsuite/ChangeLog: PR middle-end/84095 * c-c++-common/Warray-bounds-3.c: Adjust text of expected warnings. * c-c++-common/Wrestrict.c: Same. * gcc.dg/Wrestrict-6.c: Same. * gcc.dg/Wrestrict-8.c: New test. diff --git a/gcc/gimple-ssa-warn-restrict.c b/gcc/gimple-ssa-warn-restrict.c index 528eb5b..8bdf928 100644 --- a/gcc/gimple-ssa-warn-restrict.c +++ b/gcc/gimple-ssa-warn-restrict.c @@ -311,19 +311,40 @@ builtin_memref::builtin_memref (tree expr, tree size) if (TREE_CODE (expr) == ADDR_EXPR) { - poly_int64 off; tree op = TREE_OPERAND (expr, 0); - /* Determine the base object or pointer of the reference - and its constant offset from the beginning of the base. */ - base = get_addr_base_and_unit_offset (op, &off); + /* Determine the base object or pointer of the reference and + the constant bit offset from the beginning of the base. + If the offset has a non-constant component, it will be in + VAR_OFF. MODE, SIGN, REVERSE, and VOL are write only and + unused here. */ + poly_int64 bitsize, bitpos; + tree var_off; + machine_mode mode; + int sign, reverse, vol; + base = get_inner_reference (op, &bitsize, &bitpos, &var_off, + &mode, &sign, &reverse, &vol); + + poly_int64 bytepos = exact_div (bitpos, BITS_PER_UNIT); HOST_WIDE_INT const_off; - if (base && off.is_constant (&const_off)) + if (base && bytepos.is_constant (&const_off)) { offrange[0] += const_off; offrange[1] += const_off; + if (var_off) + { + if (TREE_CODE (var_off) == INTEGER_CST) + { + offset_int cstoff = wi::to_offset (var_off); + offrange[0] += cstoff; + offrange[1] += cstoff; + } + else + offrange[1] += maxobjsize; + } + /* Stash the reference for offset validation. */ ref = op; @@ -375,12 +396,6 @@ builtin_memref::builtin_memref (tree expr, tree size) } } - if (DECL_P (base) && TREE_CODE (TREE_TYPE (base)) == ARRAY_TYPE) -{ - if (offrange[0] < 0 && offrange[1] > 0) - offrange[0] = 0; -} - if (size) { tree range[2]; @@ -396,6 +411,29 @@ builtin_memref::builtin_memref (tree expr, tree size) } else sizrange[1] = maxobjsize; + + tree basetype = TREE_TYPE (base); + if (DECL_P (base) && TREE_CODE (basetype) == ARRAY_TYPE) +{ + /* If the offset could be in range of the referenced object + constrain its bounds so neither exceeds those of the obhect. */ + if (offrange[0] < 0 && offrange[1] > 0) + offrange[0] = 0; + + offset_int maxoff = maxobjsize; + if (ref && array_at_struct_end_p (ref)) + ; /* Use the maximum possible offset for last member arrays. */ + else if (tree basesize = TYPE_SIZE_UNIT (basetype)) + maxoff = wi::to_offset (basesize); + + if (offrange[0] >= 0) + { + if (offrange[1] < 0) + offrange[1] = offrange[0] <= maxoff ? maxoff : maxobjsize; + else if (offrange[0] <= maxoff && offrange[1] > maxoff) + offrange[1] = maxoff; + } +} } /* Return error_mark_node if the signed offset exceeds the bounds diff --git a/gcc/testsuite/c-c++-common/Warray-bounds-3.c b/gcc/testsuite/c-c++-common/Warray-bounds-3.c index 3445b95..2ee8146 100644 --- a/gcc/testsuite/c-c++-common/Warray-bounds-3.c +++ b/gcc/testsuite/c-c++-common/Warray-bounds-3.c @@ -61,7 +61,7 @@ void test_memcpy_bounds (char *d, const char *s, size_t n) they appear as large positive in the source. It would be nice if they retained their type but unfortunately that's not how it works so be prepared for both in case it even gets fixed. */ - T (char, 1, a + UR (3, SIZE_MAX - 1), s, n); /* { dg-warning "offset \\\[3, -2] is out of the bounds \\\[0, 1] of object" "memcpy" } */ + T (char, 1, a + UR (3, SIZE_MAX - 1), s, n); /* { dg-warning "offset \\\[3, -?\[0-9\]+] is out of the bounds \\\[0, 1] of object" "memcpy" } */ /* Verify that invalid offsets into an array of unknown size are detected. */ @@ -226,7 +226,7 @@ T (
Re: [PR tree-optimization/84047] missing -Warray-bounds on an out-of-bounds index
On 01/30/2018 03:11 PM, Aldy Hernandez wrote: Hi! [Note: Jakub has mentioned that missing -Warray-bounds regressions should be punted to GCC 9. I think this particular one is easy pickings, but if this and/or the rest of the -Warray-bounds regressions should be marked as GCC 9 material, please let me know so we can adjust all relevant PRs.] Let me first say that I have no concern with adopting this patch for GCC 8 (and older) and deferring more comprehensive solutions for the remaining regressions until GCC 9. With that said, as we discussed last week in IRC, the patch I submitted for one of these regressions, bug 83776, also happens to fix one of the two test cases submitted for this bug: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02144.html I put the patch together because it's a P2 bug and has a Target Milestone set to 6.5. I took that to mean that it's expected to be fixed in 8.1 in addition to 7.x and 6.x, but in light of Jakub's reservations and Ady's question above I'm not sure my understanding is correct, or if it is, that there necessarily is consensus to fix them in GCC 8. I also spent some time over the weekend extending my patch to handle the additional test case from comment #2 on bug 84047. That one is also due to the same underlying root cause. I'm close to done with it but I had to set it aside to handle a bug in my own code that was reported yesterday. I mention this because I too would find it helpful if it were made clear whether these regressions are expected to be fixed in GCC 8, so we can focus on the right bugs and avoid duplicating effort. If we want to fix all the reported -Warray-bounds regressions in GCC 8 then I'll finish my patch in progress and post it as a followup. Martin This is a -Warray-bounds regression that happens because the IL now has an MEM_REF instead on ARRAY_REF. Previously we had an ARRAY_REF we could diagnose: D.2720_5 = "12345678"[1073741824]; But now this is represented as: _1 = MEM[(const char *)"12345678" + 1073741824B]; I think we can just allow check_array_bounds() to handle MEM_REF's and everything should just work. The attached patch fixes both regressions mentioned in the PR. Tested on x86-64 Linux. OK?
Re: [wwwdocs] GCC-7.3 changes: Add note about LEON3FT erratum workaround
> I would like to commit the following comment about LEON3-FT errata now > available in GCC-7.3. Fine with me, thanks. -- Eric Botcazou
Re: Fix LRA subreg calculation for big-endian targets
> OK. Makes me wonder how many big endian LRA targets are getting significant > use. Debian still has an active SPARC64 port now based on GCC 7 with LRA. -- Eric Botcazou
New Finnish PO file for 'cpplib' (version 8.1-b20180128)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'cpplib' has been submitted by the Finnish team of translators. The file is available at: http://translationproject.org/latest/cpplib/fi.po (This file, 'cpplib-8.1-b20180128.fi.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/cpplib/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/cpplib.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Contents of PO file 'cpplib-8.1-b20180128.fi.po'
cpplib-8.1-b20180128.fi.po.gz Description: Binary data The Translation Project robot, in the name of your translation coordinator.
Re: [PATCH] Mark falign-*= as Optimization (PR c/84100)
On Tue, 30 Jan 2018, Jakub Jelinek wrote: > Hi! > > While -falign-{functions,jumps,labels,loops} are marked Optimization, > -falign-{functions,jumps,labels,loops}= are not. They use the same Var(), > for that it makes no difference, but it means we reject them in optimize > attribute starting with r237174. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Fix -traditional-cpp preprocessing ICE (PR preprocessor/69869)
On Tue, 30 Jan 2018, Jakub Jelinek wrote: > Hi! > > This restores GCC 3.3 behavior on this testcase, before 3.4 started ICEing > on it when it started to use the common block comment skipping code for > -traditional-cpp and just a simple function for macro block comments has > been added. Even the macro block comments can be not properly terminated > and we shouldn't crash on that, just diagnose it. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK with the comment on skip_macro_block_comment updated to document the semantics of the return value. -- Joseph S. Myers jos...@codesourcery.com
Go patch committed: Function_type and Backend_function_type are not identical
This patch by Cherry Zhang fixes the Go frontend so that Function_type and Backend_function_type are not identical, since they have different backend representations. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 257171) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -bbce8a9af264b25c5f70bafb2ce95d4fed158d68 +a347356d0f432cafb69f0cc5833d27663736a042 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/types.cc === --- gcc/go/gofrontend/types.cc (revision 257069) +++ gcc/go/gofrontend/types.cc (working copy) @@ -4442,6 +4442,9 @@ Function_type::is_identical(const Functi Cmp_tags cmp_tags, bool errors_are_identical, std::string* reason) const { + if (this->is_backend_function_type() != t->is_backend_function_type()) +return false; + if (!ignore_receiver) { const Typed_identifier* r1 = this->receiver(); Index: gcc/go/gofrontend/types.h === --- gcc/go/gofrontend/types.h (revision 257069) +++ gcc/go/gofrontend/types.h (working copy) @@ -2074,6 +2074,11 @@ class Function_type : public Type Btype* get_backend_fntype(Gogo*); + // Return whether this is a Backend_function_type. + virtual bool + is_backend_function_type() const + { return false; } + protected: int do_traverse(Traverse*); @@ -2167,6 +2172,12 @@ class Backend_function_type : public Fun : Function_type(receiver, parameters, results, location) { } + // Return whether this is a Backend_function_type. This overrides + // Function_type::is_backend_function_type. + bool + is_backend_function_type() const + { return true; } + protected: Btype* do_get_backend(Gogo* gogo)
[PATCH] libgcc: xtensa: fix build with -mtext-section-literals
libgcc/ 2018-01-31 Max Filippov * config/xtensa/ieee754-df.S (__adddf3_aux): Add .literal_position directive. * config/xtensa/ieee754-sf.S (__addsf3_aux): Likewise. --- libgcc/config/xtensa/ieee754-df.S | 1 + libgcc/config/xtensa/ieee754-sf.S | 1 + 2 files changed, 2 insertions(+) diff --git a/libgcc/config/xtensa/ieee754-df.S b/libgcc/config/xtensa/ieee754-df.S index 2662a6600751..a997c1b42632 100644 --- a/libgcc/config/xtensa/ieee754-df.S +++ b/libgcc/config/xtensa/ieee754-df.S @@ -55,6 +55,7 @@ __negdf2: #ifdef L_addsubdf3 + .literal_position /* Addition */ __adddf3_aux: diff --git a/libgcc/config/xtensa/ieee754-sf.S b/libgcc/config/xtensa/ieee754-sf.S index d48b230a7588..695c67b0fc8f 100644 --- a/libgcc/config/xtensa/ieee754-sf.S +++ b/libgcc/config/xtensa/ieee754-sf.S @@ -55,6 +55,7 @@ __negsf2: #ifdef L_addsubsf3 + .literal_position /* Addition */ __addsf3_aux: -- 2.1.4
Re: [PATCH][PR target/84064] Fix assertion with -fstack-clash-protection
On 01/30/2018 12:23 AM, Uros Bizjak wrote: > On Tue, Jan 30, 2018 at 5:48 AM, Jeff Law wrote: >> >> >> >> stack-clash-protection will sometimes force a prologue to use pushes to >> save registers. We try to limit how often that happens as it constrains >> options for generating efficient prologue sequences for common cases. >> >> We check the value of TO_ALLOCATE in ix86_compute_frame_layout and if >> it's greater than the probing interval, then we force the prologue to >> use pushes to save integer registers. That is conservatively correct >> and allows freedom in the most common cases. This is good. >> >> In expand_prologue we assert that the integer registers were saved via a >> push anytime we *might* generate a probe, even if the size of the >> allocated frame is too small to require explicit probing. Naturally, >> this conflicts with the code in ix86_compute_frame_layout that tries to >> avoid forcing pushes instead of moves. >> >> This patch moves the assertion inside expand_prologue down to the points >> where it actually needs to be true. Specifically when we're generating >> probes in a loop for -fstack-check or -fstack-clash-protection. >> >> [ Probing via calls to chkstk_ms is not affected by this change. ] >> >> Sorry to have to change this stuff for the 3rd time! > > Now you see how many paths stack frame formation code has ;) Never any doubt about that... When it became clear that we were going to need to mitigate stack-clash with new prologue sequences I nearly cried knowing how painful this would be, particularly across multiple architectures. The good news is that with it being turned on by default in F28 we're finding some of the dusty corners... Jeff
[PATCH][RFA][PR target/84128] Fix restore of scratch register used in probing loops
Whee, fun, this appears to have been broken for a very long time, possibly since the introduction of -fstack-check in 2010. Thankfully it only affects 32 bit and only in relatively constrained circumstances. -fstack-check and -fstack-clash both potentially create a loop for stack probing. In that case they both need a scratch register to hold the loop upper bound. The code to allocate a scratch register first starts with the caller-saved registers as they're zero cost. Then it'll use any callee saved register that is actually saved. If neither is available (say because all the caller-saved regs are used for parameter passing and there are no callee saved register used), then the allocation routine will push %eax on the stack and the deallocation routine will pop it off to restore its value. Of course there is a *stack allocation* between those two points. So the pop ends up restoring garbage into the register. Obviously the restore routine needs to use reg+d addressing to get to the stack slot and the allocated space needs to be deallocated by the epilogue. But sadly there's enough assertions sprinkled around that prevent that from working as-is. So what this patch does is continue to use the push to allocate the register. And it uses reg+d to restore the register after the probing loops. The "trick" is that the space allocated by the push becomes part of the normal stack frame after we restore the scratch register's value. Ie, if we push a 4 byte register, then we reduce the size of the main allocation request by 4 bytes. And everything just works. Clearly this code had not be exercised. So I hacked up things so that we always generated a probing loop and always assumed that we needed to save/restore a scratch register and enabled stack clash protections by default. I bootstrapped that and compared testsuite runs against a run which just had stack clash protection on by default. That did turn up an issue, but it was with my testing hacks, not with this patch :-) I (of course) also did the usual bootstrap and regression tests, using x86_64 and i686. Hopefully this is the last iteration on the x86/x86_64 stack clash bits :-) The one concern I have is do we need to tell the CFI machinery that %eax's value was restored to its entry value? OK for the trunk? Or do we need some magic CFI bit to describe the restore of %eax? Thanks, Jeff PR target/84128 * config/i386/i386.c (release_scratch_register_on_entry): Add new OFFSET argument. Use it to restore the scratch register rathern than a pop insn. (ix86_adjust_stack_and_probe_stack_clash): Un-constify SIZE. If we have to save a temporary register, decrement SIZE appropriately. Pass SIZE as the offset to find the saved register into release_scratch_register_on_entry. (ix86_adjust_stack_and_probe): Likewise. (ix86_emit_probe_stack_range): Likewise. PR target/84128 * gcc.target/i386/pr84128.c: New test. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index fef34a1..93ce79c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12567,22 +12567,25 @@ get_scratch_register_on_entry (struct scratch_reg *sr) } } -/* Release a scratch register obtained from the preceding function. */ +/* Release a scratch register obtained from the preceding function. + + Note there will always be some kind of stack adjustment between + allocation and releasing the scratch register. So we can't just + pop the scratch register off the stack if we were forced to save it + (the stack pointer itself has a different value). + + Instead we're passed the offset into the stack where the value will + be found and the space becomes part of the local frame that is + deallocated by the epilogue. */ static void -release_scratch_register_on_entry (struct scratch_reg *sr) +release_scratch_register_on_entry (struct scratch_reg *sr, HOST_WIDE_INT offset) { if (sr->saved) { - struct machine_function *m = cfun->machine; - rtx x, insn = emit_insn (gen_pop (sr->reg)); - - /* The RTX_FRAME_RELATED_P mechanism doesn't know about pop. */ - RTX_FRAME_RELATED_P (insn) = 1; - x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (UNITS_PER_WORD)); - x = gen_rtx_SET (stack_pointer_rtx, x); - add_reg_note (insn, REG_FRAME_RELATED_EXPR, x); - m->fs.sp_offset -= UNITS_PER_WORD; + rtx x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (offset)); + x = gen_rtx_SET (sr->reg, gen_rtx_MEM (word_mode, x)); + emit_insn (x); } } @@ -12597,7 +12600,7 @@ release_scratch_register_on_entry (struct scratch_reg *sr) pushed on the stack. */ static void -ix86_adjust_stack_and_probe_stack_clash (const HOST_WIDE_INT size, +ix86_adjust_stack_and_probe_stack_clash (HOST_WIDE_INT size, const bool int_registers_saved) { struct machine_fun
Go patch committed: Use VIEW_CONVERT_EXPR where needed
This patch to the Go frontend's GCC interface uses VIEW_CONVERT_EXPR where needed. The code was using fold_convert_loc in constructor_expression and temporary_variable, which almost always worked. However, it failed in https://golang.org/issue/23606. This patch fixes the problem. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian 2018-01-30 Ian Lance Taylor * go-gcc.cc (Gcc_backend::convert_tree): New private method. (Gcc_backend::constructor_expression): Call it. (Gcc_backend::assignment_statement): Likewise. (Gcc_backend::temporary_variable): Likewise. Index: gcc/go/go-gcc.cc === --- gcc/go/go-gcc.cc(revision 256593) +++ gcc/go/go-gcc.cc(working copy) @@ -541,6 +541,9 @@ class Gcc_backend : public Backend tree non_zero_size_type(tree); + tree + convert_tree(tree, tree, Location); + private: void define_builtin(built_in_function bcode, const char* name, const char* libname, @@ -1785,8 +1788,7 @@ Gcc_backend::constructor_expression(Btyp constructor_elt empty = {NULL, NULL}; constructor_elt* elt = init->quick_push(empty); elt->index = field; - elt->value = fold_convert_loc(location.gcc_location(), TREE_TYPE(field), -val); + elt->value = this->convert_tree(TREE_TYPE(field), val, location); if (!TREE_CONSTANT(elt->value)) is_constant = false; } @@ -2055,29 +2057,7 @@ Gcc_backend::assignment_statement(Bfunct return this->compound_statement(this->expression_statement(bfn, lhs), this->expression_statement(bfn, rhs)); - // Sometimes the same unnamed Go type can be created multiple times - // and thus have multiple tree representations. Make sure this does - // not confuse the middle-end. - if (TREE_TYPE(lhs_tree) != TREE_TYPE(rhs_tree)) -{ - tree lhs_type_tree = TREE_TYPE(lhs_tree); - gcc_assert(TREE_CODE(lhs_type_tree) == TREE_CODE(TREE_TYPE(rhs_tree))); - if (POINTER_TYPE_P(lhs_type_tree) - || INTEGRAL_TYPE_P(lhs_type_tree) - || SCALAR_FLOAT_TYPE_P(lhs_type_tree) - || COMPLEX_FLOAT_TYPE_P(lhs_type_tree)) - rhs_tree = fold_convert_loc(location.gcc_location(), lhs_type_tree, - rhs_tree); - else if (TREE_CODE(lhs_type_tree) == RECORD_TYPE - || TREE_CODE(lhs_type_tree) == ARRAY_TYPE) - { - gcc_assert(int_size_in_bytes(lhs_type_tree) -== int_size_in_bytes(TREE_TYPE(rhs_tree))); - rhs_tree = fold_build1_loc(location.gcc_location(), -VIEW_CONVERT_EXPR, -lhs_type_tree, rhs_tree); - } -} + rhs_tree = this->convert_tree(TREE_TYPE(lhs_tree), rhs_tree, location); return this->make_statement(fold_build2_loc(location.gcc_location(), MODIFY_EXPR, @@ -2507,6 +2487,43 @@ Gcc_backend::non_zero_size_type(tree typ gcc_unreachable(); } +// Convert EXPR_TREE to TYPE_TREE. Sometimes the same unnamed Go type +// can be created multiple times and thus have multiple tree +// representations. Make sure this does not confuse the middle-end. + +tree +Gcc_backend::convert_tree(tree type_tree, tree expr_tree, Location location) +{ + if (type_tree == TREE_TYPE(expr_tree)) +return expr_tree; + + if (type_tree == error_mark_node + || expr_tree == error_mark_node + || TREE_TYPE(expr_tree) == error_mark_node) +return error_mark_node; + + gcc_assert(TREE_CODE(type_tree) == TREE_CODE(TREE_TYPE(expr_tree))); + if (POINTER_TYPE_P(type_tree) + || INTEGRAL_TYPE_P(type_tree) + || SCALAR_FLOAT_TYPE_P(type_tree) + || COMPLEX_FLOAT_TYPE_P(type_tree)) +return fold_convert_loc(location.gcc_location(), type_tree, expr_tree); + else if (TREE_CODE(type_tree) == RECORD_TYPE + || TREE_CODE(type_tree) == ARRAY_TYPE) +{ + gcc_assert(int_size_in_bytes(type_tree) +== int_size_in_bytes(TREE_TYPE(expr_tree))); + if (TYPE_MAIN_VARIANT(type_tree) + == TYPE_MAIN_VARIANT(TREE_TYPE(expr_tree))) + return fold_build1_loc(location.gcc_location(), NOP_EXPR, + type_tree, expr_tree); + return fold_build1_loc(location.gcc_location(), VIEW_CONVERT_EXPR, +type_tree, expr_tree); +} + + gcc_unreachable(); +} + // Make a global variable. Bvariable* @@ -2717,8 +2734,7 @@ Gcc_backend::temporary_variable(Bfunctio } if (this->type_size(btype) != 0 && init_tree != NULL_TREE) -DECL_INITIAL(var) = fold_convert_loc(location.gcc_location(), type_tree, - init_tree); +DECL_INITIAL(var) = this->convert_tree(type_tree, init_tree, location); if (is_address_taken) TREE_ADDRESSABLE(var) = 1;