Re: [PATCH] Fix A < 0 ? C : 0 optimization (PR tree-optimization/78720)
On 09/12/2016 20:26, Jakub Jelinek wrote: > +tree ctype = TREE_TYPE (@1); > + } > + (if (shift >= 0) > +(bit_and > + (convert (rshift @0 { build_int_cst (integer_type_node, shift); })) > + @1) > +(bit_and > + (lshift (convert:ctype @0) { build_int_cst (integer_type_node, -shift); > }) > + @1) The ":ctype" shouldn't be needed because the COND_EXPR already has @1's type. Paolo
Re: [PATCH] Fix A < 0 ? C : 0 optimization (PR tree-optimization/78720)
On Sat, Dec 10, 2016 at 10:05:50AM +0100, Paolo Bonzini wrote: > > > On 09/12/2016 20:26, Jakub Jelinek wrote: > > +tree ctype = TREE_TYPE (@1); > > + } > > + (if (shift >= 0) > > +(bit_and > > + (convert (rshift @0 { build_int_cst (integer_type_node, shift); })) > > + @1) > > +(bit_and > > + (lshift (convert:ctype @0) { build_int_cst (integer_type_node, > > -shift); }) > > + @1) > > The ":ctype" shouldn't be needed because the COND_EXPR already has @1's > type. Note I'm actually bootstrapping/regtesting today following patch instead, where it clearly isn't needed. But for the above case with the shift, it isn't clear for me how it would work without :ctype - how the algorithm of figuring out what type to convert to works. Because the first operand of lshift isn't used next to @1. 2016-12-09 Jakub Jelinek Marc Glisse PR tree-optimization/78720 * match.pd (A < 0 ? C : 0): Only optimize for signed A. If shift is negative, first convert to @1's type and then lshift it by -shift. * gcc.c-torture/execute/pr78720.c: New test. --- gcc/match.pd.jj 2016-12-09 10:19:10.909735559 +0100 +++ gcc/match.pd2016-12-10 09:21:26.260516596 +0100 @@ -2768,17 +2768,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (ncmp (convert:stype @0) { build_zero_cst (stype); }) /* If we have A < 0 ? C : 0 where C is a power of 2, convert - this into a right shift followed by ANDing with C. */ + this into a right shift or sign extension followed by ANDing with C. */ (simplify (cond (lt @0 integer_zerop) integer_pow2p@1 integer_zerop) - (with { + (if (!TYPE_UNSIGNED (TREE_TYPE (@0))) + (with { int shift = element_precision (@0) - wi::exact_log2 (@1) - 1; - } - (bit_and - (convert (rshift @0 { build_int_cst (integer_type_node, shift); })) - @1))) + } + (if (shift >= 0) +(bit_and + (convert (rshift @0 { build_int_cst (integer_type_node, shift); })) + @1) +/* Otherwise ctype must be wider than TREE_TYPE (@0) and pure + sign extension followed by AND with C will achieve the effect. */ +(bit_and (convert @0) @1) /* When the addresses are not directly of decls compare base and offset. This implements some remaining parts of fold_comparison address --- gcc/testsuite/gcc.c-torture/execute/pr78720.c.jj2016-12-10 09:18:43.386574179 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr78720.c 2016-12-10 09:18:43.386574179 +0100 @@ -0,0 +1,29 @@ +/* PR tree-optimization/78720 */ + +__attribute__((noinline, noclone)) long int +foo (signed char x) +{ + return x < 0 ? 0x8L : 0L; +} + +__attribute__((noinline, noclone)) long int +bar (signed char x) +{ + return x < 0 ? 0x80L : 0L; +} + +__attribute__((noinline, noclone)) long int +baz (signed char x) +{ + return x < 0 ? 0x20L : 0L; +} + +int +main () +{ + if (foo (-1) != 0x8L || bar (-1) != 0x80L || baz (-1) != 0x20L + || foo (0) != 0L || bar (0) != 0L || baz (0) != 0L + || foo (31) != 0L || bar (31) != 0L || baz (31) != 0L) +__builtin_abort (); + return 0; +} Jakub
Re: [PATCH] Fix A < 0 ? C : 0 optimization (PR tree-optimization/78720)
On Sat, 10 Dec 2016, Jakub Jelinek wrote: * match.pd (A < 0 ? C : 0): Only optimize for signed A. If shift is negative, first convert to @1's type and then lshift it by -shift. Thanks, the ChangeLog needs updating. --- gcc/match.pd.jj 2016-12-09 10:19:10.909735559 +0100 +++ gcc/match.pd2016-12-10 09:21:26.260516596 +0100 @@ -2768,17 +2768,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (ncmp (convert:stype @0) { build_zero_cst (stype); }) /* If we have A < 0 ? C : 0 where C is a power of 2, convert - this into a right shift followed by ANDing with C. */ + this into a right shift or sign extension followed by ANDing with C. */ (simplify (cond (lt @0 integer_zerop) integer_pow2p@1 integer_zerop) - (with { + (if (!TYPE_UNSIGNED (TREE_TYPE (@0))) + (with { int shift = element_precision (@0) - wi::exact_log2 (@1) - 1; - } - (bit_and - (convert (rshift @0 { build_int_cst (integer_type_node, shift); })) - @1))) + } + (if (shift >= 0) +(bit_and + (convert (rshift @0 { build_int_cst (integer_type_node, shift); })) + @1) +/* Otherwise ctype must be wider than TREE_TYPE (@0) and pure + sign extension followed by AND with C will achieve the effect. */ +(bit_and (convert @0) @1) It is funny to notice that the fact that @1 is a power of 2 has become mostly irrelevant. When C fits in the type of A, we can do an arithmetic shift right of precision-1 to obtain a mask of 0 or -1, then bit_and works not just for powers of 2. Otherwise, imagining that A is int32_t and C int64_t, all we care about is that the low 31 bits of C are 0 (otherwise we could also do an arithmetic right shift, either before or after the convert). But that's another patch, fixing the PR is what matters for now. -- Marc Glisse
Re: [PATCH] Fix A < 0 ? C : 0 optimization (PR tree-optimization/78720)
On Sat, Dec 10, 2016 at 11:45:35AM +0100, Marc Glisse wrote: > On Sat, 10 Dec 2016, Jakub Jelinek wrote: > > > * match.pd (A < 0 ? C : 0): Only optimize for signed A. If shift > > is negative, first convert to @1's type and then lshift it by -shift. > > Thanks, the ChangeLog needs updating. Indeed, here it is with updated ChangeLog and as an added benefit, also successfully bootstrapped/regtested on x86_64-linux and i686-linux. > It is funny to notice that the fact that @1 is a power of 2 has become > mostly irrelevant. When C fits in the type of A, we can do an arithmetic > shift right of precision-1 to obtain a mask of 0 or -1, then bit_and works > not just for powers of 2. Otherwise, imagining that A is int32_t and C > int64_t, all we care about is that the low 31 bits of C are 0 (otherwise we > could also do an arithmetic right shift, either before or after the > convert). But that's another patch, fixing the PR is what matters for now. You're right. 2016-12-09 Jakub Jelinek Marc Glisse PR tree-optimization/78720 * match.pd (A < 0 ? C : 0): Only optimize for signed A. If shift is negative, sign extend to @1's type and than AND with C. * gcc.c-torture/execute/pr78720.c: New test. --- gcc/match.pd.jj 2016-12-09 10:19:10.909735559 +0100 +++ gcc/match.pd2016-12-10 09:21:26.260516596 +0100 @@ -2768,17 +2768,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (ncmp (convert:stype @0) { build_zero_cst (stype); }) /* If we have A < 0 ? C : 0 where C is a power of 2, convert - this into a right shift followed by ANDing with C. */ + this into a right shift or sign extension followed by ANDing with C. */ (simplify (cond (lt @0 integer_zerop) integer_pow2p@1 integer_zerop) - (with { + (if (!TYPE_UNSIGNED (TREE_TYPE (@0))) + (with { int shift = element_precision (@0) - wi::exact_log2 (@1) - 1; - } - (bit_and - (convert (rshift @0 { build_int_cst (integer_type_node, shift); })) - @1))) + } + (if (shift >= 0) +(bit_and + (convert (rshift @0 { build_int_cst (integer_type_node, shift); })) + @1) +/* Otherwise ctype must be wider than TREE_TYPE (@0) and pure + sign extension followed by AND with C will achieve the effect. */ +(bit_and (convert @0) @1) /* When the addresses are not directly of decls compare base and offset. This implements some remaining parts of fold_comparison address --- gcc/testsuite/gcc.c-torture/execute/pr78720.c.jj2016-12-10 09:18:43.386574179 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr78720.c 2016-12-10 09:18:43.386574179 +0100 @@ -0,0 +1,29 @@ +/* PR tree-optimization/78720 */ + +__attribute__((noinline, noclone)) long int +foo (signed char x) +{ + return x < 0 ? 0x8L : 0L; +} + +__attribute__((noinline, noclone)) long int +bar (signed char x) +{ + return x < 0 ? 0x80L : 0L; +} + +__attribute__((noinline, noclone)) long int +baz (signed char x) +{ + return x < 0 ? 0x20L : 0L; +} + +int +main () +{ + if (foo (-1) != 0x8L || bar (-1) != 0x80L || baz (-1) != 0x20L + || foo (0) != 0L || bar (0) != 0L || baz (0) != 0L + || foo (31) != 0L || bar (31) != 0L || baz (31) != 0L) +__builtin_abort (); + return 0; +} Jakub
[C++ PATCH] Demangle decltype(auto) (PR c++/78761)
Hi! Mangling of decltype(auto) has been added back in April 2013: https://gcc.gnu.org/ml/gcc-patches/2013-04/msg01273.html and matches the https://mentorembedded.github.io/cxx-abi/abi.html#mangling ::= Da # auto ::= Dc # decltype(auto) part, but the corresponding demangler change has never been added. The following patch adds it. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-12-10 Jakub Jelinek PR c++/78761 * cp-demangle.c (cplus_demangle_type): Demangle Dc as decltype(auto). * testsuite/demangle-expected: Add test for decltype(auto). --- libiberty/cp-demangle.c.jj 2016-11-30 08:57:16.0 +0100 +++ libiberty/cp-demangle.c 2016-12-10 09:31:17.698046004 +0100 @@ -2590,7 +2590,11 @@ cplus_demangle_type (struct d_info *di) /* auto */ ret = d_make_name (di, "auto", 4); break; - + case 'c': + /* decltype(auto) */ + ret = d_make_name (di, "decltype(auto)", 14); + break; + case 'f': /* 32-bit decimal floating point */ ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[26]); --- libiberty/testsuite/demangle-expected.jj2016-11-16 07:13:50.0 +0100 +++ libiberty/testsuite/demangle-expected 2016-12-10 09:39:44.277651914 +0100 @@ -4200,6 +4200,9 @@ decltype (new auto({parm#1})) f(int _Z1fIiERDaRKT_S1_ auto& f(int const&, int) --format=gnu-v3 +_Z1gIiEDcRKT_S0_ +decltype(auto) g(int const&, int) +--format=gnu-v3 _Z1gILi1EEvR1AIXT_EER1BIXscbT_EE void g<1>(A<1>&, B(1)>&) --format=gnu-v3 Jakub
Re: [PATCH, GCC/ARM] Define arm_arch_core_flags in a single file
On 09/12/16 16:56, Thomas Preudhomme wrote: > Hi, > > This patch moves the definition of arm_arch_core_flags along with the > declaration of its structure element type to > common/config/arm/arm-common.c so that it is not *defined* in all object > files including tm.h. Otherwise, GCC gets bloated with as many copy of > that array as there is file including tm.h and executable as well given > that crtbegin gets a copy too. > > ChangeLog entry is as follows: > > > *** gcc/ChangeLog *** > > 2016-12-09 Thomas Preud'homme > > * config/arm/arm-opts.h: Move struct arm_arch_core_flag and > arm_arch_core_flags to ... > * common/config/arm/arm-common.c: There. > > > Testing: successfully built GCC configured with --with-cpu=cortex-a15 > and optional_thumb testcases pass (without any option for -1 and -2, > with -mcpu=cortex-m3 for -3 to satisfy requirements): build is working > and feature as well. > > Is this ok for stage3? > > Best regards, > > Thomas OK. R.
[PATCH] gcc: config: tilegx: Reserve prev insn when delete useless insn
From: Chen Gang When check bundle, gcc may still need modify the prev insn. Original implementation will lose the prev insn. Also correct the coding styles of relate code. 2016-12-10 Chen Gang gcc/ PR target/78222 * tilegx.c (tilegx_gen_bundle): Reserve prev insn when delete useless insn. --- gcc/config/tilegx/tilegx.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/gcc/config/tilegx/tilegx.c b/gcc/config/tilegx/tilegx.c index 76a7455..f846ca7 100644 --- a/gcc/config/tilegx/tilegx.c +++ b/gcc/config/tilegx/tilegx.c @@ -4469,8 +4469,7 @@ tilegx_gen_bundles (void) rtx_insn *end = NEXT_INSN (BB_END (bb)); prev = NULL; - for (insn = next_insn_to_bundle (BB_HEAD (bb), end); insn; - prev = insn, insn = next) + for (insn = next_insn_to_bundle (BB_HEAD (bb), end); insn; insn = next) { next = next_insn_to_bundle (NEXT_INSN (insn), end); @@ -4499,14 +4498,17 @@ tilegx_gen_bundles (void) /* Delete barrier insns, because they can mess up the emitting of bundle braces. If it is end-of-bundle, then the previous insn must be marked end-of-bundle. */ - if (get_attr_type (insn) == TYPE_NOTHING) { - if (GET_MODE (insn) == QImode && prev != NULL - && GET_MODE (prev) == SImode) - { - PUT_MODE (prev, QImode); - } - delete_insn (insn); - } + if (get_attr_type (insn) == TYPE_NOTHING) + { + if (GET_MODE (insn) == QImode && prev != NULL + && GET_MODE (prev) == SImode) + { + PUT_MODE (prev, QImode); + } + delete_insn (insn); + } + else + prev = insn; } } } -- 2.5.0
Re: [PATCH] Add support for Fuchsia (OS)
On 08/12/16 22:55, Josh Conner wrote: > + arm*-*-fuchsia*) > + tm_file="${tm_file} fuchsia.h arm/fuchsia-elf.h glibc-stdint.h" > + tmake_file="${tmake_file} arm/t-bpabi" > + ;; This will leave the default cpu as arm7tdmi. Is that what you want? It's fine if it is, but if not, you should consider setting target_cpu_cname here as well. R.
Re: [PATCH] Fix A < 0 ? C : 0 optimization (PR tree-optimization/78720)
On December 10, 2016 12:01:37 PM GMT+01:00, Jakub Jelinek wrote: >On Sat, Dec 10, 2016 at 11:45:35AM +0100, Marc Glisse wrote: >> On Sat, 10 Dec 2016, Jakub Jelinek wrote: >> >> >* match.pd (A < 0 ? C : 0): Only optimize for signed A. If shift >> >is negative, first convert to @1's type and then lshift it by >-shift. >> >> Thanks, the ChangeLog needs updating. > >Indeed, here it is with updated ChangeLog and as an added benefit, also >successfully bootstrapped/regtested on x86_64-linux and i686-linux. OK. Richard. >> It is funny to notice that the fact that @1 is a power of 2 has >become >> mostly irrelevant. When C fits in the type of A, we can do an >arithmetic >> shift right of precision-1 to obtain a mask of 0 or -1, then bit_and >works >> not just for powers of 2. Otherwise, imagining that A is int32_t and >C >> int64_t, all we care about is that the low 31 bits of C are 0 >(otherwise we >> could also do an arithmetic right shift, either before or after the >> convert). But that's another patch, fixing the PR is what matters for >now. > >You're right. > >2016-12-09 Jakub Jelinek > Marc Glisse > > PR tree-optimization/78720 > * match.pd (A < 0 ? C : 0): Only optimize for signed A. If shift > is negative, sign extend to @1's type and than AND with C. > > * gcc.c-torture/execute/pr78720.c: New test. > >--- gcc/match.pd.jj2016-12-09 10:19:10.909735559 +0100 >+++ gcc/match.pd 2016-12-10 09:21:26.260516596 +0100 >@@ -2768,17 +2768,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (ncmp (convert:stype @0) { build_zero_cst (stype); }) > > /* If we have A < 0 ? C : 0 where C is a power of 2, convert >- this into a right shift followed by ANDing with C. */ >+ this into a right shift or sign extension followed by ANDing with >C. */ > (simplify > (cond > (lt @0 integer_zerop) > integer_pow2p@1 integer_zerop) >- (with { >+ (if (!TYPE_UNSIGNED (TREE_TYPE (@0))) >+ (with { > int shift = element_precision (@0) - wi::exact_log2 (@1) - 1; >- } >- (bit_and >- (convert (rshift @0 { build_int_cst (integer_type_node, shift); })) >- @1))) >+ } >+ (if (shift >= 0) >+(bit_and >+ (convert (rshift @0 { build_int_cst (integer_type_node, shift); >})) >+ @1) >+/* Otherwise ctype must be wider than TREE_TYPE (@0) and pure >+ sign extension followed by AND with C will achieve the effect. >*/ >+(bit_and (convert @0) @1) > >/* When the addresses are not directly of decls compare base and >offset. >This implements some remaining parts of fold_comparison address >--- gcc/testsuite/gcc.c-torture/execute/pr78720.c.jj 2016-12-10 >09:18:43.386574179 +0100 >+++ gcc/testsuite/gcc.c-torture/execute/pr78720.c 2016-12-10 >09:18:43.386574179 +0100 >@@ -0,0 +1,29 @@ >+/* PR tree-optimization/78720 */ >+ >+__attribute__((noinline, noclone)) long int >+foo (signed char x) >+{ >+ return x < 0 ? 0x8L : 0L; >+} >+ >+__attribute__((noinline, noclone)) long int >+bar (signed char x) >+{ >+ return x < 0 ? 0x80L : 0L; >+} >+ >+__attribute__((noinline, noclone)) long int >+baz (signed char x) >+{ >+ return x < 0 ? 0x20L : 0L; >+} >+ >+int >+main () >+{ >+ if (foo (-1) != 0x8L || bar (-1) != 0x80L || baz (-1) != 0x20L >+ || foo (0) != 0L || bar (0) != 0L || baz (0) != 0L >+ || foo (31) != 0L || bar (31) != 0L || baz (31) != 0L) >+__builtin_abort (); >+ return 0; >+} > > > Jakub
Re: [PATCH] gcc: config: tilegx: Reserve prev insn when delete useless insn
On 12/10/16 12:23, cheng...@emindsoft.com.cn wrote: > From: Chen Gang > > When check bundle, gcc may still need modify the prev insn. Original > implementation will lose the prev insn. > > Also correct the coding styles of relate code. > > 2016-12-10 Chen Gang > > gcc/ > PR target/78222 > * tilegx.c (tilegx_gen_bundle): Reserve prev insn when delete > useless insn. I think this was already fixed by Walt: r242617 | walt | 2016-11-19 03:34:17 +0100 (Sat, 19 Nov 2016) | 5 lines Changed paths: M /trunk/gcc/ChangeLog M /trunk/gcc/config/tilegx/tilegx.c TILE-Gx: Fix bundling when encountering consecutive barriers. * config/tilegx/tilegx.c (tilegx_gen_bundles): Preserve end-of-bundle marker for consecutive barriers. But the formatting here is still odd, and should be fixed: TAB usage, single statements in braces, { not in a line of its own. I am however not sure about this statement: /* Never wrap {} around inline asm. */ if (GET_CODE (PATTERN (insn)) != ASM_INPUT) ... because this does only exclude asm(""); that is basic asm with empty assembler string. To exclude all other forms of asm statements that are hidden in PARALLEL constructs we would need: /* Never wrap {} around inline asm. */ if (GET_CODE (PATTERN (insn)) != ASM_INPUT && asm_noperands (PATTERN (insn)) < 0) I think this if-condition is probably unnecessary, because it does apparently not create any problems although it is completely broken. Bernd.
New Spanish PO file for 'gcc' (version 6.2.0)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the Spanish team of translators. The file is available at: http://translationproject.org/latest/gcc/es.po (This file, 'gcc-6.2.0.es.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Re: [RFC] combine: Improve change_zero_ext, call simplify_set afterwards.
On Fri, Dec 09, 2016 at 04:23:44PM +0100, Dominik Vogt wrote: > 0001-* > > Deal with mode expanding zero_extracts in change_zero_ext. The > patch looks good to me, but not sure whether endianness is > handled properly. Is the second argument of gen_rtx_SUBREG > correct? > >From 600ed3dadd5bc2568ab53be8466686abaf27ff3f Mon Sep 17 00:00:00 2001 > From: Dominik Vogt > Date: Fri, 9 Dec 2016 02:48:30 +0100 > Subject: [PATCH 1/2] combine: Handle mode expanding zero_extracts in > change_zero_ext. > > Example: > > (zero_extract:DI (reg:SI) >(const_int 24) >(const_int 0)) > > --> > > (and:DI (subreg:DI (lshiftrt:SI (reg:SI) (const_int 8)) > 0) > (const_int 16777215)) > --- > gcc/combine.c | 12 +--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/gcc/combine.c b/gcc/combine.c > index b429453..e14a08f 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -11237,18 +11237,24 @@ change_zero_ext (rtx pat) >if (GET_CODE (x) == ZERO_EXTRACT > && CONST_INT_P (XEXP (x, 1)) > && CONST_INT_P (XEXP (x, 2)) > - && GET_MODE (XEXP (x, 0)) == mode) > + && (GET_MODE (XEXP (x, 0)) == mode > + || GET_MODE_PRECISION (GET_MODE (XEXP (x, 0))) > + < GET_MODE_PRECISION (mode))) If you make this <= you can collapse the two cases into one. > { > + machine_mode inner_mode = GET_MODE (XEXP (x, 0)); > + > size = INTVAL (XEXP (x, 1)); > > int start = INTVAL (XEXP (x, 2)); > if (BITS_BIG_ENDIAN) > - start = GET_MODE_PRECISION (mode) - size - start; > + start = GET_MODE_PRECISION (inner_mode) - size - start; > > if (start) > - x = gen_rtx_LSHIFTRT (mode, XEXP (x, 0), GEN_INT (start)); > + x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0), GEN_INT (start)); > else > x = XEXP (x, 0); > + if (mode != inner_mode) > + x = gen_rtx_SUBREG (mode, x, 0); gen_lowpart_SUBREG instead? It's easier to read, and code a little further down does the same thing. Okay for trunk with those changes (did you bootstrap+regcheck this already?) Thanks, Segher
Re: [PATCH] [x86] Don't use builtins for SIMD shifts
On Sat, 10 Dec 2016, Allan Sandfeld Jensen wrote: Replaces the definitions of the shift intrinsics with GCC extension syntax to allow GCC to reason about what the instructions does. Tests are added to ensure the intrinsics still produce the right instructions, and that a few basic optimizations now work. I don't think we can do it in such a straightforward way. Those intrinsics are well defined for any shift argument, while operator<< and operator>> are considered undefined for many input values. I believe you may need to use an unsigned type for the lhs of left shifts, and write a << (b & 31) to match the semantics for the rhs (I haven't checked Intel's doc). Which might require adding a few patterns in sse.md to avoid generating code for that AND. (also, the patch can't go in until next stage 1, around March or April, but that doesn't prevent from discussing it) -- Marc Glisse
Re: [PATCH] [x86] Don't use builtins for SIMD shifts
On Sat, 10 Dec 2016, Marc Glisse wrote: On Sat, 10 Dec 2016, Allan Sandfeld Jensen wrote: Replaces the definitions of the shift intrinsics with GCC extension syntax to allow GCC to reason about what the instructions does. Tests are added to ensure the intrinsics still produce the right instructions, and that a few basic optimizations now work. I don't think we can do it in such a straightforward way. Those intrinsics are well defined for any shift argument, while operator<< and operator>> are considered undefined for many input values. I believe you may need to use an unsigned type for the lhs of left shifts, and write a << (b & 31) to match the semantics for the rhs (I haven't checked Intel's doc). Which might require adding a few patterns in sse.md to avoid generating code for that AND. Oups, apparently I got that wrong, got confused with the scalar case. Left shift by more than precision is well defined for vectors, but it gives 0, it doesn't take the shift count modulo the precision. Which is even harder to explain to gcc (maybe ((unsigned)b<=31)?a
Re: [PATCH] PR fortran/65173 -- kill off old_cl_list from gfc_namespace.
Dear Steve and Andre, It turns out that Andre's instrumented gfortran came up trumps. There was a block in resolve_structure_cons that attempted to remove a charlen and, in so doing, broke the charlen chain. I have left it to namespace cleanup to do the job in revision 243517. The module.c stuff was a very red fish of some kind. Cheers Paul 2016-12-10 Paul Thomas PR fortran/78350 * resolve.c (resolve_structure_cons): Remove the block that tried to remove a charlen and rely on namespace cleanup. On 9 December 2016 at 20:50, Steven G. Kargl wrote: > On Fri, Dec 09, 2016 at 01:54:34PM +0200, Janne Blomqvist wrote: >> On Fri, Dec 9, 2016 at 1:47 PM, Paul Richard Thomas >> wrote: >> >> I'm seeing the same thing, I guess. Or rather that the ts->type == 81 >> which is non-sensical. No idea where that comes from.. >> >> Backtrace from gdb: >> >> #0 gfc_code2string (m=m@entry=0x182fe00 , >> code=code@entry=81) at ../../trunk-git/gcc/fortran/misc.c:193 >> #1 0x0065645f in mio_name (t=81, m=m@entry=0x182fe00 >> ) at ../../trunk-git/gcc/fortran/module.c:1722 >> #2 0x0065acb9 in mio_name_bt (m=0x182fe00 , >> t=) at ../../trunk-git/gcc/fortran/module.c:2106 >> #3 mio_typespec (ts=ts@entry=0x2420918) at >> ../../trunk-git/gcc/fortran/module.c:2530 >> #4 0x0065ae50 in mio_expr (ep=0x24217d0) at >> ../../trunk-git/gcc/fortran/module.c:3432 >> #5 0x0065b300 in mio_charlen (clp=clp@entry=0x241fa80) at >> ../../trunk-git/gcc/fortran/module.c:2500 >> #6 0x0065ad50 in mio_typespec (ts=ts@entry=0x241fa78) at >> ../../trunk-git/gcc/fortran/module.c:2558 >> #7 0x0065ae50 in mio_expr (ep=ep@entry=0x24208a8) at >> ../../trunk-git/gcc/fortran/module.c:3432 >> #8 0x0065bb73 in mio_component (c=c@entry=0x2420830, >> vtype=vtype@entry=0) at ../../trunk-git/gcc/fortran/module.c:2799 >> #9 0x0065bbdb in mio_component_list (cp=cp@entry=0x2422138, >> vtype=0) at ../../trunk-git/gcc/fortran/module.c:2818 >> #10 0x0065c913 in mio_symbol (sym=sym@entry=0x2422090) at >> ../../trunk-git/gcc/fortran/module.c:4238 >> #11 0x0065cb8b in write_symbol (n=2, sym=sym@entry=0x2422090) >> at ../../trunk-git/gcc/fortran/module.c:5613 >> #12 0x0065e87f in write_symbol0 (st=0x2422680) at >> ../../trunk-git/gcc/fortran/module.c:5653 >> #13 0x0065e7f2 in write_symbol0 (st=0x2420650) at >> ../../trunk-git/gcc/fortran/module.c:5632 >> #14 0x0065ea03 in write_module () at >> ../../trunk-git/gcc/fortran/module.c:5992 >> #15 0x0065ec59 in dump_module (name=name@entry=0x76853060 >> "m", dump_flag=dump_flag@entry=1) at >> ../../trunk-git/gcc/fortran/module.c:6120 >> #16 0x0065edf8 in gfc_dump_module (name=0x76853060 "m", >> dump_flag=1) at ../../trunk-git/gcc/fortran/module.c:6163 >> #17 0x00675edd in gfc_parse_file () at >> ../../trunk-git/gcc/fortran/parse.c:6158 >> #18 0x006baa5b in gfc_be_parse_file () at >> ../../trunk-git/gcc/fortran/f95-lang.c:202 > > Ugh. I can only reproduce the ICE under valgrind. There is > some interesting stuff happening in module.c > > -- > Steve > http://troutmask.apl.washington.edu/~kargl/ > https://www.youtube.com/watch?v=6hwgPfCcpyQ -- If you're walking down the right path and you're willing to keep walking, eventually you'll make progress. Barack Obama
Re: [RFC] combine: Improve change_zero_ext, call simplify_set afterwards.
On Sat, Dec 10, 2016 at 10:37:38AM -0600, Segher Boessenkool wrote: > Okay for trunk with those changes (did you bootstrap+regcheck this > already?) Thanks, Oh, and a changelog :-) Segher
Re: [PATCH] [x86] Don't use builtins for SIMD shifts
On Sat, 10 Dec 2016, Allan Sandfeld Jensen wrote: On Saturday 10 December 2016, Marc Glisse wrote: On Sat, 10 Dec 2016, Marc Glisse wrote: On Sat, 10 Dec 2016, Allan Sandfeld Jensen wrote: Replaces the definitions of the shift intrinsics with GCC extension syntax to allow GCC to reason about what the instructions does. Tests are added to ensure the intrinsics still produce the right instructions, and that a few basic optimizations now work. I don't think we can do it in such a straightforward way. Those intrinsics are well defined for any shift argument, while operator<< and operator>> are considered undefined for many input values. I believe you may need to use an unsigned type for the lhs of left shifts, and write a << (b & 31) to match the semantics for the rhs (I haven't checked Intel's doc). Which might require adding a few patterns in sse.md to avoid generating code for that AND. Oups, apparently I got that wrong, got confused with the scalar case. Left shift by more than precision is well defined for vectors, but it gives 0, it doesn't take the shift count modulo the precision. Which is even harder to explain to gcc (maybe ((unsigned)b<=31)?a
Re: cprop fix for PR78626
On Thu, Dec 08, 2016 at 01:21:04PM +0100, Bernd Schmidt wrote: > This is another case where an optimization turns a trap_if > unconditional. We have to defer changing the CFG, since the rest of > cprop seems to blow up when we modify things while scanning. > > Bootstrapped and tested on x86_64-linux, and reportedly fixes the > problem on ppc, ok? This fixes PR78626, but not yet PR78727. > @@ -1825,11 +1828,26 @@ one_cprop_pass (void) > insn into a NOTE, or deleted the insn. */ > if (! NOTE_P (insn) && ! insn->deleted ()) > mark_oprs_set (insn); > + > + if (first_uncond_trap == NULL > + && GET_CODE (PATTERN (insn)) == TRAP_IF > + && XEXP (PATTERN (insn), 0) == const1_rtx) > + first_uncond_trap = insn; > } > + if (first_uncond_trap != NULL && first_uncond_trap != BB_END (bb)) > + uncond_traps.safe_push (first_uncond_trap); > } The problem for PR78727 is that we also need to do this for insns that already are the last insn in the block: > + while (!uncond_traps.is_empty ()) > + { > + rtx_insn *insn = uncond_traps.pop (); > + basic_block to_split = BLOCK_FOR_INSN (insn); > + remove_edge (split_block (to_split, insn)); > + emit_barrier_after_bb (to_split); > + } We need that barrier, and we also need the successor edges removed (which split_block+remove_edge does). (PR78727 works fine with just that BB_END test deleted). Segher
Re: [PATCHv3] [AARCH64] Add variant support to -m="native"and add thunderxt88p1.
On Sat, Nov 26, 2016 at 1:54 PM, Andrew Pinski wrote: > On Tue, Nov 1, 2016 at 11:08 AM, Andrew Pinski wrote: >> On Tue, Nov 17, 2015 at 2:10 PM, Andrew Pinski wrote: >>> Since ThunderX T88 pass 1 (variant 0) is a ARMv8 part while pass 2 (variant >>> 1) >>> is an ARMv8.1 part, I needed to add detecting of the variant also for this >>> difference. Also I simplify a little bit and combined the single core and >>> arch detecting cases so it would be easier to add variant. >> >> Actually it is a bit more complex than what I said here, see below for >> the full table of options and what are enabled/disabled now. >> >>> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. >>> Tested -mcpu=native on both T88 pass 1 and T88 pass 2 to make sure it is >>> deecting the two seperately. >> >> >> Here is the final patch in this series updated; I changed the cpu name >> slightly and made sure I updated invoke.texi too. >> >> The names are going to match the names in LLVM (worked with our LLVM >> engineer here at Cavium about the names). >> Here are the names recorded and >> -mpcu=thunderx: >> *Matches part num 0xA0 (reserved for ThunderX 8x series) >> *T88 Pass 2 scheduling >> *Hardware prefetching (software prefetching disabled) >> *LSE enabled >> *no v8.1 >> >> -mcpu=thunderxt88: >> *Matches part num 0xA1 >> *T88 Pass 2 scheduling >> *software prefetching enabled >> *LSE enabled >> *no v8.1 >> >> -mcpu=thunderxt88p1 (only for GCC): >> *Matches part num 0xA1, variant 0 >> *T88 Pass 1 scheduling >> *software prefetching enabled >> *no LSE enabled >> *no v8.1 >> >> -mcpu=thunderxt81 and -mcpu=thunderxt83: >> *Matches part num 0xA2/0xA3 >> *T88 Pass 2 scheduling >> *Hardware prefetching (software prefetching disabled) >> *LSE enabled >> *v8.1 >> >> >> I have not hooked up software vs hardware prefetching and the >> scheduler parts (the next patch will do part of that); both ARMv8.1-a >> and LSE parts are hooked up as those parts are only in >> aarch64-cores.def. >> >> OK? Bootstrapped and tested on ThunderX T88 and ThunderX T81 >> (aarch64-linux-gnu). > > Here is the latest version of the patch. Updated for the latest > additions of "falkor". Also added a comment about the order of > "thunderxt88p1" and "thunderxt88". > > OK? Bootstrapped and tested on arrch64-linux-gnu with no regressions. Ping? > > Thanks, > Andrew Pinski > > ChangeLog: > > * config/aarch64/aarch64-cores.def: Add -1 as the variant to all of the cores. > (thunderx): Update to include LSE by default. > (thunderxt88p1): New core. > (thunderxt88): New core. > (thunderxt81): New core. > (thunderxt83): New core. > * config/aarch64/driver-aarch64.c (struct aarch64_core_data): Add variant > field. > (ALL_VARIANTS): New define. > (AARCH64_CORE): Support VARIANT operand. > (cpu_data): Likewise. > (host_detect_local_cpu): Parse variant field of /proc/cpuinfo. Combine the > arch > and single core case and support variant searching. > * common/config/aarch64/aarch64-common.c (AARCH64_CORE): Add VARIANT operand. > * config/aarch64/aarch64-opts.h (AARCH64_CORE): Likewise. > * config/aarch64/aarch64.c (AARCH64_CORE): Likewise. > * config/aarch64/aarch64.h (AARCH64_CORE): Likewise. > * config/aarch64/aarch64-tune.md: Regenerate. > > * doc/invoke.texi (AARCH64/mtune): Document thunderxt88, > thunderxt88p1, thunderxt81, thunderxt83 as available options.
committed: add i386/t-crtstuff to i[34567]86-*-netbsdelf tmake_file
I have committed the attached patch to add i386/t-crtstuff to tmake_file for i[34567]86-*-netbsdelf*. Bootstrapped and tested on i386-unknown-netbsdelf6.1 (fixes 29378 failures) /Krister 2016-12-10 Krister Walfridsson * config.host (i[34567]86-*-netbsdelf*): Add i386/t-crtstuff to tmake_file.Index: libgcc/config.host === --- libgcc/config.host (revision 243517) +++ libgcc/config.host (revision 243518) @@ -599,6 +599,7 @@ md_unwind_header=i386/freebsd-unwind.h ;; i[34567]86-*-netbsdelf*) + tmake_file="${tmake_file} i386/t-crtstuff" ;; x86_64-*-netbsd*) tmake_file="${tmake_file} i386/t-crtstuff"
Re: [PATCH/AARCH64] Handle ILP32 multi-arch
On Thu, Nov 10, 2016 at 6:58 PM, Andrew Pinski wrote: > On Tue, Oct 25, 2016 at 3:25 PM, Matthias Klose wrote: >> On 07.10.2016 23:08, Andrew Pinski wrote: >>> Hi, >>> This patch adds ilp32 multi-arch support. This is needed to support >>> multi-arch on Debian like systems. >>> >>> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. >>> Also tested with ilp32 with a newly built toolchain that supports >>> ILP32 with Ubuntu 1604 base. >>> >>> Thanks, >>> Andrew >>> >>> ChangeLog: >>> * config/aarch64/t-aarch64-linux (MULTILIB_OSDIRNAMES): Handle >>> multi-arch for ilp32. >> >> I can't approve that, but it looks like a reasonable change, but we should >> document the multiarch triplet at https://wiki.debian.org/Multiarch/Tuples > > > Ping? Ping? This is the only outstanding GCC ILP32 related patch. > > Thanks, > Andrew > >> >> Matthias >>
committed: make i486 default arch for x86 NetBSD
I have committed the attached patch to make i486 the default arch on NetBSD in the same way as for FreeBSD, as 386 CPUs are not supported on any maintained version of NetBSD. Bootstrapped and tested on i386-unknown-netbsdelf6.1 /Krister 2016-12-10 Krister Walfridsson * config.gcc (i386-*-netbsd*): Make i486 the default arch on NetBSD. Generally use cpu generic.Index: gcc/config.gcc === --- gcc/config.gcc (revision 243518) +++ gcc/config.gcc (revision 243519) @@ -3061,6 +3061,12 @@ arch_without_sse2=yes arch_without_64bit=yes ;; + i386-*-netbsd*) +arch=i486 +cpu=generic +arch_without_sse2=yes +arch_without_64bit=yes +;; i386-*-*) arch=i386 cpu=i386
Re: [PATCH, Fortran, pr78672, ctp1, v1] Gfortran test suite failures with a sanitized compiler
Hello, Le 09/12/2016 à 11:55, Andre Vehreschild a écrit : Hi Mikael, thanks a lot for your comments. Note I also have added the reply to your latest email here. On Thu, 8 Dec 2016 23:49:57 +0100 Mikael Morin wrote: diff --git a/gcc/fortran/data.c b/gcc/fortran/data.c index 139ce88..4f835b3 100644 --- a/gcc/fortran/data.c +++ b/gcc/fortran/data.c @@ -186,7 +186,7 @@ create_character_initializer (gfc_expr *init, gfc_typespec *ts, for (i = 0; i < len; i++) dest[start+i] = rvalue->representation.string[i]; } - else + else if (rvalue->value.character.string) This one looks fishy. Either rvalue is a character constant and its string should be set, or it’s not a character constant and the value.character.string should not be accessed at all. You are completely right. This can *only* occur when invalid-code is given to the compiler. In this case the offending code was: data c / = NULL() / The syntax may not be correct (just out of my head), but I hope you get the idea. The sanitizers complaint was that the second argument to the memcpy below must not be NULL. The above if () makes sure the memcpy does not get called in this case. So this merely to prevent the compiler from ICEing on systems whose memcpy is not robust. Ok, I have been able to reproduce this case, and indeed the function is called with null() as rvalue expression. But my comment still holds. Accessing rvalue->value.character.string when rvalue hasn’t type EXPR_CONSTANT is undefined. A slightly better patch would use rvalue->expr_type == EXPR_CONSTANT as else if condition, but the problem remains earlier in the function in the assignment to len. I think the function just should not be called if rvalue is null(). memcpy (&dest[start], rvalue->value.character.string, len * sizeof (gfc_char_t)); diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c index 8afba84..4e4d17c 100644 --- a/gcc/fortran/interface.c +++ b/gcc/fortran/interface.c @@ -2803,6 +2803,7 @@ compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal, int i, n, na; unsigned long actual_size, formal_size; bool full_array = false; + gfc_ref *actual_arr_ref; actual = *ap; @@ -2942,37 +2943,38 @@ compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal, and assumed-shape dummies, the string length needs to match exactly. */ if (a->expr->ts.type == BT_CHARACTER - && a->expr->ts.u.cl && a->expr->ts.u.cl->length - && a->expr->ts.u.cl->length->expr_type == EXPR_CONSTANT - && f->sym->ts.u.cl && f->sym->ts.u.cl && f->sym->ts.u.cl->length - && f->sym->ts.u.cl->length->expr_type == EXPR_CONSTANT - && (f->sym->attr.pointer || f->sym->attr.allocatable - || (f->sym->as && f->sym->as->type == AS_ASSUMED_SHAPE)) - && (mpz_cmp (a->expr->ts.u.cl->length->value.integer, - f->sym->ts.u.cl->length->value.integer) != 0)) -{ - if (where && (f->sym->attr.pointer || f->sym->attr.allocatable)) -gfc_warning (OPT_Wargument_mismatch, - "Character length mismatch (%ld/%ld) between actual " - "argument and pointer or allocatable dummy argument " - "%qs at %L", - mpz_get_si (a->expr->ts.u.cl->length->value.integer), - mpz_get_si (f->sym->ts.u.cl->length->value.integer), - f->sym->name, &a->expr->where); - else if (where) -gfc_warning (OPT_Wargument_mismatch, - "Character length mismatch (%ld/%ld) between actual " - "argument and assumed-shape dummy argument %qs " - "at %L", - mpz_get_si (a->expr->ts.u.cl->length->value.integer), - mpz_get_si (f->sym->ts.u.cl->length->value.integer), - f->sym->name, &a->expr->where); - return 0; -} + && a->expr->ts.u.cl && a->expr->ts.u.cl->length + && a->expr->ts.u.cl->length->expr_type == EXPR_CONSTANT + && f->sym->ts.type == BT_CHARACTER && f->sym->ts.u.cl ^^^ + && f->sym->ts.u.cl->length + && f->sym->ts.u.cl->length->expr_type == EXPR_CONSTANT + && (f->sym->attr.pointer || f->sym->attr.allocatable + || (f->sym->as && f->sym->as->type == AS_ASSUMED_SHAPE)) + && (mpz_cmp (a->expr->ts.u.cl->length->value.integer, + f->sym->ts.u.cl->length->value.integer) != 0)) + { + && a->expr->ts.type == BT_CHARACTER) { if (where) gfc_error ("Actual argument at %L to allocatable or " That one was just reformatting, right? No, the check for the correct has been added at ^^^. I agree that reformatting and the change was not a good idea. @@ -3039,13 +3041,28