Re: [Patch, Fortran] PR51055 - accept non-spec-expr "i" in allocate(character(len=i)::s)
*ping* On Tue, 15 May 2012 12:26, Tobias Burnus wrote: A rather simple patch. Build and regtested on x86-64-linux. OK for the trunk? I think that is the last patch required for commonly used code. Remaining are issues with array constructors and concatenations - and, of course, deferred-length components. Tobias
[Ada] fix gnat_write_global_declarations glitch in LTO mode
The routine uses an anonymous static variable and this breaks in LTO mode because a DECL_NAME is expected. Tested on i586-suse-linux, applied on the mainline and 4.7 branch. 2012-05-20 Eric Botcazou * gcc-interface/utils.c (gnat_write_global_declarations): Put a name on the dummy global variable. 2012-05-20 Eric Botcazou * gnat.dg/lto13.adb: New test. * gnat.dg/lto13_pkg.ad[sb]: New helper. -- Eric Botcazou Index: gcc-interface/utils.c === --- gcc-interface/utils.c (revision 187691) +++ gcc-interface/utils.c (working copy) @@ -5586,8 +5586,12 @@ gnat_write_global_declarations (void) if (!VEC_empty (tree, types_used_by_cur_var_decl)) { struct varpool_node *node; + char *label; + + ASM_FORMAT_PRIVATE_NAME (label, first_global_object_name, 0); dummy_global - = build_decl (BUILTINS_LOCATION, VAR_DECL, NULL_TREE, void_type_node); + = build_decl (BUILTINS_LOCATION, VAR_DECL, get_identifier (label), + void_type_node); TREE_STATIC (dummy_global) = 1; TREE_ASM_WRITTEN (dummy_global) = 1; node = varpool_node (dummy_global); -- { dg-do link } -- { dg-options "-g -flto" { target lto } } with Lto13_Pkg; use Lto13_Pkg; procedure Lto13 is begin Proc; end; package Lto13_Pkg is procedure Proc; private type T; end Lto13_Pkg; package body Lto13_Pkg is procedure Proc is begin raise Constraint_Error; end; type T is null record; end Lto13_Pkg;
Re: Symbol table 22/many: handle all aliases through the symtab
> * cgraphunit.c (handle_alias_pairs): Cleanup; handle all types of > aliases. This slightly broke error recovery for: extern int var __attribute__ ((alias ("undef"))); Before: ~/build/gcc/native32/gcc/cc1 -quiet t.c t.c:1:12: error: 'var' aliased to undefined symbol 'undef' extern int var __attribute__ ((alias ("undef"))); ^ After: ~/build/gcc/native32/gcc/cc1 -quiet t.c t.c:1:12: error: 'var' aliased to undefined symbol 'undef' extern int var __attribute__ ((alias ("undef"))); ^ t.c:1:12: error: 'var' aliased to undefined symbol 'undef' -- Eric Botcazou
Re: [Fortran, patch] PR 48831 - Constant expression (PARAMETER array element) rejected as nonconstant
Hi Alessandro, Alessandro Fanfarillo wrote: in attachment there's a patch for PR 48831, it also includes a new test case suggested by Tobias Burnus. The patch is bootstrapped and tested on x86_64-unknown-linux-gnu. Please try to ensure that your patch has a text mime type - it shows up as Content-Type: application/octet-stream; which makes reading, reviewing and quoting your patch more difficult. PR fortran/48831 * gfortran.h: Add non-static prototype declaration of check_init_expr function. * check.c (kind_check): Change if condition related to check_init_expr. * expr.c: Remove prototype declaration of check_init_expr function and static keyword. You should add the name of the function you change in parentheses, e.g. * gfortran.h (check_init_expr): Add prototype declaration of function. (The "non-static" is superfluous as static functions shouldn't be in header files.) For "check_init_expr" I'd use "Remove forward declaration" instead of "Remove prototype declaration" but that's personal style. But again, you should include the function name in parentheses. The reason is that one can more quickly find it, if it is always at the same spot. As mentioned before, the gfortran convention is to prefix functions (gfc_) - at least those which are nonstatic. Please change the function name. - if (k->expr_type != EXPR_CONSTANT) + if (check_init_expr(k) != SUCCESS) GNU style: Add a space before the "(" of the function argument: "check_init_expr (k)". +/* Check an intrinsic arithmetic operation to see if it is consistent + with some type of expression. */ +gfc_try check_init_expr (gfc_expr *); I have to admit that after reading only the comment, I had no idea what the function does - especially the "some type" is not really helpful. How about a simple "Check whether an expression is an initialization/constant expression." Initialization and constant expressions are well defined in the Fortran standard. (Actually, I find the function name speaks already for itself, thus, I do not see the need for a comment, but I also do not mind a comment.) (One problem with the name "constant expression" vs. "initialization expression" is that Fortran 90/95 distinguish between them while Fortran 2003/2008 have merged them to a single type of expression; Fortran 2003 calls the merged expression type "initialization expression" while Fortran 2008 calls them "constant expressions". In principle, gfortran should make the distinction with -std=f95 and reject expressions which are nonconstant and only initexpressions when the standard demands it, but I am not sure whether gfortran does. That part of gfortran is a bit unclean and the distinction between init/const expr is nowadays largely ignored by the gfortran developers.) Otherwise, the patch looks OK. Tobias
Re: Symbol table 22/many: handle all aliases through the symtab
> > * cgraphunit.c (handle_alias_pairs): Cleanup; handle all types of > > aliases. > > This slightly broke error recovery for: > > extern int var __attribute__ ((alias ("undef"))); > > Before: > > ~/build/gcc/native32/gcc/cc1 -quiet t.c > t.c:1:12: error: 'var' aliased to undefined symbol 'undef' > extern int var __attribute__ ((alias ("undef"))); > ^ > > After: > > ~/build/gcc/native32/gcc/cc1 -quiet t.c > t.c:1:12: error: 'var' aliased to undefined symbol 'undef' > extern int var __attribute__ ((alias ("undef"))); > ^ > t.c:1:12: error: 'var' aliased to undefined symbol 'undef' I belive this should be cured by the followup removing the code emitting warnings in finish_aliases. I will double check. Thanks! Honza > > > -- > Eric Botcazou
PATCH: PR target/53425: No warnings are given for -mno-sse
Hi, We should warn passing SSE vector argument without SSE enabled changes the ABI for 64-bit. Tested on Linux/x86-64. OK to install? Thanks. H.J. --- gcc/ 2012-05-20 H.J. Lu PR target/53425 * config/i386/i386.c (type_natural_mode): Warn passing SSE vector argument without SSE enabled changes the ABI. gcc/testsuite/ 2012-05-20 H.J. Lu PR target/53425 * gcc.target/i386/pr53425.c: New file. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index eca542c..a56847a 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5828,7 +5833,22 @@ type_natural_mode (const_tree type, const CUMULATIVE_ARGS *cum) return TYPE_MODE (type); } else - return mode; + { + if (size == 16 && !TARGET_SSE) + { + static bool warnedsse; + + if (cum + && !warnedsse + && cum->warn_sse) + { + warnedsse = true; + warning (0, "SSE vector argument without SSE " +"enabled changes the ABI"); + } + } + return mode; + } } gcc_unreachable (); diff --git a/gcc/testsuite/gcc.target/i386/pr53425.c b/gcc/testsuite/gcc.target/i386/pr53425.c new file mode 100644 index 000..2446c0f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr53425.c @@ -0,0 +1,14 @@ +/* PR target/53425 */ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options "-O2 -mno-sse" } */ + +typedef double __v2df __attribute__ ((__vector_size__ (16))); + +extern __v2df x; + +extern void bar (__v2df); +void +foo (void) +{ + bar (x); /* { dg-message "warning: SSE vector argument without SSE enabled changes the ABI" } */ +}
PATCH: PR target/53383: Allow -mpreferred-stack-boundary=3 on x86-64
Hi, This patch allows -mpreferred-stack-boundary=3 on x86-64 when SSE is disabled. Since this option changes ABI, I also added a warning for -mpreferred-stack-boundary=3. OK for trunk? Thanks. H.J. PR target/53383 * doc/invoke.texi: Add a warning for -mpreferred-stack-boundary=3. * config/i386/i386.c (ix86_option_override_internal): Allow -mpreferred-stack-boundary=3 for 64-bit if SSE is disenabled. * config/i386/i386.h (MIN_STACK_BOUNDARY): Set to 64 for 64-bit if SSE is disenabled. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index eca542c..338d387 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -3660,7 +3660,7 @@ ix86_option_override_internal (bool main_args_p) ix86_preferred_stack_boundary = PREFERRED_STACK_BOUNDARY_DEFAULT; if (global_options_set.x_ix86_preferred_stack_boundary_arg) { - int min = (TARGET_64BIT ? 4 : 2); + int min = (TARGET_64BIT ? (TARGET_SSE ? 4 : 3) : 2); int max = (TARGET_SEH ? 4 : 12); if (ix86_preferred_stack_boundary_arg < min diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index ddb3645..f7f13d2 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -708,7 +708,7 @@ enum target_cpu_default #define MAIN_STACK_BOUNDARY (TARGET_64BIT ? 128 : 32) /* Minimum stack boundary. */ -#define MIN_STACK_BOUNDARY (TARGET_64BIT ? 128 : 32) +#define MIN_STACK_BOUNDARY (TARGET_64BIT ? (TARGET_SSE ? 128 : 64) : 32) /* Boundary (in *bits*) on which the stack pointer prefers to be aligned; the compiler cannot rely on having this alignment. */ diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 4c5c79f..daa1f3a 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -13521,6 +13521,12 @@ Attempt to keep the stack boundary aligned to a 2 raised to @var{num} byte boundary. If @option{-mpreferred-stack-boundary} is not specified, the default is 4 (16 bytes or 128 bits). +@strong{Warning:} When generating code for the x86-64 architecture with +SSE extensions disabled, @option{-mpreferred-stack-boundary=3} can be +used to keep the stack boundary aligned to 8 byte boundary. You must +build all modules with @option{-mpreferred-stack-boundary=3}, including +any libraries. This includes the system libraries and startup modules. + @item -mincoming-stack-boundary=@var{num} @opindex mincoming-stack-boundary Assume the incoming stack is aligned to a 2 raised to @var{num} byte
Updating general info in tree-parloops.c
Hi, I updated some of the info in tree-parloops.c, like adding myself to the contributors, and updating the TODO list, both long overdue... I also update the wiki http://gcc.gnu.org/wiki/AutoParInGCC and added a link to it from tree-parloops.c. If there are no objections, I will commit as obvious, 2012-05-20 Razya Ladelsky * tree-parloops.c : Add myself to contributors, update TODO list, add link to wiki. Thanks, Razya Index: tree-parloops.c === --- tree-parloops.c (revision 187694) +++ tree-parloops.c (working copy) @@ -1,8 +1,8 @@ /* Loop autoparallelization. Copyright (C) 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc. - Contributed by Sebastian Pop and - Zdenek Dvorak . + Contributed by Sebastian Pop + Zdenek Dvorak and Razya Ladelsky . This file is part of GCC. @@ -54,9 +54,9 @@ along with GCC; see the file COPYING3. If not see -- if there are several parallelizable loops in a function, it may be possible to generate the threads just once (using synchronization to ensure that cross-loop dependences are obeyed). - -- handling of common scalar dependence patterns (accumulation, ...) - -- handling of non-innermost loops */ - + -- handling of common reduction patterns for outer loops. + + More info can also be found at http://gcc.gnu.org/wiki/AutoParInGCC */ /* Reduction handling: currently we use vect_force_simple_reduction() to detect reduction patterns. =
[committed] Fix PR rtl-optimzation/53373 on PA
The attached patch changes the PIC PA call patterns to hide the internal games we play with the PIC register until after reload. As such, the call value patterns are now single sets. Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11. Committed to trunk. Dave -- J. David Anglin dave.ang...@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) 2012-05-20 John David Anglin PR rtl-optimzation/53373 * config/pa/pa.md (call_symref_pic): Don't expose PIC register save in call pattern. Update split patterns. (call_symref_64bit, call_reg_pic, call_reg_64bit, call_val_symref_pic, call_val_symref_64bit, call_val_reg_pic, call_val_reg_64bit): Likewise. Index: config/pa/pa.md === --- config/pa/pa.md (revision 187620) +++ config/pa/pa.md (working copy) @@ -7190,12 +7190,11 @@ (set (attr "length") (symbol_ref "pa_attr_length_call (insn, 0)"))]) (define_insn "call_symref_pic" - [(set (match_operand:SI 2 "register_operand" "=&r") (reg:SI 19)) - (call (mem:SI (match_operand 0 "call_operand_address" "")) + [(call (mem:SI (match_operand 0 "call_operand_address" "")) (match_operand 1 "" "i")) (clobber (reg:SI 1)) (clobber (reg:SI 2)) - (use (match_dup 2)) + (clobber (match_operand 2)) (use (reg:SI 19)) (use (const_int 0))] "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT" @@ -7211,12 +7210,11 @@ ;; terminate the basic block. The split has to contain more than one ;; insn. (define_split - [(parallel [(set (match_operand:SI 2 "register_operand" "") (reg:SI 19)) - (call (mem:SI (match_operand 0 "call_operand_address" "")) + [(parallel [(call (mem:SI (match_operand 0 "call_operand_address" "")) (match_operand 1 "" "")) (clobber (reg:SI 1)) (clobber (reg:SI 2)) - (use (match_dup 2)) + (clobber (match_operand 2)) (use (reg:SI 19)) (use (const_int 0))])] "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT && reload_completed @@ -7231,12 +7229,11 @@ "") (define_split - [(parallel [(set (match_operand:SI 2 "register_operand" "") (reg:SI 19)) - (call (mem:SI (match_operand 0 "call_operand_address" "")) + [(parallel [(call (mem:SI (match_operand 0 "call_operand_address" "")) (match_operand 1 "" "")) (clobber (reg:SI 1)) (clobber (reg:SI 2)) - (use (match_dup 2)) + (clobber (match_operand 2)) (use (reg:SI 19)) (use (const_int 0))])] "!TARGET_PORTABLE_RUNTIME && !TARGET_64BIT && reload_completed" @@ -7269,12 +7266,11 @@ ;; This pattern is split if it is necessary to save and restore the ;; PIC register. (define_insn "call_symref_64bit" - [(set (match_operand:DI 2 "register_operand" "=&r") (reg:DI 27)) - (call (mem:SI (match_operand 0 "call_operand_address" "")) + [(call (mem:SI (match_operand 0 "call_operand_address" "")) (match_operand 1 "" "i")) (clobber (reg:DI 1)) (clobber (reg:DI 2)) - (use (match_dup 2)) + (clobber (match_operand 2)) (use (reg:DI 27)) (use (reg:DI 29)) (use (const_int 0))] @@ -7291,12 +7287,11 @@ ;; terminate the basic block. The split has to contain more than one ;; insn. (define_split - [(parallel [(set (match_operand:DI 2 "register_operand" "") (reg:DI 27)) - (call (mem:SI (match_operand 0 "call_operand_address" "")) + [(parallel [(call (mem:SI (match_operand 0 "call_operand_address" "")) (match_operand 1 "" "")) (clobber (reg:DI 1)) (clobber (reg:DI 2)) - (use (match_dup 2)) + (clobber (match_operand 2)) (use (reg:DI 27)) (use (reg:DI 29)) (use (const_int 0))])] @@ -7313,12 +7308,11 @@ "") (define_split - [(parallel [(set (match_operand:DI 2 "register_operand" "") (reg:DI 27)) - (call (mem:SI (match_operand 0 "call_operand_address" "")) + [(parallel [(call (mem:SI (match_operand 0 "call_operand_address" "")) (match_operand 1 "" "")) (clobber (reg:DI 1)) (clobber (reg:DI 2)) - (use (match_dup 2)) + (clobber (match_operand 2)) (use (reg:DI 27)) (use (reg:DI 29)) (use (const_int 0))])] @@ -7368,12 +7362,11 @@ ;; This pattern is split if it is necessary to save and restore the ;; PIC register. (define_insn "call_reg_pic" - [(set (match_operand:SI 1 "register_operand" "=&r") (reg:SI 19)) - (call (mem:SI (reg:SI 22)) + [(call (mem:SI (reg:SI 22)) (match_operand 0 "" "i")) (clobber (reg:SI 1)) (clobber (reg:SI 2)) - (use (match_dup 1)) + (clobber (match_operand 1)) (use (reg:SI 19)) (use (const_int 1))] "!TARGET_6
PATCH: Add RDRND, F16C and FSGSBASE support to -march=native
Hi, This patch adds RDRND, F16C and FSGSBASE support to -march=native. Tested on Linux/x86-64. OK for trunk, 4.7 and 4.6? Thanks. H.J. --- 2012-05-20 H.J. Lu * config/i386/driver-i386.c (host_detect_local_cpu): Support RDRND, F16C and FSGSBASE. diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c index e93e8d9..94f3819 100644 --- a/gcc/config/i386/driver-i386.c +++ b/gcc/config/i386/driver-i386.c @@ -398,6 +398,7 @@ const char *host_detect_local_cpu (int argc, const char **argv) unsigned int has_fma = 0, has_fma4 = 0, has_xop = 0; unsigned int has_bmi = 0, has_bmi2 = 0, has_tbm = 0, has_lzcnt = 0; unsigned int has_hle = 0, has_rtm = 0; + unsigned int has_rdrnd = 0, has_f16c = 0, has_fsgsbase = 0; bool arch; @@ -445,6 +446,8 @@ const char *host_detect_local_cpu (int argc, const char **argv) has_aes = ecx & bit_AES; has_pclmul = ecx & bit_PCLMUL; has_fma = ecx & bit_FMA; + has_f16c = ecx & bit_F16C; + has_rdrnd = ecx & bit_RDRND; has_cmpxchg8b = edx & bit_CMPXCHG8B; has_cmov = edx & bit_CMOV; @@ -461,6 +464,7 @@ const char *host_detect_local_cpu (int argc, const char **argv) has_rtm = ebx & bit_RTM; has_avx2 = ebx & bit_AVX2; has_bmi2 = ebx & bit_BMI2; + has_fsgsbase = ebx & bit_FSGSBASE; } /* Check cpuid level of extended features. */ @@ -733,11 +737,14 @@ const char *host_detect_local_cpu (int argc, const char **argv) const char *lzcnt = has_lzcnt ? " -mlzcnt" : " -mno-lzcnt"; const char *hle = has_hle ? " -mhle" : " -mno-hle"; const char *rtm = has_rtm ? " -mrtm" : " -mno-rtm"; + const char *rdrnd = has_rdrnd ? " -mrdrnd" : " -mno-rdrnd"; + const char *f16c = has_f16c ? " -mf16c" : " -mno-f16c"; + const char *fsgsbase = has_fsgsbase ? " -mfsgsbase" : " -mno-fsgsbase"; options = concat (options, cx16, sahf, movbe, ase, pclmul, popcnt, abm, lwp, fma, fma4, xop, bmi, bmi2, tbm, avx, avx2, sse4_2, sse4_1, lzcnt, rtm, - hle, NULL); + hle, rdrnd, f16c, fsgsbase, NULL); } done:
PATCH: PR target/53416: Wrong code when optimising loop involving _rdrand32_step
Hi, rdrand_1 must be marked with unspec_volatile since it returns a different value every time. OK for trunk, 4.7 and 4.6? Thanks. H.J. PR target/53416 * config/i386/i386.md (UNSPEC_RDRAND): Renamed to ... (UNSPECV_RDRAND): This. (rdrand_1): Updated. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index cce78b5..9327acf 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -176,9 +176,6 @@ ;; For CRC32 support UNSPEC_CRC32 - ;; For RDRAND support - UNSPEC_RDRAND - ;; For BMI support UNSPEC_BEXTR @@ -208,6 +205,9 @@ UNSPECV_WRFSBASE UNSPECV_WRGSBASE + ;; For RDRAND support + UNSPECV_RDRAND + ;; For RTM support UNSPECV_XBEGIN UNSPECV_XEND @@ -18399,9 +18399,9 @@ (define_insn "rdrand_1" [(set (match_operand:SWI248 0 "register_operand" "=r") - (unspec:SWI248 [(const_int 0)] UNSPEC_RDRAND)) + (unspec_volatile:SWI248 [(const_int 0)] UNSPECV_RDRAND)) (set (reg:CCC FLAGS_REG) - (unspec:CCC [(const_int 0)] UNSPEC_RDRAND))] + (unspec_volatile:CCC [(const_int 0)] UNSPECV_RDRAND))] "TARGET_RDRND" "rdrand\t%0" [(set_attr "type" "other")
Re: PATCH: PR target/53416: Wrong code when optimising loop involving _rdrand32_step
On Sun, May 20, 2012 at 10:04:26AM -0700, H.J. Lu wrote: > rdrand_1 must be marked with unspec_volatile since it returns > a different value every time. OK for trunk, 4.7 and 4.6? A testcase for this would be nice (runtime is not possible, since the RNG in theory could return the same value twice, but scanning assembly for a particular number of the rdrand insns would be nice). > PR target/53416 > * config/i386/i386.md (UNSPEC_RDRAND): Renamed to ... > (UNSPECV_RDRAND): This. > (rdrand_1): Updated. Jakub
fix cross build
In building a ppc cross compiler using a freshly built native compiler, I encountered an ICE in iterative_hash_expr compiling c-lex.c. I extracted the attached testcase, showing the problem is with statement expressions. Investigation showed I_H_E seeing BLOCK and BIND_EXPR nodes, which is was unprepared for. These two nodes are never considered equal by operand_equal_p, so we don't need to look into them further to refine the hash. I'm not sure why a native i686-pc-linux-gnu bootstrap doesn't encounter this problem. The attached patch resolves the ICE. built and tested on i686-pc-linux-gnu, ok? nathan 2012-05-20 Nathan Sidwell * tree.c (iterative_hash_expr): Add BLOCK and BIND_EXPR cases. * gcc.dg/stmt-expr-4.c: New. Index: tree.c === --- tree.c (revision 187628) +++ tree.c (working copy) @@ -6998,6 +6998,11 @@ iterative_hash_expr (const_tree t, hashv } return val; } +case BLOCK: +case BIND_EXPR: + /* These are never equal operands. The contain nodes we're not + prepared for, so stop now. */ + return val; case MEM_REF: { /* The type of the second operand is relevant, except for Index: testsuite/gcc.dg/stmt-expr-4.c === --- testsuite/gcc.dg/stmt-expr-4.c (revision 0) +++ testsuite/gcc.dg/stmt-expr-4.c (revision 0) @@ -0,0 +1,22 @@ + +/* { dg-options "-O2 -std=gnu99" } */ +/* Internal compiler error in iterative_hash_expr */ + +struct tree_string +{ + char str[1]; +}; + +union tree_node +{ + struct tree_string string; +}; + +char *Foo (union tree_node * num_string) +{ + char *str = ((union {const char * _q; char * _nq;}) + ((const char *)(({ __typeof (num_string) const __t + = num_string; __t; }) + ->string.str)))._nq; + return str; +}
Re: PATCH: PR target/53416: Wrong code when optimising loop involving _rdrand32_step
On Sun, May 20, 2012 at 10:19 AM, Jakub Jelinek wrote: > On Sun, May 20, 2012 at 10:04:26AM -0700, H.J. Lu wrote: >> rdrand_1 must be marked with unspec_volatile since it returns >> a different value every time. OK for trunk, 4.7 and 4.6? > > A testcase for this would be nice (runtime is not possible, since the > RNG in theory could return the same value twice, but scanning assembly > for a particular number of the rdrand insns would be nice). > For unsigned int number = 0; volatile int result = 0; for (register int i = 0; i < 4; ++i) { result = _rdrand32_step(&number); printf("%d: %d\n", result, number); } the issue isn't about number of rdrand insns. As long as it isn't hoisted out of the loop, one rdrand insn is OK. I don't know how to scan for inside or outside of loop. -- H.J.
[patch] Fix array type merging in LTO mode
Hi, since http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00833.html, canonical type merging for arrays takes hours instead of minutes for big Ada applications. The problem is that iterative_hash_canonical_type doesn't hash TYPE_MIN_VALUE and TYPE_MAX_VALUE for integer types anymore, so TYPE_DOMAIN is effectively not hashed anymore and the number of collisions goes to the roof in Ada. Fixed by the attached patch, which also removes a bogus comparison of the TYPE_SIZE of TYPE_DOMAIN in gimple_[canonical]types_compatible_p. LTO bootstrapped on x86_64-suse-linux, OK for mainline and 4.7 branch? 2012-05-20 Eric Botcazou * gimple.c (gimple_types_compatible_p_1) : Remove bogus size handling. (gimple_canonical_types_compatible_p) : Likewise. (iterative_hash_gimple_type): Adjust comment. (iterative_hash_canonical_type): Likewise. Hash the bounds of the domain for an array type. -- Eric Botcazou Index: gimple.c === --- gimple.c (revision 187680) +++ gimple.c (working copy) @@ -3445,13 +3445,6 @@ gimple_types_compatible_p_1 (tree t1, tr goto same_types; else if (i1 == NULL_TREE || i2 == NULL_TREE) goto different_types; - /* If for a complete array type the possibly gimplified sizes - are different the types are different. */ - else if (((TYPE_SIZE (i1) != NULL) ^ (TYPE_SIZE (i2) != NULL)) - || (TYPE_SIZE (i1) - && TYPE_SIZE (i2) - && !operand_equal_p (TYPE_SIZE (i1), TYPE_SIZE (i2), 0))) - goto different_types; else { tree min1 = TYPE_MIN_VALUE (i1); @@ -3962,9 +3955,8 @@ iterative_hash_gimple_type (tree type, h v = iterative_hash_hashval_t (TYPE_STRING_FLAG (type), v); } - /* For array types hash their domain and the string flag. */ - if (TREE_CODE (type) == ARRAY_TYPE - && TYPE_DOMAIN (type)) + /* For array types hash the domain and the string flag. */ + if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type)) { v = iterative_hash_hashval_t (TYPE_STRING_FLAG (type), v); v = visit (TYPE_DOMAIN (type), state, v, @@ -4191,16 +4183,21 @@ iterative_hash_canonical_type (tree type v = iterative_hash_hashval_t (TREE_CODE (TREE_TYPE (type)), v); } - /* For integer types hash the types min/max values and the string flag. */ + /* For integer types hash the sizetype flag and the string flag. */ if (TREE_CODE (type) == INTEGER_TYPE) v = iterative_hash_hashval_t (TYPE_STRING_FLAG (type), v); - /* For array types hash their domain and the string flag. */ - if (TREE_CODE (type) == ARRAY_TYPE - && TYPE_DOMAIN (type)) + /* For array types hash the domain and its bounds, and the string flag. */ + if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type)) { v = iterative_hash_hashval_t (TYPE_STRING_FLAG (type), v); v = iterative_hash_canonical_type (TYPE_DOMAIN (type), v); + /* OMP lowering can introduce error_mark_node in place of + random local decls in types. */ + if (TYPE_MIN_VALUE (TYPE_DOMAIN (type)) != error_mark_node) + v = iterative_hash_expr (TYPE_MIN_VALUE (TYPE_DOMAIN (type)), v); + if (TYPE_MAX_VALUE (TYPE_DOMAIN (type)) != error_mark_node) + v = iterative_hash_expr (TYPE_MAX_VALUE (TYPE_DOMAIN (type)), v); } /* Recurse for aggregates with a single element type. */ @@ -4468,13 +4465,6 @@ gimple_canonical_types_compatible_p (tre return true; else if (i1 == NULL_TREE || i2 == NULL_TREE) return false; - /* If for a complete array type the possibly gimplified sizes - are different the types are different. */ - else if (((TYPE_SIZE (i1) != NULL) ^ (TYPE_SIZE (i2) != NULL)) - || (TYPE_SIZE (i1) - && TYPE_SIZE (i2) - && !operand_equal_p (TYPE_SIZE (i1), TYPE_SIZE (i2), 0))) - return false; else { tree min1 = TYPE_MIN_VALUE (i1);
Re: Turn check macros into functions. (issue6188088)
On 05/18/2012 04:48 PM, Diego Novillo wrote: We can do this in trunk today using a variant of Lawrence's original patch (http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01649.html). This uses no C++ features, though it weakens type checking by removing away constness. In the cxx-conversion branch, we can use overloads, which will DTRT with const. My question is, what do folks prefer? a) The trunk patch today, using no C++ features. b) Wait for the cxx-conversion variant? Surely (check(t), t) also works, and also strt wrt const. r~
Re: PATCH: PR target/53416: Wrong code when optimising loop involving _rdrand32_step
On Sun, May 20, 2012 at 10:37:13AM -0700, H.J. Lu wrote: > On Sun, May 20, 2012 at 10:19 AM, Jakub Jelinek wrote: > > On Sun, May 20, 2012 at 10:04:26AM -0700, H.J. Lu wrote: > >> rdrand_1 must be marked with unspec_volatile since it returns > >> a different value every time. OK for trunk, 4.7 and 4.6? > > > > A testcase for this would be nice (runtime is not possible, since the > > RNG in theory could return the same value twice, but scanning assembly > > for a particular number of the rdrand insns would be nice). > > > > For > > unsigned int number = 0; > volatile int result = 0; > > for (register int i = 0; i < 4; ++i) { > result = _rdrand32_step(&number); > printf("%d: %d\n", result, number); > } Try it without the loop, unroll it by hand, see if without the patch the rdrand insns are still CSEd together? Jakub
Re: PATCH: PR target/53416: Wrong code when optimising loop involving _rdrand32_step
On Sun, May 20, 2012 at 11:15 AM, Jakub Jelinek wrote: > On Sun, May 20, 2012 at 10:37:13AM -0700, H.J. Lu wrote: >> On Sun, May 20, 2012 at 10:19 AM, Jakub Jelinek wrote: >> > On Sun, May 20, 2012 at 10:04:26AM -0700, H.J. Lu wrote: >> >> rdrand_1 must be marked with unspec_volatile since it returns >> >> a different value every time. OK for trunk, 4.7 and 4.6? >> > >> > A testcase for this would be nice (runtime is not possible, since the >> > RNG in theory could return the same value twice, but scanning assembly >> > for a particular number of the rdrand insns would be nice). >> > >> >> For >> >> unsigned int number = 0; >> volatile int result = 0; >> >> for (register int i = 0; i < 4; ++i) { >> result = _rdrand32_step(&number); >> printf("%d: %d\n", result, number); >> } > > Try it without the loop, unroll it by hand, see if without the patch > the rdrand insns are still CSEd together? > It doesn't: [hjl@gnu-ivb-1 tmp]$ cat x.c #include int main(int argc, char **argv) { unsigned int number = 0; volatile int result = 0; result = __builtin_ia32_rdrand32_step (&number); printf("%d: %d\n", result, number); result = __builtin_ia32_rdrand32_step (&number); printf("%d: %d\n", result, number); result = __builtin_ia32_rdrand32_step (&number); printf("%d: %d\n", result, number); result = __builtin_ia32_rdrand32_step (&number); printf("%d: %d\n", result, number); return 0; } [hjl@gnu-ivb-1 tmp]$ /export/gnu/import/git/gcc-regression/master/187369/usr/bin/gcc -mrdrnd -O3 -S x.c [hjl@gnu-ivb-1 tmp]$ grep rdrand x.s rdrand %ebx rdrand %eax rdrand %eax rdrand %eax [hjl@gnu-ivb-1 tmp]$ -- H.J.
Re: PATCH: Add RDRND, F16C and FSGSBASE support to -march=native
On Sun, May 20, 2012 at 6:43 PM, H.J. Lu wrote: > This patch adds RDRND, F16C and FSGSBASE support to -march=native. > Tested on Linux/x86-64. OK for trunk, 4.7 and 4.6? > > 2012-05-20 H.J. Lu > > * config/i386/driver-i386.c (host_detect_local_cpu): Support > RDRND, F16C and FSGSBASE. OK everywhere. Thanks, Uros.
Re: PATCH: PR target/53416: Wrong code when optimising loop involving _rdrand32_step
On Sun, May 20, 2012 at 11:37 AM, H.J. Lu wrote: > On Sun, May 20, 2012 at 11:15 AM, Jakub Jelinek wrote: >> On Sun, May 20, 2012 at 10:37:13AM -0700, H.J. Lu wrote: >>> On Sun, May 20, 2012 at 10:19 AM, Jakub Jelinek wrote: >>> > On Sun, May 20, 2012 at 10:04:26AM -0700, H.J. Lu wrote: >>> >> rdrand_1 must be marked with unspec_volatile since it returns >>> >> a different value every time. OK for trunk, 4.7 and 4.6? >>> > >>> > A testcase for this would be nice (runtime is not possible, since the >>> > RNG in theory could return the same value twice, but scanning assembly >>> > for a particular number of the rdrand insns would be nice). >>> > >>> >>> For >>> >>> unsigned int number = 0; >>> volatile int result = 0; >>> >>> for (register int i = 0; i < 4; ++i) { >>> result = _rdrand32_step(&number); >>> printf("%d: %d\n", result, number); >>> } >> >> Try it without the loop, unroll it by hand, see if without the patch >> the rdrand insns are still CSEd together? >> > > It doesn't: > > [hjl@gnu-ivb-1 tmp]$ cat x.c > #include > > int > main(int argc, char **argv) > { > unsigned int number = 0; > volatile int result = 0; > > result = __builtin_ia32_rdrand32_step (&number); > printf("%d: %d\n", result, number); > result = __builtin_ia32_rdrand32_step (&number); > printf("%d: %d\n", result, number); > result = __builtin_ia32_rdrand32_step (&number); > printf("%d: %d\n", result, number); > result = __builtin_ia32_rdrand32_step (&number); > printf("%d: %d\n", result, number); > return 0; > } > [hjl@gnu-ivb-1 tmp]$ > /export/gnu/import/git/gcc-regression/master/187369/usr/bin/gcc > -mrdrnd -O3 -S x.c > [hjl@gnu-ivb-1 tmp]$ grep rdrand x.s > rdrand %ebx > rdrand %eax > rdrand %eax > rdrand %eax > [hjl@gnu-ivb-1 tmp]$ Try: #include int main(int argc, char **argv) { unsigned int number = 0; int result0, result1, result2, result3; result0 = __builtin_ia32_rdrand32_step (&number); result1 = __builtin_ia32_rdrand32_step (&number); result2 = __builtin_ia32_rdrand32_step (&number); result3 = __builtin_ia32_rdrand32_step (&number); printf("%d: %d\n", result0, number); printf("%d: %d\n", result1, number); printf("%d: %d\n", result2, number); printf("%d: %d\n", result3, number); return 0; } Which I Know for a fact fails before the patch: pinskia@server:~$ grep rdrand t.s rdrand %edx pinskia@server:~$ Thanks, Andrew Pinski > > > -- > H.J.
[DF] Generate REFs in REGNO order
Hello list, I'm resubmitting this patch from last year's GSOC which speeds up compilation by avoiding thousands of calls to qsort(). Measured again it's impact on compilation speed, this time compiling (cc1) gcc's reload.c on i386: orig: 0.734s patched:0.720s Tested on i686, ppc64. No regressions. Paolo: I couldn't find a single test-case where the mw_reg_pool was heavily used so I reduced its size. You think it's OK for all archs? 2012-05-20 Dimitrios Apostolou Paolo Bonzini Provide almost 2% speedup on -O0 compilations by generating the DF_REF_BASE register defs in REGNO order, so that collection_rec is already sorted and most qsort() calls are avoided. In detail: (df_def_record_1): Assert a parallel must contain an EXPR_LIST at this point. Receive the LOC and move its extraction... (df_defs_record): ... here. Rewrote logic with a switch statement instead of multiple if-else. (df_find_hard_reg_defs, df_find_hard_reg_defs_1): New functions that duplicate the logic of df_defs_record() and df_def_record_1() but without actually recording any DEFs, only marking them in the defs HARD_REG_SET. (df_get_call_refs): Call df_find_hard_reg_defs() to mark DEFs that are the result of the call. Record DF_REF_BASE DEFs in REGNO order. Use HARD_REG_SET instead of bitmap for regs_invalidated_by_call. Changed defs_generated from bitmap to HARD_REG_SET, it's much faster. (df_insn_refs_collect): Record DF_REF_REGULAR DEFs after df_get_call_refs(). (df_scan_alloc): Rounded up allocation pools size, reduced the mw_reg_pool size, it was unnecessarily large. Thanks, Dimitris === modified file 'gcc/df-scan.c' --- gcc/df-scan.c 2012-02-29 08:12:04 + +++ gcc/df-scan.c 2012-05-20 15:46:06 + @@ -111,7 +111,7 @@ static void df_ref_record (enum df_ref_c rtx, rtx *, basic_block, struct df_insn_info *, enum df_ref_type, int ref_flags); -static void df_def_record_1 (struct df_collection_rec *, rtx, +static void df_def_record_1 (struct df_collection_rec *, rtx *, basic_block, struct df_insn_info *, int ref_flags); static void df_defs_record (struct df_collection_rec *, rtx, @@ -318,7 +318,7 @@ df_scan_alloc (bitmap all_blocks ATTRIBU { struct df_scan_problem_data *problem_data; unsigned int insn_num = get_max_uid () + 1; - unsigned int block_size = 400; + unsigned int block_size = 512; basic_block bb; /* Given the number of pools, this is really faster than tearing @@ -347,7 +347,7 @@ df_scan_alloc (bitmap all_blocks ATTRIBU sizeof (struct df_reg_info), block_size); problem_data->mw_reg_pool = create_alloc_pool ("df_scan mw_reg", -sizeof (struct df_mw_hardreg), block_size); +sizeof (struct df_mw_hardreg), block_size / 16); bitmap_obstack_initialize (&problem_data->reg_bitmaps); bitmap_obstack_initialize (&problem_data->insn_bitmaps); @@ -2917,40 +2917,27 @@ df_read_modify_subreg_p (rtx x) } -/* Process all the registers defined in the rtx, X. +/* Process all the registers defined in the rtx pointed by LOC. Autoincrement/decrement definitions will be picked up by df_uses_record. */ static void df_def_record_1 (struct df_collection_rec *collection_rec, - rtx x, basic_block bb, struct df_insn_info *insn_info, + rtx *loc, basic_block bb, struct df_insn_info *insn_info, int flags) { - rtx *loc; - rtx dst; - - /* We may recursively call ourselves on EXPR_LIST when dealing with PARALLEL - construct. */ - if (GET_CODE (x) == EXPR_LIST || GET_CODE (x) == CLOBBER) -loc = &XEXP (x, 0); - else -loc = &SET_DEST (x); - dst = *loc; + rtx dst = *loc; /* It is legal to have a set destination be a parallel. */ if (GET_CODE (dst) == PARALLEL) { int i; - for (i = XVECLEN (dst, 0) - 1; i >= 0; i--) { rtx temp = XVECEXP (dst, 0, i); - if (GET_CODE (temp) == EXPR_LIST || GET_CODE (temp) == CLOBBER - || GET_CODE (temp) == SET) - df_def_record_1 (collection_rec, - temp, bb, insn_info, -GET_CODE (temp) == CLOBBER -? flags | DF_REF_MUST_CLOBBER : flags); + gcc_assert (GET_CODE (temp) == EXPR_LIST); + df_def_record_1 (collection_rec, &XEXP (temp, 0), + bb, insn_info, flags); } return; } @@ -3004,26 +2991,98 @@ df_defs_record (struct df_collection_rec int flags) { RTX_CODE code = GET_CODE (x); + int i; - if (code == SET || code == CLOBBER) -{ - /* Mark the singl
Re: PATCH: PR target/53425: No warnings are given for -mno-sse
On Sun, May 20, 2012 at 4:15 PM, H.J. Lu wrote: > We should warn passing SSE vector argument without SSE enabled changes > the ABI for 64-bit. Tested on Linux/x86-64. OK to install? > > 2012-05-20 H.J. Lu > > PR target/53425 > * config/i386/i386.c (type_natural_mode): Warn passing SSE > vector argument without SSE enabled changes the ABI. > > gcc/testsuite/ > > 2012-05-20 H.J. Lu > > PR target/53425 > * gcc.target/i386/pr53425.c: New file. > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index eca542c..a56847a 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -5828,7 +5833,22 @@ type_natural_mode (const_tree type, const > CUMULATIVE_ARGS *cum) > return TYPE_MODE (type); > } > else > - return mode; > + { No need for these outermost braces. BTW: Can you please also add MMX warning for -mno-mmx to be consistent with 32bit targets? OK with these changes. Thanks, Uros.
Re: PATCH: PR target/53416: Wrong code when optimising loop involving _rdrand32_step
On Sun, May 20, 2012 at 8:43 PM, Andrew Pinski wrote: > #include > > int > main(int argc, char **argv) > { > unsigned int number = 0; > int result0, result1, result2, result3; > > result0 = __builtin_ia32_rdrand32_step (&number); > result1 = __builtin_ia32_rdrand32_step (&number); > result2 = __builtin_ia32_rdrand32_step (&number); > result3 = __builtin_ia32_rdrand32_step (&number); > printf("%d: %d\n", result0, number); > printf("%d: %d\n", result1, number); > printf("%d: %d\n", result2, number); > printf("%d: %d\n", result3, number); > return 0; > } > int test (void) { unsigned int number = 0; int result0, result1, result2, result3; result0 = __builtin_ia32_rdrand32_step (&number); result1 = __builtin_ia32_rdrand32_step (&number); result2 = __builtin_ia32_rdrand32_step (&number); result3 = __builtin_ia32_rdrand32_step (&number); return result0 + result1 +result2 + result3;; } This is the simplest, and also good test. Uros.
Re: Turn check macros into functions. (issue6188088)
On 12-05-20 13:59 , Richard Henderson wrote: On 05/18/2012 04:48 PM, Diego Novillo wrote: We can do this in trunk today using a variant of Lawrence's original patch (http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01649.html). This uses no C++ features, though it weakens type checking by removing away constness. In the cxx-conversion branch, we can use overloads, which will DTRT with const. My question is, what do folks prefer? a) The trunk patch today, using no C++ features. b) Wait for the cxx-conversion variant? Surely (check(t), t) also works, and also strt wrt const. My concern with (check(t), t) is that it evaluates 't' twice. It may not be a big deal, however. In which case, I'm OK with that alternative. Diego.
Re: PATCH: PR target/53425: No warnings are given for -mno-sse
On Sun, May 20, 2012 at 11:57 AM, Uros Bizjak wrote: > On Sun, May 20, 2012 at 4:15 PM, H.J. Lu wrote: > >> We should warn passing SSE vector argument without SSE enabled changes >> the ABI for 64-bit. Tested on Linux/x86-64. OK to install? >> >> 2012-05-20 H.J. Lu >> >> PR target/53425 >> * config/i386/i386.c (type_natural_mode): Warn passing SSE >> vector argument without SSE enabled changes the ABI. >> >> gcc/testsuite/ >> >> 2012-05-20 H.J. Lu >> >> PR target/53425 >> * gcc.target/i386/pr53425.c: New file. >> >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> index eca542c..a56847a 100644 >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -5828,7 +5833,22 @@ type_natural_mode (const_tree type, const >> CUMULATIVE_ARGS *cum) >> return TYPE_MODE (type); >> } >> else >> - return mode; >> + { > > No need for these outermost braces. It is needed to avoid /export/gnu/import/git/gcc/gcc/config/i386/i386.c: In function \u2018type_natural_mode\u2019: /export/gnu/import/git/gcc/gcc/config/i386/i386.c:5813:9: warning: suggest explicit braces to avoid ambiguous \u2018else\u2019 [-Wparentheses] > BTW: Can you please also add MMX warning for -mno-mmx to be consistent > with 32bit targets? 64-bit passes 8-byte vector in SSE register, not MMX register. I updated the patch to want 8-byte vector. Is this patch OK? Thanks. -- H.J. gcc/ 2012-05-20 H.J. Lu PR target/53425 * config/i386/i386.c (type_natural_mode): Warn passing SSE vector argument without SSE enabled changes the ABI. gcc/testsuite/ 2012-05-20 H.J. Lu PR target/53425 * gcc.target/i386/pr53425-1.c: New file. * gcc.target/i386/pr53425-2.c: Likewise. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index eca542c..3c0b81c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5828,7 +5828,22 @@ type_natural_mode (const_tree type, const CUMULATIVE_ARGS *cum) return TYPE_MODE (type); } else - return mode; + { + if ((size == 8 || size == 16) && !TARGET_SSE) + { + static bool warnedsse; + + if (cum + && !warnedsse + && cum->warn_sse) + { + warnedsse = true; + warning (0, "SSE vector argument without SSE " +"enabled changes the ABI"); + } + } + return mode; + } } gcc_unreachable (); diff --git a/gcc/testsuite/gcc.target/i386/pr53425-1.c b/gcc/testsuite/gcc.target/i386/pr53425-1.c new file mode 100644 index 000..2446c0f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr53425-1.c @@ -0,0 +1,14 @@ +/* PR target/53425 */ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options "-O2 -mno-sse" } */ + +typedef double __v2df __attribute__ ((__vector_size__ (16))); + +extern __v2df x; + +extern void bar (__v2df); +void +foo (void) +{ + bar (x); /* { dg-message "warning: SSE vector argument without SSE enabled changes the ABI" } */ +} diff --git a/gcc/testsuite/gcc.target/i386/pr53425-2.c b/gcc/testsuite/gcc.target/i386/pr53425-2.c new file mode 100644 index 000..b89a5b1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr53425-2.c @@ -0,0 +1,14 @@ +/* PR target/53425 */ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options "-O2 -mno-sse" } */ + +typedef float __v2sf __attribute__ ((__vector_size__ (8))); + +extern __v2sf x; + +extern void bar (__v2sf); +void +foo (void) +{ + bar (x); /* { dg-message "warning: SSE vector argument without SSE enabled changes the ABI" } */ +}
Re: PATCH: PR target/53416: Wrong code when optimising loop involving _rdrand32_step
On Sun, May 20, 2012 at 12:03 PM, Uros Bizjak wrote: > On Sun, May 20, 2012 at 8:43 PM, Andrew Pinski wrote: > >> #include >> >> int >> main(int argc, char **argv) >> { >> unsigned int number = 0; >> int result0, result1, result2, result3; >> >> result0 = __builtin_ia32_rdrand32_step (&number); >> result1 = __builtin_ia32_rdrand32_step (&number); >> result2 = __builtin_ia32_rdrand32_step (&number); >> result3 = __builtin_ia32_rdrand32_step (&number); >> printf("%d: %d\n", result0, number); >> printf("%d: %d\n", result1, number); >> printf("%d: %d\n", result2, number); >> printf("%d: %d\n", result3, number); >> return 0; >> } >> > > int test (void) > { > unsigned int number = 0; > int result0, result1, result2, result3; > > result0 = __builtin_ia32_rdrand32_step (&number); > result1 = __builtin_ia32_rdrand32_step (&number); > result2 = __builtin_ia32_rdrand32_step (&number); > result3 = __builtin_ia32_rdrand32_step (&number); > > return result0 + result1 +result2 + result3;; > } > > This is the simplest, and also good test. > Is this patck OK for trunk, 4.7 and 4.6? Thanks. -- H.J. --- gcc/ 2012-05-20 H.J. Lu PR target/53416 * config/i386/i386.md (UNSPEC_RDRAND): Renamed to ... (UNSPECV_RDRAND): This. (rdrand_1): Updated. gcc/testsuite/ 2012-05-20 Uros Bizjak H.J. Lu diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index cce78b5..9327acf 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -176,9 +176,6 @@ ;; For CRC32 support UNSPEC_CRC32 - ;; For RDRAND support - UNSPEC_RDRAND - ;; For BMI support UNSPEC_BEXTR @@ -208,6 +205,9 @@ UNSPECV_WRFSBASE UNSPECV_WRGSBASE + ;; For RDRAND support + UNSPECV_RDRAND + ;; For RTM support UNSPECV_XBEGIN UNSPECV_XEND @@ -18399,9 +18399,9 @@ (define_insn "rdrand_1" [(set (match_operand:SWI248 0 "register_operand" "=r") - (unspec:SWI248 [(const_int 0)] UNSPEC_RDRAND)) + (unspec_volatile:SWI248 [(const_int 0)] UNSPECV_RDRAND)) (set (reg:CCC FLAGS_REG) - (unspec:CCC [(const_int 0)] UNSPEC_RDRAND))] + (unspec_volatile:CCC [(const_int 0)] UNSPECV_RDRAND))] "TARGET_RDRND" "rdrand\t%0" [(set_attr "type" "other") diff --git a/gcc/testsuite/gcc.target/i386/pr53416.c b/gcc/testsuite/gcc.target/i386/pr53416.c new file mode 100644 index 000..d0a159b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr53416.c @@ -0,0 +1,17 @@ +/* PR target/53416 */ +/* { dg-options "-O2 -mrdrnd" } */ + +int test (void) +{ + unsigned int number = 0; + int result0, result1, result2, result3; + + result0 = __builtin_ia32_rdrand32_step (&number); + result1 = __builtin_ia32_rdrand32_step (&number); + result2 = __builtin_ia32_rdrand32_step (&number); + result3 = __builtin_ia32_rdrand32_step (&number); + + return result0 + result1 +result2 + result3;; +} + +/* { dg-final { scan-assembler-times "rdrand" 4 } } */
[PATCH, 4.6] Fix PR53170: missing target c++11 selector
The testsuite for PR52796 uses the 'target c++11' selector which doesn't exist in 4.6. This patch backports the selector, clearing the 'ERROR: g++.dg/cpp0x/variadic-value1.C: syntax error in target selector "target c++11" for " dg-do 2 run { target c++11 } "' errors which have appeared in recent 4.6 builds. Tested on x86_64-linux-gnu with no regressions. Changes the ERROR to UNSUPPORTED. OK for 4.6? -- Michael 2012-05-21 Michael Hope PR 53170 Backport from mainline 2011-11-08 Jason Merrill * lib/target-supports.exp (check_effective_target_c++11): New. === modified file 'gcc/testsuite/lib/target-supports.exp' --- gcc/testsuite/lib/target-supports.exp 2012-02-22 17:38:22 + +++ gcc/testsuite/lib/target-supports.exp 2012-05-18 01:57:51 + @@ -3822,6 +3822,17 @@ return 0 } +# Check which language standard is active by checking for the presence of +# one of the C++11 -std flags. This assumes that the default for the +# compiler is C++98, and that there will never be multiple -std= arguments +# on the command line. +proc check_effective_target_c++11 { } { +if ![check_effective_target_c++] { + return 0 +} +return [check-flags { { } { } { -std=c++0x -std=gnu++0x -std=c++11 -std=gnu++11 } }] +} + # Return 1 if the language for the compiler under test is C++. proc check_effective_target_c++ { } {
Re: [RFA] leb128.h: New file.
Ping. On Thu, May 17, 2012 at 11:29 AM, Doug Evans wrote: > Hi. > > This is a slightly modified version of my previous patch. > > ref: http://gcc.gnu.org/ml/gcc-patches/2012-05/msg00962.html > > The only change is to make the result of the functions an int > instead of a const pointer. > This lets them be used in places where the code is using > non-const pointers without having to apply ugly casts. > > Ok to check in? > > 2012-05-17 Doug Evans > > * leb128.h: New file. > > Index: leb128.h > === > RCS file: leb128.h > diff -N leb128.h > --- /dev/null 1 Jan 1970 00:00:00 - > +++ leb128.h 17 May 2012 18:23:29 - > @@ -0,0 +1,130 @@ > +/* Utilities for reading leb128 values. > + Copyright (C) 2012 Free Software Foundation, Inc. > + > +This file is part of the libiberty library. > +Libiberty is free software; you can redistribute it and/or > +modify it under the terms of the GNU Library General Public > +License as published by the Free Software Foundation; either > +version 2 of the License, or (at your option) any later version. > + > +Libiberty is distributed in the hope that it will be useful, > +but WITHOUT ANY WARRANTY; without even the implied warranty of > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +Library General Public License for more details. > + > +You should have received a copy of the GNU Library General Public > +License along with libiberty; see the file COPYING.LIB. If not, write > +to the Free Software Foundation, Inc., 51 Franklin Street - Fifth Floor, > +Boston, MA 02110-1301, USA. */ > + > +/* The functions defined here can be speed critical. > + Since they are all pretty small we keep things simple and just define > + them all as "static inline". */ > + > +#ifndef LEB128_H > +#define LEB128_H > + > +#include "ansidecl.h" > + > +/* Get a definition for NULL. */ > +#include > + > +#ifdef HAVE_STDINT_H > +#include > +#endif > +#ifdef HAVE_INTTYPES_H > +#include > +#endif > + > +/* Decode the unsigned LEB128 constant at BUF into the variable pointed to > + by R, and return the number of bytes read. > + If we read off the end of the buffer, zero is returned, > + and nothing is stored in R. > + > + Note: The result is an int instead of a pointer to the next byte to be > + read to avoid const-vs-non-const problems. */ > + > +static inline int > +read_uleb128 (const unsigned char *buf, const unsigned char *buf_end, > + uint64_t *r) > +{ > + const unsigned char *p = buf; > + unsigned int shift = 0; > + uint64_t result = 0; > + unsigned char byte; > + > + while (1) > + { > + if (p >= buf_end) > + return 0; > + > + byte = *p++; > + result |= ((uint64_t) (byte & 0x7f)) << shift; > + if ((byte & 0x80) == 0) > + break; > + shift += 7; > + } > + > + *r = result; > + return p - buf; > +} > + > +/* Decode the signed LEB128 constant at BUF into the variable pointed to > + by R, and return the number of bytes read. > + If we read off the end of the buffer, zero is returned, > + and nothing is stored in R. > + > + Note: The result is an int instead of a pointer to the next byte to be > + read to avoid const-vs-non-const problems. */ > + > +static inline int > +read_sleb128 (const unsigned char *buf, const unsigned char *buf_end, > + int64_t *r) > +{ > + const unsigned char *p = buf; > + unsigned int shift = 0; > + int64_t result = 0; > + unsigned char byte; > + > + while (1) > + { > + if (p >= buf_end) > + return 0; > + > + byte = *p++; > + result |= ((uint64_t) (byte & 0x7f)) << shift; > + shift += 7; > + if ((byte & 0x80) == 0) > + break; > + } > + if (shift < (sizeof (*r) * 8) && (byte & 0x40) != 0) > + result |= -(((uint64_t) 1) << shift); > + > + *r = result; > + return p - buf; > +} > + > +/* Return the number of bytes to read to skip past an LEB128 number in BUF. > + If the end isn't found before reaching BUF_END, return zero. > + > + Note: The result is an int instead of a pointer to the next byte to be > + read to avoid const-vs-non-const problems. */ > + > +static inline int > +skip_leb128 (const unsigned char *buf, const unsigned char *buf_end) > +{ > + const unsigned char *p = buf; > + unsigned char byte; > + > + while (1) > + { > + if (p == buf_end) > + return 0; > + > + byte = *p++; > + if ((byte & 128) == 0) > + return p - buf; > + } > +} > + > +#endif /* LEB128_H */
[PATCH] Fix PR53183, libgcc does not always figure out the size of double/long double
The problem here is that when libgcc goes to try to figure out the size of double/long double, it includes some headers. But those headers does not exist when doing a "stage1" Linux cross compiler build. This fixes the problem having configure not include those headers. OK for the trunk and the 4.7 branch? Bootstrap and tested on x86_64-linux-gnu and doing a full build of mips64-linux-gnu. Thanks, Andrew Pinski ChangeLog: * configure.ac: Define the default includes to being none. * configure: Regenerate. Index: configure.ac === --- configure.ac(revision 187695) +++ configure.ac(working copy) @@ -14,6 +14,11 @@ AC_PREREQ(2.64) AC_INIT([GNU C Runtime Library], 1.0,,[libgcc]) AC_CONFIG_SRCDIR([static-object.mk]) +# The libgcc should not depend on any header files +AC_DEFUN([_AC_INCLUDES_DEFAULT_REQUIREMENTS], + [m4_divert_text([DEFAULTS], +[ac_includes_default='/* none */'])]) + AC_ARG_WITH(target-subdir, [ --with-target-subdir=SUBDIR Configuring in a subdirectory for target]) AC_ARG_WITH(cross-host, Index: configure === --- configure (revision 187695) +++ configure (working copy) @@ -552,42 +552,7 @@ PACKAGE_BUGREPORT='' PACKAGE_URL='http://www.gnu.org/software/libgcc/' ac_unique_file="static-object.mk" -# Factoring default headers for most tests. -ac_includes_default="\ -#include -#ifdef HAVE_SYS_TYPES_H -# include -#endif -#ifdef HAVE_SYS_STAT_H -# include -#endif -#ifdef STDC_HEADERS -# include -# include -#else -# ifdef HAVE_STDLIB_H -# include -# endif -#endif -#ifdef HAVE_STRING_H -# if !defined STDC_HEADERS && defined HAVE_MEMORY_H -# include -# endif -# include -#endif -#ifdef HAVE_STRINGS_H -# include -#endif -#ifdef HAVE_INTTYPES_H -# include -#endif -#ifdef HAVE_STDINT_H -# include -#endif -#ifdef HAVE_UNISTD_H -# include -#endif" - +ac_includes_default='/* none */' ac_subst_vars='LTLIBOBJS LIBOBJS asm_hidden_op @@ -605,8 +570,6 @@ enable_decimal_float decimal_float long_double_type_size double_type_size -EGREP -GREP CPP OBJEXT EXEEXT @@ -1732,35 +1695,6 @@ rm -f conftest.val return $ac_retval } # ac_fn_c_compute_int - -# ac_fn_c_check_header_preproc LINENO HEADER VAR -# -- -# Tests whether HEADER is present, setting the cache variable VAR accordingly. -ac_fn_c_check_header_preproc () -{ - as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 -$as_echo_n "checking for $2... " >&6; } -if { as_var=$3; eval "test \"\${$as_var+set}\" = set"; }; then : - $as_echo_n "(cached) " >&6 -else - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include <$2> -_ACEOF -if ac_fn_c_try_cpp "$LINENO"; then : - eval "$3=yes" -else - eval "$3=no" -fi -rm -f conftest.err conftest.$ac_ext -fi -eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } - eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset as_lineno;} - -} # ac_fn_c_check_header_preproc cat >config.log <<_ACEOF This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. @@ -2117,6 +2051,9 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu +# The libgcc should not depend on any header files + + # Check whether --with-target-subdir was given. if test "${with_target_subdir+set}" = set; then : @@ -4029,264 +3966,6 @@ ac_c_preproc_warn_flag=yes -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for grep that handles long lines and -e" >&5 -$as_echo_n "checking for grep that handles long lines and -e... " >&6; } -if test "${ac_cv_path_GREP+set}" = set; then : - $as_echo_n "(cached) " >&6 -else - if test -z "$GREP"; then - ac_path_GREP_found=false - # Loop through the user's path and test for each of PROGNAME-LIST - as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. -for ac_prog in grep ggrep; do -for ac_exec_ext in '' $ac_executable_extensions; do - ac_path_GREP="$as_dir/$ac_prog$ac_exec_ext" - { test -f "$ac_path_GREP" && $as_test_x "$ac_path_GREP"; } || continue -# Check for GNU ac_path_GREP and select it if it is found. - # Check for GNU $ac_path_GREP -case `"$ac_path_GREP" --version 2>&1` in -*GNU*) - ac_cv_path_GREP="$ac_path_GREP" ac_path_GREP_found=:;; -*) - ac_count=0 - $as_echo_n 0123456789 >"conftest.in" - while : - do -cat "conftest.in" "conftest.in" >"conftest.tmp" -mv "conftest.tmp" "conftest.in" -cp "conftest.in" "conftest.nl" -$as_echo 'GREP' >> "conftest.nl" -"$ac_path_GREP" -e 'GREP$' -e '-(cannot match)-' < "conftest.nl" >"conftest.out" 2>/dev/null || break -diff "conftest.out" "conf
Re: Symbol table 21/many: analyze initializers of external vars
On Thu, May 17, 2012 at 9:42 AM, Jan Hubicka wrote: > Hi, > C++ virtual tables keyed to other compilation units are represented as > DECL_EXTERNAL > variables with constructor known. Knowhing the constructor helps constant > folding to do > devirtualization. > > At the moment these costructors are not seen by varpool and thus they are not > represented by ipa-ref and thus WHOPR partitioning is not seeing them and we > later in can_refer_decl_in_current_unit_p try to work out if the partitioning > was done in lucky or unlucky way. > > This patch makes external variables to be handled similarly to external > functions. > That is the variables gets finalized and analyzed by varpool. They go in > similar > way to partitioning as comdat functions. > > Code removing unreachable nodes treats them as normal variables until after > inlining when vars/funcions referred only by those are considred unreachable. > This also allows us to remove the constructors from memory when we know they > are no longer needed saving couple hundred KB on compiling Mozilla with LTO. > > The patch also enables aboud 3000 extra foldings on Mozilla LTO build. > Mostly those are devirtualized calls to libstdc++. > > Bootstrapped/regtested x86_64-linux, comitted. > Index: ChangeLog > === > *** ChangeLog (revision 187630) > --- ChangeLog (working copy) > *** > *** 1,3 > --- 1,26 > + 2012-05-17 Jan Hubicka > + > + * lto-symtab.c (lto_symtab_resolve_symbols): Preffer decl with > constructor > + over decl without. > + * cgraph.c (cgraph_remove_node): Clear also body of unanalyzed nodes. > + * cgraph.h (varpool_can_remove_if_no_refs): Handle external correctly. > + * cgraphunit.c (process_function_and_variable_attributes): Finalize > + extrnal decls. > + (mark_functions_to_output): Also accept bodies for functions with > clones. > + (output_in_order): Skip external vars. > + * lto-cgraph.c (lto_output_node): External functions are never in > other > + partition. > + (lto_output_varpool_node): Likewise. > + * lto-streamer-out.c (lto_write_tree): Always use error_mark_nodes for > + forgotten initializers. > + * ipa.c (process_references): Handle external vars. > + (symtab_remove_unreachable_nodes): Update to handle external vars. > + (varpool_externally_visible_p): External vars are externally visible. > + * gimple-fold.c (can_refer_decl_in_current_unit_p): Update. > + * varpool.c (varpool_remove_node): Remove constructor. > + (decide_is_variable_needed): Handle externals. > + (varpool_remove_unreferenced_decls): Likewise. > + This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53428 -- H.J.
rs6000.c forward declaration cleanup
This removes rather a lot of forward declarations in rs6000.c, most of which existed to satisfy an early TARGET_INITIALIZER. Now that the TARGET_INITIALIZER has been moved to the end of rs6000.c, they become unnecessary and wrongly give the impression that rs6000.c style is to declare functions at the beginning of the file. Bootstrapped etc. powerpc-linux. OK to apply? * config/rs6000/rs6000.c: Delete unnecessary forward declarations. Move those with ATTRIBUTE_UNUSED to immediately before definitions. Move function pointer variables after forward declarations. (rs6000_builtin_support_vector_misalignment): Make static. (rs6000_legitimate_address_p, rs6000_gimplify_va_arg): Likewise. (rs6000_function_value, rs6000_can_eliminate): Likewise. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 187699) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -920,310 +920,103 @@ static tree (*rs6000_veclib_handler) (tree, tree, tree); -static bool rs6000_function_ok_for_sibcall (tree, tree); -static const char *rs6000_invalid_within_doloop (const_rtx); -static bool rs6000_legitimate_address_p (enum machine_mode, rtx, bool); static bool rs6000_debug_legitimate_address_p (enum machine_mode, rtx, bool); -static rtx rs6000_generate_compare (rtx, enum machine_mode); static bool spe_func_has_64bit_regs_p (void); -static rtx gen_frame_mem_offset (enum machine_mode, rtx, int); -static unsigned rs6000_hash_constant (rtx); -static unsigned toc_hash_function (const void *); -static int toc_hash_eq (const void *, const void *); -static bool reg_offset_addressing_ok_p (enum machine_mode); -static bool virtual_stack_registers_memory_p (rtx); -static bool constant_pool_expr_p (rtx); -static bool legitimate_small_data_p (enum machine_mode, rtx); -static bool legitimate_lo_sum_address_p (enum machine_mode, rtx, int); static struct machine_function * rs6000_init_machine_status (void); -static bool rs6000_assemble_integer (rtx, unsigned int, int); -#if defined (HAVE_GAS_HIDDEN) && !TARGET_MACHO -static void rs6000_assemble_visibility (tree, int); -#endif static int rs6000_ra_ever_killed (void); -static bool rs6000_attribute_takes_identifier_p (const_tree); static tree rs6000_handle_longcall_attribute (tree *, tree, tree, int, bool *); static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *); -static bool rs6000_ms_bitfield_layout_p (const_tree); static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *); -static void rs6000_eliminate_indexed_memrefs (rtx operands[2]); -static const char *rs6000_mangle_type (const_tree); -static void rs6000_set_default_type_attributes (tree); -static bool rs6000_reg_live_or_pic_offset_p (int); static tree rs6000_builtin_vectorized_libmass (tree, tree, tree); -static tree rs6000_builtin_vectorized_function (tree, tree, tree); -static bool rs6000_output_addr_const_extra (FILE *, rtx); -static void rs6000_output_function_prologue (FILE *, HOST_WIDE_INT); -static void rs6000_output_function_epilogue (FILE *, HOST_WIDE_INT); -static void rs6000_output_mi_thunk (FILE *, tree, HOST_WIDE_INT, HOST_WIDE_INT, - tree); static rtx rs6000_emit_set_long_const (rtx, HOST_WIDE_INT, HOST_WIDE_INT); -static bool rs6000_return_in_memory (const_tree, const_tree); -static rtx rs6000_function_value (const_tree, const_tree, bool); -static void rs6000_file_start (void); -#if TARGET_ELF -static int rs6000_elf_reloc_rw_mask (void); -static void rs6000_elf_asm_out_constructor (rtx, int) ATTRIBUTE_UNUSED; -static void rs6000_elf_asm_out_destructor (rtx, int) ATTRIBUTE_UNUSED; -static void rs6000_elf_file_end (void) ATTRIBUTE_UNUSED; -static void rs6000_elf_asm_init_sections (void); -static section *rs6000_elf_select_rtx_section (enum machine_mode, rtx, - unsigned HOST_WIDE_INT); -static void rs6000_elf_encode_section_info (tree, rtx, int) - ATTRIBUTE_UNUSED; -#endif -static bool rs6000_use_blocks_for_constant_p (enum machine_mode, const_rtx); -static void rs6000_alloc_sdmode_stack_slot (void); -static void rs6000_instantiate_decls (void); -#if TARGET_XCOFF -static void rs6000_xcoff_asm_output_anchor (rtx); -static void rs6000_xcoff_asm_globalize_label (FILE *, const char *); -static void rs6000_xcoff_asm_init_sections (void); -static int rs6000_xcoff_reloc_rw_mask (void); -static void rs6000_xcoff_asm_named_section (const char *, unsigned int, tree); -static section *rs6000_xcoff_select_section (tree, int, -unsigned HOST_WIDE_INT); -static void rs6000_xcoff_unique_section (tree, int); -static section *rs6000_xcoff_select_rtx_section - (enum machine_mode, rtx, unsigned HOST_WIDE_INT); -static const char * rs6000_xcoff_strip_name_encoding (const char *); -static unsigned int rs6000_xcoff_section_type_flags (tree,
Re: [PATCH preprocessor, diagnostics] PR preprocessor/53229 - Fix diagnostics location when pasting tokens
On 05/15/2012 10:17 AM, Dodji Seketeli wrote: It fixes the test case gcc.dg/cpp/paste12.c because that test case runs with -ftrack-macro-expansion turned off. Otherwise, you are right that the issue exists only when we aren't tracking virtual locations. I still don't understand why this change should be needed; it seems like a kludge disguised as a trivial change from one function to another. If we're going to kludge, I'd rather explicitly set the flag with a comment. But why do we need the set_invocation_location flag at all? When do we not want to set invocation_location if we're beginning to expand a macro? Jason
[RS6000] save/restore reg tidy
On Tue, May 08, 2012 at 08:02:39PM +0930, Alan Modra wrote: > I also make use of gen_frame_store and siblings that I invented for > generating the eh info, elsewhere in rs6000.c where doing so is > blindingly obvious. We could probably use them in other places too, > but I'll leave that for later. Like so. The part that isn't completely obvious is removing calls to emit_move_insn, which can transform the rtl (rs6000_emit_move). However, we're past reload, the insns emitted are always one set involving a hard reg and mem, and we don't want any addressing mode subsititions going on that avoid rs6000_emit_prologue tracking of r0, r11 and r12 usage. This patch also fixes a couple of places that call df_regs_ever_live_p without checking call_used_regs to test for global asm regs. Not serious bugs as they just result in larger stack frames. Bootstrapped and regression tested powerpc-linux. OK to apply? * config/rs6000/rs6000.c (save_reg_p): New function. (first_reg_to_save, first_fp_reg_to_save): Use it here. (first_altivec_reg_to_save, restore_saved_cr): Likewise. (emit_frame_save): Use gen_frame_store. (gen_frame_mem_offset): Correct SPE condition requiring reg+reg. (rs6000_emit_prologue): Use save_reg_p. Use gen_frame_store for vrsave and toc. (rs6000_emit_epilogue): Use save_reg_p. Use gen_frame_load for vrsave, toc, gp and fp restores. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 187699) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -17236,6 +17079,12 @@ /* This page contains routines that are used to determine what the function prologue and epilogue code will do and write them out. */ +static inline bool +save_reg_p (int r) +{ + return !call_used_regs[r] && df_regs_ever_live_p (r); +} + /* Return the first fixed-point register that is required to be saved. 32 if none. */ @@ -17246,14 +17095,16 @@ /* Find lowest numbered live register. */ for (first_reg = 13; first_reg <= 31; first_reg++) -if (df_regs_ever_live_p (first_reg) - && (! call_used_regs[first_reg] - || (first_reg == RS6000_PIC_OFFSET_TABLE_REGNUM - && ((DEFAULT_ABI == ABI_V4 && flag_pic != 0) - || (DEFAULT_ABI == ABI_DARWIN && flag_pic) - || (TARGET_TOC && TARGET_MINIMAL_TOC) +if (save_reg_p (first_reg)) break; + if (first_reg > RS6000_PIC_OFFSET_TABLE_REGNUM + && ((DEFAULT_ABI == ABI_V4 && flag_pic != 0) + || (DEFAULT_ABI == ABI_DARWIN && flag_pic) + || (TARGET_TOC && TARGET_MINIMAL_TOC)) + && df_regs_ever_live_p (RS6000_PIC_OFFSET_TABLE_REGNUM)) +first_reg = RS6000_PIC_OFFSET_TABLE_REGNUM; + #if TARGET_MACHO if (flag_pic && crtl->uses_pic_offset_table @@ -17273,7 +17124,7 @@ /* Find lowest numbered live register. */ for (first_reg = 14 + 32; first_reg <= 63; first_reg++) -if (df_regs_ever_live_p (first_reg)) +if (save_reg_p (first_reg)) break; return first_reg; @@ -17299,7 +17150,7 @@ /* Find lowest numbered live register. */ for (i = FIRST_ALTIVEC_REGNO + 20; i <= LAST_ALTIVEC_REGNO; ++i) -if (df_regs_ever_live_p (i)) +if (save_reg_p (i)) break; return i; @@ -18995,7 +18904,7 @@ emit_frame_save (rtx frame_reg, enum machine_mode mode, unsigned int regno, int offset, HOST_WIDE_INT frame_reg_to_sp) { - rtx reg, insn, mem, addr; + rtx reg, insn; /* Some cases that need register indexed addressing. */ gcc_checking_assert (!((TARGET_ALTIVEC_ABI && ALTIVEC_VECTOR_MODE (mode)) @@ -19006,9 +18915,7 @@ && !SPE_CONST_OFFSET_OK (offset; reg = gen_rtx_REG (mode, regno); - addr = gen_rtx_PLUS (Pmode, frame_reg, GEN_INT (offset)); - mem = gen_frame_mem (mode, addr); - insn = emit_move_insn (mem, reg); + insn = emit_insn (gen_frame_store (reg, frame_reg, offset)); return rs6000_frame_related (insn, frame_reg, frame_reg_to_sp, NULL_RTX, NULL_RTX); } @@ -19023,7 +18930,7 @@ int_rtx = GEN_INT (offset); - if ((TARGET_SPE_ABI && SPE_VECTOR_MODE (mode)) + if ((TARGET_SPE_ABI && SPE_VECTOR_MODE (mode) && !SPE_CONST_OFFSET_OK (offset)) || (TARGET_E500_DOUBLE && mode == DFmode)) { offset_rtx = gen_rtx_REG (Pmode, FIXED_SCRATCH); @@ -19652,8 +19559,7 @@ { int i; for (i = 0; i < 64 - info->first_fp_reg_save; i++) - if (df_regs_ever_live_p (info->first_fp_reg_save + i) - && ! call_used_regs[info->first_fp_reg_save + i]) + if (save_reg_p (info->first_fp_reg_save + i)) emit_frame_save (frame_reg_rtx, (TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT ? DFmode : SFmode), @@ -20103,7 +20009,7 @@ && TARGET_ALTIVEC_VRSAVE && info->vrsave_mask
Fix negation in stack_protect_test docs. Heads-up: prediction bug
The label is for branching *around* a block calling a noreturn function. See also the open-coded version which calls emit_cmp_and_jump_insns with EQ. All ports defining it seem ok. They'd notice very quickly, no code compiled with -fstack-protector code requiring stack-protection would work. Also, there's a corresponding bug in the prediction set in function.c:stack_protect_epilogue; the predction is set as per the documentation. However, I can't see how to fix it, as there doesn't seem to be any predict.def value for the opposite, to fix parameters in the predict_insn_def call, i.e. to "branch around a basic block calling a noreturn function; predict as taken". Help? I'll install this patch as obvious. gcc: * doc/md.texi (stack_protect_test): Remove negation of branch to label. Index: doc/md.texi === --- doc/md.texi (revision 187558) +++ doc/md.texi (working copy) @@ -6136,7 +6136,7 @@ If this pattern is not defined, then a p This pattern, if defined, compares a @code{ptr_mode} value from the memory in operand 1 with the memory in operand 0 without leaving the value in a register afterward and branches to operand 2 if the values -weren't equal. +were equal. If this pattern is not defined, then a plain compare pattern and conditional branch pattern is used. brgds, H-P
[RS6000] out-of-line save/restore conditions
Currently, powerpc-linux gcc -Os -mno-multiple uses out-of-linux gpr save and restore functions when saving/restoring just one gpr. That's quite silly since the function call requires more instructions and is slower than an inline save/restore. The only case where it might win is when no fprs are restored and the restore function can tear down the frame and exit (also loading up lr on ppc64). I guess that's how GP_SAVE_INLINE came to be like it is, ie. it's optimised for the common case using ldm in the prologue and no fprs. Still, it isn't difficult to choose the best combination in all cases, but it does mean different logic is needed for restores. I could have implemented GP_RESTORE_INLINE and FP_RESORE_INLINE macros but it seemed simpler to just move everything into the one place the macros are invoked. AIX and Darwin register cutoff doesn't change with this patch. This patch also enables out-of-line restores in cases that were previously disabled due to using inline saves. Bootstrapped and regression tested powerpc-linux. OK to apply? * aix.h (FP_SAVE_INLINE, GP_SAVE_INLINE): Delete. * darwin.h (FP_SAVE_INLINE, GP_SAVE_INLINE): Delete. * sysv4.h (FP_SAVE_INLINE, GP_SAVE_INLINE, V_SAVE_INLINE): Delete. * config/rs6000/rs6000.c (V_SAVE_INLINE): Delete. (rs6000_savres_strategy): Reimplement GP/FP/V_SAVE_INLINE logic. For ELF targets, use out-of-line restores for -Os and any number of regs if the restore exits, and out-of-line gp save for two or more regs. Use save_reg_p to test for holes in reg restore set. Replace "#if" with "if". Index: gcc/config/rs6000/aix.h === --- gcc/config/rs6000/aix.h (revision 187699) +++ gcc/config/rs6000/aix.h (working copy) @@ -207,11 +207,6 @@ { "link_syscalls",LINK_SYSCALLS_SPEC }, \ { "link_libg",LINK_LIBG_SPEC } -/* Define cutoff for using external functions to save floating point. */ -#define FP_SAVE_INLINE(FIRST_REG) ((FIRST_REG) == 62 || (FIRST_REG) == 63) -/* And similarly for general purpose registers. */ -#define GP_SAVE_INLINE(FIRST_REG) ((FIRST_REG) < 32) - #define PROFILE_HOOK(LABEL) output_profile_hook (LABEL) /* No version of AIX fully supports AltiVec or 64-bit instructions in Index: gcc/config/rs6000/darwin.h === --- gcc/config/rs6000/darwin.h (revision 187699) +++ gcc/config/rs6000/darwin.h (working copy) @@ -173,16 +173,6 @@ (RS6000_ALIGN (crtl->outgoing_args_size, 16) \ + (STACK_POINTER_OFFSET)) -/* Define cutoff for using out-of-line functions to save registers. - Currently on Darwin, we implement FP and GPR out-of-line-saves plus the - special routine for 'save everything'. */ - -#undef FP_SAVE_INLINE -#define FP_SAVE_INLINE(FIRST_REG) ((FIRST_REG) > 60 && (FIRST_REG) < 64) - -#undef GP_SAVE_INLINE -#define GP_SAVE_INLINE(FIRST_REG) ((FIRST_REG) > 29 && (FIRST_REG) < 32) - /* Darwin uses a function call if everything needs to be saved/restored. */ #undef WORLD_SAVE_P Index: gcc/config/rs6000/sysv4.h === --- gcc/config/rs6000/sysv4.h (revision 187699) +++ gcc/config/rs6000/sysv4.h (working copy) @@ -243,19 +243,6 @@ #defineBYTES_BIG_ENDIAN (TARGET_BIG_ENDIAN) #defineWORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN) -/* Define cutoff for using external functions to save floating point. - When optimizing for size, use external functions when profitable. */ -#define FP_SAVE_INLINE(FIRST_REG) ((FIRST_REG) == 62 \ - || (FIRST_REG) == 63 \ - || !optimize_size) - -/* And similarly for general purpose registers. */ -#define GP_SAVE_INLINE(FIRST_REG) (!optimize_size) - -/* And vector registers. */ -#define V_SAVE_INLINE(FIRST_REG) ((FIRST_REG) == LAST_ALTIVEC_REGNO\ - || !optimize_size) - /* Put jump tables in read-only memory, rather than in .text. */ #define JUMP_TABLES_IN_TEXT_SECTION 0 Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 187699) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -17443,10 +17294,6 @@ REST_INLINE_VRS = 0x200 }; -#ifndef V_SAVE_INLINE -#define V_SAVE_INLINE(FIRST_REG) 1 -#endif - static int rs6000_savres_strategy (rs6000_stack_t *info, bool using_static_chain_p) @@ -17468,7 +17315,6 @@ | SAVE_INLINE_VRS | REST_INLINE_VRS); if (info->first_fp_reg_save == 64 - || FP_SAVE_INLINE (info->first_fp_reg_save) /* The out-of-line FP routines use double-precision stores; we can't use those routines if we don't have such stores. */ || (TARGET_HARD_F
Re: PATCH: PR target/53425: No warnings are given for -mno-sse
On Sun, May 20, 2012 at 11:53 PM, H.J. Lu wrote: >>> We should warn passing SSE vector argument without SSE enabled changes >>> the ABI for 64-bit. Tested on Linux/x86-64. OK to install? >>> @@ -5828,7 +5833,22 @@ type_natural_mode (const_tree type, const >>> CUMULATIVE_ARGS *cum) >>> return TYPE_MODE (type); >>> } >>> else >>> - return mode; >>> + { >> >> No need for these outermost braces. > > It is needed to avoid > > /export/gnu/import/git/gcc/gcc/config/i386/i386.c: In function > \u2018type_natural_mode\u2019: > /export/gnu/import/git/gcc/gcc/config/i386/i386.c:5813:9: warning: > suggest explicit braces to avoid ambiguous \u2018else\u2019 > [-Wparentheses] Sure, you need else if ( ... ) { ... } else return mode; there. >> BTW: Can you please also add MMX warning for -mno-mmx to be consistent >> with 32bit targets? > > 64-bit passes 8-byte vector in SSE register, not MMX register. > I updated the patch to want 8-byte vector. Is this patch OK? > > Thanks. > > -- > H.J. > > gcc/ > > 2012-05-20 H.J. Lu > > PR target/53425 > * config/i386/i386.c (type_natural_mode): Warn passing SSE > vector argument without SSE enabled changes the ABI. > > gcc/testsuite/ > > 2012-05-20 H.J. Lu > > PR target/53425 > * gcc.target/i386/pr53425-1.c: New file. > * gcc.target/i386/pr53425-2.c: Likewise. OK with the above change. Thanks, Uros.
Re: PATCH: PR target/53416: Wrong code when optimising loop involving _rdrand32_step
On Mon, May 21, 2012 at 12:56 AM, H.J. Lu wrote: >>> #include >>> >>> int >>> main(int argc, char **argv) >>> { >>> unsigned int number = 0; >>> int result0, result1, result2, result3; >>> >>> result0 = __builtin_ia32_rdrand32_step (&number); >>> result1 = __builtin_ia32_rdrand32_step (&number); >>> result2 = __builtin_ia32_rdrand32_step (&number); >>> result3 = __builtin_ia32_rdrand32_step (&number); >>> printf("%d: %d\n", result0, number); >>> printf("%d: %d\n", result1, number); >>> printf("%d: %d\n", result2, number); >>> printf("%d: %d\n", result3, number); >>> return 0; >>> } >>> >> >> int test (void) >> { >> unsigned int number = 0; >> int result0, result1, result2, result3; >> >> result0 = __builtin_ia32_rdrand32_step (&number); >> result1 = __builtin_ia32_rdrand32_step (&number); >> result2 = __builtin_ia32_rdrand32_step (&number); >> result3 = __builtin_ia32_rdrand32_step (&number); >> >> return result0 + result1 +result2 + result3;; >> } >> >> This is the simplest, and also good test. >> > > Is this patck OK for trunk, 4.7 and 4.6? OK everywhere, without double semicolon in the test. Thanks, Uros.