RE: [PATCH, FT32] initial support
> On Mon, 16 Feb 2015, James Bowman wrote: > > > I have updated the target options. Space-saving is now enabled by > > -Os. There is also a new option -msim to enable building for the > > simulator (the simulator is pending submission to gdb-binutils). > > The documentation in this patch doesn't seem to have been updated for > those changes. > Ping. Also, have attached updated patchset for the current gcc. Thanks. -- James Bowman FTDI Open Source Liaison gcc-ft32.txt.gz Description: gcc-ft32.txt.gz
[PATCH] Backport ubsan fix to 4.9
I'd like to backport the following patch that suppresses bogus ubsan errors. I had to tweak the testcase a bit since 4.9 doesn't know -fno-sanitize-recover. Bootstrapped/regtested on x86_64-linux, ok for 4.9? 2015-03-10 Marek Polacek Backported from mainline 2014-12-04 Marek Polacek PR middle-end/56917 * fold-const.c (fold_unary_loc): Perform the negation in A's type when transforming ~ (A - 1) or ~ (A + -1) to -A. * c-c++-common/ubsan/pr56917.c: New test. --- gcc/fold-const.c +++ gcc/fold-const.c @@ -8324,9 +8324,14 @@ fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0) && integer_onep (TREE_OPERAND (arg0, 1))) || (TREE_CODE (arg0) == PLUS_EXPR && integer_all_onesp (TREE_OPERAND (arg0, 1) - return fold_build1_loc (loc, NEGATE_EXPR, type, - fold_convert_loc (loc, type, - TREE_OPERAND (arg0, 0))); + { + /* Perform the negation in ARG0's type and only then convert +to TYPE as to avoid introducing undefined behavior. */ + tree t = fold_build1_loc (loc, NEGATE_EXPR, + TREE_TYPE (TREE_OPERAND (arg0, 0)), + TREE_OPERAND (arg0, 0)); + return fold_convert_loc (loc, type, t); + } /* Convert ~(X ^ Y) to ~X ^ Y or X ^ ~Y if ~X or ~Y simplify. */ else if (TREE_CODE (arg0) == BIT_XOR_EXPR && (tem = fold_unary_loc (loc, BIT_NOT_EXPR, type, --- gcc/testsuite/c-c++-common/ubsan/pr56917.c +++ gcc/testsuite/c-c++-common/ubsan/pr56917.c @@ -0,0 +1,43 @@ +/* PR middle-end/56917 */ +/* { dg-do run } */ +/* { dg-options "-fsanitize=undefined" } */ + +#include + +#define INT_MIN (-__INT_MAX__ - 1) +#define LONG_MIN (-__LONG_MAX__ - 1L) +#define LLONG_MIN (-__LONG_LONG_MAX__ - 1LL) + +int __attribute__ ((noinline,noclone)) +fn1 (unsigned int u) +{ + return (-(int) (u - 1U)) - 1; +} + +long __attribute__ ((noinline,noclone)) +fn2 (unsigned long int ul) +{ + return (-(long) (ul - 1UL)) - 1L; +} + +long long __attribute__ ((noinline,noclone)) +fn3 (unsigned long long int ull) +{ + return (-(long long) (ull - 1ULL)) - 1LL; +} + +int +main (void) +{ + fputs ("UBSAN TEST START\n", stderr); + + if (fn1 (__INT_MAX__ + 1U) != INT_MIN + || fn2 (__LONG_MAX__ + 1UL) != LONG_MIN + || fn3 (__LONG_LONG_MAX__ + 1ULL) != LLONG_MIN) +__builtin_abort (); + + fputs ("UBSAN TEST END\n", stderr); + return 0; +} + +/* { dg-output "UBSAN TEST START(\n|\r\n|\r)UBSAN TEST END" } */ Marek
Re: [PATCH/AARCH64] Add missing definition of crypto instruction on cortex-a57.md
On 11/03/2015 02:11, 박준모 wrote: > Hi all, > > This patch only affect sha2 crypto instruction's order when gcc > performs instruction scheduling(rtl-sched1,2). > > There are no definition for crypto_sha256_fast, crypto_sha256_slow on > "cortex-a57.md". > > This makes poor result of instruction scheduling when we use sha2 crypto > instructions. > > This idea already applied on "cortex-a53.md". so I think it can apply on > GCC5(even we only accepts regression fixes.). > > Is this ok? The approach makes sense - however please resubmit the patch with a Changelog entry and a proper plain text email so that the patch is archived on the GCC mailing lists. HTML email is bounced from the lists, please only use plain text when submitting patches or writing emails to the GCC mailing lists. regards Ramana > Thanks, > > Junmo Park. >
Re: [PATCH, PR target/65103, 1/3] Fix cost of PIC register in ix86_address_cost
Hello! > > > Test O2 ref patchedOfast + LTO ref patched > > > 164.gzip12 0 (-100%)39 0 (-100%) > > > 175.vpr 0 0 (-0%) 4 0 (-100%) > > > 176.gcc 141 6 (-96%) 294 10 (-97%) > > > 181.mcf 4 0 (-100%)4 2 (-50%) Do you also have executable sizes at hand? > 2015-03-10 Ilya Enkovich > > PR target/65103 > * config/i386/i386.c (ix86_address_cost): Fix cost of a PIC > register. > > gcc/testsuite/ > > 2015-03-10 Ilya Enkovich > > PR target/65103 > * gcc.target/i386/pr65103-1.c: New. LGTM, just a nit below. Otherwise, OK for mainline as a bugfix (but please wait for a day if there are any objections from release managers). + /* Attempt to minimize number of registers in the address. This is now a displaced comment. Please integrate it in the main comment. Thanks, Uros.
Re: [C++ Patch] PR 65370
On 03/10/2015 01:03 PM, Paolo Carlini wrote: Good question, but we don't have this issue, because for that we emit anyway: 65370.C:11:36: error: default argument specified in explicit specialization [-fpermissive] C::C(const C&, bool = false); nothing changes about that kind of testcase, usual behavior. Ah. So here we can ignore any template instantiation or specialization, with a comment that check_explicit_specialization will handle them. But I suspect that checking the decl itself will be better; I would expect checking the context to lead you to accept template<> class C { template C(const C&, bool); }; template C::C(const C&, bool = false); Since here C is a specialization of C, but the constructor is not itself a partial instantiation. Jason
Re: Fwd: [PATCH]Remve xfail for wrapped target from libstdc++-v3/testsuite/27_io/ios_base/sync_with_stdio/1.cc
On 05/02/15 11:28 +, Renlin Li wrote: Hi all, This patch simply remove the target selector. It should pass for all target which applies. The comment in the code is not correct. stderr is redirected, not the stdout. Therefore, the return status which is streamed into stdout should properly captured even by wrapped target. The history of this test is curious. Paolo changed the redirect to fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14866 but then a year later Mark added the support for the "unwrapped" effective target to XFAIL this test, adding the incorrect comment ... even though presumably it wasn't actually failing after Paolo's fix! Maybe Mark was merging something from a CodeSourcery branch where the test still failed. The "unwrapped" target is used elsewhere in gcc/testsuite so it's still useful even if we remove it from this libstdc++ test. Okay for trunk? OK, thanks. libstdc++-v3/ChangeLog: 2015-02-03 Renlin Li * testsuite/27_io/ios_base/sync_with_stdio/1.cc: Remve xfail for wrapped target. diff --git a/libstdc++-v3/testsuite/27_io/ios_base/sync_with_stdio/1.cc b/libstdc++-v3/testsuite/27_io/ios_base/sync_with_stdio/1.cc index 6edaef3..1c9fa60 100644 --- a/libstdc++-v3/testsuite/27_io/ios_base/sync_with_stdio/1.cc +++ b/libstdc++-v3/testsuite/27_io/ios_base/sync_with_stdio/1.cc @@ -23,12 +23,6 @@ // @require@ %-*.tst // @diff@ %-*.tst %-*.txt -// This test fails on platforms using a wrapper, because this test -// redirects stdout to a file and so the exit status printed by the -// wrapper is not visibile to DejaGNU. DejaGNU then assumes that the -// test exited with a non-zero exit status. -// { dg-do run { xfail { ! unwrapped } } } - #include #include #include
[patch] disable libmpx x32 multilib builds
current trunk fails to build on x86*-linux, when configured for x32 multilibs because libmpx doesn't support these. Disable them. ok for the trunk? * Disable libmpx x32 multilib builds. --- a/config-ml.in +++ b/config-ml.in @@ -102,6 +102,7 @@ Makefile=${ac_file-Makefile} ml_config_shell=${CONFIG_SHELL-/bin/sh} ml_realsrcdir=${srcdir} +ml_srcbase=`basename $ml_realsrcdir` # Scan all the arguments and set all the ones we need. @@ -220,6 +221,10 @@ if [ "${dir}" = "." ]; then true else +# libmpx is not supported on x32 +if [ "${ml_srcbase}-${dir}" = libmpx-x32 ]; then + continue +fi if [ -z "${multidirs}" ]; then multidirs="${dir}" else
[PATCH, TSAN] Fix a crash in ScopedReport::AddThread
Hi Jakub, with my OPC UA Server, I observe a reproducible crash in ScopedReport::AddThread: tctx==NULL in "if ((u32)rep_->threads[i]->id == tctx->tid)". Apparently, Dmitry has already fixed that in the obvious way. So we should cherry pick these two changes from LLVM: 224508 and 224755 See attachment. Builds cleanly and fixes the problem for me. OK for trunk? Thanks Bernd. patch-tsan-crash.diff Description: Binary data
Re: [PATCH] PR target/65242, Fix powerpc abort in gen_add2_insn
On 03/11/15 08:44, David Edelsohn wrote: On Mon, Mar 9, 2015 at 7:30 PM, Michael Meissner wrote: This bug was one I unfortunately introduced with the -mupper-regs support. If the reload pass needed to reload a PLUS operation (for example, due to using odd address with the LD/STD instructions), it would go through all of the registers you could load DImode into, and see if it is a preferred register class. This lead the compiler to believe it could do integer arithmetic in the floating point registers. This patch fixes the problem, by not allowing PLUS to be reloaded into FPR registers. I have done bootstraps and make checks on both a big endian Power7 and a little endian Power8 system, and there were no regressions. Is the patch ok to apply? I do not believe it needs to be back ported to GCC 4.9 since the -mupper-regs changes are not installed currently on that branch. [gcc] 2015-03-09 Michael Meissner PR target/65242 * config/rs6000/rs6000.c (rs6000_preferred_reload_class): Do not allow reloads of PLUS in floating point/VSX registers. [gcc/testsuite] 2015-03-09 Michael Meissner PR target/65242 * g++.dg/pr65242.C: New test. This is okay. What about Jeff Law's Bugzilla comment #6 to change ?m to !m in the movdi_internal64 pattern? That also seems reasonable. It doesn't matter much to me either way as long as it gets fixed :-) Avoiding floating point registers via preferred reload class is a valid approach. My only concern then would be cases where we have similar looking arithmetic and even though we no longer prefer the FP classes, we still end up selecting that problematical alternative -- say perhaps because the pseudos in question have many other uses where FP regs make sense. I know we could get into those kind of situations on the PA because of the weird way in which integer multiplies were implemented (FP unit, using FP regs) -- which could occur even when using '?' to disparage those alternatives. I'm not familiar enough with PPC implementations to know if we can get into that same situation with that port. Jeff
Re: [PATCH/AARCH64] Add missing definition of crypto instruction on cortex-a57.md
Attached patch as text. 2015-03-11 Junmo Park * config/arm/cortex-a57.md (cortex_a57_crypto_simple): Add crypto_sha256_fast. (cortex_a57_crypto_complex): Add crypto_sha256_slow. Ok to commit to trunk? OK, Thanks Sebastian. regards Ramana Thanks, Sebastian
Re: [PATCH/AARCH64] Add missing definition of crypto instruction on cortex-a57.md
James Greenhalgh wrote: > On Wed, Mar 11, 2015 at 04:24:07PM +, Ramana Radhakrishnan wrote: > > > > > > > > Attached patch as text. > > > > > > 2015-03-11 Junmo Park > > > > > > * config/arm/cortex-a57.md (cortex_a57_crypto_simple): Add > > > crypto_sha256_fast. > > > (cortex_a57_crypto_complex): Add crypto_sha256_slow. > > > > > > Ok to commit to trunk? > > > > > > > > OK, Thanks Sebastian. > > As far as I can see, this patch still hasn't made it to gcc-patches. > Could you please send a copy (or a commit revision number), for those > of us interested? Committed r221349. Sebastian
[CHKP, PATCH] Fix instrumented indirect calls with propagated pointers
Hi, Instrumented function pointer may be propagated into not instrumented indirect call and vice versa. It requires additional call modifications (either remove bounds or change callee). Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2015-03-12 Ilya Enkovich * cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Add redirection for instrumented calls. * tree-chkp.h (chkp_copy_call_skip_bounds): New. (chkp_redirect_edge): New. * tree-chkp.c (chkp_copy_call_skip_bounds): New. (chkp_redirect_edge): New. gcc/testsuite/ 2015-03-12 Ilya Enkovich * gcc.target/i386/mpx/chkp-fix-calls-1.c: New. * gcc.target/i386/mpx/chkp-fix-calls-2.c: New. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 5ca1901..a0b0465 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -1278,14 +1278,25 @@ cgraph_edge::redirect_call_stmt_to_callee (void) { cgraph_edge *e = this; - tree decl = gimple_call_fndecl (e->call_stmt); - tree lhs = gimple_call_lhs (e->call_stmt); + tree decl; + tree lhs; gcall *new_stmt; gimple_stmt_iterator gsi; + bool skip_bounds = false; #ifdef ENABLE_CHECKING cgraph_node *node; #endif + /* We might propagate instrumented function pointer into + not instrumented function and vice versa. In such a + case we need to either fix function declaration or + remove bounds from call statement. */ + if (callee) +skip_bounds = chkp_redirect_edge (e); + + decl = gimple_call_fndecl (e->call_stmt); + lhs = gimple_call_lhs (e->call_stmt); + if (e->speculative) { cgraph_edge *e2; @@ -1391,7 +1402,8 @@ cgraph_edge::redirect_call_stmt_to_callee (void) } if (e->indirect_unknown_callee - || decl == e->callee->decl) + || (decl == e->callee->decl + && !skip_bounds)) return e->call_stmt; #ifdef ENABLE_CHECKING @@ -1416,13 +1428,19 @@ cgraph_edge::redirect_call_stmt_to_callee (void) } } - if (e->callee->clone.combined_args_to_skip) + if (e->callee->clone.combined_args_to_skip + || skip_bounds) { int lp_nr; - new_stmt - = gimple_call_copy_skip_args (e->call_stmt, - e->callee->clone.combined_args_to_skip); + new_stmt = e->call_stmt; + if (e->callee->clone.combined_args_to_skip) + new_stmt + = gimple_call_copy_skip_args (new_stmt, + e->callee->clone.combined_args_to_skip); + if (skip_bounds) + new_stmt = chkp_copy_call_skip_bounds (new_stmt); + gimple_call_set_fndecl (new_stmt, e->callee->decl); gimple_call_set_fntype (new_stmt, gimple_call_fntype (e->call_stmt)); diff --git a/gcc/testsuite/gcc.target/i386/mpx/chkp-fix-calls-1.c b/gcc/testsuite/gcc.target/i386/mpx/chkp-fix-calls-1.c new file mode 100644 index 000..cb4d229 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/mpx/chkp-fix-calls-1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fcheck-pointer-bounds -mmpx" } */ + +#include "math.h" + +double +test1 (double x, double y, double (*fn)(double, double)) +{ + return fn (x, y); +} + +double +test2 (double x, double y) +{ + return test1 (x, y, copysign); +} diff --git a/gcc/testsuite/gcc.target/i386/mpx/chkp-fix-calls-2.c b/gcc/testsuite/gcc.target/i386/mpx/chkp-fix-calls-2.c new file mode 100644 index 000..951e7de --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/mpx/chkp-fix-calls-2.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fcheck-pointer-bounds -mmpx -fno-inline" } */ + +#include "math.h" + +double +test1 (double x, double y, double (*fn)(double, double)) +{ + return fn (x, y); +} + +double +test2 (double x, double y) +{ + return test1 (x, y, copysign); +} diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index d2df4ba..2d2090f 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -500,6 +500,62 @@ chkp_expand_bounds_reset_for_mem (tree mem, tree ptr) expand_normal (bndstx); } +/* Build a GIMPLE_CALL identical to CALL but skipping bounds + arguments. */ + +gcall * +chkp_copy_call_skip_bounds (gcall *call) +{ + bitmap bounds; + unsigned i; + + bitmap_obstack_initialize (NULL); + bounds = BITMAP_ALLOC (NULL); + + for (i = 0; i < gimple_call_num_args (call); i++) +if (POINTER_BOUNDS_P (gimple_call_arg (call, i))) + bitmap_set_bit (bounds, i); + + call = gimple_call_copy_skip_args (call, bounds); + gimple_call_set_with_bounds (call, false); + + BITMAP_FREE (bounds); + bitmap_obstack_release (NULL); + + return call; +} + +/* Redirect edge E to the correct node according to call_stmt. + Return 1 if bounds removal from call_stmt should be done + instead of redirection. */ + +bool +chkp_redirect_edge (cgraph_edge *e) +{ + bool instrumented = false; + + if (e->callee->instrumentation_clone + || chkp_function_instrumented_p (e->callee->decl)) +instrumented = true; +
Re: [RS6000] bswapdi2 pattern, reload and lra
On Wed, Dec 18, 2013 at 09:53:38AM -0500, David Edelsohn wrote: https://gcc.gnu.org/ml/gcc-patches/2013-12/msg01599.html > Why change the code from swapping the words at the initial > change_address() to swapping the words in the call to gen_bswapsi2()? Sorry for dropping this on the floor for so long. I've been prodded back into action by Redhat people and pr63150. I don't recall a compelling technical reason for the change. It was probably to make my life easier in tracking the lifetimes of addr1 and addr2, necessary due to losing one of the scratch registers along with early clobbers. (In the splitter you question, addr1 might be the same register as dest/dest_32.) I suppose it also makes those splitters look a little more like the one for bswapdi2_32bit, so a plus for maintenance. The patch applies with some minor changes (see pr63150) and I've checked for regressions on a current powerpc64le build. OK to apply, and on the branches? -- Alan Modra Australia Development Lab, IBM
[PING^2] [PATCH] [AArch64, NEON] Improve vmulX intrinsics
Hi, This is a ping for: https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00775.html Regtested with aarch64-linux-gnu on QEMU. This patch has no regressions for aarch64_be-linux-gnu big-endian target too. OK for the trunk? Thanks. Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 219845) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,38 @@ +2014-12-11 Felix Yang + Jiji Jiang + + * config/aarch64/aarch64-simd.md (aarch64_mul_n, + aarch64_mull_n, aarch64_mull, + aarch64_simd_mull2_n, aarch64_mull2_n, + aarch64_mull_lane, aarch64_mull2_lane_internal, + aarch64_mull_laneq, aarch64_mull2_laneq_internal, + aarch64_smull2_lane, aarch64_umull2_lane, + aarch64_smull2_laneq, aarch64_umull2_laneq, + aarch64_fmulx, aarch64_fmulx, aarch64_fmulx_lane, + aarch64_pmull2v16qi, aarch64_pmullv8qi): New patterns. + * config/aarch64/aarch64-simd-builtins.def (vec_widen_smult_hi_, + vec_widen_umult_hi_, umull, smull, smull_n, umull_n, mul_n, smull2_n, + umull2_n, smull_lane, umull_lane, smull_laneq, umull_laneq, pmull, + umull2_lane, smull2_laneq, umull2_laneq, fmulx, fmulx_lane, pmull2, + smull2_lane): New builtins. + * config/aarch64/arm_neon.h (vmul_n_f32, vmul_n_s16, vmul_n_s32, + vmul_n_u16, vmul_n_u32, vmulq_n_f32, vmulq_n_f64, vmulq_n_s16, + vmulq_n_s32, vmulq_n_u16, vmulq_n_u32, vmull_high_lane_s16, + vmull_high_lane_s32, vmull_high_lane_u16, vmull_high_lane_u32, + vmull_high_laneq_s16, vmull_high_laneq_s32, vmull_high_laneq_u16, + vmull_high_laneq_u32, vmull_high_n_s16, vmull_high_n_s32, + vmull_high_n_u16, vmull_high_n_u32, vmull_high_p8, vmull_high_s8, + vmull_high_s16, vmull_high_s32, vmull_high_u8, vmull_high_u16, + vmull_high_u32, vmull_lane_s16, vmull_lane_s32, vmull_lane_u16, + vmull_lane_u32, vmull_laneq_s16, vmull_laneq_s32, vmull_laneq_u16, + vmull_laneq_u32, vmull_n_s16, vmull_n_s32, vmull_n_u16, vmull_n_u32, + vmull_p8, vmull_s8, vmull_s16, vmull_s32, vmull_u8, vmull_u16, + vmull_u32, vmulx_f32, vmulx_lane_f32, vmulxd_f64, vmulxq_f32, + vmulxq_f64, vmulxq_lane_f32, vmulxq_lane_f64, vmulxs_f32): Rewrite + using builtin functions. + * config/aarch64/iterators.md (UNSPEC_FMULX, UNSPEC_FMULX_LANE, + VDQF_Q): New unspec and int iterator. + 2015-01-19 Jiong Wang Andrew Pinski Index: gcc/config/aarch64/arm_neon.h === --- gcc/config/aarch64/arm_neon.h (revision 219845) +++ gcc/config/aarch64/arm_neon.h (working copy) @@ -7580,671 +7580,6 @@ vmovn_u64 (uint64x2_t a) return result; } -__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) -vmul_n_f32 (float32x2_t a, float32_t b) -{ - float32x2_t result; - __asm__ ("fmul %0.2s,%1.2s,%2.s[0]" - : "=w"(result) - : "w"(a), "w"(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) -vmul_n_s16 (int16x4_t a, int16_t b) -{ - int16x4_t result; - __asm__ ("mul %0.4h,%1.4h,%2.h[0]" - : "=w"(result) - : "w"(a), "x"(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) -vmul_n_s32 (int32x2_t a, int32_t b) -{ - int32x2_t result; - __asm__ ("mul %0.2s,%1.2s,%2.s[0]" - : "=w"(result) - : "w"(a), "w"(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) -vmul_n_u16 (uint16x4_t a, uint16_t b) -{ - uint16x4_t result; - __asm__ ("mul %0.4h,%1.4h,%2.h[0]" - : "=w"(result) - : "w"(a), "x"(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) -vmul_n_u32 (uint32x2_t a, uint32_t b) -{ - uint32x2_t result; - __asm__ ("mul %0.2s,%1.2s,%2.s[0]" - : "=w"(result) - : "w"(a), "w"(b) - : /* No clobbers */); - return result; -} - -#define vmull_high_lane_s16(a, b, c)\ - __extension__ \ -({ \ - int16x4_t b_ = (b); \ - int16x8_t a_ = (a); \ - int32x4_t result;\ - __asm__ ("smull2 %0.4s, %1.8h, %2.h[%3]" \ -: "=w"(result) \ -: "w"(a_), "x"(b_), "i"(c) \ -: /* No clobbers */);
Re: [PATCH, CHKP, PR target/65044] Restrict pointer bounds checker with Sanitizer
2015-03-12 12:02 GMT+03:00 Jakub Jelinek : > On Thu, Mar 12, 2015 at 11:51:51AM +0300, Ilya Enkovich wrote: >> On 09 Mar 15:51, Jakub Jelinek wrote: >> > On Mon, Mar 02, 2015 at 01:25:43PM +0300, Ilya Enkovich wrote: >> > > > --- a/gcc/toplev.c >> > > > +++ b/gcc/toplev.c >> > > > @@ -1376,6 +1376,11 @@ process_options (void) >> > > > { >> > > >if (targetm.chkp_bound_mode () == VOIDmode) >> > > > error ("-fcheck-pointer-bounds is not supported for this >> > > > target"); >> > > > + >> > > > + if (flag_sanitize & SANITIZE_ADDRESS) >> > > > + error ("-fcheck-pointer-bounds is not supported with Address >> > > > Sanitizer"); >> > > > + >> > > > + flag_check_pointer_bounds = 0; >> > > > } >> > >> > Doesn't this disable -fcheck-pointer-bounds always? >> > I'd expect you want to clear flag_check_pointer_bounds only if you issued >> > one of the two errors... >> > >> > Jakub >> >> Whoops! Here is a less destructive version. > > Ok for trunk. Did the old version pass make check? If so, perhaps you want > to add > (incrementally) some test that would actually verify that > -fcheck-pointer-bounds does what it should do (e.g. by scanning tree dumps > etc.). Thanks! I sent previous version before make check. There are several chkp tests which would fail. Ilya > > Jakub
Re: [PATCH, CHKP, PR target/65044] Restrict pointer bounds checker with Sanitizer
On 09 Mar 15:51, Jakub Jelinek wrote: > On Mon, Mar 02, 2015 at 01:25:43PM +0300, Ilya Enkovich wrote: > > > --- a/gcc/toplev.c > > > +++ b/gcc/toplev.c > > > @@ -1376,6 +1376,11 @@ process_options (void) > > > { > > >if (targetm.chkp_bound_mode () == VOIDmode) > > > error ("-fcheck-pointer-bounds is not supported for this target"); > > > + > > > + if (flag_sanitize & SANITIZE_ADDRESS) > > > + error ("-fcheck-pointer-bounds is not supported with Address > > > Sanitizer"); > > > + > > > + flag_check_pointer_bounds = 0; > > > } > > Doesn't this disable -fcheck-pointer-bounds always? > I'd expect you want to clear flag_check_pointer_bounds only if you issued > one of the two errors... > > Jakub Whoops! Here is a less destructive version. Thanks, Ilya -- gcc/ 2015-03-11 Ilya Enkovich PR target/65044 * toplev.c (process_options): Restrict Pointer Bounds Checker usage with Address Sanitizer. gcc/testsuite/ 2015-03-11 Ilya Enkovich PR target/65044 * gcc.target/i386/pr65044.c: New. diff --git a/gcc/testsuite/gcc.target/i386/pr65044.c b/gcc/testsuite/gcc.target/i386/pr65044.c new file mode 100644 index 000..4f318d6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr65044.c @@ -0,0 +1,12 @@ +/* { dg-error "-fcheck-pointer-bounds is not supported with Address Sanitizer" } */ +/* { dg-do compile } */ +/* { dg-require-effective-target mpx } */ +/* { dg-options "-fcheck-pointer-bounds -mmpx -fsanitize=address" } */ + +extern int x[]; + +void +foo () +{ + x[0] = 0; +} diff --git a/gcc/toplev.c b/gcc/toplev.c index 99cf180..b06eed3 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1375,7 +1375,17 @@ process_options (void) if (flag_check_pointer_bounds) { if (targetm.chkp_bound_mode () == VOIDmode) - error ("-fcheck-pointer-bounds is not supported for this target"); + { + error ("-fcheck-pointer-bounds is not supported for this target"); + flag_check_pointer_bounds = 0; + } + + if (flag_sanitize & SANITIZE_ADDRESS) + { + error ("-fcheck-pointer-bounds is not supported with " +"Address Sanitizer"); + flag_check_pointer_bounds = 0; + } } /* One region RA really helps to decrease the code size. */
Re: [PATCH] Fix PR44563 more
On Tue, 10 Mar 2015, Richard Biener wrote: > > CFG cleanup currently searches for calls that became noreturn and > fixes them up (splitting block and removing the fallthru). Previously > that was technically necessary as propagation may have turned an > indirect call into a direct noreturn call and the CFG verifier would > have barfed. Today we guard that with GF_CALL_CTRL_ALTERING and > thus we "remember" the previous call analysis. > > The following patch removes the CFG cleanup code (which is expensive > because gimple_call_flags () is quite expensive, not to talk about > walking all stmts). This leaves the fixup_cfg passes to perform the > very same optimization (relevant propagators can also be teached > to call fixup_noreturn_call, but I don't think that's very important). > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. > > I'm somewhat undecided whether this is ok at this stage and if we > _do_ want to make propagators fix those (previously indirect) calls up > earlier at the same time. > > Honza - I think we performed this in CFG cleanup for the sake of CFG > checking, not for the sake of prompt optimization, no? > > This would make PR44563 a pure IPA pass issue. Soo - testing revealed a single case where we mess up things (and the verifier noticing only because of a LHS on a noreturn call...). The following patch makes all propagators handle the noreturn transition (the paths in all but PRE are not exercised by bootstrap or testsuite :/). This patch makes CFG cleanup independent on BB size (during analysis, merge_blocks and delete_basic_block are still O(n)) - which is a very much desired property. It also changes fixup_cfg to produce a dump only when run as separate pass (otherwise the .optimized dump changes and I get tons of scan related fails) - that also reduces noise in the very many places we dump functions (they are dumped anyway for all cases). Bootstrap and regtest running on x86_64-unknown-linux-gnu. I wonder if you can throw this on firefox/chromium - the critical paths are devirtualization introducing __builtin_unreachable. This patch should get a good speedup on all compiles (we run CFG-cleanup a _lot_), by removing pointless IL walks and expensive gimple_call_flags calls on calls. Thanks, Richard. 2015-03-10 Richard Biener PR middle-end/44563 * tree-cfgcleanup.c (split_bb_on_noreturn_calls): Remove. (cleanup_tree_cfg_1): Do not call it. (execute_cleanup_cfg_post_optimizing): Fixup the CFG here. (fixup_noreturn_call): Mark the stmt as control altering. * tree-cfg.c (execute_fixup_cfg): Do not dump the function here. (pass_data_fixup_cfg): Produce a dump file. * tree-ssa-dom.c: Include tree-cfgcleanup.h. (need_noreturn_fixup): New global. (pass_dominator::execute): Fixup queued noreturn calls. (optimize_stmt): Queue calls that became noreturn for fixup. * tree-ssa-forwprop.c (pass_forwprop::execute): Likewise. * tree-ssa-pre.c: Include tree-cfgcleanup.h. (el_to_fixup): New global. (eliminate_dom_walker::before_dom_childre): Queue calls that became noreturn for fixup. (eliminate): Fixup queued noreturn calls. * tree-ssa-propagate.c: Include tree-cfgcleanup.h. (substitute_and_fold_dom_walker): New member stmts_to_fixup. (substitute_and_fold_dom_walker::before_dom_children): Queue alls that became noreturn for fixup. (substitute_and_fold): Fixup queued noreturn calls. Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c (revision 221379) +++ gcc/tree-cfg.c (working copy) @@ -8721,10 +8721,6 @@ execute_fixup_cfg (void) if (count_scale != REG_BR_PROB_BASE) compute_function_frequency (); - /* Dump a textual representation of the flowgraph. */ - if (dump_file) -gimple_dump_cfg (dump_file, dump_flags); - if (current_loops && (todo & TODO_cleanup_cfg)) loops_state_set (LOOPS_NEED_FIXUP); @@ -8737,7 +8733,7 @@ namespace { const pass_data pass_data_fixup_cfg = { GIMPLE_PASS, /* type */ - "*free_cfg_annotations", /* name */ + "fixup_cfg", /* name */ OPTGROUP_NONE, /* optinfo_flags */ TV_NONE, /* tv_id */ PROP_cfg, /* properties_required */ Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 221379) +++ gcc/tree-cfgcleanup.c (working copy) @@ -625,35 +625,13 @@ fixup_noreturn_call (gimple stmt) update_stmt (stmt); } + /* Mark the call as altering control flow. */ + gimple_call_set_ctrl_altering (stmt, true); + return remove_fallthru_edge (bb->succs); } -/* Split basic blocks on calls in the middle of a basic block that are now - known not to return, and remove the unreachable code. */ - -static bool -split_bb_on_noreturn_calls (basic_block bb)
[wwwdocs] Update 4.9.2 status link from RC announcement to release announcement
This just updates the status link on the homepage from the 4.9.2-rc1 announcement to the final release announcement a week later. Committed to CVS. Index: index.html === RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v retrieving revision 1.955 diff -u -r1.955 index.html --- index.html 6 Feb 2015 22:36:02 - 1.955 +++ index.html 12 Mar 2015 09:37:46 - @@ -188,7 +188,7 @@ Status: - https://gcc.gnu.org/ml/gcc/2014-10/msg00195.html";>2014-10-23 + https://gcc.gnu.org/ml/gcc/2014-10/msg00260.html";>2014-10-30 (regression fixes and docs only).
Re: [PING^2] [PATCH] [AArch64, NEON] Improve vmulX intrinsics
Hi Jiangjiji, This is definitely stage 1 material by now... At my glance it all looks like the right approach, I have a question below: On 12/03/15 09:20, Jiangjiji wrote: + +(define_insn "aarch64_fmulx_lane" + [(set (match_operand:VDQF 0 "register_operand" "=w") +(unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w") + (match_operand: 2 "register_operand" "w") + (match_operand:SI 3 "immediate_operand" "i")] + UNSPEC_FMULX_LANE))] + "TARGET_SIMD" + "fmulx\\t%0., %1., %2." + [(set_attr "type" "neon_mul_s")] +) Where did operand 3 go? Shouldn't his be the lane-element variant of fmulx? Thanks, Kyrill
Re: [PATCH] Speedup gimple_split_block
On Tue, 10 Mar 2015, Richard Biener wrote: > On Tue, 10 Mar 2015, Richard Biener wrote: > > > > > This removes the old vestige loop to find a gsi for a stmt (from times > > where gsi_for_stmt was O(n)). > > > > PR44563 shows gimple_split_block quite high in the profile (this > > patch doesn't fix that) as the tail loop setting BB on all stmts > > moved to the new block shows quadratic behavior when inlining > > N calls in a basic-block. > > > > Bootstrap and regtest scheduled on x86_64-unknown-linux-gnu. > > Ok, reveals two errors in my fix and two oddities in omp-low.c - removing > a stmt and then splitting its basic-block after it. > > Hopefully the following will finish bootstrap & regtest ok. Not. But the following did. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2015-03-12 Richard Biener * tree-cfg.c (gimple_split_block): Remove loop finding stmt to split on. * omp-low.c (expand_omp_taskreg): Split block before removing the stmt. (expand_omp_target): Likewise. * ubsan.c (ubsan_expand_null_ifn): Adjust stmt if we replaced it. * tree-parloops.c (create_call_for_reduction_1): Pass a proper stmt to split_block. Index: gcc/tree-cfg.c === *** gcc/tree-cfg.c (revision 221324) --- gcc/tree-cfg.c (working copy) *** gimple_split_block (basic_block bb, void *** 5683,5689 { gimple_stmt_iterator gsi; gimple_stmt_iterator gsi_tgt; - gimple act; gimple_seq list; basic_block new_bb; edge e; --- 5683,5688 *** gimple_split_block (basic_block bb, void *** 5697,5722 FOR_EACH_EDGE (e, ei, new_bb->succs) e->src = new_bb; ! if (stmt && gimple_code ((gimple) stmt) == GIMPLE_LABEL) ! stmt = NULL; ! ! /* Move everything from GSI to the new basic block. */ ! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { ! act = gsi_stmt (gsi); ! if (gimple_code (act) == GIMPLE_LABEL) ! continue; ! ! if (!stmt) ! break; ! ! if (stmt == act) ! { ! gsi_next (&gsi); ! break; ! } } ! if (gsi_end_p (gsi)) return new_bb; --- 5696,5711 FOR_EACH_EDGE (e, ei, new_bb->succs) e->src = new_bb; ! /* Get a stmt iterator pointing to the first stmt to move. */ ! if (!stmt || gimple_code ((gimple) stmt) == GIMPLE_LABEL) ! gsi = gsi_after_labels (bb); ! else { ! gsi = gsi_for_stmt ((gimple) stmt); ! gsi_next (&gsi); } ! ! /* Move everything from GSI to the new basic block. */ if (gsi_end_p (gsi)) return new_bb; Index: gcc/omp-low.c === *** gcc/omp-low.c (revision 221324) --- gcc/omp-low.c (working copy) *** expand_omp_taskreg (struct omp_region *r *** 5514,5521 stmt = gsi_stmt (gsi); gcc_assert (stmt && (gimple_code (stmt) == GIMPLE_OMP_PARALLEL || gimple_code (stmt) == GIMPLE_OMP_TASK)); - gsi_remove (&gsi, true); e = split_block (entry_bb, stmt); entry_bb = e->dest; single_succ_edge (entry_bb)->flags = EDGE_FALLTHRU; --- 5514,5521 stmt = gsi_stmt (gsi); gcc_assert (stmt && (gimple_code (stmt) == GIMPLE_OMP_PARALLEL || gimple_code (stmt) == GIMPLE_OMP_TASK)); e = split_block (entry_bb, stmt); + gsi_remove (&gsi, true); entry_bb = e->dest; single_succ_edge (entry_bb)->flags = EDGE_FALLTHRU; *** expand_omp_target (struct omp_region *re *** 8889,8896 stmt = gsi_stmt (gsi); gcc_assert (stmt && gimple_code (stmt) == gimple_code (entry_stmt)); - gsi_remove (&gsi, true); e = split_block (entry_bb, stmt); entry_bb = e->dest; single_succ_edge (entry_bb)->flags = EDGE_FALLTHRU; --- 8889,8896 stmt = gsi_stmt (gsi); gcc_assert (stmt && gimple_code (stmt) == gimple_code (entry_stmt)); e = split_block (entry_bb, stmt); + gsi_remove (&gsi, true); entry_bb = e->dest; single_succ_edge (entry_bb)->flags = EDGE_FALLTHRU; Index: gcc/ubsan.c === *** gcc/ubsan.c (revision 221324) --- gcc/ubsan.c (working copy) *** ubsan_expand_null_ifn (gimple_stmt_itera *** 864,869 --- 864,870 /* Replace the UBSAN_NULL with a GIMPLE_COND stmt. */ gsi_replace (&gsi, g, false); + stmt = g; } if (check_align) Index: gcc/tree-parloops.c === *** gcc/tree-parloops.c (revision 221324) --- gcc/tree-parloops.c (working copy) *** create_call_for_reduction_1 (r
Re: [PATCH, CHKP, PR target/65044] Restrict pointer bounds checker with Sanitizer
On Thu, Mar 12, 2015 at 11:51:51AM +0300, Ilya Enkovich wrote: > On 09 Mar 15:51, Jakub Jelinek wrote: > > On Mon, Mar 02, 2015 at 01:25:43PM +0300, Ilya Enkovich wrote: > > > > --- a/gcc/toplev.c > > > > +++ b/gcc/toplev.c > > > > @@ -1376,6 +1376,11 @@ process_options (void) > > > > { > > > >if (targetm.chkp_bound_mode () == VOIDmode) > > > > error ("-fcheck-pointer-bounds is not supported for this > > > > target"); > > > > + > > > > + if (flag_sanitize & SANITIZE_ADDRESS) > > > > + error ("-fcheck-pointer-bounds is not supported with Address > > > > Sanitizer"); > > > > + > > > > + flag_check_pointer_bounds = 0; > > > > } > > > > Doesn't this disable -fcheck-pointer-bounds always? > > I'd expect you want to clear flag_check_pointer_bounds only if you issued > > one of the two errors... > > > > Jakub > > Whoops! Here is a less destructive version. Ok for trunk. Did the old version pass make check? If so, perhaps you want to add (incrementally) some test that would actually verify that -fcheck-pointer-bounds does what it should do (e.g. by scanning tree dumps etc.). Jakub
[PING] [PATCH, AArch64] [4.8] [4.9] Backport PR64304 fix (miscompilation with -mgeneral-regs-only )
This backports the fixes for PR target/64304 , miscompilation with -mgeneral-regs-only, to the 4.8 & 4.9 branch from trunk r219844. Tested on x86_64 by using qemu of aarch64. OK for 4.8 & 4.9 ? ---gcc-4.8--- diff -rupN gcc-4.8-20150226/gcc/ChangeLog gcc-4.8-20150226.pr64304//gcc/ChangeLog --- gcc-4.8-20150226/gcc/ChangeLog2015-03-04 21:13:46.0 -0500 +++ gcc-4.8-20150226.pr64304//gcc/ChangeLog2015-03-04 21:19:49.0 -0500 @@ -1,3 +1,13 @@ +2015-03-05 Shanyao Chen + +Backported from mainline +2015-01-19 Jiong Wang +Andrew Pinski + +PR target/64304 +* config/aarch64/aarch64.md (define_insn "*ashl3_insn"): Deleted. +(ashl3): Don't expand if operands[2] is not constant. + 2015-02-26 Peter Bergner Backport from mainline diff -rupN gcc-4.8-20150226/gcc/config/aarch64/aarch64.md gcc-4.8-20150226.pr64304//gcc/config/aarch64/aarch64.md --- gcc-4.8-20150226/gcc/config/aarch64/aarch64.md2015-03-04 21:14:29.0 -0500 +++ gcc-4.8-20150226.pr64304//gcc/config/aarch64/aarch64.md 2015-03-04 21:21:54.0 -0500 @@ -2612,6 +2612,8 @@ DONE; } } +else + FAIL; } ) @@ -2681,16 +2683,6 @@ (set_attr "mode" "SI")] ) -(define_insn "*ashl3_insn" - [(set (match_operand:SHORT 0 "register_operand" "=r") -(ashift:SHORT (match_operand:SHORT 1 "register_operand" "r") - (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "rUss")))] - "" - "lsl\\t%0, %1, %2" - [(set_attr "v8type" "shift") - (set_attr "mode" "")] -) - (define_insn "*3_insn" [(set (match_operand:SHORT 0 "register_operand" "=r") (ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r") diff -rupN gcc-4.8-20150226/gcc/testsuite/ChangeLog gcc-4.8-20150226.pr64304//gcc/testsuite/ChangeLog --- gcc-4.8-20150226/gcc/testsuite/ChangeLog2015-03-04 21:16:54.0 -0500 +++ gcc-4.8-20150226.pr64304//gcc/testsuite/ChangeLog2015-03-04 21:22:58.0 -0500 @@ -1,3 +1,10 @@ +2015-03-05 Shanyao chen + +Backported from mainline +2015-01-19 Jiong Wang + +* gcc.target/aarch64/pr64304.c: New testcase. + 2015-02-26 Peter Bergner Backport from mainline diff -rupN gcc-4.8-20150226/gcc/testsuite/gcc.target/aarch64/pr64304.c gcc-4.8-20150226.pr64304//gcc/testsuite/gcc.target/aarch64/pr64304.c --- gcc-4.8-20150226/gcc/testsuite/gcc.target/aarch64/pr64304.c 1969-12-31 19:00:00.0 -0500 +++ gcc-4.8-20150226.pr64304//gcc/testsuite/gcc.target/aarch64/pr64304.c 2015-03-04 21:12:15.0 -0500 @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 --save-temps" } */ + +unsigned char byte = 0; + +void +set_bit (unsigned int bit, unsigned char value) +{ + unsigned char mask = (unsigned char) (1 << (bit & 7)); + + if (! value) +byte &= (unsigned char)~mask; + else +byte |= mask; +/* { dg-final { scan-assembler "and\tw\[0-9\]+, w\[0-9\]+, 7" } } */ +} + +/* { dg-final { cleanup-saved-temps } } */ + ---gcc-4.9--- diff -rupN gcc-4.9-20150225/gcc/ChangeLog gcc-4.9-20150225.pr64304//gcc/ChangeLog --- gcc-4.9-20150225/gcc/ChangeLog2015-03-04 20:48:30.0 -0500 +++ gcc-4.9-20150225.pr64304//gcc/ChangeLog2015-03-04 20:55:59.0 -0500 @@ -1,3 +1,13 @@ +2015-03-05 Shanyao Chen + +Backported from mainline +2015-01-19 Jiong Wang +Andrew Pinski + +PR target/64304 +* config/aarch64/aarch64.md (define_insn "*ashl3_insn"): Deleted. +(ashl3): Don't expand if operands[2] is not constant. + 2015-02-25 Kai Tietz PR tree-optimization/61917 diff -rupN gcc-4.9-20150225/gcc/config/aarch64/aarch64.md gcc-4.9-20150225.pr64304//gcc/config/aarch64/aarch64.md --- gcc-4.9-20150225/gcc/config/aarch64/aarch64.md2015-03-04 20:41:03.0 -0500 +++ gcc-4.9-20150225.pr64304//gcc/config/aarch64/aarch64.md 2015-03-04 20:46:44.0 -0500 @@ -2719,6 +2719,8 @@ DONE; } } +else + FAIL; } ) @@ -2947,15 +2949,6 @@ [(set_attr "type" "shift_reg")] ) -(define_insn "*ashl3_insn" - [(set (match_operand:SHORT 0 "register_operand" "=r") -(ashift:SHORT (match_operand:SHORT 1 "register_operand" "r") - (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "rUss")))] - "" - "lsl\\t%0, %1, %2" - [(set_attr "type" "shift_reg")] -) - (define_insn "*3_insn" [(set (match_operand:SHORT 0 "register_operand" "=r") (ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r") diff -rupN gcc-4.9-20150225/gcc/testsuite/ChangeLog gcc-4.9-20150225.pr64304//gcc/testsuite/ChangeLog --- gcc-4.9-20150225/gcc/testsuite/ChangeLog2015-03-04 21:00:24.0 -0500 +++ gcc-4.9-20150225.pr64304//gcc/testsuite/ChangeLog2015-03-04 21:03:21.0 -0500 @@ -1,3 +1,10 @@ +2015-03-05 Shanyao chen + +Back
[PING][PATCH] ASan on unaligned accesses
On 03/04/2015 11:00 AM, Marat Zakirov wrote: Hi all! Here is the patch which forces ASan to work on memory access without proper alignment. it's useful because some programs like linux kernel often cheat with alignment which may cause false negatives. This patch needs additional support for proper work on unaligned accesses in global data and heap. It will be implemented in libsanitizer by separate patch. --Marat gcc/ChangeLog: 2015-02-25 Marat Zakirov * asan.c (asan_emit_stack_protection): Support for misalign accesses. (asan_expand_check_ifn): Likewise. * params.def: New option asan-catch-misaligned. * params.h: New param ASAN_CATCH_MISALIGNED. gcc/testsuite/ChangeLog: 2015-02-25 Marat Zakirov * c-c++-common/asan/misalign-catch.c: New test. diff --git a/gcc/asan.c b/gcc/asan.c index b7c2b11..49d0da4 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -1050,7 +1050,6 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, rtx_code_label *lab; rtx_insn *insns; char buf[30]; - unsigned char shadow_bytes[4]; HOST_WIDE_INT base_offset = offsets[length - 1]; HOST_WIDE_INT base_align_bias = 0, offset, prev_offset; HOST_WIDE_INT asan_frame_size = offsets[0] - base_offset; @@ -1059,6 +1058,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, unsigned char cur_shadow_byte = ASAN_STACK_MAGIC_LEFT; tree str_cst, decl, id; int use_after_return_class = -1; + bool misalign = (flag_sanitize & SANITIZE_KERNEL_ADDRESS) || ASAN_CATCH_MISALIGNED; if (shadow_ptr_types[0] == NULL_TREE) asan_init_shadow_ptr_types (); @@ -1193,11 +1193,37 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, if (STRICT_ALIGNMENT) set_mem_align (shadow_mem, (GET_MODE_ALIGNMENT (SImode))); prev_offset = base_offset; + + vec shadow_mems; + vec shadow_bytes; + + shadow_mems.create(0); + shadow_bytes.create(0); + for (l = length; l; l -= 2) { if (l == 2) cur_shadow_byte = ASAN_STACK_MAGIC_RIGHT; offset = offsets[l - 1]; + if (l != length && misalign) + { + HOST_WIDE_INT aoff + = base_offset + ((offset - base_offset) + & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1)) + - ASAN_RED_ZONE_SIZE; + if (aoff > prev_offset) + { + shadow_mem = adjust_address (shadow_mem, VOIDmode, + (aoff - prev_offset) + >> ASAN_SHADOW_SHIFT); + prev_offset = aoff; + shadow_bytes.safe_push (0); + shadow_bytes.safe_push (0); + shadow_bytes.safe_push (0); + shadow_bytes.safe_push (0); + shadow_mems.safe_push (shadow_mem); + } + } if ((offset - base_offset) & (ASAN_RED_ZONE_SIZE - 1)) { int i; @@ -1212,13 +1238,13 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, if (aoff < offset) { if (aoff < offset - (1 << ASAN_SHADOW_SHIFT) + 1) - shadow_bytes[i] = 0; + shadow_bytes.safe_push (0); else - shadow_bytes[i] = offset - aoff; + shadow_bytes.safe_push (offset - aoff); } else - shadow_bytes[i] = ASAN_STACK_MAGIC_PARTIAL; - emit_move_insn (shadow_mem, asan_shadow_cst (shadow_bytes)); + shadow_bytes.safe_push (ASAN_STACK_MAGIC_PARTIAL); + shadow_mems.safe_push(shadow_mem); offset = aoff; } while (offset <= offsets[l - 2] - ASAN_RED_ZONE_SIZE) @@ -1227,12 +1253,21 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, (offset - prev_offset) >> ASAN_SHADOW_SHIFT); prev_offset = offset; - memset (shadow_bytes, cur_shadow_byte, 4); - emit_move_insn (shadow_mem, asan_shadow_cst (shadow_bytes)); + shadow_bytes.safe_push (cur_shadow_byte); + shadow_bytes.safe_push (cur_shadow_byte); + shadow_bytes.safe_push (cur_shadow_byte); + shadow_bytes.safe_push (cur_shadow_byte); + shadow_mems.safe_push(shadow_mem); offset += ASAN_RED_ZONE_SIZE; } cur_shadow_byte = ASAN_STACK_MAGIC_MIDDLE; } + for (unsigned i = 0; misalign && i < shadow_bytes.length () - 1; i++) +if (shadow_bytes[i] == 0 && shadow_bytes[i + 1] > 0) + shadow_bytes[i] = 8 + (shadow_bytes[i + 1] > 7 ? 0 : shadow_bytes[i + 1]); + for (unsigned i = 0; i < shadow_mems.length (); i++) +emit_move_insn (shadow_mems[i], asan_shadow_cst (&shadow_bytes[i * 4])); + do_pending_stack_adjust (); /* Construct epilogue sequence. */ @@ -1285,34 +1320,8 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, if (STRICT_ALIGNMENT) set_mem_align (shadow_mem, (GET_MODE_ALIGNMENT (SImode))); - prev_offset = base_offset; - last_offset = base_offset; - last_size = 0; - for (l = length; l; l -= 2) -{ - offset = base_offset + ((offsets[l - 1] - base_offset) - & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1)); - if (last_offset + last_size != offset) - { - shadow_mem = adjust_address (shadow_mem, VOIDmode, - (last_offset - prev_offset) - >> ASA
Re: [PATCH, i386 testsuite]: Require nonpic target for some tests
On 12-03-15 10:57, Uros Bizjak wrote: On Thu, Mar 12, 2015 at 9:11 AM, Tom de Vries wrote: Attached patch adds nonpic target requirement for some (obvious) cases, where data access or PIC register setup confuses scan-asms. 2015-01-30 Uros Bizjak * gcc.target/i386/fuse-caller-save-rec.c: Require nonpic target. * gcc.target/i386/fuse-caller-save-xmm.c: Ditto. * gcc.target/i386/fuse-caller-save.c: Ditto. Hi, I've reverted this part of the patch. The scans were failing because the -fipa-ra optimization was broken for -m32 -fpic (PR64895). Not really. Allocator is free to allocate %ebx (or other call-saved register) as PIC register. In this case, unwanted push/pop sequence will be emitted. Sure, but I don't see what that has to do with the test-cases. I don't see a pic register used in fuse-caller-save.c and fuse-caller-save-rec.c. I do see a pic register used in gcc.target/i386/fuse-caller-save-xmm.c, but there's no scan for push/pop sequence in there. Thanks, - Tom
[CHKP, PATCH] Fix LTO cgraph merge for instrumented functions
Hi, Currently cgraph merge has several issues with instrumented code: - original function node may be removed => no assembler name conflict is detected between function and variable - only orig_decl name is privatized for instrumented function => node still shares assembler name which causes infinite privatization loop - information about changed name is stored in file_data of instrumented node => original section name may be not found for original function - chkp reference is not fixed when nodes are merged This patch should fix theese problems by keeping instrumentation thunks reachable, privatizing both nodes and fixing chkp references. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2015-03-12 Ilya Enkovich * ipa-chkp.h (chkp_maybe_fix_chkp_ref): New. * ipa-chkp.c (chkp_maybe_fix_chkp_ref): New. * ipa.c (symbol_table::remove_unreachable_nodes): Don't remove instumentation thunks calling reachable functions. * lto-cgraph.c: Include ipa-chkp.h. (input_symtab): Fix chkp references for boundary nodes. * lto/lto-partition.c (privatize_symbol_name_1): New. (privatize_symbol_name): Privatize both decl and orig_decl names for instrumented functions. * lto/lto-symtab.c: Include ipa-chkp.h. (lto_cgraph_replace_node): Fix chkp references for merged function nodes. gcc/testsuite/ 2015-03-12 Ilya Enkovich * gcc.dg/lto/chkp-privatize-1_0.c: New. * gcc.dg/lto/chkp-privatize-1_1.c: New. * gcc.dg/lto/chkp-privatize-2_0.c: New. * gcc.dg/lto/chkp-privatize-2_1.c: New. diff --git a/gcc/ipa-chkp.c b/gcc/ipa-chkp.c index 0b857ff..223f4ed 100644 --- a/gcc/ipa-chkp.c +++ b/gcc/ipa-chkp.c @@ -414,6 +414,36 @@ chkp_instrumentable_p (tree fndecl) && (!fn || !copy_forbidden (fn, fndecl))); } +/* Check NODE has a correct IPA_REF_CHKP reference. + Create a new reference if required. */ + +void +chkp_maybe_fix_chkp_ref (cgraph_node *node) +{ + /* Firstly check node needs IPA_REF_CHKP. */ + if (node->instrumentation_clone + || !node->instrumented_version) +return; + + /* Check we already have a proper IPA_REF_CHKP. + Remove incorrect refs. */ + int i; + ipa_ref *ref = NULL; + for (i = 0; node->iterate_reference (i, ref); i++) +if (ref->use == IPA_REF_CHKP) + { + /* Found proper reference. */ + if (ref->referred == node->instrumented_version) + return; + + /* Need to recreate reference. */ + ref->remove_reference (); + break; + } + + node->create_reference (node->instrumented_version, IPA_REF_CHKP, NULL); +} + /* Return clone created for instrumentation of NODE or NULL. */ cgraph_node * diff --git a/gcc/ipa-chkp.h b/gcc/ipa-chkp.h index 6708fe9..5fa7d88 100644 --- a/gcc/ipa-chkp.h +++ b/gcc/ipa-chkp.h @@ -24,5 +24,6 @@ extern tree chkp_copy_function_type_adding_bounds (tree orig_type); extern tree chkp_maybe_clone_builtin_fndecl (tree fndecl); extern cgraph_node *chkp_maybe_create_clone (tree fndecl); extern bool chkp_instrumentable_p (tree fndecl); +extern void chkp_maybe_fix_chkp_ref (cgraph_node *node); #endif /* GCC_IPA_CHKP_H */ diff --git a/gcc/ipa.c b/gcc/ipa.c index b3752de..ae6269f 100644 --- a/gcc/ipa.c +++ b/gcc/ipa.c @@ -492,7 +492,18 @@ symbol_table::remove_unreachable_nodes (FILE *file) } else if (cnode->thunk.thunk_p) enqueue_node (cnode->callees->callee, &first, &reachable); - + + /* For instrumentation clones we always need original +function node for proper LTO privatization. */ + if (cnode->instrumentation_clone + && reachable.contains (cnode) + && cnode->definition) + { + gcc_assert (cnode->instrumented_version); + enqueue_node (cnode->instrumented_version, &first, &reachable); + reachable.add (cnode->instrumented_version); + } + /* If any reachable function has simd clones, mark them as reachable as well. */ if (cnode->simd_clones) diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index c875fed..b9196eb 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -80,6 +80,7 @@ along with GCC; see the file COPYING3. If not see #include "pass_manager.h" #include "ipa-utils.h" #include "omp-low.h" +#include "ipa-chkp.h" /* True when asm nodes has been output. */ bool asm_nodes_output = false; @@ -1888,6 +1889,10 @@ input_symtab (void) context of the nested function. */ if (node->lto_file_data) node->aux = NULL; + + /* May need to fix chkp reference because we don't stream +them for boundary symbols. */ + chkp_maybe_fix_chkp_ref (node); } } diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index 235b735..7d117e9 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-par
RFA: Update gcc test 20101011-1.c with more targets that do not trap
Hi Guys, The patch below updates the 20101011-1.c test in the gcc testsuite to add a few more targets whose (simulated) runtime does not support trapping on division by zero. OK to apply ? Cheers Nick gcc/testsuite/ChangeLog 2015-03-12 Nick Clifton * gcc.c-torture/execute/20101011-1.c: Skip this test for the V850, MSP430, RL78 and RX targets. Index: gcc/testsuite/gcc.c-torture/execute/20101011-1.c === --- gcc/testsuite/gcc.c-torture/execute/20101011-1.c(revision 221346) +++ gcc/testsuite/gcc.c-torture/execute/20101011-1.c(working copy) @@ -12,6 +12,18 @@ #elif defined (__sh__) /* On SH division by zero does not trap. */ # define DO_TEST 0 +#elif defined (__v850__) + /* On V850 division by zero does not trap. */ +# define DO_TEST 0 +#elif defined (__MSP430__) + /* On MSP430 division by zero does not trap. */ +# define DO_TEST 0 +#elif defined (__RL78__) + /* On RL78 division by zero does not trap. */ +# define DO_TEST 0 +#elif defined (__RX__) + /* On RX division by zero does not trap. */ +# define DO_TEST 0 #elif defined (__aarch64__) /* On AArch64 integer division by zero does not trap. */ # define DO_TEST 0
Re: [PATCH, i386 testsuite]: Require nonpic target for some tests
On Thu, Mar 12, 2015 at 9:11 AM, Tom de Vries wrote: >> Attached patch adds nonpic target requirement for some (obvious) >> cases, where data access or PIC register setup confuses scan-asms. >> >> 2015-01-30 Uros Bizjak >> >> * gcc.target/i386/fuse-caller-save-rec.c: Require nonpic target. >> * gcc.target/i386/fuse-caller-save-xmm.c: Ditto. >> * gcc.target/i386/fuse-caller-save.c: Ditto. > > > Hi, > > I've reverted this part of the patch. The scans were failing because the > -fipa-ra optimization was broken for -m32 -fpic (PR64895). Not really. Allocator is free to allocate %ebx (or other call-saved register) as PIC register. In this case, unwanted push/pop sequence will be emitted. Uros.
Re: [PATCH, PR target/65103, 1/3] Fix cost of PIC register in ix86_address_cost
On 10 Mar 19:08, Uros Bizjak wrote: > Hello! > > > > > Test O2 ref patchedOfast + LTO ref patched > > > > 164.gzip12 0 (-100%)39 0 (-100%) > > > > 175.vpr 0 0 (-0%) 4 0 (-100%) > > > > 176.gcc 141 6 (-96%) 294 10 (-97%) > > > > 181.mcf 4 0 (-100%)4 2 (-50%) > > Do you also have executable sizes at hand? Summary size change for SPEC2000 on -O2 is -0,11%. > > > 2015-03-10 Ilya Enkovich > > > > PR target/65103 > > * config/i386/i386.c (ix86_address_cost): Fix cost of a PIC > > register. > > > > gcc/testsuite/ > > > > 2015-03-10 Ilya Enkovich > > > > PR target/65103 > > * gcc.target/i386/pr65103-1.c: New. > > LGTM, just a nit below. > > Otherwise, OK for mainline as a bugfix (but please wait for a day if > there are any objections from release managers). > > + /* Attempt to minimize number of registers in the address. > > This is now a displaced comment. Please integrate it in the main comment. > > Thanks, > Uros. Here is a final version. Thanks, Ilya -- diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ab8f03a..47deda7 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12931,30 +12931,26 @@ ix86_address_cost (rtx x, machine_mode, addr_space_t, bool) if (parts.index && GET_CODE (parts.index) == SUBREG) parts.index = SUBREG_REG (parts.index); - /* Attempt to minimize number of registers in the address. */ - if ((parts.base - && (!REG_P (parts.base) || REGNO (parts.base) >= FIRST_PSEUDO_REGISTER)) - || (parts.index - && (!REG_P (parts.index) - || REGNO (parts.index) >= FIRST_PSEUDO_REGISTER))) -cost++; - - /* When address base or index is "pic_offset_table_rtx" we don't increase - address cost. When a memopt with "pic_offset_table_rtx" is not invariant - itself it most likely means that base or index is not invariant. - Therefore only "pic_offset_table_rtx" could be hoisted out, which is not - profitable for x86. */ + /* Attempt to minimize number of registers in the address by increasing + address cost for each used register. We don't increase address cost + for "pic_offset_table_rtx". When a memopt with "pic_offset_table_rtx" + is not invariant itself it most likely means that base or index is not + invariant. Therefore only "pic_offset_table_rtx" could be hoisted out, + which is not profitable for x86. */ if (parts.base - && (current_pass->type == GIMPLE_PASS - || (!pic_offset_table_rtx - || REGNO (pic_offset_table_rtx) != REGNO(parts.base))) && (!REG_P (parts.base) || REGNO (parts.base) >= FIRST_PSEUDO_REGISTER) - && parts.index && (current_pass->type == GIMPLE_PASS - || (!pic_offset_table_rtx - || REGNO (pic_offset_table_rtx) != REGNO(parts.index))) + || !pic_offset_table_rtx + || !REG_P (parts.base) + || REGNO (pic_offset_table_rtx) != REGNO (parts.base))) +cost++; + + if (parts.index && (!REG_P (parts.index) || REGNO (parts.index) >= FIRST_PSEUDO_REGISTER) - && parts.base != parts.index) + && (current_pass->type == GIMPLE_PASS + || !pic_offset_table_rtx + || !REG_P (parts.index) + || REGNO (pic_offset_table_rtx) != REGNO (parts.index))) cost++; /* AMD-K6 don't like addresses with ModR/M set to 00_xxx_100b, diff --git a/gcc/testsuite/gcc.target/i386/pr65103-1.c b/gcc/testsuite/gcc.target/i386/pr65103-1.c new file mode 100644 index 000..4e3a7a3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr65103-1.c @@ -0,0 +1,19 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-require-effective-target pie } */ +/* { dg-options "-O2 -fPIE" } */ +/* { dg-final { scan-assembler-not "GOTOFF," } } */ + +typedef struct S +{ + int a; + int sum; + int delta; +} S; + +S gs; +int global_opt (int max) +{ + while (gs.sum < max) +gs.sum += gs.delta; + return gs.a; +}
Re: [PATCH, i386 testsuite]: Require nonpic target for some tests
On Thu, Mar 12, 2015 at 11:41 AM, Tom de Vries wrote: Attached patch adds nonpic target requirement for some (obvious) cases, where data access or PIC register setup confuses scan-asms. 2015-01-30 Uros Bizjak * gcc.target/i386/fuse-caller-save-rec.c: Require nonpic target. * gcc.target/i386/fuse-caller-save-xmm.c: Ditto. * gcc.target/i386/fuse-caller-save.c: Ditto. >>> >>> >>> >>> Hi, >>> >>> I've reverted this part of the patch. The scans were failing because the >>> -fipa-ra optimization was broken for -m32 -fpic (PR64895). >> >> >> Not really. >> >> Allocator is free to allocate %ebx (or other call-saved >> register) as PIC register. >> >> In this case, unwanted push/pop sequence >> will be emitted. >> > > Sure, but I don't see what that has to do with the test-cases. I don't see a > pic register used in fuse-caller-save.c and fuse-caller-save-rec.c. I do see > a pic register used in gcc.target/i386/fuse-caller-save-xmm.c, but there's > no scan for push/pop sequence in there. You are right, the call is (obviously) to a local function. There is no need for PIC reg, so this clears my concerns. Thanks, Uros.
Re: [PATCH, PR target/65103, 1/3] Fix cost of PIC register in ix86_address_cost
On Thu, Mar 12, 2015 at 10:50 AM, Ilya Enkovich wrote: >> > > > Test O2 ref patchedOfast + LTO ref patched >> > > > 164.gzip12 0 (-100%)39 0 (-100%) >> > > > 175.vpr 0 0 (-0%) 4 0 (-100%) >> > > > 176.gcc 141 6 (-96%) 294 10 (-97%) >> > > > 181.mcf 4 0 (-100%)4 2 (-50%) >> >> Do you also have executable sizes at hand? > > Summary size change for SPEC2000 on -O2 is -0,11%. Nice! >> > 2015-03-10 Ilya Enkovich >> > >> > PR target/65103 >> > * config/i386/i386.c (ix86_address_cost): Fix cost of a PIC >> > register. >> > >> > gcc/testsuite/ >> > >> > 2015-03-10 Ilya Enkovich >> > >> > PR target/65103 >> > * gcc.target/i386/pr65103-1.c: New. >> >> LGTM, just a nit below. >> >> Otherwise, OK for mainline as a bugfix (but please wait for a day if >> there are any objections from release managers). >> >> + /* Attempt to minimize number of registers in the address. >> >> This is now a displaced comment. Please integrate it in the main comment. >> >> Thanks, >> Uros. > > Here is a final version. OK for mainline. Thanks, Uros.
Re: [PATCH, i386 testsuite]: Require nonpic target for some tests
On 30-01-15 20:49, Uros Bizjak wrote: Hello! Attached patch adds nonpic target requirement for some (obvious) cases, where data access or PIC register setup confuses scan-asms. 2015-01-30 Uros Bizjak * gcc.target/i386/fuse-caller-save-rec.c: Require nonpic target. * gcc.target/i386/fuse-caller-save-xmm.c: Ditto. * gcc.target/i386/fuse-caller-save.c: Ditto. Hi, I've reverted this part of the patch. The scans were failing because the -fipa-ra optimization was broken for -m32 -fpic (PR64895). Thanks, - Tom 2015-03-12 Tom de Vries PR rtl-optimization/64895 * gcc.target/i386/fuse-caller-save-rec.c: Revert require nonpic target. * gcc.target/i386/fuse-caller-save-xmm.c: Ditto. * gcc.target/i386/fuse-caller-save.c: Ditto. diff --git a/gcc/testsuite/gcc.target/i386/fuse-caller-save-rec.c b/gcc/testsuite/gcc.target/i386/fuse-caller-save-rec.c index ed0984c..c660e01 100644 --- a/gcc/testsuite/gcc.target/i386/fuse-caller-save-rec.c +++ b/gcc/testsuite/gcc.target/i386/fuse-caller-save-rec.c @@ -1,5 +1,4 @@ /* { dg-do compile } */ -/* { dg-require-effective-target nonpic } */ /* { dg-options "-O2 -fipa-ra -fomit-frame-pointer -fno-optimize-sibling-calls" } */ /* { dg-additional-options "-mregparm=1" { target ia32 } } */ diff --git a/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c b/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c index 261ba07..1d02844 100644 --- a/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c +++ b/gcc/testsuite/gcc.target/i386/fuse-caller-save-xmm.c @@ -1,5 +1,4 @@ /* { dg-do compile } */ -/* { dg-require-effective-target nonpic } */ /* { dg-options "-O2 -msse2 -mno-avx -fipa-ra -fomit-frame-pointer" } */ typedef double v2df __attribute__((vector_size (16))); diff --git a/gcc/testsuite/gcc.target/i386/fuse-caller-save.c b/gcc/testsuite/gcc.target/i386/fuse-caller-save.c index b9494ac..7cfd22a 100644 --- a/gcc/testsuite/gcc.target/i386/fuse-caller-save.c +++ b/gcc/testsuite/gcc.target/i386/fuse-caller-save.c @@ -1,5 +1,4 @@ /* { dg-do compile } */ -/* { dg-require-effective-target nonpic } */ /* { dg-options "-O2 -fipa-ra -fomit-frame-pointer" } */ /* { dg-additional-options "-mregparm=1" { target ia32 } } */ -- 1.9.1
[Committed][PR64895] Use actual_call_used_reg_set to find conflicting regs
Hi, This patch fixes PR64895, related to the gcc.target/i386/fuse-caller-save*.c failures for -m32 -fpic. Bootstrapped and reg-tested on x86_64 for unix/ and unix/-m32. Build and reg-tested on x86_64 for unix/fpic and unix/fpic/-m32. Approved here ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64895#c7 ): ... The patch looks ok to me. Tom, could you prepare the patch (check it mostly for x86-64 bootstrap and testsuite) and commit it to the trunk. I approve it. ... Thanks, - Tom 2015-03-12 Tom de Vries PR rtl-optimization/64895 * lra-lives.c (check_pseudos_live_through_calls): Use actual_call_used_reg_set instead of call_used_reg_set, if available. diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c index 9dfffb6..5d759ca 100644 --- a/gcc/lra-lives.c +++ b/gcc/lra-lives.c @@ -636,8 +636,12 @@ check_pseudos_live_through_calls (int regno) if (! sparseset_bit_p (pseudos_live_through_calls, regno)) return; sparseset_clear_bit (pseudos_live_through_calls, regno); + bool actual_call_used_reg_set_available_p += !hard_reg_set_empty_p (lra_reg_info[regno].actual_call_used_reg_set); IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs, - call_used_reg_set); + (actual_call_used_reg_set_available_p + ? lra_reg_info[regno].actual_call_used_reg_set + : call_used_reg_set)); for (hr = 0; hr < FIRST_PSEUDO_REGISTER; hr++) if (HARD_REGNO_CALL_PART_CLOBBERED (hr, PSEUDO_REGNO_MODE (regno))) -- 1.9.1
Re: [patch] disable libmpx x32 multilib builds
On 11 Mar 19:11, Ilya Enkovich wrote: > 2015-03-11 18:59 GMT+03:00 H.J. Lu : > > On Wed, Mar 11, 2015 at 7:37 AM, Matthias Klose wrote: > >> current trunk fails to build on x86*-linux, when configured for x32 > >> multilibs > >> because libmpx doesn't support these. Disable them. > >> > >> ok for the trunk? > >> > >> * Disable libmpx x32 multilib builds. > >> > >> --- a/config-ml.in > >> +++ b/config-ml.in > >> @@ -102,6 +102,7 @@ > >> Makefile=${ac_file-Makefile} > >> ml_config_shell=${CONFIG_SHELL-/bin/sh} > >> ml_realsrcdir=${srcdir} > >> +ml_srcbase=`basename $ml_realsrcdir` > >> > >> # Scan all the arguments and set all the ones we need. > >> > >> @@ -220,6 +221,10 @@ > >>if [ "${dir}" = "." ]; then > >> true > >>else > >> +# libmpx is not supported on x32 > >> +if [ "${ml_srcbase}-${dir}" = libmpx-x32 ]; then > >> + continue > >> +fi > >> if [ -z "${multidirs}" ]; then > >>multidirs="${dir}" > >> else > > > > This is incorrect. Ilya and I are working on a proper fix. > > > > -- > > H.J. > > Current libmpx configure has a check for x32 but it doesn't work due > to square brackets removed from the test by autoconf. Will test this > patch: > > diff --git a/libmpx/configure.ac b/libmpx/configure.ac > index 4669525..fe0d3f2 100644 > --- a/libmpx/configure.ac > +++ b/libmpx/configure.ac > @@ -28,7 +28,7 @@ GCC_LIBSTDCXX_RAW_CXX_FLAGS > # See if supported. > unset LIBMPX_SUPPORTED > AC_MSG_CHECKING([for target support for Intel MPX runtime library]) > -echo "int i[sizeof (void *) == 4 ? 1 : -1] = { __x86_64__ };" > conftest.c > +echo "int i[[sizeof (void *) == 4 ? 1 : -1]] = { __x86_64__ };" > conftest.c > if AC_TRY_COMMAND([${CC} ${CFLAGS} -c -o conftest.o conftest.c > 1>&AS_MESSAGE_LOG_FD]) > then > LIBMPX_SUPPORTED=no > > > Thanks, > Ilya Successfully bootstrapped on on x86_64-unknown-linux-gnu with '--enable-libmpx --with-multilib-list=m32,m64,mx32'. Applied to trunk. Thanks, Ilya -- 2015-03-12 Ilya Enkovich PR other/65384 * configure.ac: Fix x32 test. * configure: Regenerate. diff --git a/libmpx/configure.ac b/libmpx/configure.ac index 4669525..fe0d3f2 100644 --- a/libmpx/configure.ac +++ b/libmpx/configure.ac @@ -28,7 +28,7 @@ GCC_LIBSTDCXX_RAW_CXX_FLAGS # See if supported. unset LIBMPX_SUPPORTED AC_MSG_CHECKING([for target support for Intel MPX runtime library]) -echo "int i[sizeof (void *) == 4 ? 1 : -1] = { __x86_64__ };" > conftest.c +echo "int i[[sizeof (void *) == 4 ? 1 : -1]] = { __x86_64__ };" > conftest.c if AC_TRY_COMMAND([${CC} ${CFLAGS} -c -o conftest.o conftest.c 1>&AS_MESSAGE_LOG_FD]) then LIBMPX_SUPPORTED=no
[Patch, Fortran] Reject unsupported coarray communication
There are two groups of features which are not properly implemented with remote access: * "caf(:)[i]%a" might have a byte stride which is not compatible with the size of "a". (Fix: new array descriptor.) * All access which involves dereferencing pointers in a remote coarray (e.g. "caf[i]%ptr_comp = 5") are not supported. This patch now rejects them - instead of accepting them silently and doing the wrong things at runtime. Build and regtested on x86-64-gnu-linux OK for the trunk? Tobias 2015-03-11 Tobias Burnus * trans-expr.c (gfc_get_tree_for_caf_expr): Reject unimplemented coindexed coarray accesses. * gfortran.dg/coarray_38.f90: New. * gfortran.dg/coarray_39.f90: New. * gfortran.dg/coarray/coindexed_3.f90: Add dg-error, turn into compile test. gcc/fortran/trans-expr.c | 57 +- gcc/testsuite/gfortran.dg/coarray/coindexed_3.f90 | 10 +- gcc/testsuite/gfortran.dg/coarray_38.f90 | 124 ++ gcc/testsuite/gfortran.dg/coarray_39.f90 | 124 ++ 4 files changed, 309 insertions(+), 6 deletions(-) diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index 353d012..87d3a2d 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -1498,10 +1498,65 @@ gfc_get_tree_for_caf_expr (gfc_expr *expr) { tree caf_decl; bool found = false; - gfc_ref *ref; + gfc_ref *ref, *comp_ref = NULL; gcc_assert (expr && expr->expr_type == EXPR_VARIABLE); + /* Not-implemented diagnostic. */ + for (ref = expr->ref; ref; ref = ref->next) +if (ref->type == REF_COMPONENT) + { +comp_ref = ref; + if ((ref->u.c.component->ts.type == BT_CLASS + && !CLASS_DATA (ref->u.c.component)->attr.codimension + && (CLASS_DATA (ref->u.c.component)->attr.pointer + || CLASS_DATA (ref->u.c.component)->attr.allocatable)) + || (ref->u.c.component->ts.type != BT_CLASS + && !ref->u.c.component->attr.codimension + && (ref->u.c.component->attr.pointer + || ref->u.c.component->attr.allocatable))) + gfc_error ("Sorry, coindexed access to a pointer or allocatable " + "component of the coindexed coarray at %L is not yet " + "supported", &expr->where); + } + if ((!comp_ref + && ((expr->symtree->n.sym->ts.type == BT_CLASS + && CLASS_DATA (expr->symtree->n.sym)->attr.alloc_comp) + || (expr->symtree->n.sym->ts.type == BT_DERIVED + && expr->symtree->n.sym->ts.u.derived->attr.alloc_comp))) + || (comp_ref + && ((comp_ref->u.c.component->ts.type == BT_CLASS + && CLASS_DATA (comp_ref->u.c.component)->attr.alloc_comp) + || (comp_ref->u.c.component->ts.type == BT_DERIVED + && comp_ref->u.c.component->ts.u.derived->attr.alloc_comp +gfc_error ("Sorry, coindexed coarray at %L with allocatable component is " + "not yet supported", &expr->where); + + if (expr->rank) +{ + /* Without the new array descriptor, access like "caf[i]%a(:)%b" is in + general not possible as the required stride multiplier might be not + a multiple of c_sizeof(b). In case of noncoindexed access, the + scalarizer often takes care of it - for coarrays, it always fails. */ + for (ref = expr->ref; ref; ref = ref->next) +if (ref->type == REF_COMPONENT + && ((ref->u.c.component->ts.type == BT_CLASS + && CLASS_DATA (ref->u.c.component)->attr.codimension) + || (ref->u.c.component->ts.type != BT_CLASS + && ref->u.c.component->attr.codimension))) + break; + if (ref == NULL) + ref = expr->ref; + for ( ; ref; ref = ref->next) + if (ref->type == REF_ARRAY && ref->u.ar.dimen) + break; + for ( ; ref; ref = ref->next) + if (ref->type == REF_COMPONENT) + gfc_error ("Sorry, coindexed access at %L to a scalar component " + "with an array partref is not yet supported", + &expr->where); +} + caf_decl = expr->symtree->n.sym->backend_decl; gcc_assert (caf_decl); if (expr->symtree->n.sym->ts.type == BT_CLASS) diff --git a/gcc/testsuite/gfortran.dg/coarray/coindexed_3.f90 b/gcc/testsuite/gfortran.dg/coarray/coindexed_3.f90 index 46488f3..4642f2c 100644 --- a/gcc/testsuite/gfortran.dg/coarray/coindexed_3.f90 +++ b/gcc/testsuite/gfortran.dg/coarray/coindexed_3.f90 @@ -1,4 +1,4 @@ -! { dg-do run } +! { dg-do compile } ! ! Contributed by Reinhold Bader ! @@ -45,8 +45,8 @@ program pmup allocate(t :: a(3)[*]) IF (this_image() == num_images()) THEN SELECT TYPE (a) - TYPE IS (t) - a(:)[1]%a = 4.0 + TYPE IS (t) ! FIXME: When implemented, turn into "do-do run" + a(:)[1]%a = 4.0 ! { dg-error "Sorry, coindexed access at \\(1\\) to a scalar component with an array partref is not yet supported" } END SELECT END IF SYNC ALL @@ -56,8 +56,8 @@ program pmup TYPE IS (real) ii = a(1)[1] call abort() -TYPE IS (t) - IF (ALL(A(:)[1]%a == 4.0)) THEN +TYPE IS (t) ! FIXME: When implemented, turn into
Re: [C++ Patch] PR 65323
Hi, On 03/11/2015 09:26 PM, Jason Merrill wrote: On 03/06/2015 03:36 AM, Paolo Carlini wrote: this is a regression about duplicate warnings with -Wzero-as-null-pointer-constant. The regression is rather old, affects 4_8-branch too, and started when check_default_argument got a perform_implicit_conversion_flags call which warns a first time, then maybe_warn_zero_as_null_pointer_constant as called by check_default_argument itself warns a second time. The latter call is even older, dates back to c++/52718, I think we can now safely remove it and keep on returning nullptr_node to avoid warning later still at the call sites (that was the point of c++/52718). Tested x86_64-linux. Do we need this special handling at all? When I remove that whole 'if' block I still only get one warning from the 52718 testcase. I just tried again for this reduced version of 52718 (I added a 0 && to the 'if'): void* fun(void* a = 0); void* f2 = fun(); and I got (removed the irrelevant carets): 52718_red.C:1:22: warning: zero as null pointer constant [-Wzero-as-null-pointer-constant] void* fun(void* a = 0); 52718_red.C:2:16: warning: zero as null pointer constant [-Wzero-as-null-pointer-constant] void* f2 = fun(); That is, as far as I can see, the rationale that led to an early *return* for 52718 still stand: no matter what we do at the beginning of check_default_argument, whether we warn via perform_implicit_conversion_flags or immediately, we still want to early return to avoid warning again at the call site. Paolo.
[PATCH] Fix recent OEP_ADDRESS_OF change
This fixes sth noticed by Honza - I was resetting OEP_ADDRESS_OF before actually testing for it in MEM_REF/TARGET_MEM_REF handling. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-03-12 Richard Biener PR middle-end/65270 * fold-const.c (operand_equal_p): Fix ordering of resetting OEP_ADDRESS_OF and checking for it in the [TARGET_]MEM_REF case. Index: gcc/fold-const.c === *** gcc/fold-const.c(revision 221324) --- gcc/fold-const.c(working copy) *** operand_equal_p (const_tree arg0, const_ *** 2934,2954 return OP_SAME (0); case TARGET_MEM_REF: - flags &= ~(OEP_CONSTANT_ADDRESS_OF|OEP_ADDRESS_OF); - /* Require equal extra operands and then fall through to MEM_REF -handling of the two common operands. */ - if (!OP_SAME_WITH_NULL (2) - || !OP_SAME_WITH_NULL (3) - || !OP_SAME_WITH_NULL (4)) - return 0; - /* Fallthru. */ case MEM_REF: - flags &= ~(OEP_CONSTANT_ADDRESS_OF|OEP_ADDRESS_OF); /* Require equal access sizes, and similar pointer types. We can have incomplete types for array references of variable-sized arrays from the Fortran frontend though. Also verify the types are compatible. */ ! return ((TYPE_SIZE (TREE_TYPE (arg0)) == TYPE_SIZE (TREE_TYPE (arg1)) || (TYPE_SIZE (TREE_TYPE (arg0)) && TYPE_SIZE (TREE_TYPE (arg1)) && operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)), --- 2934,2945 return OP_SAME (0); case TARGET_MEM_REF: case MEM_REF: /* Require equal access sizes, and similar pointer types. We can have incomplete types for array references of variable-sized arrays from the Fortran frontend though. Also verify the types are compatible. */ ! if (!((TYPE_SIZE (TREE_TYPE (arg0)) == TYPE_SIZE (TREE_TYPE (arg1)) || (TYPE_SIZE (TREE_TYPE (arg0)) && TYPE_SIZE (TREE_TYPE (arg1)) && operand_equal_p (TYPE_SIZE (TREE_TYPE (arg0)), *** operand_equal_p (const_tree arg0, const_ *** 2963,2970 && (MR_DEPENDENCE_BASE (arg0) == MR_DEPENDENCE_BASE (arg1)) && (TYPE_ALIGN (TREE_TYPE (arg0)) ! == TYPE_ALIGN (TREE_TYPE (arg1) ! && OP_SAME (0) && OP_SAME (1)); case ARRAY_REF: case ARRAY_RANGE_REF: --- 2954,2968 && (MR_DEPENDENCE_BASE (arg0) == MR_DEPENDENCE_BASE (arg1)) && (TYPE_ALIGN (TREE_TYPE (arg0)) ! == TYPE_ALIGN (TREE_TYPE (arg1))) ! return 0; ! flags &= ~(OEP_CONSTANT_ADDRESS_OF|OEP_ADDRESS_OF); ! return (OP_SAME (0) && OP_SAME (1) ! /* TARGET_MEM_REF require equal extra operands. */ ! && (TREE_CODE (arg0) != TARGET_MEM_REF ! || (OP_SAME_WITH_NULL (2) ! && OP_SAME_WITH_NULL (3) ! && OP_SAME_WITH_NULL (4; case ARRAY_REF: case ARRAY_RANGE_REF:
[PATCH] Make split_block and create_basic_block type-safe
After noticing tree-parloop.c passing crap to split_block (a tree rather than a gimple or an rtx) I noticed those CFG functions simply take void * pointers. The following patch fixes that and adds two overloads, one for GIMPLE use and one for RTL use. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Ok at this stage? Thanks, Richard. 2015-03-12 Richard Biener * cfghooks.h (create_basic_block): Replace with two overloads for RTL and GIMPLE. (split_block): Likewise. * cfghooks.c (split_block): Rename to ... (split_block_1): ... this. (split_block): Add two type-safe overloads for RTL and GIMPLE. (split_block_after_labels): Call split_block_1. (create_basic_block): Rename to ... (create_basic_block_1): ... this. (create_basic_block): Add two type-safe overloads for RTL and GIMPLE. (create_empty_bb): Call create_basic_block_1. * cfgrtl.c (fixup_fallthru_exit_predecessor): Use split_block_after_labels. * omp-low.c (expand_parallel_call): Likewise. (expand_omp_target): Likewise. (simd_clone_adjust): Likewise. * tree-chkp.c (chkp_get_entry_block): Likewise. * cgraphunit.c (init_lowered_empty_function): Use the GIMPLE create_basic_block overload. (cgraph_node::expand_thunk): Likewise. * tree-cfg.c (make_blocks): Likewise. (handle_abnormal_edges): Likewise. * tree-inline.c (copy_bb): Likewise. Index: gcc/cfghooks.c === --- gcc/cfghooks.c (revision 221379) +++ gcc/cfghooks.c (working copy) @@ -505,8 +505,8 @@ redirect_edge_and_branch_force (edge e, the labels). If I is NULL, splits just after labels. The newly created edge is returned. The new basic block is created just after the old one. */ -edge -split_block (basic_block bb, void *i) +static edge +split_block_1 (basic_block bb, void *i) { basic_block new_bb; edge res; @@ -550,12 +550,24 @@ split_block (basic_block bb, void *i) return res; } +edge +split_block (basic_block bb, gimple i) +{ + return split_block_1 (bb, i); +} + +edge +split_block (basic_block bb, rtx i) +{ + return split_block_1 (bb, i); +} + /* Splits block BB just after labels. The newly created edge is returned. */ edge split_block_after_labels (basic_block bb) { - return split_block (bb, NULL); + return split_block_1 (bb, NULL); } /* Moves block BB immediately after block AFTER. Returns false if the @@ -696,8 +708,8 @@ split_edge (edge e) HEAD and END are the first and the last statement belonging to the block. If both are NULL, an empty block is created. */ -basic_block -create_basic_block (void *head, void *end, basic_block after) +static basic_block +create_basic_block_1 (void *head, void *end, basic_block after) { basic_block ret; @@ -714,12 +726,25 @@ create_basic_block (void *head, void *en return ret; } +basic_block +create_basic_block (gimple_seq seq, basic_block after) +{ + return create_basic_block_1 (seq, NULL, after); +} + +basic_block +create_basic_block (rtx head, rtx end, basic_block after) +{ + return create_basic_block_1 (head, end, after); +} + + /* Creates an empty basic block just after basic block AFTER. */ basic_block create_empty_bb (basic_block after) { - return create_basic_block (NULL, NULL, after); + return create_basic_block_1 (NULL, NULL, after); } /* Checks whether we may merge blocks BB1 and BB2. */ Index: gcc/cfghooks.h === --- gcc/cfghooks.h (revision 221379) +++ gcc/cfghooks.h (working copy) @@ -196,12 +196,14 @@ extern edge redirect_edge_succ_nodup (ed extern bool can_remove_branch_p (const_edge); extern void remove_branch (edge); extern void remove_edge (edge); -extern edge split_block (basic_block, void *); +extern edge split_block (basic_block, rtx); +extern edge split_block (basic_block, gimple); extern edge split_block_after_labels (basic_block); extern bool move_block_after (basic_block, basic_block); extern void delete_basic_block (basic_block); extern basic_block split_edge (edge); -extern basic_block create_basic_block (void *, void *, basic_block); +extern basic_block create_basic_block (rtx, rtx, basic_block); +extern basic_block create_basic_block (gimple_seq, basic_block); extern basic_block create_empty_bb (basic_block); extern bool can_merge_blocks_p (basic_block, basic_block); extern void merge_blocks (basic_block, basic_block); Index: gcc/cfgrtl.c === --- gcc/cfgrtl.c(revision 221379) +++ gcc/cfgrtl.c(working copy) @@ -4047,7 +4047,7 @@ fixup_fallthru_exit_predecessor (void) edge, we have to split that block. */ if (c == bb) { - bb = split_block (bb, NULL)->dest; + bb = split_block_
Re: [PATCH][simplify-rtx] PR 65235: Calculate element size correctly when simplifying (vec_select (vec_concat (const_int) (...)) [...])
>> The patch fixes that by calculating the size of the first element by >> taking the size of the outer mode and subtracting the size of the second >> element. >> >> I've added an assert to make sure that the second element is not also a >> const_int, as a vec_concat of const_ints doesn't make sense as far as I can >> see. > > I'm not sure about the assert, can't we just punt in this case? Ok, here's a patch returning 0 in that case. The assert had never triggered in my testing anyway, but I agree we want to just cancel the simplification rather than ICE. > >> Bootstrapped and tested on aarch64-none-linux-gnu, >> arm-none-linux-gnueabihf, x86_64-linux-gnu. >> This bug appears on trunk, 4.9 and 4.8, so it's not a regression on the >> release branches but it is a wrong-code bug. > > I think that the fix would be acceptable for GCC 5 without the assert. > Thanks for reviewing. Richard, do you think this can go in for GCC 5 now? What about 4.9 and 4.8? The bug appears there as well. Thanks, Kyrill 2015-03-12 Kyrylo Tkachov PR rtl-optimization 65235 * simplify-rtx.c (simplify_binary_operation_1, VEC_SELECT case): When first element of vec_concat is const_int, calculate its size using second element. 2015-03-12 Kyrylo Tkachov PR rtl-optimization 65235 * gcc.target/aarch64/pr65235_1.c: New test.commit 9946603f73e89f50d6610a943f770627ed533dbc Author: Kyrylo Tkachov Date: Thu Feb 26 16:40:52 2015 + [simplify-rtx] Calculate vector size correctly when simplifying (vec_select (vec_concat (const_int) (...)) [...]) diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index a003b41..5d17498 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -3555,7 +3555,21 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, while (GET_MODE (vec) != mode && GET_CODE (vec) == VEC_CONCAT) { - HOST_WIDE_INT vec_size = GET_MODE_SIZE (GET_MODE (XEXP (vec, 0))); + HOST_WIDE_INT vec_size; + + if (CONST_INT_P (XEXP (vec, 0))) + { + /* vec_concat of two const_ints doesn't make sense with + respect to modes. */ + if (CONST_INT_P (XEXP (vec, 1))) + return 0; + + vec_size = GET_MODE_SIZE (GET_MODE (trueop0)) + - GET_MODE_SIZE (GET_MODE (XEXP (vec, 1))); + } + else + vec_size = GET_MODE_SIZE (GET_MODE (XEXP (vec, 0))); + if (offset < vec_size) vec = XEXP (vec, 0); else diff --git a/gcc/testsuite/gcc.target/aarch64/pr65235_1.c b/gcc/testsuite/gcc.target/aarch64/pr65235_1.c new file mode 100644 index 000..ca12cd5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr65235_1.c @@ -0,0 +1,30 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include "arm_neon.h" + +int +main (int argc, char** argv) +{ + int64x1_t val1; + int64x1_t val2; + int64x1_t val3; + uint64x1_t val13; + uint64x2_t val14; + uint64_t got; + uint64_t exp; + val1 = vcreate_s64(UINT64_C(0x80008000)); + val2 = vcreate_s64(UINT64_C(0xf38d)); + val3 = vcreate_s64(UINT64_C(0x7fff809b)); + /* Expect: "val13" = 80001553. */ + val13 = vcreate_u64 (UINT64_C(0x80001553)); + /* Expect: "val14" = 0010 0002 . */ + val14 = vcombine_u64(vcgt_s64(vqrshl_s64(val1, val2), +vshr_n_s64(val3, 18)), + vshr_n_u64(val13, 11)); + /* Should be . */ + got = vgetq_lane_u64(val14, 0); + exp = 0; + if(exp != got) +__builtin_abort (); +}
Re: [PATCH][simplify-rtx] PR 65235: Calculate element size correctly when simplifying (vec_select (vec_concat (const_int) (...)) [...])
On Thu, Mar 12, 2015 at 2:28 PM, Kyrill Tkachov wrote: >>> The patch fixes that by calculating the size of the first element by >>> taking the size of the outer mode and subtracting the size of the second >>> element. >>> >>> I've added an assert to make sure that the second element is not also a >>> const_int, as a vec_concat of const_ints doesn't make sense as far as I >>> can >>> see. >> >> I'm not sure about the assert, can't we just punt in this case? > > Ok, here's a patch returning 0 in that case. > The assert had never triggered in my testing anyway, but I agree we > want to just cancel the simplification rather than ICE. > >> >>> Bootstrapped and tested on aarch64-none-linux-gnu, >>> arm-none-linux-gnueabihf, x86_64-linux-gnu. >>> This bug appears on trunk, 4.9 and 4.8, so it's not a regression on the >>> release branches but it is a wrong-code bug. >> >> I think that the fix would be acceptable for GCC 5 without the assert. >> > > Thanks for reviewing. > Richard, do you think this can go in for GCC 5 now? > What about 4.9 and 4.8? The bug appears there as well. Sure - it's a wrong-code fix. Ok for trunk and branches (after a while). Thanks, Richard. > Thanks, > Kyrill > > > 2015-03-12 Kyrylo Tkachov > > PR rtl-optimization 65235 > * simplify-rtx.c (simplify_binary_operation_1, VEC_SELECT case): > When first element of vec_concat is const_int, calculate its size > using second element. > > 2015-03-12 Kyrylo Tkachov > > > PR rtl-optimization 65235 > * gcc.target/aarch64/pr65235_1.c: New test.
[linaro/gcc-4_9-branch] Merge from gcc-4_9-branch and backports
Hi all we have merged the gcc-4_9-branch into linaro/gcc-4_9-branch up to revision 221341 as r221360. We have also backported this set of revisions: * r212011 as r221216 : PR tree-optimization/61607 * r214942 as r221216 : Abstract away marking loops for removal * r214957 as r221216 : Sanity check removed loops * r215012 as r221216 : PR bootstrap/63204 * r215016 as r221216 : PR ipa/63196 * r215612 as r221194 : Tighten predicates on SIMD shift intrinsics * r215722 as r221196 : Wire up vqdmullh_laneq_s16 and vqdmullh_laneq_s32 * r216663 as r221239 : [testsuite] revert changes on check_effective_target_arm_*_ok * r217706 as r221240 : [testsuite] new set of Neon intrinsics tests * r217707 as r221241 : [testsuite] fix vbic/vorn Neon tests * r217725 as r221339 : Improve modeled latency between FP operations and FP->GP register moves * r217780 as r221302 : Adjust generic move costs * r217852 as r221300 : Add range-check for Symbol + offset addressing * r217938 as r221301 : Add vector pattern for __builtin_ctz * r218115 as r221216 : PR tree-optimization/64083 * r218463 as r221242 : [testsuite] Fix vaddl and vaddw tests * r218486 as r221344 : Bics instruction generation for aarch64 * r218503 as r221344 : additional bics patterns * r218733 as r221216 : PR tree-optimization/64284 * r218746 as r221216 : PR middle-end/64246 * r219764 as r221242 : [testsuite] Add explicit dependency on Neon Cumulative Saturation flag * r219765 as r221242 : [testsuite] Be more verbose, and actually confirm that a test was checked. * r219767 as r221242 : [testsuite] Add vld1_lane tests * r219914 as r221242 : [testsuite] Add vldX_dup test. * r219917 as r221242 : [testsuite] Add vmla and vmls tests. * r219918 as r221242 : [testsuite] Add vmla_lane and vmls_lane tests. * r219919 as r221242 : [testsuite] Add vtrn tests. Refactor vzup and vzip tests. * r219920 as r221242 : [testsuite] Add vmlal and vmlsl tests. * r219921 as r221242 : [testsuite] Add vmlal_lane and vmlsl_lane tests. * r219922 as r221242 : [testsuite] Add vmlal_n and vmlsl_n tests. * r219930 as r221242 : [testsuite] Add vqdmlal and vqdmlsl tests. * r219931 as r221242 : [testsuite] Add vqdmlal_lane and vqdmlsl_lane tests * r219932 as r221242 : [testsuite] Add vqdmlal_n and vqdmlsl_n tests. * r219934 as r221242 : [testsuite] Add vsli_n and vsri_n tests. * r219937 as r221242 : [testsuite] Add vsubl tests, put most of the code in common with vaddl in vXXXl.inc. * r219938 as r221242 : [testsuite] Add vsubw tests, putting most of the code in common with vaddw * r219939 as r221242 : [testsuite] Add vmovn tests. * r219940 as r221242 : [testsuite] Add vmul_lane tests. * r219941 as r221242 : [testsuite] Add vmul_n tests. * r219942 as r221242 : [testsuite] Add vmull tests. * r219943 as r221242 : [testsuite] Add vmull_lane tests. * r219944 as r221242 : [testsuite] Add vmull_n tests. * r219945 as r221242 : [testsuite] Add vqdmulh tests. * r219946 as r221242 : [testsuite] Add vqdmulh_lane tests. * r219947 as r221242 : [testsuite] Add vqdmulh_n tests. * r219948 as r221242 : [testsuite] Add vqdmull tests. * r219949 as r221242 : [testsuite] Add vqdmull_lane tests. * r219950 as r221242 : [testsuite] Add vqdmull_n tests. * r220117 as r221242 : [testsuite] Add vsubhn, vraddhn and vrsubhn tests. * r220118 as r221242 : [testsuite] Add vmla_n and vmls_n tests. * r220119 as r221242 : [testsuite] Add vpadd, vpmax and vpmin tests. * r220121 as r221242 : [testsuite] Add vmovl tests. * r220122 as r221242 : [testsuite] Add vmnv tests. * r220123 as r221242 : [testsuite] Add vpadal tests. * r220124 as r221242 : [testsuite] Add vpaddl tests. * r220126 as r221242 : Fix incorrect ChangeLog formatting. * r220353 as r221242 : [testsuite] Add vmax, vmin, vhadd, vhsub and vrhadd tests. * r220491 as r221216 : PR tree-optimization/64878 * r220751 as r221343 : [Haifa Scheduler] Fix latent bug in macro-fusion/instruction grouping * r220860 as r221215 : [AArch64] Fix wrong-code bug in right-shift SISD patterns This will be part of our 2015.03 4.9 release. Thanks Yvan
[Ada] handle 'Code_Address on targets with function descriptors
For P a subprogram, P'Code_Address is expected to return the address at which the machine code for P starts. It differs from 'Address on targets where function symbol names denote the address of a function descriptor, a record from which the code address can be fetched (e.g. on ppc-aix). On such targets, P'Address is expected to return the descriptor address, and it does. P'Code_Address should fetch the code address from the, descriptor but we have nothing in place to achieve that today. It just returns the same as 'Address. The attached patch is the gigi part of a change to fix this, relying on a tm definition that we'll be submitting later on. With everything in place, the testcase below is expected to display "OK". Bootstrapped and regtested on x86_64-pc-linux-gnu Olivier 2015-03-12 Olivier Hainque * gcc-interface/trans.c (Attribute_to_gnu) : On targets where a function symbol designates a function descriptor, fetch the function code address from the descriptor. -- with System, Ada.Unchecked_Conversion; with Ada.Text_IO; use Ada.Text_IO; procedure Code_Addr_P is Addr, Code_Addr : System.Address; type Fn_Descriptor is record Fn_Address : System.Address; end record; type Descriptor_Access is access all Fn_Descriptor; function To_Descriptor_Access is new Ada.Unchecked_Conversion (System.Address, Descriptor_Access); Da : Descriptor_Access; use type System.Address; begin Addr := Code_Addr_P'Address; Code_Addr := Code_Addr_P'Code_Address; Da := To_Descriptor_Access (Addr); if Da.Fn_Address /= Code_Addr then raise Program_Error; end if; Put_Line ("OK"); end; fndesc.diff Description: Binary data
[PATCH][OpenMP] Fix declare target variables in fortran modules
Hi, We have a problem with declare target variables in fortran modules, here is a small reproducer: + share.f90: module share integer :: var_x !$omp declare target(var_x) end module + test.f90: use share var_x = 10 !$omp target update to(var_x) end + $ gfortran -fopenmp -c share.f90 $ gfortran -fopenmp -c test.f90 $ gfortran -fopenmp share.o test.o $ ./a.out libgomp: Duplicate node + This happens because the var_x is added into offload tables for both share.o and test.o. The patch below fixes this issue. Regtested on x86_64-linux and i686-linux. However I'm not sure how to create a regression test, which would compile 2 separate objects, and check run-time result. diff --git a/gcc/varpool.c b/gcc/varpool.c index 707f62f..5929d92 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -173,7 +173,7 @@ varpool_node::get_create (tree decl) node = varpool_node::create_empty (); node->decl = decl; - if ((flag_openacc || flag_openmp) + if ((flag_openacc || flag_openmp) && !DECL_EXTERNAL (decl) && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl))) { node->offloadable = 1; Thanks, -- Ilya
Re: [PATCH][OpenMP] Fix declare target variables in fortran modules
On Thu, Mar 12, 2015 at 04:56:35PM +0300, Ilya Verbin wrote: > This happens because the var_x is added into offload tables for both share.o > and > test.o. The patch below fixes this issue. Regtested on x86_64-linux and > i686-linux. However I'm not sure how to create a regression test, which would > compile 2 separate objects, and check run-time result. Ok with proper ChangeLog entry. As for testcase, won't dg-additional-sources help? I mean, does it fail without your patch even if you just do gfortran -fopenmp -o a.out share.f90 test.f90; ./a.out ? > --- a/gcc/varpool.c > +++ b/gcc/varpool.c > @@ -173,7 +173,7 @@ varpool_node::get_create (tree decl) >node = varpool_node::create_empty (); >node->decl = decl; > > - if ((flag_openacc || flag_openmp) > + if ((flag_openacc || flag_openmp) && !DECL_EXTERNAL (decl) >&& lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl))) > { >node->offloadable = 1; Jakub
Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math
On Wed, Mar 11, 2015 at 08:52:54PM -0400, David Edelsohn wrote: > On Wed, Mar 11, 2015 at 6:21 PM, Michael Meissner > wrote: > > On Wed, Mar 11, 2015 at 01:02:06PM -0400, David Edelsohn wrote: > >> I am concerned with the create_TOC_reference use for TARGET_TOC. Has > >> this been tested with big endian -mcmodel=small? > > > > Yes, that was a problem. Patch coming up soon. Thanks. > > Can you call rs6000_emit_move_directly? Well, I can, but I would have to have some sort of flag that says after the split1 pass not to allow FP constants in move (other than 0.0). It is doable, but it does touch more areas in the rs6000 back end. I am starting to think that it is just simpler to rip out all of the special fast math handling of constants, considering the multiply by reciprocal support has moved to SSA/tree and away from RTL. Did you want me to investigate the performance implications of removing it now (rather than waiting to GCC 6.0), or just do the more limited patch that I've been pursuing. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math
On Thu, Mar 12, 2015 at 11:29 AM, Michael Meissner wrote: > On Wed, Mar 11, 2015 at 08:52:54PM -0400, David Edelsohn wrote: >> On Wed, Mar 11, 2015 at 6:21 PM, Michael Meissner >> wrote: >> > On Wed, Mar 11, 2015 at 01:02:06PM -0400, David Edelsohn wrote: >> >> I am concerned with the create_TOC_reference use for TARGET_TOC. Has >> >> this been tested with big endian -mcmodel=small? >> > >> > Yes, that was a problem. Patch coming up soon. Thanks. >> >> Can you call rs6000_emit_move_directly? > > Well, I can, but I would have to have some sort of flag that says after the > split1 pass not to allow FP constants in move (other than 0.0). It is doable, > but it does touch more areas in the rs6000 back end. > > I am starting to think that it is just simpler to rip out all of the special > fast math handling of constants, considering the multiply by reciprocal > support > has moved to SSA/tree and away from RTL. Did you want me to investigate the > performance implications of removing it now (rather than waiting to GCC 6.0), > or just do the more limited patch that I've been pursuing. Please check on the performance implications of removing the special constant support. I know that it is late, but I think that ripping it out is less risky than trying to fix this, if the performance impact is not bad. Thanks, David
libgo patch committed: It's OK to use cgo on PPC
The cgo tool installed by gccgo works fine on 32-bit PPC. This patch notes that fact in the gccgo version of the go tool. This is GCC PR 65404. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian diff -r 81cc50c9140d libgo/go/go/build/build.go --- a/libgo/go/go/build/build.goMon Mar 09 17:13:50 2015 -0700 +++ b/libgo/go/go/build/build.goThu Mar 12 09:32:03 2015 -0700 @@ -268,6 +268,7 @@ "linux/alpha": true, "linux/amd64": true, "linux/arm": true, + "linux/ppc": true, "linux/ppc64": true, "linux/ppc64le": true, "linux/s390": true,
gotools patch committed: Build gotools with compiler options
This patch changes the gotools to add GOCFLAGS to the build command, since the command is both compiling and linking. The main effect of this is to, by default, build with -g -O2, which previously was not happening. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian 2015-03-12 Ian Lance Taylor * Makefile.am (GOLINK): Add GOCFLAGS. * Makefile.in: Rebuild. Index: gotools/Makefile.am === --- gotools/Makefile.am (revision 220066) +++ gotools/Makefile.am (working copy) @@ -39,7 +39,7 @@ GOCFLAGS = $(CFLAGS_FOR_TARGET) GOCOMPILE = $(GOCOMPILER) $(GOCFLAGS) AM_LDFLAGS = -L $(libgodir) -L $(libgodir)/.libs -GOLINK = $(GOCOMPILER) $(AM_GOCFLAGS) $(LDFLAGS) $(AM_LDFLAGS) -o $@ +GOLINK = $(GOCOMPILER) $(GOCFLAGS) $(AM_GOCFLAGS) $(LDFLAGS) $(AM_LDFLAGS) -o $@ cmdsrcdir = $(srcdir)/../libgo/go/cmd
Re: [C++ Patch] PR 65323
On 03/12/2015 06:13 AM, Paolo Carlini wrote: 52718_red.C:1:22: warning: zero as null pointer constant [-Wzero-as-null-pointer-constant] void* fun(void* a = 0); 52718_red.C:2:16: warning: zero as null pointer constant [-Wzero-as-null-pointer-constant] void* f2 = fun(); OK, then your second patch is OK. But please add a comment that the code is there to avoid a redundant warning at the call site. Jason
Re: libgo patch committed: It's OK to use cgo on PPC
On 03/12/2015 05:41 PM, Ian Lance Taylor wrote: > The cgo tool installed by gccgo works fine on 32-bit PPC. This patch > notes that fact in the gccgo version of the go tool. This is GCC PR > 65404. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. > Committed to mainline. same thing needs to be done for arm64: >From 391fba3b788628ef6431765c382a51f52a93cddf Mon Sep 17 00:00:00 2001 From: Michael Hudson-Doyle Date: Wed, 27 Aug 2014 14:57:07 +1200 Subject: [PATCH 3/3] Enable cgo by default on linux/arm64. --- src/libgo/go/go/build/build.go | 1 + 1 file changed, 1 insertion(+) Index: b/src/libgo/go/go/build/build.go === --- a/src/libgo/go/go/build/build.go +++ b/src/libgo/go/go/build/build.go @@ -268,6 +268,7 @@ var cgoEnabled = map[string]bool{ "linux/alpha": true, "linux/amd64": true, "linux/arm": true, + "linux/arm64": true, "linux/ppc64": true, "linux/ppc64le": true, "linux/s390": true, > > Ian >
Re: libgo patch committed: It's OK to use cgo on PPC
On Thu, Mar 12, 2015 at 10:00 AM, Matthias Klose wrote: > On 03/12/2015 05:41 PM, Ian Lance Taylor wrote: >> The cgo tool installed by gccgo works fine on 32-bit PPC. This patch >> notes that fact in the gccgo version of the go tool. This is GCC PR >> 65404. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. >> Committed to mainline. > > same thing needs to be done for arm64: Thanks. Committed. Ian
[C PATCH] Fix up file-scope _Atomic expansion (PR c/65345)
The PR shows that the compiler ICEs whenever it tries to expand an atomic operation at the file scope. That happens because it creates temporaries via create_tmp_var, which also pushes the variable into the current binding, but that can't work if current_function_decl is NULL. The fix is I think to only generate the temporaries during gimplification. Turned out that the TARGET_EXPRs are tailor-made for this, so I've used them along with changing create_tmp_var calls to create_tmp_var_raw that does not push the variable into the current binding. But this wasn't enough to handle the following case: _Atomic int i = 5; void f (int a[i += 1]) {} To make it work I had to tweak the artificial labels that build_atomic_assign creates to not ICE in gimplification. The comment in store_parm_decls sums it up. It uses walk_tree, but I think this will be only rarely exercised in practice, if ever; I think programs using such a construction are thin on the ground. I tried comparing .gimple dumps with/without the patch on _Atomic int q = 4; void f (void) { q += 2; } and I see no code changes. This is not a regression, so not sure if I shouldn't defer this patch to the next stage1 at this juncture... Comments? Bootstrapped/regtested on x86_64-linux. 2015-03-12 Marek Polacek PR c/65345 * c-decl.c (set_labels_context_r): New function. (store_parm_decls): Call it via walk_tree_without_duplicates. * c-typeck.c (convert_lvalue_to_rvalue): Use create_tmp_var_raw instead of create_tmp_var. Build TARGET_EXPR instead of COMPOUND_EXPR. (build_atomic_assign): Use create_tmp_var_raw instead of create_tmp_var. Build TARGET_EXPRs instead of MODIFY_EXPR. * gcc.dg/pr65345-1.c: New test. * gcc.dg/pr65345-2.c: New test. --- gcc/c/c-decl.c +++ gcc/c/c-decl.c @@ -8799,6 +8799,21 @@ store_parm_decls_from (struct c_arg_info *arg_info) store_parm_decls (); } +/* Called by walk_tree to look for and update context-less labels. */ + +static tree +set_labels_context_r (tree *tp, int *walk_subtrees, void *data) +{ + if (TREE_CODE (*tp) == LABEL_EXPR + && DECL_CONTEXT (LABEL_EXPR_LABEL (*tp)) == NULL_TREE) +{ + DECL_CONTEXT (LABEL_EXPR_LABEL (*tp)) = static_cast(data); + *walk_subtrees = 0; +} + + return NULL_TREE; +} + /* Store the parameter declarations into the current function declaration. This is called after parsing the parameter declarations, before digesting the body of the function. @@ -8853,7 +8868,21 @@ store_parm_decls (void) thus won't naturally see the SAVE_EXPR containing the increment. All other pending sizes would be handled by gimplify_parameters. */ if (arg_info->pending_sizes) -add_stmt (arg_info->pending_sizes); +{ + /* In very special circumstances, e.g. for code like + _Atomic int i = 5; + void f (int a[i += 2]) {} +we need to execute the atomic assignment on function entry. +But in this case, it is not just a straight store, it has the +op= form, which means that build_atomic_assign has generated +gotos, labels, etc. Because at that time the function decl +for F has not been created yet, those labels do not have any +function context. But we have the fndecl now, so update the +labels accordingly. gimplify_expr would crash otherwise. */ + walk_tree_without_duplicates (&arg_info->pending_sizes, + set_labels_context_r, fndecl); + add_stmt (arg_info->pending_sizes); +} } /* Store PARM_DECLs in PARMS into scope temporarily. Used for --- gcc/c/c-typeck.c +++ gcc/c/c-typeck.c @@ -2039,7 +2039,7 @@ convert_lvalue_to_rvalue (location_t loc, struct c_expr exp, /* Remove the qualifiers for the rest of the expressions and create the VAL temp variable to hold the RHS. */ nonatomic_type = build_qualified_type (expr_type, TYPE_UNQUALIFIED); - tmp = create_tmp_var (nonatomic_type); + tmp = create_tmp_var_raw (nonatomic_type); tmp_addr = build_unary_op (loc, ADDR_EXPR, tmp, 0); TREE_ADDRESSABLE (tmp) = 1; TREE_NO_WARNING (tmp) = 1; @@ -2055,7 +2055,8 @@ convert_lvalue_to_rvalue (location_t loc, struct c_expr exp, mark_exp_read (exp.value); /* Return tmp which contains the value loaded. */ - exp.value = build2 (COMPOUND_EXPR, nonatomic_type, func_call, tmp); + exp.value = build4 (TARGET_EXPR, nonatomic_type, tmp, func_call, + NULL_TREE, NULL_TREE); } return exp; } @@ -3686,10 +3687,11 @@ build_atomic_assign (location_t loc, tree lhs, enum tree_code modifycode, the VAL temp variable to hold the RHS. */ nonatomic_lhs_type = build_qualified_type (lhs_type, TYPE_UNQUALIFIED); nonatomic_rhs_type = build_qualified_type (rhs_type, TYPE_UNQUALIFIED); - val = create_tmp_var (nonatomic_rhs_type); + val = create_tmp_var_raw (
Ping: [Patch, fortran] PR61138 Wrong code with pointer-bounds remapping
Ping: https://gcc.gnu.org/ml/fortran/2015-02/msg00045.html
Re: [Patch, fortran] PR64952 - Missing temporary in assignment from elemental function
Hello Paul, have you had time to look at this again? Mikael
[patch] libstdc++/64847 add autoconf checks for pthread_rwlock_t
I assumed that Pthreads was enough to ensure pthread_rwlock_t but https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64847 shows that isn't true for HPUX (seems it was optional prior to POSIX 1003.1-2001). This adds an autoconf check to decide whether to use pthread_rwlock_t or the fallback implementation in terms of std::condition_variable and std::mutex. This also includes some fixes from Torvald so that we loop and retry if libc returns EAGAIN, to handle a difference in semantics between POSIX and C++14. And as an optimization I've made the _M_rwlock member use the PTHREAD_RWLOCK_INITIALIZER macro if available. Tested x86_64-linux, ppc64le-linux and x86_64-dragonfly. I plan to commit this to trunk tomorrow. commit 7212446ada7d741f6fe0fc9d9fca9d5b55322384 Author: Jonathan Wakely Date: Thu Mar 12 17:29:42 2015 + 2015-03-12 Jonathan Wakely Torvald Riegel PR libstdc++/64847 * acinclude.m4 (GLIBCXX_CHECK_GTHREADS): Check for pthread_rwlock_t. * config.h.in: Regenerate. * configure: Regenerate. * include/std/shared_mutex: Check _GLIBCXX_USE_PTHREADS_RWLOCKS. (shared_timed_mutex::_M_rwlock): Use PTHREAD_RWLOCK_INITIALIZER. (shared_timed_mutex::lock_shared()): Retry on EAGAIN. (shared_timed_mutex::try_lock_shared_until()): Retry on EAGAIN and EDEADLK. diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index 1727140..86628c0 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -3563,6 +3563,13 @@ AC_DEFUN([GLIBCXX_CHECK_GTHREADS], [ if test x"$ac_has_gthreads" = x"yes"; then AC_DEFINE(_GLIBCXX_HAS_GTHREADS, 1, [Define if gthreads library is available.]) + +# Also check for pthread_rwlock_t for std::shared_timed_mutex in C++14 +AC_CHECK_TYPE([pthread_rwlock_t], +[AC_DEFINE([_GLIBCXX_USE_PTHREADS_RWLOCKS], 1, +[Define if POSIX read/write locks are available in .])], +[], +[#include "gthr.h"]) fi CXXFLAGS="$ac_save_CXXFLAGS" diff --git a/libstdc++-v3/include/std/shared_mutex b/libstdc++-v3/include/std/shared_mutex index 5dcc295..61251b0 100644 --- a/libstdc++-v3/include/std/shared_mutex +++ b/libstdc++-v3/include/std/shared_mutex @@ -57,10 +57,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION /// shared_timed_mutex class shared_timed_mutex { -#if defined(__GTHREADS_CXX0X) +#ifdef _GLIBCXX_USE_PTHREADS_RWLOCKS typedef chrono::system_clock __clock_t; -pthread_rwlock_t _M_rwlock; +#ifdef PTHREAD_RWLOCK_INITIALIZER +pthread_rwlock_t _M_rwlock = PTHREAD_RWLOCK_INITIALIZER; + + public: +shared_timed_mutex() = default; +~shared_timed_mutex() = default; +#else +pthread_rwlock_t _M_rwlock; public: shared_timed_mutex() @@ -82,6 +89,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // Errors not handled: EBUSY, EINVAL _GLIBCXX_DEBUG_ASSERT(__ret == 0); } +#endif shared_timed_mutex(const shared_timed_mutex&) = delete; shared_timed_mutex& operator=(const shared_timed_mutex&) = delete; @@ -165,12 +173,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION void lock_shared() { - int __ret = pthread_rwlock_rdlock(&_M_rwlock); + int __ret; + do + __ret = pthread_rwlock_rdlock(&_M_rwlock); + // We retry if we exceeded the maximum number of read locks supported by + // the POSIX implementation; this can result in busy-waiting, but this + // is okay based on the current specification of forward progress + // guarantees by the standard. + while (__ret == EAGAIN); if (__ret == EDEADLK) __throw_system_error(int(errc::resource_deadlock_would_occur)); - if (__ret == EAGAIN) - // Maximum number of read locks has been exceeded. - __throw_system_error(int(errc::device_or_resource_busy)); // Errors not handled: EINVAL _GLIBCXX_DEBUG_ASSERT(__ret == 0); } @@ -210,11 +222,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION static_cast(__ns.count()) }; - int __ret = pthread_rwlock_timedrdlock(&_M_rwlock, &__ts); + int __ret; + do + __ret = pthread_rwlock_timedrdlock(&_M_rwlock, &__ts); // If the maximum number of read locks has been exceeded, or we would - // deadlock, we just fail to acquire the lock. Unlike for lock(), - // we are not allowed to throw an exception. - if (__ret == ETIMEDOUT || __ret == EAGAIN || __ret == EDEADLK) + // deadlock, we just try to acquire the lock again (and will time out + // eventually). Unlike for lock(), we are not allowed to throw an + // exception. In cases where we would exceed the maximum number of + // read locks throughout the whole time until the timeout, we will + // fail to acquire the lock even if it would be logically free; + // however, this is allowed by the standard, and we made a "strong + // effort" (see C++14 30.4.1.4p26). + while (__ret == EAGAIN || __ret == EDEADLK); + if (__ret == ETIMEDOUT) return false; // Errors not handled: EINVAL _GLIBCXX_DEBUG_ASSERT(__ret
Re: [PATCH][OpenMP] Fix declare target variables in fortran modules
On Thu, Mar 12, 2015 at 15:21:35 +0100, Jakub Jelinek wrote: > On Thu, Mar 12, 2015 at 04:56:35PM +0300, Ilya Verbin wrote: > > This happens because the var_x is added into offload tables for both > > share.o and > > test.o. The patch below fixes this issue. Regtested on x86_64-linux and > > i686-linux. However I'm not sure how to create a regression test, which > > would > > compile 2 separate objects, and check run-time result. > > Ok with proper ChangeLog entry. > > As for testcase, won't dg-additional-sources help? > I mean, does it fail without your patch even if you just do > gfortran -fopenmp -o a.out share.f90 test.f90; ./a.out ? Yes, this works. Here is what I will commit tomorrow, if no objections. gcc/ * varpool.c (varpool_node::get_create): Don't set 'offloadable' flag for the external decls. libgomp/ * testsuite/libgomp.fortran/declare-target-1.f90: New test. * testsuite/libgomp.fortran/declare-target-2.f90: New file. diff --git a/gcc/varpool.c b/gcc/varpool.c index b583693..ce64279 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -173,7 +173,7 @@ varpool_node::get_create (tree decl) node = varpool_node::create_empty (); node->decl = decl; - if ((flag_openacc || flag_openmp) + if ((flag_openacc || flag_openmp) && !DECL_EXTERNAL (decl) && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl))) { node->offloadable = 1; diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-1.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-1.f90 new file mode 100644 index 000..fd9c26f --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/declare-target-1.f90 @@ -0,0 +1,15 @@ +! { dg-do run } +! { dg-additional-sources declare-target-2.f90 } + +module declare_target_1_mod + integer :: var_x + !$omp declare target(var_x) +end module declare_target_1_mod + + interface +subroutine foo () +end subroutine foo + end interface + + call foo () +end diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-2.f90 b/libgomp/testsuite/libgomp.fortran/declare-target-2.f90 new file mode 100644 index 000..f8d3ab2 --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/declare-target-2.f90 @@ -0,0 +1,18 @@ +! Don't compile this anywhere, it is just auxiliary +! file compiled together with declare-target-1.f90 +! to verify inter-CU module handling of omp declare target. +! { dg-do compile { target { lp64 && { ! lp64 } } } } + +subroutine foo + use declare_target_1_mod + + var_x = 10 + !$omp target update to(var_x) + + !$omp target +var_x = var_x * 2; + !$omp end target + + !$omp target update from(var_x) + if (var_x /= 20) call abort +end subroutine foo -- Ilya
Re: [PATCH][OpenMP] Fix declare target variables in fortran modules
On Thu, Mar 12, 2015 at 10:22:37PM +0300, Ilya Verbin wrote: > On Thu, Mar 12, 2015 at 15:21:35 +0100, Jakub Jelinek wrote: > > On Thu, Mar 12, 2015 at 04:56:35PM +0300, Ilya Verbin wrote: > > > This happens because the var_x is added into offload tables for both > > > share.o and > > > test.o. The patch below fixes this issue. Regtested on x86_64-linux and > > > i686-linux. However I'm not sure how to create a regression test, which > > > would > > > compile 2 separate objects, and check run-time result. > > > > Ok with proper ChangeLog entry. > > > > As for testcase, won't dg-additional-sources help? > > I mean, does it fail without your patch even if you just do > > gfortran -fopenmp -o a.out share.f90 test.f90; ./a.out ? > > Yes, this works. Here is what I will commit tomorrow, if no objections. Ok, thanks. > gcc/ > * varpool.c (varpool_node::get_create): Don't set 'offloadable' flag for > the external decls. > libgomp/ > * testsuite/libgomp.fortran/declare-target-1.f90: New test. > * testsuite/libgomp.fortran/declare-target-2.f90: New file. > > > diff --git a/gcc/varpool.c b/gcc/varpool.c > index b583693..ce64279 100644 > --- a/gcc/varpool.c > +++ b/gcc/varpool.c > @@ -173,7 +173,7 @@ varpool_node::get_create (tree decl) >node = varpool_node::create_empty (); >node->decl = decl; > > - if ((flag_openacc || flag_openmp) > + if ((flag_openacc || flag_openmp) && !DECL_EXTERNAL (decl) >&& lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl))) > { >node->offloadable = 1; > diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-1.f90 > b/libgomp/testsuite/libgomp.fortran/declare-target-1.f90 > new file mode 100644 > index 000..fd9c26f > --- /dev/null > +++ b/libgomp/testsuite/libgomp.fortran/declare-target-1.f90 > @@ -0,0 +1,15 @@ > +! { dg-do run } > +! { dg-additional-sources declare-target-2.f90 } > + > +module declare_target_1_mod > + integer :: var_x > + !$omp declare target(var_x) > +end module declare_target_1_mod > + > + interface > +subroutine foo () > +end subroutine foo > + end interface > + > + call foo () > +end > diff --git a/libgomp/testsuite/libgomp.fortran/declare-target-2.f90 > b/libgomp/testsuite/libgomp.fortran/declare-target-2.f90 > new file mode 100644 > index 000..f8d3ab2 > --- /dev/null > +++ b/libgomp/testsuite/libgomp.fortran/declare-target-2.f90 > @@ -0,0 +1,18 @@ > +! Don't compile this anywhere, it is just auxiliary > +! file compiled together with declare-target-1.f90 > +! to verify inter-CU module handling of omp declare target. > +! { dg-do compile { target { lp64 && { ! lp64 } } } } > + > +subroutine foo > + use declare_target_1_mod > + > + var_x = 10 > + !$omp target update to(var_x) > + > + !$omp target > +var_x = var_x * 2; > + !$omp end target > + > + !$omp target update from(var_x) > + if (var_x /= 20) call abort > +end subroutine foo > > > -- Ilya Jakub
[PATCH][ARM] New testcase to check parameter passing bug
Hi, I have wrote a testcase that reproduces argument overwriting bug during arm code generation. I wrote this testcase with the help of Mikael Pettersson. If some format is not proper to run in gcc testsuite framework, please correct me. Please refer to the following bugzilla link for details: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65358 Honggyu --- gcc/testsuite/ChangeLog|5 + gcc/testsuite/gcc.target/arm/pr65358.c | 34 2 files changed, 39 insertions(+) create mode 100644 gcc/testsuite/gcc.target/arm/pr65358.c diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 5302dbd..9acd12a 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2015-03-13 Honggyu Kim + + PR target/65235 + * gcc.target/arm/pr65358.c: New test for sibcall argument passing bug. + 2015-03-12 Kyrylo Tkachov PR rtl-optimization/65235 diff --git a/gcc/testsuite/gcc.target/arm/pr65358.c b/gcc/testsuite/gcc.target/arm/pr65358.c new file mode 100644 index 000..d663dcf --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr65358.c @@ -0,0 +1,34 @@ +/* PR target/65358 */ +/* { dg-do compile { target arm*-*-* } } */ +/* { dg-options "-O2" } */ + +struct pack +{ + int fine; + int victim; + int killer; +}; + +int __attribute__ ((__noinline__, __noclone__)) +bar (int a, int b, struct pack p) +{ + if (a != 20 || b != 30) +__builtin_abort (); + if (p.fine != 40 || p.victim != 50 || p.killer != 60) +__builtin_abort (); + return 0; +} + +int __attribute__ ((__noinline__, __noclone__)) +foo (int arg1, int arg2, int arg3, struct pack p) +{ + return bar (arg2, arg3, p); +} + +int main (void) +{ + struct pack p = { 40, 50, 60 }; + + (void) foo (10, 20, 30, p); + return 0; +} -- 1.7.9.5
Re: [PATCH] Fix PR44563 more
> > CFG cleanup currently searches for calls that became noreturn and > > fixes them up (splitting block and removing the fallthru). Previously > > that was technically necessary as propagation may have turned an > > indirect call into a direct noreturn call and the CFG verifier would > > have barfed. Today we guard that with GF_CALL_CTRL_ALTERING and > > thus we "remember" the previous call analysis. Yep, I remember introducing this in back in tree-SSA branch days as kind of aftertought. > > > > The following patch removes the CFG cleanup code (which is expensive > > because gimple_call_flags () is quite expensive, not to talk about > > walking all stmts). This leaves the fixup_cfg passes to perform the > > very same optimization (relevant propagators can also be teached > > to call fixup_noreturn_call, but I don't think that's very important). > > > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. > > > > I'm somewhat undecided whether this is ok at this stage and if we > > _do_ want to make propagators fix those (previously indirect) calls up > > earlier at the same time. > > > > Honza - I think we performed this in CFG cleanup for the sake of CFG > > checking, not for the sake of prompt optimization, no? It is first time I hear this. We have verify_flow_info. I think most of CFG cleanups was scheudled because we need update-ssa and that would bomb on unreachable basic blocks. > > > > This would make PR44563 a pure IPA pass issue. > > Soo - testing revealed a single case where we mess up things (and > the verifier noticing only because of a LHS on a noreturn call...). > > The following patch makes all propagators handle the noreturn transition > (the paths in all but PRE are not exercised by bootstrap or testsuite :/). > > This patch makes CFG cleanup independent on BB size (during analysis, > merge_blocks and delete_basic_block are still O(n)) - which is > a very much desired property. > > It also changes fixup_cfg to produce a dump only when run as > separate pass (otherwise the .optimized dump changes and I get > tons of scan related fails) - that also reduces noise in the > very many places we dump functions (they are dumped anyway for > all cases). > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > I wonder if you can throw this on firefox/chromium - the critical > paths are devirtualization introducing __builtin_unreachable. I built chromium and firefox without problems with your patch. > > This patch should get a good speedup on all compiles (we run > CFG-cleanup a _lot_), by removing pointless IL walks and expensive > gimple_call_flags calls on calls. Yes, i definitely like it. The expensiveness of cfg-cleanup always quite bothered me. On unrelated note, I think it is possible to do cfg-cleanup with only fixed number of passes over CFG (my old RTL code did that but it was changed since then adding crossjumping). Both RTL and tree cfg cleanups have complicated history, there are probably quite few ways to get them cheaper. Thanks for working on it. As I noted in PR I plan to finally fix the inliner non-linearity with the sreal metrics next stage1 too. Honza
Re: [PATCH] Speed-up def_builtin_const (ix86_valid_target_attribute)
> 2015-03-09 Martin Liska > > * config/i386/i386.c (def_builtin): Collect union of all > possible masks. > (ix86_add_new_builtins): Do not iterate over all builtins > in cases that isa value has no intersection with possible masks > and(or) last passed value is equal to the provided. > --- > gcc/config/i386/i386.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index ab8f03a..5f180b6 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -30592,6 +30592,8 @@ struct builtin_isa { > > static struct builtin_isa ix86_builtins_isa[(int) IX86_BUILTIN_MAX]; > > +/* Union of all masks that are part of builtin_isa structures. */ > +static HOST_WIDE_INT defined_isa_values = 0; > > /* Add an ix86 target builtin function with CODE, NAME and TYPE. Save the > MASK > of which isa_flags to use in the ix86_builtins_isa array. Stores the > @@ -30619,6 +30621,7 @@ def_builtin (HOST_WIDE_INT mask, const char *name, >if (!(mask & OPTION_MASK_ISA_64BIT) || TARGET_64BIT) > { >ix86_builtins_isa[(int) code].isa = mask; > + defined_isa_values |= mask; I think you can move this down to set_and_not_build_p set. Please add also comment explaining the caching mehanism. > >mask &= ~OPTION_MASK_ISA_64BIT; >if (mask == 0 > @@ -30670,6 +30673,14 @@ def_builtin_const (HOST_WIDE_INT mask, const char > *name, > static void > ix86_add_new_builtins (HOST_WIDE_INT isa) > { > + /* Last cached isa value. */ > + static HOST_WIDE_INT last_tested_isa_value = 0; > + > + if ((isa & defined_isa_values) == 0 || isa == last_tested_isa_value) Heer you need to compare (isa & defined_isa_values) == (isa & last_tested_isa_value) right, because we have isa flags that enable no builtins. Honza
Fix polymorphic type matching in ipa-icf
Hi, this patch fixes IPA-ICF's polymorphic type matching. Basically ipa-polymorphic-call looks for the following cases: - data living in automatic or static variables of polymorphic type - dynamic type changes done by explicit calls to constructor or writes of virtual table pointer - parameters of THIS pointer of methods - data living in parameters/return values of polymorphic types passed by invisible reference. In these cases it may derive type of the instance from them. Current implementation of ipa-icf mixes type compatibility checks with polymorphic type checks and checks types of everything that is useless and somehwat expensive. It disables the checks for leaf functions (that are commonly merged). This is however not 100% safe, because constructor calls and THIS pointer writes may get inlined and the information may be propagated outside of the function code itself. This patch restructures the checks to be safe in this case and to do polymorphic type matching independently of the type checking. The common reason why merging fails is mismat of THIS pointer type (not very suprisingly). Next stae1 I think we can lift this restriction and simply merge polymorphic type info in ipa-prop's jump functions. Bootstrapped/regtested x86_64-linux and also tested with Firefox and Chromium. Honza * ipa-icf.c (sem_function::equals_wpa): Match CXX_CONSTRUCTOR_P and CXX_DESTURCTOR_P. For consutrctors match ODR type of class they are building; for methods check ODR type of class they belong to if they may lead to a polymorphic call. (sem_function::compare_polymorphic_p): Be bit smarter about testing when function may lead to a polymorphic call. (sem_function::compare_type_list): Remove. (sem_variable::equals): Update use of compatible_types_p. (sem_variable::parse_tree_refs): Remove. (sem_item_optimizer::filter_removed_items): Do not filter out CXX cdtor. * ipa-icf-gimple.c (func_checker::compare_decl): Do polymorphic matching here. (func_checker::compatible_polymorphic_types_p): Break out from ... (unc_checker::compatible_types_p): ... here. * ipa-icf-gimple.h (func_checker::compatible_polymorphic_types_p): Declare. (unc_checker::compatible_types_p): Update. * ipa-icf.h (compare_type_list, parse_tree_refs, compare_sections): Remove. Index: ipa-icf.c === --- ipa-icf.c (revision 221405) +++ ipa-icf.c (working copy) @@ -429,9 +429,29 @@ sem_function::equals_wpa (sem_item *item if (DECL_NO_LIMIT_STACK (decl) != DECL_NO_LIMIT_STACK (item->decl)) return return_false_with_msg ("no stack limit attributes are different"); + if (DECL_CXX_CONSTRUCTOR_P (decl) != DECL_CXX_CONSTRUCTOR_P (item->decl)) +return return_false_with_msg ("DELC_CXX_CONSTRUCTOR mismatch"); + + if (DECL_CXX_DESTRUCTOR_P (decl) != DECL_CXX_DESTRUCTOR_P (item->decl)) +return return_false_with_msg ("DELC_CXX_DESTRUCTOR mismatch"); + if (flags_from_decl_or_type (decl) != flags_from_decl_or_type (item->decl)) return return_false_with_msg ("decl_or_type flags are different"); + /* Do not match polymorphic constructors of different types. They calls + type memory location for ipa-polymorphic-call and we do not want + it to get confused by wrong type. */ + if (DECL_CXX_CONSTRUCTOR_P (decl) + && TREE_CODE (TREE_TYPE (decl)) == METHOD_TYPE) +{ + if (TREE_CODE (TREE_TYPE (item->decl)) != METHOD_TYPE) +return return_false_with_msg ("DECL_CXX_CONSTURCTOR type mismatch"); + else if (!func_checker::compatible_polymorphic_types_p +(method_class_type (TREE_TYPE (decl)), + method_class_type (TREE_TYPE (item->decl)), false)) +return return_false_with_msg ("ctor polymorphic type mismatch"); +} + /* Checking function TARGET and OPTIMIZATION flags. */ cl_target_option *tar1 = target_opts_for_fn (decl); cl_target_option *tar2 = target_opts_for_fn (item->decl); @@ -473,13 +493,8 @@ sem_function::equals_wpa (sem_item *item if (!arg_types[i] || !m_compared_func->arg_types[i]) return return_false_with_msg ("NULL argument type"); - /* Polymorphic comparison is executed just for non-leaf functions. */ - bool is_not_leaf = get_node ()->callees != NULL -|| get_node ()->indirect_calls != NULL; - if (!func_checker::compatible_types_p (arg_types[i], -m_compared_func->arg_types[i], -is_not_leaf, i == 0)) +m_compared_func->arg_types[i])) return return_false_with_msg ("argument type is different"); if (POINTER_TYPE_P (arg_types[i]) && (TYPE_RESTRICT (arg_types[i]) @@ -494,6 +509,24 @@ sem_function::e