[PATCH v2] Rerun loop-header-copying just before vectorization

2015-06-19 Thread Alan Lawrence
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02139.html . Changes are: * Separate the two passes by descending from a common base class, allowing different predicates; * Test flag_tree_vectorize, and loop->force_vectorize/dont_vectorize - this fixes the test failing

Re: [PATCH/RFC] Make loop-header-copying more aggressive, rerun before tree-if-conversion

2015-06-19 Thread Alan Lawrence
Richard Biener wrote: Apart from Jeffs comment - the usual fix for the undesired vectorization is to put a __asm__ volatile (""); in the loop. In vect-strided-a-u16-i4.c, narrowing the scope of the declaration seemed to preserve the original intent. I've been able to drop the other testsuite c

Re: [PATCH/RFC] Make loop-header-copying more aggressive, rerun before tree-if-conversion

2015-06-19 Thread Alan Lawrence
Jeff Law wrote: On 05/22/2015 09:42 AM, Alan Lawrence wrote: This patch does so (and makes slightly less conservative, to tackle the example above). I found I had to make this a separate pass, so that the phi nodes were cleaned up at the end of the pass before running tree_if_conversion. What

Re: fix PR46029: reimplement if conversion of loads and stores

2015-06-22 Thread Alan Lawrence
Abe Skolnik wrote: Hi everybody! In the current implementation of if conversion, loads and stores are if-converted in a thread-unsafe way: * loads were always executed, even when they should have not been. Some source code could be rendered invalid due to null pointers that were OK in

[PATCH 2/3][AArch64 nofp] Clarify docs for +nofp/-mgeneral-regs-only

2015-06-23 Thread Alan Lawrence
James Greenhalgh wrote: -Generate code which uses only the general registers. +Generate code which uses only the general registers. Equivalent to feature The ARMARM uses "general-purpose registers" to refer to these registers, we should match that style. s/Equivalent to feature/This is equi

[PATCH 1/3][AArch64 nofp] Fix ICEs with +nofp/-mgeneral-regs-only and improve error messages

2015-06-23 Thread Alan Lawrence
James Greenhalgh wrote: Submissions on this list should be one patch per mail, it makes tracking review easier. OK here's a respin of the first, I've added a third patch after I found another route to get to an ICE. +void +aarch64_err_no_fpadvsimd (machine_mode mode, const char *msg) +{ +

[PATCH 3/3][AArch64 nofp] Fix another ICE with +nofp/-mgeneral-regs-only

2015-06-23 Thread Alan Lawrence
This fixes another ICE, obtained with the attached testcase - yes, there was a way to get hold of a float, without passing an argument or going through movsf/movdf! Bootstrapped + check-gcc on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64.md (2): Condition on

Re: fix PR46029: reimplement if conversion of loads and stores

2015-06-26 Thread Alan Lawrence
Sebastian Pop wrote: On Thu, Jun 25, 2015 at 4:43 AM, Richard Biener wrote: when the new scheme triggers vectorization cannot succeed on the result as we get if (cond) *p = val; if-converted to tem = cond ? p : &scratch; *tem = val; That's correct. and if (cond) val =

Re: fix PR46029: reimplement if conversion of loads and stores

2015-06-30 Thread Alan Lawrence
Abe Skolnik wrote: In tree-if-conv.c:[…]> if it doesn't trap, but has_non_addressable_refs, can't we use ifcvt_can_use_mask_load_store there too? if an access could trap, but is addressable,> can't we use the scratchpad technique to get round the trapping problem? That`s how we deal with loa

Re: [PATCH/RFC] Make loop-header-copying more aggressive, rerun before tree-if-conversion

2015-06-30 Thread Alan Lawrence
Jeff Law wrote: Thanks. Does running the phi-only propagator after the loop header copying help? At first glance it would seem that it ought to propagate the values of those degenerate PHIs then eliminate those PHIs. It was written to cleanup after jump threading which has a tendency to cre

Re: [PATCH] fix PR46029: reimplement if conversion of loads and stores [2nd submitted version of patch]

2015-07-02 Thread Alan Lawrence
Thanks, Abe. A couple comments below... @@ -883,7 +733,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt, if (flag_tree_loop_if_convert_stores) { - if (ifcvt_could_trap_p (stmt, refs)) + if (ifcvt_could_trap_p (stmt)) { if (ifcvt_can_use_mask_load_store

Re: [PATCH][RFC] Add FRE in pass_vectorize

2015-07-02 Thread Alan Lawrence
Jeff Law wrote: On 06/24/2015 01:59 AM, Richard Biener wrote: And then there is the possibility of making passes generate less needs to perform cleanups after them - like in the present case with the redundant IVs make them more appearant redundant by CSEing the initial value and step during vec

Re: [PATCH v2] Rerun loop-header-copying just before vectorization

2015-07-02 Thread Alan Lawrence
With those comment fixes, this is OK for the trunk. jeff Thank you for review - I've pushed r225311 with what I hope are appropriate comment fixes. Cheers, Alan

Re: [PATCH 0/3] [ARM] PR63870 improve error messages for NEON vldN_lane/vstN_lane

2015-07-03 Thread Alan Lawrence
Charles Baylis wrote: These patches are a port of the changes do the same thing for AArch64 (see https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01984.html) The first patch ports over some infrastructure, and the second converts the vldN_lane and vstN_lane intrinsics. The changes required for vget

[PATCH 0/2][trunk+5 backport][ARM] PR/65956 Implement AAPCS updates for alignment attribute

2015-07-03 Thread Alan Lawrence
This patch series implements the changes/additions to the ARM ABI proposed at https://gcc.gnu.org/ml/gcc/2015-07/msg00040.html . The first patch is the ABI update. This is an ABI-breaking change for any code using __attribute__((aligned(...))) on a public interface (a case not previously defin

[PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-07-03 Thread Alan Lawrence
These include tests of structs, scalars, and vectors - only general-purpose registers are affected by the ABI rules for alignment, but we can restrict the vector test to use the base AAPCS. Prior to this patch, align2.c, align3.c and align_rec1.c were failing (the latter showing an internal in

[PATCH 2/2][ARM] fix movdi expander to avoid illegal ldrd/strd

2015-07-03 Thread Alan Lawrence
The previous patch caused a regression in gcc.c-torture/execute/20040709-1.c at -O0 (only), and the new align_rec2.c test fails, both outputting an illegal assembler instruction (ldrd on an odd-numbered reg) from output_move_double in arm.c. Most routes have checks against such an illegal instru

Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-07-06 Thread Alan Lawrence
Richard Biener wrote: I also believe this loop is equivalent to checking TYPE_ALIGN of the aggregate type? Jakub is correct: the intention is to discard any top-level alignment attribute on a struct declaration. I'll double check your wording in the abi document, but it seems to be unclea

Re: [PATCH 1/3] [ARM] PR63870 NEON error messages

2015-07-06 Thread Alan Lawrence
I note some parts of this duplicate my https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html , which has been pinged a couple of times. Both Charles' patch, and my two, contain parts the other does not... Cheers, Alan Charles Baylis wrote: gcc/ChangeLog: Charles Baylis * con

Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-07-06 Thread Alan Lawrence
have a working Ada compiler with which to bootstrap gcc's Ada frontend. Working on this now. --Alan gcc/ChangeLog: * config/arm/arm.c (arm_needs_doubleword_align) : Drop any outer alignment attribute, exploring one level down for records and arrays. commit f8bd310d65f2b8fd8

Re: [PATCH] fix PR46029: reimplement if conversion of loads and stores [2nd submitted version of patch]

2015-07-06 Thread Alan Lawrence
Abe wrote: On 7/2/15 4:49 AM, Alan Lawrence wrote: As before, I'm still confused here. This still returns false, i.e. bails out of if-conversion, if the statement could trap. Doesn't the scratchpad let us handle that? Or do we just not care because it won't be vectorizable a

Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-07-06 Thread Alan Lawrence
12:00, Alan Lawrence wrote: Eric Botcazou wrote: Technically this is incorrect since AGGREGATE_TYPE_P includes ARRAY_TYPE and ARRAY_TYPE doesn't have TYPE_FIELDS. I doubt we could reach that case though (unless there's a language that allows passing arrays by value). Ada passes small a

Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-07-06 Thread Alan Lawrence
Ramana Radhakrishnan wrote: On 06/07/15 17:38, Alan Lawrence wrote: Trying to push these now (svn!), patch 2 is going first. I realize my second iteration of patch 1/2, dropped the testcases from the first version. Okay to include those as per https://gcc.gnu.org/ml/gcc-patches/2015-07

Re: [PATCH 2/2][ARM] fix movdi expander to avoid illegal ldrd/strd

2015-07-06 Thread Alan Lawrence
Richard Earnshaw wrote: On 03/07/15 16:27, Alan Lawrence wrote: The previous patch caused a regression in gcc.c-torture/execute/20040709-1.c at -O0 (only), and the new align_rec2.c test fails, both outputting an illegal assembler instruction (ldrd on an odd-numbered reg) from output_move_double

Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-07-07 Thread Alan Lawrence
Ramana Radhakrishnan wrote: This is OK, the ada testing can go in parallel and we should take this in to not delay rc1 any further. I can confirm, no regressions in check-ada (gcc/testsuite/gnats and gcc/testsuite/acats) following an ada bootstrap on cortex-a15/neon/hard-float. That's the

Re: [PATCH 1/3] [ARM] PR63870 NEON error messages

2015-07-07 Thread Alan Lawrence
Alan Lawrence wrote: I note some parts of this duplicate my https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html , which has been pinged a couple of times. Both Charles' patch, and my two, contain parts the other does not... Cheers, Alan Charles Baylis wrote: gcc/ChangeLog: Ch

[PATCH 0/16][ARM/AArch64] Float16_t support, v2

2015-07-07 Thread Alan Lawrence
This is a respin of the series at https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01332.html, plus the two ARM patches on which these depend (https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01333.html). These two somewhat duplicate Charles Baylis' lane-bounds-checking patch at https://gcc.gnu.org/

[PATCH 1/16][ARM] PR/63870 Add qualifier to check lane bounds in expand

2015-07-07 Thread Alan Lawrence
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01333.html (While this falls under PR/63870, and I will link to that in the ChangeLog, it is only a small step towards fixing that PR.) commit 9812db88cff20a505365f68f4065d2fbab998c9c Author: Alan Lawrence Date: Mon Dec 8 11:04:49 2014

[PATCH 4/16][ARM] Add float16x8_t type

2015-07-07 Thread Alan Lawrence
Unchanged since https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01336.html commit b9ccac6243415b304024443b74bdc97b3a5954f2 Author: Alan Lawrence Date: Mon Dec 8 18:40:24 2014 + Add float16x8_t + V8HFmode support (regardless of -mfp16-format) diff --git a/gcc/config/arm/arm-builtins.c b

[PATCH 3/16][ARM] Add float16x4_t intrinsics

2015-07-07 Thread Alan Lawrence
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html commit 54a89a084fbd00e4de036f549ca893b74b8f58fb Author: Alan Lawrence Date: Mon Dec 8 18:40:03 2014 + ARM: float16x4_t intrinsics (v2 - fix v[sg]et_lane_f16 at -O0, no vdup_n/vmov_n) diff --git a/gcc/config/arm

[PATCH 2/16][ARM] PR/63870 Add __builtin_arm_lane_check.

2015-07-07 Thread Alan Lawrence
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01334.html commit 1bb1b208a2c8c8b1ee1186c6128a498583fd64fe Author: Alan Lawrence Date: Mon Dec 8 18:36:30 2014 + Add __builtin_arm_lane_check diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 7f5bf87

[PATCH 6/16][ARM] Remaining float16 intrinsics: vld..., vst..., vget_low/high, vcombine

2015-07-07 Thread Alan Lawrence
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01341.html commit ae6264b144d25fadcbf219e68ddf3d8c5f40be34 Author: Alan Lawrence Date: Thu Dec 11 11:53:59 2014 + ARM 4/4 v2: v(ld|st)[234](q?|_lane|_dup), vcombine, vget_(low|high) (v2 w/ V_uf_sclr) All are tied together

[PATCH 5/16][ARM] Add float16x8_t intrinsics

2015-07-07 Thread Alan Lawrence
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01337.html commit 336eb16d3061131fe8d28fad4a473d00768bfe5c Author: Alan Lawrence Date: Tue Dec 9 15:06:38 2014 + ARM float16x8_t intrinsics (v2 - fix v[sg]etq_lane_f16, add vreinterpretq_p16_f16, no vdup_n/lane/vmov_n) diff --git

[PATCH 7/16][AArch64] Add basic fp16 support

2015-07-07 Thread Alan Lawrence
: New test. commit 989af1492bbf268be1ecfae06f3303b90ae514c8 Author: Alan Lawrence Date: Tue Dec 2 12:57:39 2014 + AArch64 1/6: Basic HFmode support (less tests), aarch64_fp16_type_node, patterns, mangling, predefines. No --fp16-format option. Disable constants as NYI. di

[PATCH 8/16][ARM/AArch64 Testsuite] Add basic fp16 tests

2015-07-07 Thread Alan Lawrence
/fp16/fp16.exp: New. * gcc.target/aarch64/fp16/f16_convs_1.c: New. * gcc.target/aarch64/fp16/f16_convs_2.c: New. commit bc5045c0d3dd34b8cb94910281384f9ab9880325 Author: Alan Lawrence Date: Thu May 7 10:08:12 2015 +0100 (ARM+AArch64) Add gcc.target/aarch64/fp16, f16_conv_[12].c

[PATCH 9/16][AArch64] Add support for float16x{4,8}_t vectors/builtins

2015-07-07 Thread Alan Lawrence
As https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01341.html commit 49cb53a94a44fcda845c3f6ef11e88f9be458aad Author: Alan Lawrence Date: Tue Dec 2 13:08:15 2014 + AArch64 2/N: Vector/__builtin basics: define+support types, movs, test ABI. Patterns, builtins, intrinsics for

[PATCH 11/16][AArch64] Implement vcvt_{,high_}f16_f32

2015-07-07 Thread Alan Lawrence
): Use BUILTIN_VDF iterator. * config/aarch64/arm_neon.h (vcvt_f16_f32, vcvt_high_f16_f32): New. * config/aarch64/iterators.md (VDF, Vdtype): New. (VWIDE, Vmwtype): Add cases for V4HF and V2SF. commit 5007fafedc8469ab645edfe65fbf41f75fc74750 Author: Alan Lawrence Date: Tue

[PATCH 10/16][AArch64] vld{2,3,4}{,_lane,_dup},vcombine,vcreate

2015-07-07 Thread Alan Lawrence
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01342.html commit ef719e5d3d6eccc5cf621851283b7c0ba1a9ee6c Author: Alan Lawrence Date: Tue Aug 5 17:52:28 2014 +0100 AArch64 3/N: v(create|combine|v(ld|st|ld...dup/lane|st...lane)[234](q?))_f16; tests vldN{,_lane,_dup} inc bigendian

[PATCH 12/16][AArch64] vreinterpret(q?), vget_(low|high), vld1(q?)_dup

2015-07-07 Thread Alan Lawrence
. commit beb21a6bce76d4fbedb13fcf25796563b27f6bae Author: Alan Lawrence Date: Mon Jun 29 18:46:49 2015 +0100 [AArch64 5/N v2] vreinterpret, vget_(low|high), vld1(q?)_dup. update tests for vget_low/high diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index b915754..ff1a45c 100644 --- a/gc

[PATCH 13/16][AArch64] Add vcvt(_high)?_f32_f16 intrinsics, with BE RTL fix

2015-07-07 Thread Alan Lawrence
Unchanged since https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01345.html commit 214fcc00475a543a79ed444f9a64061215397cc8 Author: Alan Lawrence Date: Wed Jan 28 13:01:31 2015 + AArch64 6/N: vcvt{,_high}_f32_f16 (using vect_par_cnst_hi_half, fixing bigendian indices) diff --git a/gcc

[PATCH 15/16][fold-const.c] Fix bigendian HFmode in native_interpret_real

2015-07-07 Thread Alan Lawrence
As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01346.html. Fixes FAIL of advsimd-intrinsics vcreate.c on aarch64_be-none-elf from previous patch. commit e2e7ca148960a82fc88128820f17e7cbd14173cb Author: Alan Lawrence Date: Thu Apr 9 10:54:40 2015 +0100 Fix native_interpret_real for

[PATCH 14/16][ARM/AArch64 testsuite] Update advsimd-intrinsics tests to add float16 vectors

2015-07-07 Thread Alan Lawrence
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01347.html, removing many default values of 0x333, to complete that I introduced new macros CHECK_RESULTS{,_NAMED}_NO_FP16 as writing the same list of vector types in four places seemed too many. gcc/testsuite/ChangeLog:

[PATCH 16/16][ARM/AArch64 Testsuite] Add test of vcvt{,_high}_{f16_f32,f32_f16}

2015-07-07 Thread Alan Lawrence
/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp: set additional flags for neon-fp16 support. * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New. commit e6cc7467ddf5702d3a122b8ac4163621d0164b37 Author: Alan Lawrence Date: Wed Jan 28 13

Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics

2015-07-07 Thread Alan Lawrence
Kyrill Tkachov wrote: On 07/07/15 14:09, Kyrill Tkachov wrote: Hi Alan, On 07/07/15 13:34, Alan Lawrence wrote: As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html For some context, the reference for these is at: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a

Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics

2015-07-07 Thread Alan Lawrence
Kyrill Tkachov wrote: On 07/07/15 17:34, Alan Lawrence wrote: Kyrill Tkachov wrote: On 07/07/15 14:09, Kyrill Tkachov wrote: Hi Alan, On 07/07/15 13:34, Alan Lawrence wrote: As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html For some context, the reference for these is at

Re: fix PR46029: reimplement if conversion of loads and stores

2015-07-08 Thread Alan Lawrence
Abe wrote: I`m uncertain to what that is intended to refer, but I believe Sebastian would agree that the new if converter is safer than the old one in terms of correctness at the time of running the code being compiled. > even if they take us a step backwards from a performance standpoint.

Re: [PATCH 15/16][fold-const.c] Fix bigendian HFmode in native_interpret_real

2015-07-08 Thread Alan Lawrence
Richard Biener wrote: On Wed, Jul 8, 2015 at 12:07 AM, Jeff Law wrote: On 07/07/2015 06:37 AM, Alan Lawrence wrote: [snip] Fix native_interpret_real for HFmode floats on Bigendian with UNITS_PER_WORD>=4 (with missing space) OK with ChangeLog in proper form. Err - but now off

Re: [PATCH] fix PR46029: reimplement if conversion of loads and stores [2nd submitted version of patch]

2015-07-08 Thread Alan Lawrence
Abe wrote: [Alan wrote:] Where can I find info on what the different flag values mean? (I had thought they were booleans [...] [Abe wrote:] Sorry; I don`t know if that is documented anywhere yet. In this case, (-1) simply means "defaulted": on if the vectorizer is on, and off if it is

Re: [PATCH 15/16][fold-const.c] Fix bigendian HFmode in native_interpret_real

2015-07-09 Thread Alan Lawrence
Jeff Law wrote: On 07/08/2015 03:43 AM, Richard Biener wrote: On Wed, Jul 8, 2015 at 12:07 AM, Jeff Law wrote: On 07/07/2015 06:37 AM, Alan Lawrence wrote: As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01346.html. Fixes FAIL of advsimd-intrinsics vcreate.c on aarch64_be-none-elf from

Re: [PATCH 15/16][fold-const.c] Fix bigendian HFmode in native_interpret_real

2015-07-09 Thread Alan Lawrence
Richard Biener wrote: I wonder why wi::from_buffer doesn't have the same issue though for HImode ints. It's structured differently, without magic '4's as well. I don't claim to understand the rest of wi::from_buffer and why it is different. However, wrt. HImode, I think the key line is: o

[PATCH 1/2][ARM] PR/63870: Add qualifier to check lane bounds in expand

2015-01-16 Thread Alan Lawrence
This is based loosely upon svn r217440, "[AArch64] Add bounds checking to vqdm_lane intrinsics...", but applies to more intrinsics (including e.g. vget_lane), and does not do the endianness-flipping present on AArch64: the objective is to exactly preserve behaviour on all valid code. (Yes, the n

[PATCH 0/4][ARM Intrinsics][RFTesting] Add missing float16x8_t type, and float16x[48] intrinsics

2015-01-16 Thread Alan Lawrence
These add all the V[48]HFmode insns and corresponding intrinsics for ARM. Depends on the two patches at https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html . Unfortunately I don't at present have a testsuite. I've done some testing both manually and on a large internal testsuite for Neon/

[PATCH 2/2][ARM] PR/63870: Add a __builtin_lane_check

2015-01-16 Thread Alan Lawrence
This parallels the present form of __builtin_aarch64_im_lane_boundsi, and allows to check lane indices for intrinsics that can otherwise be written in terms of GCC vector extensions. The new builtin is not used in this patch but is used in my series of float16_t intrinsics (https://gcc.gnu.org

[PATCH 1/4][ARM Intrinsics]float16x4_t intrinsics: vget_lane, vset_lane, vcreate, vdup_n, vdup_lane, vld1_lane, vld1_dup, vreinterpret

2015-01-16 Thread Alan Lawrence
This adds a bunch of new intrinsics, implemented with GCC vector extensions to maximise mid-end optimization (the same approach as AArch64). Note that unlike AArch64, no attempt is made to support bigendian. gcc/ChangeLog: * config/arm/arm_neon.h (vcreate_f16, vdup_lane_f16, vld1_lane_f16,

[PATCH 2/4][ARM Intrinsics] Add missing float16x8_t type

2015-01-16 Thread Alan Lawrence
This defines arm_neon.h's float16x8_t type, although no intrinsics yet (see next patch). Adding V8HFmode does mean programmers can define a GCC vector of same size themselves. gcc/ChangeLog: * config/arm/arm.h (VALID_NEON_QREG_MODE): Add V8HFmode. * config/arm/arm.c (arm_vector_mode_s

[PATCH 3/4][ARM Intrinsics]float16x8_t intrinsics: vgetq_lane, vsetq_lane, vdupq_n, vdupq_lane, vld1q_lane, vld1q_dup, vreinterpretq

2015-01-16 Thread Alan Lawrence
Much like the first patch, this adds the equivalent ...q... intrinsics for float16x8_t, using GCC vector extensions. gcc/ChangeLog: * config/arm/arm_neon.h (vdupq_lane_f16, vld1q_lane_f16, vld1q_dup_f16, vreinterpretq_p8_f16, vreinterpretq_f16_p8, vreinterpretq_f16_p16, vreinterpret

[PATCH 4/4][ARM Intrinsics] Add float16 v(ld|st)[234](q?|_lane|_dup),vcombine,vget_(low|high)

2015-01-16 Thread Alan Lawrence
These intrinsics are all made from patterns in neon.md, and are all tied together by iterators - I've tried to reduce coupling a bit but there is possibly more that could be done here. gcc/ChangeLog: * config/arm/arm-builtins.c (VAR11, VAR12): New. * config/arm/arm_neon_builtins.def (v

Re: [PATCH 0/4][ARM Intrinsics][RFTesting] Add missing float16x8_t type, and float16x[48] intrinsics

2015-01-22 Thread Alan Lawrence
s, Alan Christophe Lyon wrote: On 16 January 2015 at 18:22, Alan Lawrence wrote: These add all the V[48]HFmode insns and corresponding intrinsics for ARM. Depends on the two patches at https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html . Unfortunately I don't at present have a t

Retracted: [PATCH 0/4][ARM Intrinsics][RFTesting] Add missing float16x8_t type, and float16x[48] intrinsics

2015-01-26 Thread Alan Lawrence
There are still bugs in these patches, they should not go in. Hope to have something ready, with tests, in the next stage 1. Cheers, Alan Alan Lawrence wrote: These add all the V[48]HFmode insns and corresponding intrinsics for ARM. Depends on the two patches at https://gcc.gnu.org/ml/gcc

[PATCH][AArch64] Fix illegal assembly 'eon v1, v2, v3'

2015-01-28 Thread Alan Lawrence
Hi, The split rule introduced in r218961 uses as its split condition 'reload_completed && (which_alternative == 1)', but which_alternative does not seem to be set reliably during split phases, even after reload. This can lead to the split rule not being used even for insns using FP/SIMD regist

Re: [PATCH][AArch64 Intrinsics] Replace temporary assembler for vst1_lane

2015-01-30 Thread Alan Lawrence
This was posted towards the end of stage 3, a few days before stage 4 started. Is it now too late to "ping" ? --Alan Alan Lawrence wrote: Nowadays, just storing the (bigendian-corrected) vector element to the address, generates exactly the same assembler for all cases except {floa

Re: [PATCH] Relax check against commuting XOR and ASHIFTRT in combine.c

2015-02-02 Thread Alan Lawrence
Rainer Orth wrote: I'm still not really comfortable with those target lists; they tend to artificially exclude tests on targets where they are perfectly capable of running. At least with the comments added, it's better than before with no explanation whatsoever. Perhaps Mike can weigh in here?

Re: [PATCH/AARCH64] Fix 64893: ICE with vget_lane_u32 with C++ front-end at -O0

2015-02-03 Thread Alan Lawrence
Andrew Pinski wrote: While trying to build the GCC 5 with GCC 5, I ran into an ICE when building libcpp at -O0. The problem is the C++ front-end was not folding sizeof(a)/sizeof(a[0]) when passed to a function at -O0. The C++ front-end keeps around sizeof until the gimplifier and there is no wa

[Obvious][Testsuite] Remove extraneous target from gcc.target/arm/macro_defs0.c

2015-02-09 Thread Alan Lawrence
This was giving an UNRESOLVED after my first attempt to apply the patch ran into trouble with line wrapping, and in diagnosing the problem I'd introduced an extra 'target' vs. the original (https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00215.html). Sorry! Pushed as r220542. --Alan gcc/testsu

Re: [PATCH][AArch64] Fix illegal assembly 'eon v1, v2, v3'

2015-02-11 Thread Alan Lawrence
ames Greenhalgh wrote: On Wed, Jan 28, 2015 at 12:32:45PM +, Alan Lawrence wrote: Ok for stage 4? This is a regression from 4.9, so once we iron out some nits, it should be. gcc/ChangeLog: * config/aarch64/aarch64.md (*xor_one_cmpl3): Use FP_REGNUM_P as split condition. And a

Re: FW: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.

2014-10-28 Thread Alan Lawrence
When you say a patch by Alan Hayward that's "coming soon", I take it you mean this one? https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00952.html Just so that we know it has now arrived :). --Alan David Sherwood wrote: Hi, I forgot to mention that this patch needs was tested in combination wi

Re: [COMMITTED][PATCH PR63173] [AARCH64, NEON] Improve vld[234](q?)_dup intrinsics

2014-11-03 Thread Alan Lawrence
So we've been seeing FAIL: gcc.target/aarch64/vldN_dup_1.c on aarch64_be-none-elf, since this patch went in. Felix, did you test for bigendian? However, this failure is fixed if I apply David Sherwood's patch set: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00942.html https://gcc.gnu.org/ml

[PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness

2014-11-06 Thread Alan Lawrence
This generates out-of-range errors at compile- (rather than assemble-)time for the vqdm*_lane intrinsics, and also provides a single place to do bigendian lane-swapping for all those intrinsics (and others to follow in later patches). This allows us to remove many define_expands that just do a r

Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-11-06 Thread Alan Lawrence
Hmmm. I am a little surprised by your mention of "saturation points" as I would not expect any variety of reduc_plus to be a saturating operation??? A. Bill Schmidt wrote: On Fri, 2014-10-24 at 19:49 -0400, David Edelsohn wrote: On Fri, Oct 24, 2014 at 8:06 AM, Alan Lawrence wr

Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-11-07 Thread Alan Lawrence
Ah I see now! Thank you for explaining that bit, I was a bit puzzled when I saw it, but it makes sense now! Cheers, Alan Bill Schmidt wrote: On Thu, 2014-11-06 at 16:44 +, Alan Lawrence wrote: Hmmm. I am a little surprised by your mention of "saturation points" as I would not

Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness

2014-11-11 Thread Alan Lawrence
n settled, but there's still ARM, indeed. If you have any way/ideas to get better error messages (i.e. line numbers), that'd be particularly good, tho :) Cheers, Alan Charles Baylis wrote: On 6 November 2014 10:19, Alan Lawrence <mailto:alan.lawre...@arm.com>> wrote: Thi

Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-11-12 Thread Alan Lawrence
So I'm no expert on RS6000 here, but following on from Segher's observation about the change in pattern...so the difference in 'expand' is exactly that, a vsx_reduc_splus_v2df followed by a vec_extract to DF, becomes a vsx_reduc_splus_v2df_scalar - as I expected the combiner to produce by combin

Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness

2014-11-12 Thread Alan Lawrence
Nice! One nit - can the extra "tree" argument be a "const_tree" ? - I'll defer to the maintainers on the use of C++ default arguments in the AArch64 backend. But LGTM. --Alan Charles Baylis wrote: On 11 November 2014 15:25, Alan Lawrence wrote: [Resending in gcc-pa

[PATCH 0/4][Vectorizer] Reductions: replace VEC_RSHIFT_EXPR with VEC_PERM_EXPR

2014-11-12 Thread Alan Lawrence
In response to https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01803.html, this series removes the VEC_RSHIFT_EXPR, instead using a VEC_PERM_EXPR (with a second argument full of constant zeroes) to represent the shift. I've kept the use of vec_shr optab for platforms that define it, as even on p

[PATCH 1/4][Vectorizer] Split vect_gen_perm_mask into _checked and _any variants

2014-11-12 Thread Alan Lawrence
This is a preliminary to patch 2, which wants functionality equivalent to vect_gen_perm_mask (converting a char* to an RTL const_vector) but without the check of can_vec_perm_p. All existing calls to vect_gen_perm_mask barring that in perm_mask_for_reverse, assert the return value is non-null.

[PATCH 2/4][Vectorizer] Use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR; expand appropriate VEC_PERM_EXPRs using vec_shr_optab

2014-11-12 Thread Alan Lawrence
This makes the vectorizer use VEC_PERM_EXPRs when doing reductions via shifts, rather than VEC_RSHIFT_EXPR. VEC_RSHIFT_EXPR presently has an endianness-dependent meaning (paralleling vec_shr_optab). While the overall destination of this patch series is to make these endianness-neutral, this pa

[PATCH 3/4] Remove VEC_RSHIFT_EXPR tree code, now unused

2014-11-12 Thread Alan Lawrence
Tested (with patches 1+2): Bootstrap + check-gcc on x64-none-linux-gnu cross-tested check-gcc on aarch64-none-elf and aarch64_be-none-elf as these platforms stand (i.e. without vec_shr_optab). also cross-tested check-gcc on aarch64-none-elf and aarch64_be-none-elf after applying https://gcc.

[PATCH 4/4][Vectorizer]Make reductions-via-shifts and vec_shr_optab endianness-neutral

2014-11-12 Thread Alan Lawrence
This redefines vec_shr optab to be the same (in terms of gcc vectors) regardless of target endianness. The vectorizer uses this to do reductions via shifts, so also change the vectorizer to shift things always the same way (from the midend's POV of vectors). cross-tested check-gcc on (1) aarch

Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal

2014-11-12 Thread Alan Lawrence
Have run check-gcc on gcc110.fsffrance.org (powerpc64-unknown-linux-gnu) using this snippet on top of original patch; no regressions. Alan Lawrence wrote: So I'm no expert on RS6000 here, but following on from Segher's observation about the change in pattern...so the difference in &

Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness

2014-11-12 Thread Alan Lawrence
Pushed as r217440, also with Charles' whitespace fixes ('' -> tab) - good spot! Cheers, Alan Marcus Shawcroft wrote: On 6 November 2014 10:19, Alan Lawrence wrote: This generates out-of-range errors at compile- (rather than assemble-)time for the vqdm*_lane in

[PATCH] Add -funconstrained-commons to work around PR/69368 (and others) in SPEC2006 (was: Re: [PATCH] Add -funknown-commons ...)

2016-03-07 Thread Alan Lawrence
he same wording in invoke.texi, unless you think there is more to add. On 04/03/16 13:33, Jakub Jelinek wrote: > Also, isn't the *.opt description line supposed to end with a full stop? Ah, yes, thanks. Is this version OK for trunk? gcc/ChangeLog: DATE Alan Lawrence Jaku

Re: [PATCH] Add -funconstrained-commons to work around PR/69368 (and others) in SPEC2006

2016-03-09 Thread Alan Lawrence
On 07/03/16 11:02, Alan Lawrence wrote: On 04/03/16 13:27, Richard Biener wrote: I think to make it work with LTO you need to mark it 'Optimization'. Also it's about arrays so maybe 'Assume common declarations may be overridden with ones with a larger trailing array'

[PATCH] Fix PR70013

2016-03-11 Thread Alan Lawrence
In this PR, a packed structure containing bitfields, loses part of its constant-pool initialization in SRA. A fuller explanation is on the PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70013#c11. In short we need to treat constant-pool entries, like function parameters, as both come 'pre-initi

Re: [PATCH 1/2][AArch64] Implement AAPCS64 updates for alignment attribute

2016-03-11 Thread Alan Lawrence
On 04/03/16 17:24, Alan Lawrence wrote: On 26/02/16 14:52, James Greenhalgh wrote: gcc/ChangeLog: * gcc/config/aarch64/aarch64.c (aarch64_function_arg_alignment): Rewrite, looking one level down for records and arrays. --- gcc/config/aarch64/aarch64.c | 31

Re: [PATCH] Add -funconstrained-commons to work around PR/69368 (and others) in SPEC2006

2016-03-11 Thread Alan Lawrence
On 10/03/16 16:18, Dominique d'Humières wrote: > The test gfortran.dg/unconstrained_commons.f fails in the 32 bit mode. It > needs some regexp Indeed, confirmed on ARM, sorry for not spotting this earlier. I believe the variable, if there is one, should always be called 'j', as it is in the sour

Re: [PATCH] PR target/67127: [ARM] Avoiding odd-number ldrd/strd in movdi introduced a regression on armeb-linux-gnueabihf

2015-08-10 Thread Alan Lawrence
Yvan Roux wrote: Hi, this patch is a fix for pr27127. It avoids splitting the DI registers into SI ones if it is not allowed, which breaks the introduced loop. I haven't added a testcase as the bug is already exhibited by several regressions (like g++.dg/ext/attribute-test-2.C or g++.dg/eh/simd

Re: [PATCH 1/15][ARM] Hide existing float16 intrinsics unless we have a scalar __fp16 type

2015-08-20 Thread Alan Lawrence
Thanks, pushed with comment and ChangeLog fix as r227033. --Alan Kyrill Tkachov wrote: Hi Alan, On 28/07/15 12:23, Alan Lawrence wrote: This makes the existing float16 vector intrinsics available only when we have an __fp16 type (i.e. when one of the ARM_FP16_FORMAT_... macros is defined

Re: [PATCH 11/15][AArch64] vreinterpret(q?), vget_(low|high), vld1(q?)_dup

2015-08-24 Thread Alan Lawrence
James Greenhalgh wrote: Did you check that these actually emit the expected instruction? Applying your patch set I see some fairly unpleasant code generation, but I might have made an error, or perhaps you have another patch in waiting? Thanks, James Yes, you are right, some of the code gen

Re: [PATCH 12/15][AArch64] Add vcvt(_high)?_f32_f16 intrinsics, with BE RTL fix

2015-08-25 Thread Alan Lawrence
James Greenhalgh wrote: >> >> - VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf) >> + VAR2 (UNOP, vec_unpacks_hi_, 10, v4sf, v8hf) > > Should this not use the appropriate "BUILTIN_..." iterator? Indeed; BUILTIN_VQ_HSF it is. >>VAR1 (BINOP, float_truncate_hi_, 0, v4sf) >>VAR1 (BINOP, float_truncat

[PATCH 0/5][tree-sra.c] PR/63679 Make SRA replace constant pool loads

2015-08-25 Thread Alan Lawrence
ssa-dom-cse-2.c fails on a number of platforms because the input array is pushed out to the constant pool, preventing later stages from folding away the entire computation. This patch series fixes the failure by extending SRA to pull the constants back in. This is my first patch(set) to SRA and as

[RFC 4/5] Handle constant-pool entries

2015-08-25 Thread Alan Lawrence
This makes SRA replace loads of records/arrays from constant pool entries, with elementwise assignments of the constant values, hence, overcoming the fundamental problem in PR/63679. As a first pass, the approach I took was to look for constant-pool loads as we scanned through other accesses, and

[RFC 5/5] Always completely replace constant pool entries

2015-08-25 Thread Alan Lawrence
I used this as a means of better-testing the previous changes, as it exercises the constant replacement code a whole lot more. Indeed, quite a few tests are now optimized away to nothing on AArch64... Always pulling in constants, is almost certainly not what we want, but we may nonetheless want so

[PATCH 2/5] completely_scalarize arrays as well as records

2015-08-25 Thread Alan Lawrence
This changes the completely_scalarize_record path to also work on arrays (thus allowing records containing arrays, etc.). This just required extending the existing type_consists_of_records_p and completely_scalarize_record methods to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I rena

[PATCH 1/5] Refactor completely_scalarize_var

2015-08-25 Thread Alan Lawrence
This is a small refactoring/renaming patch, it just moves the call to "completely_scalarize_record" out from completely_scalarize_var, and renames the latter to create_total_scalarization_access. This is because the next patch needs to drop the "_record" suffix and I felt it would be confusing to

[PATCH 3/5] Build ARRAY_REFs when the base is of ARRAY_TYPE.

2015-08-25 Thread Alan Lawrence
When SRA completely scalarizes an array, this patch changes the generated accesses from e.g. MEM[(int[8] *)&a + 4B] = 1; to a[1] = 1; This overcomes a limitation in dom2, that accesses to equivalent chunks of e.g. MEM[(int[8] *)&a] are not hashable_expr_equal_p with accesses to e.g. ME

Re: [PATCH 0/15][ARM/AArch64] Add support for float16_t vectors (v3)

2015-08-25 Thread Alan Lawrence
Alan Lawrence wrote: All AArch64 patches are unchanged from previous version. However, in response to discussion, the ARM patches are changed (much as I suggested https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02249.html); this version: * Hides the existing vcvt_f16_f32 and vcvt_f32_f16

Re: [PATCH 13/15][ARM/AArch64 Testsuite] Add float16 tests to advsimd-intrinsics testsuite

2015-08-25 Thread Alan Lawrence
Christophe Lyon wrote: On 28 July 2015 at 13:26, Alan Lawrence wrote: This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00488.html, fixing up the testsuite for float16 vectors. Relative to the previous version, most of the additions to the tests are now within #if..#endif such

Re: [PATCH 14/15][ARM/AArch64 Testsuite]Add test of vcvt{,_high}_i{f32_f16,f16_f32}

2015-08-25 Thread Alan Lawrence
Sorry - wrong version posted. The hunk for add_options_for_arm_neon_fp16 has moved to the previous patch! This version also fixes some whitespace issues. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New. * lib/target-supports.exp (check_effe

Re: [PATCH 14/15][ARM/AArch64 Testsuite]Add test of vcvt{,_high}_{f16_f32,f32_f16}

2015-08-25 Thread Alan Lawrence
Christophe Lyon wrote: On 28 July 2015 at 13:27, Alan Lawrence wrote: gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp: set additional flags for neon-fp16 support. * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New. Is that

[PATCH][AArch64 0/8] Add D-registers to TARGET_ARRAY_MODE_SUPPORTED_P

2015-08-26 Thread Alan Lawrence
The end goal of this series of patches is to enable 64bit vector modes for TARGET_ARRAY_MODE_SUPPORTED_P, achieved in the last patch. At present, doing so causes ICEs with illegal subregs (e.g. returning the middle bits from a large int mode covering 3 vectors); the patchset avoids these by first r

<    1   2   3   4   5   6   >