Re: [Neon intrinsics] Literal vector construction through vcombine is poor

2017-06-27 Thread Richard Earnshaw (lists)
arns...@arm.com] > Sent: Friday, June 23, 2017 2:09 AM > To: Michael Collison ; GCC Patches > > Cc: nd > Subject: Re: [Neon intrinsics] Literal vector construction through vcombine > is poor > > On 23/06/17 00:10, Michael Collison wrote: >> Richard, >> >>

RE: [Neon intrinsics] Literal vector construction through vcombine is poor

2017-06-26 Thread Michael Collison
: [Neon intrinsics] Literal vector construction through vcombine is poor On 23/06/17 00:10, Michael Collison wrote: > Richard, > > I reworked the patch and retested on big endian as well as little. The > original code was performing two swaps in the big endian case which works out &

Re: [Neon intrinsics] Literal vector construction through vcombine is poor

2017-06-23 Thread Richard Earnshaw (lists)
_simd_combine): > Allow register and subreg operands. > > -Original Message- > From: Richard Earnshaw (lists) [mailto:richard.earns...@arm.com] > Sent: Monday, June 19, 2017 6:37 AM > To: Michael Collison ; GCC Patches > > Cc: nd > Subject: Re: [Neo

RE: [Neon intrinsics] Literal vector construction through vcombine is poor

2017-06-22 Thread Michael Collison
Earnshaw (lists) [mailto:richard.earns...@arm.com] Sent: Monday, June 19, 2017 6:37 AM To: Michael Collison ; GCC Patches Cc: nd Subject: Re: [Neon intrinsics] Literal vector construction through vcombine is poor On 16/06/17 22:08, Michael Collison wrote: > This patch improves code generation

Re: [Neon intrinsics] Literal vector construction through vcombine is poor

2017-06-19 Thread Richard Earnshaw (lists)
On 16/06/17 22:08, Michael Collison wrote: > This patch improves code generation for literal vector construction by > expanding and exposing the pattern to rtl optimization earlier. The current > implementation delays splitting the pattern until after reload which results > in poor code generati

[Neon intrinsics] Literal vector construction through vcombine is poor

2017-06-16 Thread Michael Collison
This patch improves code generation for literal vector construction by expanding and exposing the pattern to rtl optimization earlier. The current implementation delays splitting the pattern until after reload which results in poor code generation for the following code: #include "arm_neon.h"