On Fri, Nov 06, 2015 at 09:37:17PM +0100, Christophe Lyon wrote:
> On 6 November 2015 at 18:03, James Greenhalgh <james.greenha...@arm.com> 
> wrote:
> > On Fri, Nov 06, 2015 at 02:49:38PM +0100, Christophe Lyon wrote:
> >> Hi,
> >>
> >> As mentioned by James a few weeks ago, the vqtbl[lx][234] intrinsics
> >> are failing on aarch64_be.
> >>
> >> The attached patch fixes them, and rewrites them using new builtins
> >> instead of inline assembly.
> >>
> >> I wondered about the names of the new builtins, I hope I got them
> >> right: qtbl3, qtbl4, qtbx3, qtbx4 with v8qi and v16qi modes.
> >>
> >> I have modified the existing aarch64_tbl3v8qi and aarch64_tbx4v8qi to
> >> use <mode> and share the code with the v16qi variants.
> >>
> >> In arm_neon.h, I moved the rewritten intrinsics to the bottom of the
> >> file, in alphabetical order, although the comment says "Start of
> >> optimal implementations in approved order": the previous ones really
> >> seem to be in alphabetical order.
> >>
> >> And I added a new testcase, skipped for arm* targets.
> >>
> >> This has been tested on aarch64-none-elf and aarch64_be-none-elf
> >> targets, using the Foundation model.
> >>
> >> OK?
> >
> > Hi Christophe,
> >
> > Thanks for this. With this patch I think we can finally say that
> > aarch64_be Neon intrinsics are in as good a state as aarch64 Neon
> > intrinsics. On our internal testsuite the pass rate is now equivalent
> > between the two. I'm very grateful for your work in this area!
> 
> Thanks for the quick review, committed as r229886.
> 
> We are still missing many tests for most of the armv8 intrinsics.
> A significant effort, apparently not worth it since you say your
> internal testsuite is now clean.

The internal testsuiite is of no use to the rest community and is
unlikely to be feasible to submit upstream, so I wouldn't write off
extending the (excellent) set of GCC tests you've been adding so far
as "not worth it". Certainly they were a big help for the big-endian
work.

> Actually, you say the pass rate is equivalent on little and
> big-endian: does it mean that it not 100%?

Yes, I picked my words carefully :-)

The remaining failures are missing intrinsics and conformance issues when
the intrinsics are combined and folded. For an idea of what is missing,
take a look at the LLVM test-suite I pointed you at a few weeks ago:

    
<llvm-testsuite>/SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c

I'll try to get some of the "folding" examples in to the upstream
bugzilla - generally they are issues where the semantics of the
intrinsic are well defined for signed overflow, but our use of C
constructs means the midend considers signed overflow undefined, and
performs more aggressive optimisation.

Thanks,
James

> >
> > This patch is OK for trunk.
> >
> > Thanks again,
> > James
> >
> >>
> >> Christophe.
> >
> >> 2015-11-06  Christophe Lyon  <christophe.l...@linaro.org>
> >>
> >>       gcc/testsuite/
> >>       * gcc.target/aarch64/advsimd-intrinsics/vqtbX.c: New test.
> >>
> >>       gcc/
> >>       * config/aarch64/aarch64-simd-builtins.def: Update builtins
> >>       tables: add tbl3v16qi, qtbl[34]*, tbx4v16qi, qtbx[34]*.
> >>       * config/aarch64/aarch64-simd.md (aarch64_tbl3v8qi): Rename to...
> >>       (aarch64_tbl3<mode>) ... this, which supports v16qi too.
> >>       (aarch64_tbx4v8qi): Rename to...
> >>       aarch64_tbx4<mode>): ... this.
> >>       (aarch64_qtbl3<mode>): New pattern.
> >>       (aarch64_qtbx3<mode>): New pattern.
> >>       (aarch64_qtbl4<mode>): New pattern.
> >>       (aarch64_qtbx4<mode>): New pattern.
> >>       * config/aarch64/arm_neon.h (vqtbl2_s8, vqtbl2_u8, vqtbl2_p8)
> >>       (vqtbl2q_s8, vqtbl2q_u8, vqtbl2q_p8, vqtbl3_s8, vqtbl3_u8)
> >>       (vqtbl3_p8, vqtbl3q_s8, vqtbl3q_u8, vqtbl3q_p8, vqtbl4_s8)
> >>       (vqtbl4_u8, vqtbl4_p8, vqtbl4q_s8, vqtbl4q_u8, vqtbl4q_p8)
> >>       (vqtbx2_s8, vqtbx2_u8, vqtbx2_p8, vqtbx2q_s8, vqtbx2q_u8)
> >>       (vqtbx2q_p8, vqtbx3_s8, vqtbx3_u8, vqtbx3_p8, vqtbx3q_s8)
> >>       (vqtbx3q_u8, vqtbx3q_p8, vqtbx4_s8, vqtbx4_u8, vqtbx4_p8)
> >>       (vqtbx4q_s8, vqtbx4q_u8, vqtbx4q_p8): Rewrite using builtin
> >>       functions.
> >
> 

Reply via email to