On Fri, Nov 06, 2015 at 09:37:17PM +0100, Christophe Lyon wrote: > On 6 November 2015 at 18:03, James Greenhalgh <james.greenha...@arm.com> > wrote: > > On Fri, Nov 06, 2015 at 02:49:38PM +0100, Christophe Lyon wrote: > >> Hi, > >> > >> As mentioned by James a few weeks ago, the vqtbl[lx][234] intrinsics > >> are failing on aarch64_be. > >> > >> The attached patch fixes them, and rewrites them using new builtins > >> instead of inline assembly. > >> > >> I wondered about the names of the new builtins, I hope I got them > >> right: qtbl3, qtbl4, qtbx3, qtbx4 with v8qi and v16qi modes. > >> > >> I have modified the existing aarch64_tbl3v8qi and aarch64_tbx4v8qi to > >> use <mode> and share the code with the v16qi variants. > >> > >> In arm_neon.h, I moved the rewritten intrinsics to the bottom of the > >> file, in alphabetical order, although the comment says "Start of > >> optimal implementations in approved order": the previous ones really > >> seem to be in alphabetical order. > >> > >> And I added a new testcase, skipped for arm* targets. > >> > >> This has been tested on aarch64-none-elf and aarch64_be-none-elf > >> targets, using the Foundation model. > >> > >> OK? > > > > Hi Christophe, > > > > Thanks for this. With this patch I think we can finally say that > > aarch64_be Neon intrinsics are in as good a state as aarch64 Neon > > intrinsics. On our internal testsuite the pass rate is now equivalent > > between the two. I'm very grateful for your work in this area! > > Thanks for the quick review, committed as r229886. > > We are still missing many tests for most of the armv8 intrinsics. > A significant effort, apparently not worth it since you say your > internal testsuite is now clean.
The internal testsuiite is of no use to the rest community and is unlikely to be feasible to submit upstream, so I wouldn't write off extending the (excellent) set of GCC tests you've been adding so far as "not worth it". Certainly they were a big help for the big-endian work. > Actually, you say the pass rate is equivalent on little and > big-endian: does it mean that it not 100%? Yes, I picked my words carefully :-) The remaining failures are missing intrinsics and conformance issues when the intrinsics are combined and folded. For an idea of what is missing, take a look at the LLVM test-suite I pointed you at a few weeks ago: <llvm-testsuite>/SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c I'll try to get some of the "folding" examples in to the upstream bugzilla - generally they are issues where the semantics of the intrinsic are well defined for signed overflow, but our use of C constructs means the midend considers signed overflow undefined, and performs more aggressive optimisation. Thanks, James > > > > This patch is OK for trunk. > > > > Thanks again, > > James > > > >> > >> Christophe. > > > >> 2015-11-06 Christophe Lyon <christophe.l...@linaro.org> > >> > >> gcc/testsuite/ > >> * gcc.target/aarch64/advsimd-intrinsics/vqtbX.c: New test. > >> > >> gcc/ > >> * config/aarch64/aarch64-simd-builtins.def: Update builtins > >> tables: add tbl3v16qi, qtbl[34]*, tbx4v16qi, qtbx[34]*. > >> * config/aarch64/aarch64-simd.md (aarch64_tbl3v8qi): Rename to... > >> (aarch64_tbl3<mode>) ... this, which supports v16qi too. > >> (aarch64_tbx4v8qi): Rename to... > >> aarch64_tbx4<mode>): ... this. > >> (aarch64_qtbl3<mode>): New pattern. > >> (aarch64_qtbx3<mode>): New pattern. > >> (aarch64_qtbl4<mode>): New pattern. > >> (aarch64_qtbx4<mode>): New pattern. > >> * config/aarch64/arm_neon.h (vqtbl2_s8, vqtbl2_u8, vqtbl2_p8) > >> (vqtbl2q_s8, vqtbl2q_u8, vqtbl2q_p8, vqtbl3_s8, vqtbl3_u8) > >> (vqtbl3_p8, vqtbl3q_s8, vqtbl3q_u8, vqtbl3q_p8, vqtbl4_s8) > >> (vqtbl4_u8, vqtbl4_p8, vqtbl4q_s8, vqtbl4q_u8, vqtbl4q_p8) > >> (vqtbx2_s8, vqtbx2_u8, vqtbx2_p8, vqtbx2q_s8, vqtbx2q_u8) > >> (vqtbx2q_p8, vqtbx3_s8, vqtbx3_u8, vqtbx3_p8, vqtbx3q_s8) > >> (vqtbx3q_u8, vqtbx3q_p8, vqtbx4_s8, vqtbx4_u8, vqtbx4_p8) > >> (vqtbx4q_s8, vqtbx4q_u8, vqtbx4q_p8): Rewrite using builtin > >> functions. > > >