On Fri, Feb 4, 2022 at 7:34 PM Andrew Pinski <pins...@gmail.com> wrote: > > On Fri, Feb 4, 2022 at 3:21 AM Richard Sandiford via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Sorry, just realised I'd never replied to this. > > > > Marc Poulhies <poulh...@adacore.com> writes: > > > Eric Botcazou <botca...@adacore.com> writes: > > >>> The new variables seem to be unused, so I think slightly stronger > > >>> DCE could remove the calls even after the patch. Perhaps the containing > > >>> functions should take an int32x4_t *ptr or something, with the calls > > >>> assigning to different ptr[] indices. > > >> > > >> We run a minimal DCE pass at -O0 in our compiler to eliminate all the > > >> garbage > > >> generated by the gimplifier for variable-sized types (people care about > > >> code > > >> size at -O0 in specific contexts) but it does not touch anything written > > >> by > > >> the user (and debugging is unaffected of course). Given that the > > >> builtins are > > >> pure functions and the arguments have no side effects, it eliminates the > > >> calls, but adding a LHS blocks that because this minimal DCE pass > > >> preserves > > >> anything user-related, in particular assignments to user variables. > > >> > > >>> I think it would be better to do that using new calls though, > > >>> and xfail the existing ones when they no longer work. For example: > > >>> > > >>> /* { dg-error "lane -1 out of range 0 - 7" "" {target *-*-*} 0 } */ > > >>> vqdmlal_high_laneq_s16 (int32x4_a, int16x8_b, int16x8_c, -1); > > >>> /* { dg-error "lane -1 out of range 0 - 7" "" {target *-*-*} 0 } */ > > >>> ptr[0] = vqdmlal_high_laneq_s16 (int32x4_a, int16x8_b, int16x8_c, -1); > > >>> > > >>> That way we don't lose the existing tests. > > >> > > >> Frankly I'm not quite sure of what we can lose by adding a LHS here, can > > >> you > > >> elaborate a bit? We would need a solution that works out of the box > > >> with our > > >> compiler in the future, i.e. without having to tweak 50 testcases again. > > > > > > Hi Richard, > > > > > > Thank for your reply ! > > > > > > As Éric, I'm also wondering why having LHS in the existing tests would > > > make us loose them. I guess I'm not familiar enough with this part of > > > the testsuite and I'm missing something. > > > > The problem is that we only enforce lane bounds via calls to > > __builtin_aarch64_im_lane_boundsi. In previous releases, the check > > only happend at RTL expansion time, so the check would be skipped if > > any gimple pass removed the call. Now we do the checking during > > folding, but that still misses cases. E.g., compare the -O0 and -O1 > > behaviour for: > > Actually I looked into the below testcase and > __builtin_aarch64_im_lane_boundsi is not part of the intrinsic. > Basically some intrinsics have their own bounds checking as part of > the builtin rather than using __builtin_aarch64_im_lane_boundsi. > That is the problem shows up in GCC 11 where the folding of > __builtin_aarch64_im_lane_boundsi on the gimple level didn't happen. > I will file a bug report on this regression later tonight or tomorrow.
I opened PR 104396 for this regression. Thanks, Andrew Pinski > > Here are the uses of aarch64_simd_lane_bounds which emit the error > (besides the __builtin_aarch64_im_lane_boundsi builtin itself): > > function: > aarch64_expand_fcmla_builtin > > builtin_simd_arg args: > SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX > SIMD_ARG_LANE_INDEX > SIMD_ARG_LANE_PAIR_INDEX > SIMD_ARG_LANE_QUADTUP_INDEX > > rtl named patterns: > aarch64_ld<nregs>_lane<vstruct_elt> > aarch64_st<nregs>_lane<vstruct_elt> > > Thanks, > Andrew Pinski > > > > > #include <arm_neon.h> > > > > void f(int32x4_t *p0, int16x8_t *p1) { > > vqdmlal_high_laneq_s16(p0[0], p1[0], p1[1], -1); > > //p0[0] = vqdmlal_high_laneq_s16(p0[0], p1[0], p1[1], -1); > > } > > > > -O0 gives the error but -O1 doesn't [https://godbolt.org/z/1KosTY43T]. > > The -O1 behaviour here is wrong: badly-formed calls should be rejected > > with a diagnostic even if the calls are unused. Clang gets this right > > in both cases [https://godbolt.org/z/EGxs8jq97]. > > > > I think keeping the lhs-free calls is important for making sure that > > the -O0 behaviour doesn't regress without the DCE. > > > > Your DCE will regress it, but that's the fault of the arm_neon.h > > implementation rather than the fault of your pass. Having the > > tests but XFAILing them seems like the best way of dealing with that. > > Hopefully we'll then see some progression if the arm_neon.h implementation > > is improved in future. > > > > Thanks, > > Richard