On Fri, Feb 4, 2022 at 7:34 PM Andrew Pinski <pins...@gmail.com> wrote:
>
> On Fri, Feb 4, 2022 at 3:21 AM Richard Sandiford via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Sorry, just realised I'd never replied to this.
> >
> > Marc Poulhies <poulh...@adacore.com> writes:
> > > Eric Botcazou <botca...@adacore.com> writes:
> > >>> The new variables seem to be unused, so I think slightly stronger
> > >>> DCE could remove the calls even after the patch.  Perhaps the containing
> > >>> functions should take an int32x4_t *ptr or something, with the calls
> > >>> assigning to different ptr[] indices.
> > >>
> > >> We run a minimal DCE pass at -O0 in our compiler to eliminate all the 
> > >> garbage
> > >> generated by the gimplifier for variable-sized types (people care about 
> > >> code
> > >> size at -O0 in specific contexts) but it does not touch anything written 
> > >> by
> > >> the user (and debugging is unaffected of course).  Given that the 
> > >> builtins are
> > >> pure functions and the arguments have no side effects, it eliminates the
> > >> calls, but adding a LHS blocks that because this minimal DCE pass 
> > >> preserves
> > >> anything user-related, in particular assignments to user variables.
> > >>
> > >>> I think it would be better to do that using new calls though,
> > >>> and xfail the existing ones when they no longer work.  For example:
> > >>>
> > >>>   /* { dg-error "lane -1 out of range 0 - 7" "" {target *-*-*} 0 } */
> > >>>   vqdmlal_high_laneq_s16 (int32x4_a, int16x8_b, int16x8_c, -1);
> > >>>   /* { dg-error "lane -1 out of range 0 - 7" "" {target *-*-*} 0 } */
> > >>>   ptr[0] = vqdmlal_high_laneq_s16 (int32x4_a, int16x8_b, int16x8_c, -1);
> > >>>
> > >>> That way we don't lose the existing tests.
> > >>
> > >> Frankly I'm not quite sure of what we can lose by adding a LHS here, can 
> > >> you
> > >> elaborate a bit?  We would need a solution that works out of the box 
> > >> with our
> > >> compiler in the future, i.e. without having to tweak 50 testcases again.
> > >
> > > Hi Richard,
> > >
> > > Thank for your reply !
> > >
> > > As Éric, I'm also wondering why having LHS in the existing tests would
> > > make us loose them. I guess I'm not familiar enough with this part of
> > > the testsuite and I'm missing something.
> >
> > The problem is that we only enforce lane bounds via calls to
> > __builtin_aarch64_im_lane_boundsi.  In previous releases, the check
> > only happend at RTL expansion time, so the check would be skipped if
> > any gimple pass removed the call.  Now we do the checking during
> > folding, but that still misses cases.  E.g., compare the -O0 and -O1
> > behaviour for:
>
> Actually I looked into the below testcase and
> __builtin_aarch64_im_lane_boundsi is not part of the intrinsic.
> Basically some intrinsics have their own bounds checking as part of
> the builtin rather than using __builtin_aarch64_im_lane_boundsi.
> That is the problem shows up in GCC 11 where the folding of
> __builtin_aarch64_im_lane_boundsi on the gimple level didn't happen.
> I will file a bug report on this regression later tonight or tomorrow.

I opened PR 104396 for this regression.

Thanks,
Andrew Pinski

>
> Here are the uses of aarch64_simd_lane_bounds which emit the error
> (besides the __builtin_aarch64_im_lane_boundsi builtin itself):
>
> function:
> aarch64_expand_fcmla_builtin
>
> builtin_simd_arg args:
> SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX
> SIMD_ARG_LANE_INDEX
> SIMD_ARG_LANE_PAIR_INDEX
> SIMD_ARG_LANE_QUADTUP_INDEX
>
> rtl named patterns:
> aarch64_ld<nregs>_lane<vstruct_elt>
> aarch64_st<nregs>_lane<vstruct_elt>
>
> Thanks,
> Andrew Pinski
>
> >
> > #include <arm_neon.h>
> >
> > void f(int32x4_t *p0, int16x8_t *p1) {
> >     vqdmlal_high_laneq_s16(p0[0], p1[0], p1[1], -1);
> >     //p0[0] = vqdmlal_high_laneq_s16(p0[0], p1[0], p1[1], -1);
> > }
> >
> > -O0 gives the error but -O1 doesn't [https://godbolt.org/z/1KosTY43T].
> > The -O1 behaviour here is wrong: badly-formed calls should be rejected
> > with a diagnostic even if the calls are unused.  Clang gets this right
> > in both cases [https://godbolt.org/z/EGxs8jq97].
> >
> > I think keeping the lhs-free calls is important for making sure that
> > the -O0 behaviour doesn't regress without the DCE.
> >
> > Your DCE will regress it, but that's the fault of the arm_neon.h
> > implementation rather than the fault of your pass.  Having the
> > tests but XFAILing them seems like the best way of dealing with that.
> > Hopefully we'll then see some progression if the arm_neon.h implementation
> > is improved in future.
> >
> > Thanks,
> > Richard

Reply via email to