Hi Saurabh, > On 30 Sep 2024, at 10:36 PM, Saurabh Jha <saurabh....@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Hi Soumya, > > Thank you for the patch. Two clarifications: > > In the instruction pattern's output string, why did you add the 'Z' > prefix before operands? (%0 -> %Z0).
The ‘Z’ prefix is added to ensure that the register name is correctly printed. Normally, %n by default prints the register assigned to operand 'n' using PRINT_OPERAND target hook, as mentioned here: https://gcc.gnu.org/onlinedocs/gccint/Output-Template.html Using %<char>n overrides that target hook for customizing the print name. In most cases, overriding is not necessary since the operands are SVE registers but in this case, since non-SVE registers are used at RTL level, we need to override the hook to ensure that something like: fscale v0.s, p7/m, v0.s, v31.s does not get printed. > Also, maybe you can make your test cases more precise by specifying > which functions generate which instructions. I don't have and SVE test > off the top of my head but have a look at > /gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c > for example. Thanks for the suggestion! I'll update the test case accordingly. Regards, Soumya > Regards, > Saurabh > > > > On 9/30/2024 5:26 PM, Soumya AR wrote: >> This patch uses the FSCALE instruction provided by SVE to implement the >> standard ldexp family of functions. >> >> Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the >> following code: >> >> float >> test_ldexpf (float x, int i) >> { >> return __builtin_ldexpf (x, i); >> } >> >> double >> test_ldexp (double x, int i) >> { >> return __builtin_ldexp(x, i); >> } >> >> GCC Output: >> >> test_ldexpf: >> b ldexpf >> >> test_ldexp: >> b ldexp >> >> Since SVE has support for an FSCALE instruction, we can use this to process >> scalar floats by moving them to a vector register and performing an fscale >> call, >> similar to how LLVM tackles an ldexp builtin as well. >> >> New Output: >> >> test_ldexpf: >> fmov s31, w0 >> ptrue p7.b, all >> fscale z0.s, p7/m, z0.s, z31.s >> ret >> >> test_ldexp: >> sxtw x0, w0 >> ptrue p7.b, all >> fmov d31, x0 >> fscale z0.d, p7/m, z0.d, z31.d >> ret >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> OK for mainline? >> >> Signed-off-by: Soumya AR <soum...@nvidia.com> >> >> gcc/ChangeLog: >> >> * config/aarch64/aarch64-sve.md >> (ldexp<mode>3): Added a new pattern to match ldexp calls with scalar >> floating modes and expand to the existing pattern for FSCALE. >> (@aarch64_pred_<optab><mode>): Extended the pattern to accept SVE >> operands as well as scalar floating modes. >> >> * config/aarch64/iterators.md: >> SVE_FULL_F_SCALAR: Added an iterator to match all FP SVE modes as well >> as SF and DF. >> VPRED: Extended the attribute to handle GPF modes. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/aarch64/sve/fscale.c: New test. >> >