Hmm, Reducing the iterations to "1 step for float" and "2 steps for double"
I got VE (miscompares) on following benchmarks 416.gamess 453.povray 454.calculix 459.GemsFDTD Benedikt , I have ICE for 444.namd with your patch, not sure if something wrong in my local tree. Regards, Venkat. > -----Original Message----- > From: pins...@gmail.com [mailto:pins...@gmail.com] > Sent: Sunday, June 28, 2015 8:35 PM > To: Kumar, Venkataramanan > Cc: Dr. Philipp Tomsich; Benedikt Huber; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) > estimation in -ffast-math > > > > > > > On Jun 25, 2015, at 9:44 AM, Kumar, Venkataramanan > <venkataramanan.ku...@amd.com> wrote: > > > > I got around ~12% gain with -Ofast -mcpu=cortex-a57. > > I get around 11/12% on thunderX with the patch and the decreasing the > iterations change (1/2) compared to without the patch. > > Thanks, > Andrew > > > > > > Regards, > > Venkat. > > > >> -----Original Message----- > >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > >> ow...@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich > >> Sent: Thursday, June 25, 2015 9:13 PM > >> To: Kumar, Venkataramanan > >> Cc: Benedikt Huber; pins...@gmail.com; gcc-patches@gcc.gnu.org > >> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root > >> (rsqrt) estimation in -ffast-math > >> > >> Kumar, > >> > >> what is the relative gain that you see on Cortex-A57? > >> > >> Thanks, > >> Philipp. > >> > >>>> On 25 Jun 2015, at 17:35, Kumar, Venkataramanan > >>> <venkataramanan.ku...@amd.com> wrote: > >>> > >>> Changing to "1 step for float" and "2 steps for double" gives > >>> better gains > >> now for gromacs on cortex-a57. > >>> > >>> Regards, > >>> Venkat. > >>>> -----Original Message----- > >>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > >>>> ow...@gcc.gnu.org] On Behalf Of Benedikt Huber > >>>> Sent: Thursday, June 25, 2015 4:09 PM > >>>> To: pins...@gmail.com > >>>> Cc: gcc-patches@gcc.gnu.org; philipp.tomsich@theobroma- > systems.com > >>>> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root > >>>> (rsqrt) estimation in -ffast-math > >>>> > >>>> Andrew, > >>>> > >>>>> This is NOT a win on thunderX at least for single precision > >>>>> because you have > >>>> to do the divide and sqrt in the same time as it takes 5 multiples > >>>> (estimate and step are multiplies in the thunderX pipeline). > >>>> Doubles is 10 multiplies which is just the same as what the patch > >>>> does (but it is really slightly less than 10, I rounded up). So in > >>>> the end this is NOT a win at all for thunderX unless we do one less > >>>> step for both single > >> and double. > >>>> > >>>> Yes, the expected benefit from rsqrt estimation is implementation > >>>> specific. If one has a better initial rsqrte or an application that > >>>> can trade precision for execution time, we could offer a command > >>>> line option to do only 2 steps for doulbe and 1 step for float; > >>>> similar to - > >> mrecip-precision for PowerPC. > >>>> What are your thoughts on that? > >>>> > >>>> Best regards, > >>>> Benedikt > >