On Sat, Sep 03, 2011 at 11:11:37PM +0200, Uros Bizjak wrote:
> On Sat, Sep 3, 2011 at 5:42 PM, Michael Matz <m...@suse.de> wrote:
> 
> >> > I've decided to not use four new bits from target_flags, and instead
> >> > created a new mask (recip_mask).  Four bits would have fit in target
> >> > bits right now,  but in the future we might want to add more
> >> > specialization, like modes for which the reciprocals are active.
> >> >
> >> > What do you think?
> >>
> >> These new flags looks like a nice addition, but I wonder, why we need
> >> separate options to handle vector recip. A vector rsqrt or rdiv is
> >> generated automatically in the same way as scalar rsqrt or rdiv is
> >> generated, so IMO, -mrecip-sqrt and -mrecip-div should be enough.
> >
> > No, the difference does matter.  Using reciprocal estimates for scalar
> > divs often results in errors in benchmarks because those sometimes are
> > used to feed integer conversions for either index calculations or
> > printouts.  The small rounding errors with the reciprocals lead to
> > incorrect outputs then.  Context where the div can be vectorized often
> > don't have this problem (they're then used purely for calculations over
> > arrays of float data).  For instance spec2006 and polyhedron break with
> > -mrecip purely because of the scalar reciprocals, but work with only
> > vectorized ones.  I.e. users really want to differ between both.
> 
> I agree with your analysis.
> 
> > Also, when this patch goes in I plan to submit another one that activates
> > vectorized rcp/rsqrt under -ffast-math already (that's what ICC happens to
> > do too).
> 
> Great! In the past, we tried to use -mrecip with -ffast-math. IIRC,
> polyhedron broke on scalar rdiv and spec2006 broke on rsqrt. Taking
> into account your analysis above, using separate options and
> activating vectorized ones for -ffast-math makes much sense.

It depends on how accurate the initial guess is and how many fixup passes you
need.  I would imagine AMD and Intel machines would have slightly different
guesses.

> >> For the future - could rs6000 and x86 use the same compile options to
> >> handle reciprocals?
> >
> > I'd guess so.  rs6000 uses a hand-written comma-splitter, which we could
> > reuse.
> 
> Perhaps rs6000 could adopt our approach in addition to its
> comma-splitter? OTOH, whatever is more convenient, I don't care that
> much. I have CC'd rs6000 maintainer for his opinion.

I wrote the comma splitting some time ago, but who knows who is using it now,
so we have to keep it on ppc for awhile yet, even if we eventually deprecate
it.  FWIW, on spec2006, I use -mrecip=rsqrt.  Allowing division slows down one
of the benchmarks because we ran out of registers.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com     fax +1 (978) 399-6899

Reply via email to