On Tue, Sep 26, 2023 at 09:28:08AM +0000, Tamar Christina wrote:
> > -----Original Message-----
> > From: Gcc <gcc-bounces+tamar.christina=arm....@gcc.gnu.org> On Behalf
> > Of Paul Iannetta via Gcc
> > Sent: Tuesday, September 26, 2023 9:54 AM
> > To: Richard Biener <richard.guent...@gmail.com>
> > Cc: Sylvain Noiry <sno...@kalrayinc.com>; gcc@gcc.gnu.org;
> > sylvain.no...@hotmail.fr
> > Subject: Re: Complex numbers support: discussions summary
> > 
> > On Tue, Sep 26, 2023 at 09:30:21AM +0200, Richard Biener via Gcc wrote:
> > > On Mon, Sep 25, 2023 at 5:17 PM Sylvain Noiry via Gcc <gcc@gcc.gnu.org>
> > wrote:
> > > >
> > > > Hi,
> > > >
> > > > We had very interesting discussions during our presentation with
> > > > Paul on the support of complex numbers in gcc at the Cauldron.
> > > >
> > > > Thank you all for your participation !
> > > >
> > > > Here is a small summary from our viewpoint:
> > > >
> > > > - Replace CONCAT with a backend defined internal representation in
> > > > RTL
> > > > --> No particular problems
> > > >
> > > > - Allow backend to write patterns for operation on complex modes
> > > > --> No particular problems
> > > >
> > > > - Conditional lowering depending on whether a pattern exists or not
> > > > --> Concerns when the vectorization of split complex operations
> > > > --> performs
> > > > better
> > > >     than not vectorized unified complex operations
> > > >
> > > > - Centralize complex lowering in cplxlower
> > > > --> No particular problems if it doesn't prevent IEEE compliance and
> > > >     optimizations (like const folding)
> > > >
> > > > - Vectorization of complex operations
> > > > --> 2 representations (interleaved and separated real/imag): cannot
> > > > impose one
> > > >     if some machines prefer the other
> > > > --> Complex are composite modes, the vectorizer assumes that the
> > > > --> inner
> > > > mode is
> > > >     scalar to do some optimizations (which ones ?)
> > > > --> Mixed split/unified complex operations cannot be vectorized
> > > > --> easely Assuming that the inner representation of complex vectors
> > > > --> is let to
> > > > target
> > > >     backends, the vectorizer doesn't know it, which prevent some
> > > > optimizations
> > > >     (which ones ?)
> > > >
> > > > - Explicit vectors of complex
> > > > --> Cplxlower cannot lower it, and moving veclower before cplxlower
> > > > --> is a
> > > > bad
> > > >     idea as it prevents some optimizations
> > > > --> Teaching cplxlower how to deal with vectors of complex seems to
> > > > --> be a
> > > >     reasonable alternative
> > > > --> Concerns about ABI or indexing if the internal representation is
> > > > --> let
> > > > to the
> > > >     backend and differs from the representation in memory
> > > >
> > > > - Impact of the current SLP pattern matching of complex operations
> > > > --> Only with -ffast-math
> > > > --> It can match user defined operations (not C99) that can be
> > > > simplified with a
> > > >     complex instruction
> > > > --> Dedicated opcode and real vector type choosen VS standard opcode
> > > > --> and
> > > > complex
> > > >     mode in our implementation
> > > > --> Need to preserve SLP pattern matching as too many applications
> > > > redefines
> > > >     complex and bypass C99 standard.
> > > > --> So need to harmonize with our implementation
> > > >
> > > > - Support of the pure imaginary type (_Imaginary)
> > > > --> Still not supported by gcc (and llvm), neither in our
> > > > --> implementation Issues comes from the fact that an imaginary is
> > > > --> not a complex with
> > > > real part
> > > >     set to 0
> > > > --> The same issue with complex multiplication by a real (which is
> > > > --> split
> > > > in the
> > > >     frontend, and our implementation hasn't changed it yet)
> > > > --> Idea: Add an attribute to the Tree complex type which specify
> > > > --> pure
> > > > real / pure
> > > >     imaginary / full complex ?
> > > >
> > > > - Fast pattern for IEEE compliant emulated operations
> > > > --> Not enough time to discuss about it
> > > >
> > > > Don't hesitate to add something or bring more precision if you want.
> > > >
> > > > As I said at the end of the presentation, we have written a paper
> > > > which explains our implementation in details. You can find it on the
> > > > wiki page of the Cauldron
> > > >
> > (https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&tar
> > get=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf).
> > >
> > > Thanks for the detailed presentation at the Cauldron.
> > >
> > > My personal summary is that I'm less convinced delaying lowering is
> > > the way to go.
> > 
> > This is not only delayed lowering, if the SPN are there, there is no 
> > lowering at
> > all.
> > 
> > > I do think that if targets implement complex optabs we should use them
> > > but eventually re-discovering complex operations from lowered form is
> > > going to be more useful.
> > 
> > I would not be opposed to rediscovering complex operations but I think that
> > even though, rediscovering a + b, a - b is easy, a * b would still be 
> > doable, but
> > even a / b will be hard.  Even though, I doubt will see a hardware complex
> > division but who knows.  However, once lowered, re-associating a * b * c and
> > more complex expressions is going to be hard.
> > 
> > > That's because as you said, use of _Complex is limited and people
> > > inventing their own representation.
> > 
> > Yes, this would be a step back at first, but, proper support for _Complex 
> > would
> > probably be an incentive for library writers to take them into account.
> > 
> > > SLP vectorization can discover some ops already with the limiting
> > > factor being that we don't specifically search for only complex
> > > operations (plus we expose the result as vector operations, requiring
> > > target support for the vector ops rather than [SD]Cmode operations).
> > 
> > Our only concern with SLP is that it only works within loops.  If we want 
> > to re-
> > discover complex numbers we could either add a dedicated pass before the
> > SLP vectorizer or rely on match.pd?
> 
> SLP doesn't work in just loops. SLP works on scalar statements inside BBs 
> starting
> from sink (constructors, stores, reductions etc).
> I think you're confusing Loop-Aware SLP and SLP (in GCC these are two 
> different
> Passes that share much common code.
> 

Indeed, we conflated both.  Thanks for pointing this out!

Paul

> Tamar
> > 
> > >
> > > There's the gimple-isel.cc or the widen-mul pass that perform
> > > instruction selection which could be enhanced to discover scalar
> > > [SD]Cmode operations.
> > 
> > We'll have another look there.
> > 
> > Thanks,
> > Paul
> > >
> > > Richard.
> > >
> > > > Sylvain
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > 
> > 
> > 
> 
> 
> 
> 
> 




Reply via email to