Re: Expansion of narrowing math built-ins into power instructions

2019-08-11 Thread Segher Boessenkool
Hi Tejas,

On Sun, Aug 11, 2019 at 10:34:26AM +0530, Tejas Joshi wrote:
> > As far as I understand that flag should set the behaviour of the fadd
> > function, not the __builtin_fadd one.  So I don't know.
> 
> According to ISO/IEC TS 18661, I am supposed to implement the fadd
> variants for folding and expand them inline, that take double and long
> double as arguments and return
> addition in appropriate narrower type, float and double. As far as I
> know, we use __builtin_ to call the internal functions? I do not know
> which the only fadd function is.

See the manual, section "Other Built-in Functions Provided by GCC":

  @opindex fno-builtin
  GCC includes built-in versions of many of the functions in the standard
  C library.  These functions come in two forms: one whose names start with
  the @code{__builtin_} prefix, and the other without.  Both forms have the
  same type (including prototype), the same address (when their address is
  taken), and the same meaning as the C library functions even if you specify
  the @option{-fno-builtin} option @pxref{C Dialect Options}).  Many of these
  functions are only optimized in certain cases; if they are not optimized in
  a particular case, a call to the library function is emitted.

> > double precision one.  But instead you want to add two double precision
> > numbers, producing a single precision one?  The fadds instruction fits
> 
> Yes.
> 
> > well to that, but you'll have to check exactly how the fadd() function
> > should behave with respect to rounding and exceptions and the like.

I read 18661-1 now...  and yup, "fadds" will work fine, and there are
no complications like this as far as I see.

For QP to either DP or SP, you can do round-to-odd followed by one of the
conversion instructions.  The ISA manual describes this; I can help you
with it, but first get DP->SP (fadd) working?

For the non-QP long doubles we have...  There is the option of using DP
for it, which isn't standard-compliant, many other archs have it too,
and it is simple anyway, because you have all code for operations
already.  You can mostly just ignore this option.

For double-double...  Well firstly, double-double is on the way out, so
adding new features to it is pretty useless?  Just ignore it unless you
have time left, I'd say.

> In Joseph's initial mail that describes what should be carried out in
> the course of project, about rounding and exceptions. I have strictly
> followed this description for my folding patch :
> 
> * The narrowing functions, e.g. fadd, faddl, daddl, are a bit different
> from most other built-in math.h functions because the return type is
> different from the argument types.  You could start by adding them to
> builtins.def similarly to roundeven (with new macros to handle adding such
> functions for relevant pairs of _FloatN, _FloatNx types).  These functions
> could be folded for constant arguments only if the result is exact, or if
> -fno-rounding-math -fno-trapping-math (and -fno-math-errno if the result
> involves overflow / underflow).

For Power, all five basic operations (add, sub, mul, div, fma) work fine
wrt rounding mode if using the fadds etc. insns, for DP->SP.  All
exceptions work as expected, except maybe underflow and overflow, but
18661 doesn't require much at all for those anyway :-)


Segher


Re: Expansion of narrowing math built-ins into power instructions

2019-08-11 Thread Tejas Joshi
Hello.

> with it, but first get DP->SP (fadd) working?

Can you please review what have I have been trying and facing the
issues on patch :


Thanks,
Tejas


On Sun, 11 Aug 2019 at 12:50, Segher Boessenkool
 wrote:
>
> Hi Tejas,
>
> On Sun, Aug 11, 2019 at 10:34:26AM +0530, Tejas Joshi wrote:
> > > As far as I understand that flag should set the behaviour of the fadd
> > > function, not the __builtin_fadd one.  So I don't know.
> >
> > According to ISO/IEC TS 18661, I am supposed to implement the fadd
> > variants for folding and expand them inline, that take double and long
> > double as arguments and return
> > addition in appropriate narrower type, float and double. As far as I
> > know, we use __builtin_ to call the internal functions? I do not know
> > which the only fadd function is.
>
> See the manual, section "Other Built-in Functions Provided by GCC":
>
>   @opindex fno-builtin
>   GCC includes built-in versions of many of the functions in the standard
>   C library.  These functions come in two forms: one whose names start with
>   the @code{__builtin_} prefix, and the other without.  Both forms have the
>   same type (including prototype), the same address (when their address is
>   taken), and the same meaning as the C library functions even if you specify
>   the @option{-fno-builtin} option @pxref{C Dialect Options}).  Many of these
>   functions are only optimized in certain cases; if they are not optimized in
>   a particular case, a call to the library function is emitted.
>
> > > double precision one.  But instead you want to add two double precision
> > > numbers, producing a single precision one?  The fadds instruction fits
> >
> > Yes.
> >
> > > well to that, but you'll have to check exactly how the fadd() function
> > > should behave with respect to rounding and exceptions and the like.
>
> I read 18661-1 now...  and yup, "fadds" will work fine, and there are
> no complications like this as far as I see.
>
> For QP to either DP or SP, you can do round-to-odd followed by one of the
> conversion instructions.  The ISA manual describes this; I can help you
> with it, but first get DP->SP (fadd) working?
>
> For the non-QP long doubles we have...  There is the option of using DP
> for it, which isn't standard-compliant, many other archs have it too,
> and it is simple anyway, because you have all code for operations
> already.  You can mostly just ignore this option.
>
> For double-double...  Well firstly, double-double is on the way out, so
> adding new features to it is pretty useless?  Just ignore it unless you
> have time left, I'd say.
>
> > In Joseph's initial mail that describes what should be carried out in
> > the course of project, about rounding and exceptions. I have strictly
> > followed this description for my folding patch :
> >
> > * The narrowing functions, e.g. fadd, faddl, daddl, are a bit different
> > from most other built-in math.h functions because the return type is
> > different from the argument types.  You could start by adding them to
> > builtins.def similarly to roundeven (with new macros to handle adding such
> > functions for relevant pairs of _FloatN, _FloatNx types).  These functions
> > could be folded for constant arguments only if the result is exact, or if
> > -fno-rounding-math -fno-trapping-math (and -fno-math-errno if the result
> > involves overflow / underflow).
>
> For Power, all five basic operations (add, sub, mul, div, fma) work fine
> wrt rounding mode if using the fadds etc. insns, for DP->SP.  All
> exceptions work as expected, except maybe underflow and overflow, but
> 18661 doesn't require much at all for those anyway :-)
>
>
> Segher


SSA Trees and Lockless Implementation

2019-08-11 Thread xerofoify
Greetings,
It seems possible after looking at the SSA dominator trees briefly to
make them lockless at
least partly for insertion/deletion/removal of both standard SSA nodes
and PHI nodes.  Am
I right in assuming this or our there some subtle details that I'm
missing when looking at
and understanding the code.

Thanks,

Nick


Re: Expansion of narrowing math built-ins into power instructions

2019-08-11 Thread Segher Boessenkool
Hi Tejas,

On Sat, Aug 10, 2019 at 04:00:53PM +0530, Tejas Joshi wrote:
> +(define_expand "add_truncdfsf3"
> +  [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand"))
> + (plus:DF (match_operand:DF 1 "gpc_reg_operand")
> +  (match_operand:DF 2 "gpc_reg_operand")))]
> +  "TARGET_HARD_FLOAT"
> +  "")

float_extend on the LHS is never correct.  I think the following should
work, never mind that it looks like it does double rounding, because it
doesn't (famous last words ;-) ):

(define_expand "add_truncdfsf3"
  [(set (match_operand:SF 0 "gpc_reg_operand")
(float_truncate:SF
  (plus:DF (match_operand:DF 1 "gpc_reg_operand")
   (match_operand:DF 2 "gpc_reg_operand"]
  "TARGET_HARD_FLOAT"
  "")

> +(define_insn "*add_truncdfsf3_fpr"
> +  [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand" "="))
> + (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%")
> +  (match_operand:DF 2 "gpc_reg_operand" "")))]
> +  "TARGET_HARD_FLOAT"
> +  "fadd %0,%1,%2"
> +  [(set_attr "type" "fp")])

The constraints should be "f", "%d", "d", respectively.   says to
display something for the mode in a mode iterator.  There is no mode
iterator here.  (In what you copied this from, there was SFDF).

You want to output "fadds", not "fadd".

Maybe it is easier to immediately write the VSX scalar version for this
as well?  That's xsaddsp.  Oh, and you need to restrict all of this to
more recent CPUs, we'll have to do some new TARGET_* flag for that I
think.

Finally: please send patches to gcc-patches@ (not gcc@).

Thanks,


Segher


gcc-10-20190811 is now available

2019-08-11 Thread gccadmin
Snapshot gcc-10-20190811 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/10-20190811/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 10 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 274268

You'll find:

 gcc-10-20190811.tar.xz   Complete GCC

  SHA256=9dc8980af4efc54ce52f29e9f1c00dea238ec2b70264814df40d3cd08feb2fe3
  SHA1=89bfdb7b220c6d042de7094267f5346bded3663f

Diffs from 10-20190804 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-10
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Indirect memory addresses vs. lra

2019-08-11 Thread John Darrington
On Sat, Aug 10, 2019 at 11:12:18AM -0500, Segher Boessenkool wrote:
 Hi!
 
 On Sat, Aug 10, 2019 at 08:05:53AM +0200, John Darrington wrote:
 >   Choosing alt 5 in insn 14:  (0) m  (1) m {*movsi}
 >14: [r40:PSI+0x20]=[r41:PSI]
 > Inserting insn reload before:
 >48: r40:PSI=r34:PSI
 >49: r41:PSI=[y:PSI+0x2f]
 
 insn 14 is a mem-to-mem move (another feature not many more modern /
 more RISCy CPUs have).  That requires both of your address registers.
 So far, so good.  The reloads (insn 48 and 49) require address
 registers themselves; that isn't necessarily a problem either.

So far as I can see, insn 48 is completely redundant.  It's copying a
pseudo reg (74) into another pseudo reg (40).
This is pointless and a waste, since insn 14 does not modify 74.
I don't understand why lra feels the need to do it.

If lra knew about (mem (mem ...)) style addressing, then insn 49 would
also be redundant (which is why I raised the topic).

In summary, what we have is:

(insn 48 84 49 2 (set (reg/f:PSI 40 [34])
(reg/f:PSI 74 [34]))
 (nil))
(insn 49 48 14 2 (set (reg:PSI 41)
(mem/f/c:PSI (plus:PSI (reg/f:PSI 9 y)
(const_int 47 [0x2f])) [3 p+0 S4 A8]))
 (nil))
(insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34])
(const_int 32 [0x20])) [2  S4 A64])
(mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8])) 

where, like you say, insns 48 and 49 are reloads.  But these two reloads 
are unnecessary and cause the machine to run out of PSImode registers.
The above could be easier and more efficiently done simply as:

(insn 14 11 15 2 (set 
(mem:SI (plus:PSI (reg/f:PSI 74 [34]) (const_int 32 [0x20])) [2  S4 
A64])
(mem/f/c:PSI (mem:PSI (plus:PSI (reg/f:PSI 9 y)
(const_int 47 [0x2f])) [3 p+0 S4 A8])))


This is exactly what we had before lra messed with things.  It can be
represented in the ISA with one assembler instruction: 
  mov.p (32, x), [47, y]
and if I'm not mistaken, alternative 5 of my "movpsi" pattern should do
this just fine.


 But
 this requires careful juggling.  Maybe you will need some backend code

Could you give a hint into which set of hooks/constraints/predicates
this backend code should go?
 

-- 
Avoid eavesdropping.  Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.