Re: Indirect memory addresses vs. lra

2019-08-12 Thread Segher Boessenkool
Hi John,

On Mon, Aug 12, 2019 at 08:47:43AM +0200, John Darrington wrote:
> On Sat, Aug 10, 2019 at 11:12:18AM -0500, Segher Boessenkool wrote:
>  On Sat, Aug 10, 2019 at 08:05:53AM +0200, John Darrington wrote:
>  > Choosing alt 5 in insn 14:  (0) m  (1) m {*movsi}
>  >14: [r40:PSI+0x20]=[r41:PSI]
>  > Inserting insn reload before:
>  >48: r40:PSI=r34:PSI
>  >49: r41:PSI=[y:PSI+0x2f]
>  
>  insn 14 is a mem-to-mem move (another feature not many more modern /
>  more RISCy CPUs have).  That requires both of your address registers.
>  So far, so good.  The reloads (insn 48 and 49) require address
>  registers themselves; that isn't necessarily a problem either.
> 
> So far as I can see, insn 48 is completely redundant.  It's copying a
> pseudo reg (74) into another pseudo reg (40).
> This is pointless and a waste, since insn 14 does not modify 74.
> I don't understand why lra feels the need to do it.

LRA always does this, I think...  it reloads all inputs to all insns
that may need reloading.  It later optimises most of that away again,
but this gives it a lot of freedom to move things around.

Or that is what it always looked like to me.  I haven't looked at the
code to see if that is the real reason, blush.

> If lra knew about (mem (mem ...)) style addressing, then insn 49 would
> also be redundant (which is why I raised the topic).

Yes.  But it probably should be able to deal with things like this, too,
or some other testcases will die a horrible death.

> In summary, what we have is:
> 
> (insn 48 84 49 2 (set (reg/f:PSI 40 [34])
> (reg/f:PSI 74 [34]))
>  (nil))
> (insn 49 48 14 2 (set (reg:PSI 41)
> (mem/f/c:PSI (plus:PSI (reg/f:PSI 9 y)
> (const_int 47 [0x2f])) [3 p+0 S4 A8]))
>  (nil))
> (insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34])
> (const_int 32 [0x20])) [2  S4 A64])
> (mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8])) 
> 
> where, like you say, insns 48 and 49 are reloads.  But these two reloads 
> are unnecessary and cause the machine to run out of PSImode registers.

Anyway, please have patience, and see what Vladimir comes up with.  These
things take time.


Segher


GCC 9.2 Released

2019-08-12 Thread Jakub Jelinek
The GNU Compiler Collection version 9.2 has been released.

GCC 9.2 is a bug-fix release from the GCC 9 branch
containing important fixes for regressions and serious bugs in
GCC 9.1 with more than 68 bugs fixed since the previous release.
This release is available from the FTP servers listed at:

  http://www.gnu.org/order/ftp.html

Please do not contact me directly regarding questions or comments
about this release.  Instead, use the resources available from
http://gcc.gnu.org.

As always, a vast number of people contributed to this GCC release
-- far too many to thank them individually!


GCC 9.3 Status Report (2019-08-12)

2019-08-12 Thread Jakub Jelinek
Status
==

GCC 9.2 has been released and the branch is again open for regression
and documentation fixes.  History makes us expect a GCC 9.3 release
at the end of this or the beginning of next year.


Quality Data


Priority  #   Change from last report
---   ---
P10
P2  198
P3   51   +   2
P4  141
P5   24
---   ---
Total P1-P3 249   +   2
Total   414   +   2


Previous Report
===

https://gcc.gnu.org/ml/gcc/2019-08/msg00016.html


Re: Indirect memory addresses vs. lra

2019-08-12 Thread Vladimir Makarov



On 2019-08-10 2:05 a.m., John Darrington wrote:

On Fri, Aug 09, 2019 at 01:34:36PM -0400, Vladimir Makarov wrote:
  
  If you provide LRA dump for such test (it is better to use

  -fira-verbose=15 to output full RA info into stderr), I probably could
  say more.

I've attached such a dump (generated from 
gcc/testsuite/gcc.c-torture/compile/pr53410-2.c).


Unfortunately, this info is not enough for me to say what is the 
problem.  I only found suspicious that LRA is trying to assign a few 
registers to a pseudo register and fails even though these registers are 
not assigned to anything.  Probably HARD_REGNO_MODE_OK prevents this.  
So it would be interesting to know how many registers of Pmode are 
actually available.


In any case I'll try to look at this problem more on this week using 
your built gcc on gcc135.


  
  The less regs the architecture has, thoke easier to run into such error

  message if something described wrong in the back-end.?? I see your
  architecture is 16-bit micro-controller with only 8 regs, some of them is
  specialized.?? So your architecture is really register constrained.

That's not quite correct.  It is a 24-bit micro-controller (the address
space is 24 bits wide).  There are 2 address registers (plus stack
pointer and program counter) and there are 8 general purpose data
registers (of differing sizes).
  



Re: Expansion of narrowing math built-ins into power instructions

2019-08-12 Thread Tejas Joshi
Hi,
I have the following code in my rs6000.md (I haven't used new TARGET_* yet) :

(define_expand "add_truncdfsf3"
  [(set (match_operand:SF 0 "gpc_reg_operand")
   (float_truncate:SF
   (plus:DF (match_operand:DF 1 "gpc_reg_operand")
(match_operand:DF 2 "gpc_reg_operand"]
  "TARGET_HARD_FLOAT"
  "")

(define_insn "*add_truncdfsf3_fpr"
  [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
   (float_truncate:SF
   (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa")
(match_operand:DF 2 "gpc_reg_operand" "d,wa"]
  "TARGET_HARD_FLOAT"
  "@
   fadds %0,%1,%2
   xsaddsp %x0,%x1,%x2"
  [(set_attr "type" "fp")])

with following optab in optabs.def :

OPTAB_CD(fadd_optab, "add_trunc$b$a3") (what is the
difference between $b$a and $a$b?)

I have also tried adding fadd, add_truncdfsf3 in rs6000-builtin.def,
examined rtl dumps multiple times but couldn't get fadd to be
exapanded. What am I missing here?

Thanks,
Tejas

On Sun, 11 Aug 2019 at 22:29, Segher Boessenkool
 wrote:
>
> Hi Tejas,
>
> On Sat, Aug 10, 2019 at 04:00:53PM +0530, Tejas Joshi wrote:
> > +(define_expand "add_truncdfsf3"
> > +  [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand"))
> > + (plus:DF (match_operand:DF 1 "gpc_reg_operand")
> > +  (match_operand:DF 2 "gpc_reg_operand")))]
> > +  "TARGET_HARD_FLOAT"
> > +  "")
>
> float_extend on the LHS is never correct.  I think the following should
> work, never mind that it looks like it does double rounding, because it
> doesn't (famous last words ;-) ):
>
> (define_expand "add_truncdfsf3"
>   [(set (match_operand:SF 0 "gpc_reg_operand")
> (float_truncate:SF
>   (plus:DF (match_operand:DF 1 "gpc_reg_operand")
>(match_operand:DF 2 "gpc_reg_operand"]
>   "TARGET_HARD_FLOAT"
>   "")
>
> > +(define_insn "*add_truncdfsf3_fpr"
> > +  [(set (float_extend:DF (match_operand:SF 0 "gpc_reg_operand" "="))
> > + (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%")
> > +  (match_operand:DF 2 "gpc_reg_operand" "")))]
> > +  "TARGET_HARD_FLOAT"
> > +  "fadd %0,%1,%2"
> > +  [(set_attr "type" "fp")])
>
> The constraints should be "f", "%d", "d", respectively.   says to
> display something for the mode in a mode iterator.  There is no mode
> iterator here.  (In what you copied this from, there was SFDF).
>
> You want to output "fadds", not "fadd".
>
> Maybe it is easier to immediately write the VSX scalar version for this
> as well?  That's xsaddsp.  Oh, and you need to restrict all of this to
> more recent CPUs, we'll have to do some new TARGET_* flag for that I
> think.
>
> Finally: please send patches to gcc-patches@ (not gcc@).
>
> Thanks,
>
>
> Segher


Re: Expansion of narrowing math built-ins into power instructions

2019-08-12 Thread Segher Boessenkool
On Mon, Aug 12, 2019 at 11:01:11PM +0530, Tejas Joshi wrote:
> I have the following code in my rs6000.md (I haven't used new TARGET_* yet) :
> 
> (define_expand "add_truncdfsf3"
>   [(set (match_operand:SF 0 "gpc_reg_operand")
>(float_truncate:SF
>(plus:DF (match_operand:DF 1 "gpc_reg_operand")
> (match_operand:DF 2 "gpc_reg_operand"]
>   "TARGET_HARD_FLOAT"
>   "")
> 
> (define_insn "*add_truncdfsf3_fpr"
>   [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
>(float_truncate:SF
>(plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa")
> (match_operand:DF 2 "gpc_reg_operand" "d,wa"]
>   "TARGET_HARD_FLOAT"
>   "@
>fadds %0,%1,%2
>xsaddsp %x0,%x1,%x2"
>   [(set_attr "type" "fp")])

Those look fine.  You can also merge them into one:

(define_insn "add_truncdfsf3"
  [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
(float_truncate:SF
  (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa")
   (match_operand:DF 2 "gpc_reg_operand" "d,wa"]
  "TARGET_HARD_FLOAT"
  "@
   fadds %0,%1,%2
   xsaddsp %x0,%x1,%x2"
  [(set_attr "type" "fp")])

> with following optab in optabs.def :
> 
> OPTAB_CD(fadd_optab, "add_trunc$b$a3") (what is the
> difference between $b$a and $a$b?)

Which of the two modes becomes $a and which becomes $b?  It depends on
the definition of fadd_optab what order is expected, I think.


Segher


Re: Expansion of narrowing math built-ins into power instructions

2019-08-12 Thread Joseph Myers
On Mon, 12 Aug 2019, Segher Boessenkool wrote:

> (define_insn "add_truncdfsf3"
>   [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
>   (float_truncate:SF
> (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa")
>  (match_operand:DF 2 "gpc_reg_operand" "d,wa"]

That sort of pattern is incorrect for a fused operation such as fadd, 
because combine could match it for code that is supposed to do separate 
addition and narrowing conversion.  The RTL needs to be something that 
does *not* match the combination of separate operations (just as fma has 
its own RTL, and a separate pass is responsible for converting separate 
operations to fused ones in the -ffp-contract=fast case where it's 
permitted).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Expansion of narrowing math built-ins into power instructions

2019-08-12 Thread Segher Boessenkool
On Mon, Aug 12, 2019 at 09:20:18PM +, Joseph Myers wrote:
> On Mon, 12 Aug 2019, Segher Boessenkool wrote:
> 
> > (define_insn "add_truncdfsf3"
> >   [(set (match_operand:SF 0 "gpc_reg_operand" "=f,wa")
> > (float_truncate:SF
> >   (plus:DF (match_operand:DF 1 "gpc_reg_operand" "%d,wa")
> >(match_operand:DF 2 "gpc_reg_operand" "d,wa"]
> 
> That sort of pattern is incorrect for a fused operation such as fadd, 
> because combine could match it for code that is supposed to do separate 
> addition and narrowing conversion.  The RTL needs to be something that 
> does *not* match the combination of separate operations (just as fma has 
> its own RTL, and a separate pass is responsible for converting separate 
> operations to fused ones in the -ffp-contract=fast case where it's 
> permitted).

Ugh, we allow disabling contraction, I forgot.  Rats.


Segher