Re: Minor correction

2024-06-14 Thread Gerald Pfeifer
On Sun, 2 Jul 2023, ppw0--- via Gcc wrote:
> just wanted to let you know that while going over 
> https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html , I've noticed 
> certain sections, namely haswell, broadwell, skylake, knl, knm, 
> skylake-avx512, cannonlake, icelake-client, icelake-server, cascadelake, 
> cooperlake, tigerlake, sapphirerapids, rocketlake and graniterapids, 
> have the MOVBE instruction mentioned twice.

Thank you for your report and sorry it took us a while to get back to 
this; I'll fix this in a minute.

Gerald

PS: A more descriptive subject might have helped attracting the attention 
of the respective maintainers.


Re: GCC 12.4 Release Candidate available from gcc.gnu.org

2024-06-14 Thread Jonathan Wakely via Gcc
On Thu, 13 Jun 2024 at 09:14, Richard Biener via Gcc  wrote:
>
>
> The first release candidate for GCC 12.4 is available from
>
> https://gcc.gnu.org/pub/gcc/snapshots/12.4.0-RC-20240613/
>
> and shortly its mirrors.  It has been generated from git commit
> r12-10557-g6693b1f3929771.
>
> I have so far bootstrapped and tested the release candidate on
> x86_64-linux.
> Please test it and report any issues to bugzilla.
>
> If all goes well, we'd like to release 12.4 on Thursday, June 20th.

I've just been informed of a regression for --disable-hosted-libstdcxx
(i.e. freestanding) builds which I've fixed for trunk, 14 and 13 with:
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654726.html

It would be nice to fix it for gcc 12.4, although it's not essential
(it was already broken for 14.1 and 13.3).


Re: How to target a processor with very primitive addressing modes?

2024-06-14 Thread Georg-Johann Lay

Am 10.06.24 um 18:35 schrieb Paul Koning:

On Jun 10, 2024, at 11:48 AM, Georg-Johann Lay  wrote:
Am 08.06.24 um 11:32 schrieb Mikael Pettersson via Gcc:

On Thu, Jun 6, 2024 at 8:59 PM Dimitar Dimitrov  wrote:

Have you tried defining TARGET_LEGITIMIZE_ADDRESS for your target? From
a quick search I see that the iq2000 and rx backends are rewriting some
PLUS expression addresses with insn sequence to calculate the address.

I have partial success.
The key was to define both TARGET_LEGITIMATE_ADDRESS_P and an
addptr3 insn.
I had tried TARGET_LEGITIMATE_ADDRESS_P before, together with various
combinations of TARGET_LEGITIMIZE_ADDRESS and
LEGITIMIZE_RELOAD_ADDRESS, but they all threw gcc into reload loops.
My add3 insn clobbers the CC register. The docs say to define
addptr3 in this case, and that eliminated the reload loops.
The issue now is that the machine cannot perform an add without
clobbering the CC register, so I'll have to hide that somehow. When
emitting the asm code, can one check if the CC register is LIVE-OUT
from the insn? If it isn't I shouldn't have to generate code to
preserve it.
/Mikael


There is a different approach like liken by AVR (and maybe some
more targets):

Don't introduce CC until after reload, i.e. keep cbranch insns
and only split them to compare + branch after reload in the
first post reload split pass.

It's some effort because the number of insns is effectively
doubled: One pre-reload version of the insn without CC,
and a postreload version with CC.  On AVR, most insns don't
set CCmode is a usable way, so that works, though not as
well like the killed cc0 representation.


Yes, PDP11 does this also.  And it uses define_subst to create two post-reload 
flavors, one that clobbers CC, one that sets it.  (The CC set from most 
instructions is pretty useable in the sense that it's generally what's needed 
for a compare against zero.)


Then I am not sure whether TARGET_LEGITIMIZE_ADDRESS works in
all situations, in particular when it comes to accessing frame
locations. It might be required to fake-supply reg+offset
addressing, and then split that post-reload.

An example is the Reduced Tiny cores (-mavrtiny) of the AVR
port that only support POST_INC or byte addressing.  Splitting
locations after reload improved code quality a lot.


Did you find a good way to handle POST_INC or similar modes in LRA?  PDP11 
would like to use that (and PRE_DEC) but it seems that LRA is even less willing 
than recent versions of old reload to generate such modes.

paul


avr still uses reload, and even there POST_INC-only and REG-only
addressing isn't well supported.  I'd guess that LRA isn't going
to be an improvement in that regard.

Even when you expand to always POST_INC, the rtl passes will
ignore that and load the address to a new register and to
arithmetic there instead of using the post-inc address.

The problem with X-only addressing is that after regalloc, you
end up with code that does / must do fake addressing modes like

;; insn1: reg1 = FP[off1]
FP += off1 * size1
reg1 = *FP++
FP -= (off1+1) * size1

;; insn2: reg2 = FP[off2]
FP += off2 * size2
reg2 = *FP++
FP -= (off2+1) * size2

You cannot change the frame-pointer like that in regalloc
or LEGITIMATE_ADDRESS etc, and when you post-reload split
into real instructions in order to combine

FP -= (off1+1) * size1
FP += off2 * size2

into one PLUS insn, then you are running into bugs like

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114208

where DSE deletes a store that is NOT dead.

For POST_INC optimization in general, there was an announcement
quite some time ago that someone would have a go at
improving pre- / post-modify situations, but I don't know
whether that was a dead end somewhere in the loop passes...

The old lreg/greg allocator had some astonishing tricks up
its sleeves, but that days are over since SSA.

Johann


Re: “ira_may_move_out_cost” vs “ira_register_move_cost”

2024-06-14 Thread Vladimir Makarov via Gcc



On 6/13/24 00:34, Surya Kumari Jangala wrote:

Hi Vladimir,
With my patch for PR111673 (scale the spill/restore cost of callee-save
register with the frequency of the entry bb in the routine assign_hard_reg() :
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html), the
following Linaro aarch64 test failed due to an extra 'mov' instruction:

__SVBool_t __attribute__((noipa))
callee_pred (__SVBool_t p0, __SVBool_t p1, __SVBool_t p2, __SVBool_t p3,
  __SVBool_t mem0, __SVBool_t mem1)
{
   p0 = svbrkpa_z (p0, p1, p2);
   p0 = svbrkpb_z (p0, p3, mem0);
   return svbrka_z (p0, mem1);
}

With trunk:
 addvl   sp, sp, #-1
 str p14, [sp]
 str p15, [sp, #1, mul vl]
 ldr p14, [x0]
 ldr p15, [x1]
 brkpa   p0.b, p0/z, p1.b, p2.b
 brkpb   p0.b, p0/z, p3.b, p14.b
 brkap0.b, p0/z, p15.b
 ldr p14, [sp]
 ldr p15, [sp, #1, mul vl]
 addvl   sp, sp, #1
 ret

With patch:
   addvl   sp, sp, #-1
 str p14, [sp]
 str p15, [sp, #1, mul vl]
 mov p14.b, p3.b  // extra mov insn
 ldr p15, [x0]
 ldr p3, [x1]
 brkpa   p0.b, p0/z, p1.b, p2.b
 brkpb   p0.b, p0/z, p14.b, p15.b
 brkap0.b, p0/z, p3.b
 ldr p14, [sp]
 ldr p15, [sp, #1, mul vl]
 addvl   sp, sp, #1
 ret

p0-p15 are predicate registers on aarch64 where p0-p3 are caller-save while
p4-p15 are callee-save.

The input RTL for ira pass:

1:   set r112, r68#p0
2:   set r113, r69#p1
3:   set r114, r70#p2
4:   set r115, r71#p3
5:   set r116, x0 #mem0, the 5th parameter
6:   set r108, mem(r116)
7:   set r117, x1 #mem1, the 6th parameter
8:   set r110, mem(r117)
9:   set r100, unspec_brkpa(r112, r113, r114)
10:  set r101, unspec_brkpb(r100, r115, r108)
11:  set r68,  unspec_brka(r101, r110)
12:  ret r68

Here, r68-r71 represent predicate hard regs p0-p3.
With my patch, r113 and r114 are being assigned memory by ira but with trunk 
they are
assigned registers. This in turn leads to a difference in decisions taken by LRA
ultimately leading to the extra mov instruction.

Register assignment w/ patch:

   Popping a5(r112,l0)  -- assign reg p0
   Popping a2(r100,l0)  -- assign reg p0
   Popping a0(r101,l0)  -- assign reg p0
   Popping a1(r110,l0)  -- assign reg p3
   Popping a3(r115,l0)  -- assign reg p2
   Popping a4(r108,l0)  -- assign reg p1
   Popping a6(r113,l0)  -- (memory is more profitable 8000 vs 9000) spill!
   Popping a7(r114,l0)  -- (memory is more profitable 8000 vs 9000) spill!
   Popping a8(r117,l0)  -- assign reg 1
   Popping a9(r116,l0)  -- assign reg 0


With patch, cost of memory is 8000 and it is lesser than the cost of callee-save
register (9000) and hence memory is assigned to r113 and r114. It is interesting
to see that all the callee-save registers are free but none is chosen.

The two instructions in which r113 is referenced are:
2:  set r113, r69 #p1
9:  set r100, unspec_brkpa(r112, r113, r114)

IRA computes the memory cost of an allocno in find_costs_and_classes(). In this 
routine
IRA scans each insn and computes memory cost and cost of register classes for 
each
operand in the insn.

So for insn 2, memory cost of r113 is set to 4000 because this is the cost of 
storing
r69 to memory if r113 is assigned to memory. The possible register classes of 
r113
are ALL_REGS, PR_REGS, PR_HI_REGS and PR_LO_REGS. The cost of moving r69
to r113 if r113 is assigned a register from each of the possible register 
classes is
computed. If r113 is assigned a reg in ALL_REGS, then the cost of the
move is 18000, while if r113 is assigned a register from any of the predicate 
register
classes, then the cost of the move is 2000. This cost is obtained from the array
“ira_register_move_cost”. After scanning insn 9, memory cost of r113
is increased to 8000 because if r113 is assigned memory, we need a load to read 
the
value before using it in the unspec_brkpa. But the register class cost is 
unchanged.

Later in setup_allocno_class_and_costs(), the ALLOCNO_CLASS of r113 is set to 
PR_REGS.
The ALLOCNO_MEMORY_COST of r113 is set to 8000.
The ALLOCNO_HARD_REG_COSTS of each register in PR_REGS is set to 2000.

During coloring, when r113 has to be assigned a register, the cost of 
callee-save
registers in PR_REGS is increased by the spill/restore cost. So the cost
of callee-save registers increases from 2000 to 9000. All the caller-save 
registers
have been assigned to other allocnos, so for r113 memory is assigned
as memory is cheaper than callee-save registers.

However, for r108, the cost is 0 for register classes PR_REGS, PR_HI_REGS and 
PR_LO_REGS.

References of r108:
6:   set r108, mem(r116)
10:  set r101, unspec_brkpb(r100, r115, r108)

It was surprising that while for 

gcc-13-20240614 is now available

2024-06-14 Thread GCC Administrator via Gcc
Snapshot gcc-13-20240614 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/13-20240614/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 13 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-13 revision 24dbdd20dcbd4c560f852cce51aa0754464476f5

You'll find:

 gcc-13-20240614.tar.xz   Complete GCC

  SHA256=f46dbd52e17884a0dee91c1bd4bdca8bb993a1e88066e98093c6bc7fac1ecdd2
  SHA1=f7014ca559392cb9ef30739538f9c75c41c4a031

Diffs from 13-20240607 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-13
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.