Sourceware infrastructure updates for Q4 2023

2023-11-28 Thread Mark Wielaard
Sourceware infrastructure community updates for Q4 2023

- 6 months with the Software Freedom Conservancy
- Sourceware @ Fosdem
- OSUOSL provides extra larger arm64 and x86_64 buildbot servers
- No more From rewriting for patches mailinglists

= 6 months with the Software Freedom Conservancy

Sourceware thanks Conservancy for their support and urges the
community to support Conservancy.

Sourceware has only been a Software Freedom Conservancy member project
for just 6 months. But the story started a long time ago and a lot has
happened in that time:

https://sfconservancy.org/blog/2023/nov/27/sourceware-thanks-conservancy/

We hope the community will support the Software Freedom Conservancy
2023 Fundraiser and become a Conservancy Sustainer
https://sfconservancy.org/sustainer

= Sourceware @ Fosdem 

Various Sourceware projects will be present at Fosdem plus some
overseers and of course Conservancy staff.

Get your talk submissions in before end of the week (December 1st) to
these developer rooms:

Debuggers and Analysis tools
gdb, libabigail, systemtap, valgrind, binutils, elfutils, gnupoke, cgen
https://inbox.sourceware.org/6a2e8cbf-0d63-24e7-e3c2-c3d286e2e...@redhat.com/

GCC compiler devroom
gcc, binutils, glibc, newlib
https://inbox.sourceware.org/36fadb0549c3dca716eb3b923d66a11be2c67a61.ca...@redhat.com/

And if you like to organize an online virtual mini-BoF around some
topic or project then the @conservancy BBB server is available for all
Sourceware projects.

https://inbox.sourceware.org/9ca90cd013675a960d47ee09fa4403f69405e9f2.ca...@klomp.org/

= OSUOSL provides extra larger arm64 and x86_64 buildbot servers

There have been complaints about overloaded builders. So OSUOSL have
provided us with another arm64 and x86_64 server. The new servers do
the larger gcc and glibc builds so the other builders can do quicker
(smaller) CI builds without having to wait on the big jobs.

This also frees up the other container builders to do more automated
jobs like the recently added autotools generated files checker for
gcc, binutils and gdb:
https://inbox.sourceware.org/20231115194803.gw31...@gnu.wildebeest.org/

Please contact the builder project build...@sourceware.org if you want
to run some automated jobs on https://builder.sourceware.org/

= No more From rewriting for patches mailinglists

Because of dkim, strict dmarc policies and an old mailman setup
Sourceware mailinglists used From rewriting.

No more! We upgraded mailman, gave up subject prefixes, mail footers,
html stripping and reply-to mangling.

After the libc-alpha and gcc-patches mailinglist tests to avoid From
rewriting worked out nicely we enabled the same settings to some other
mailinglists. The gcc patches lists for libstdc++, libgccjit, fortran
and gcc-rust. And for those projects that use patchwork, newlib,
elfutils, libabigail and gdb.

This hopefully makes mailing patches and using git am on them a bit
nicer. 

Outgoing sourceware email now also includes ARC headers.
https://en.wikipedia.org/wiki/Authenticated_Received_Chain
Feedback on whether this helps email delivery appreciated.

Please contact overseers if you would like the new setting for any
other Sourceware mailinglist.

Thanks to the FSF tech-team for walking us through their setup for
lists.gnu.org



Re: Discussion about arm/aarch64 testcase failures seen with patch for PR111673

2023-11-28 Thread Surya Kumari Jangala via Gcc
Hi Richard,
Thanks a lot for your response!

Another failure reported by the Linaro CI is as follows :
(Note: I am planning to send a separate mail for each failure, as this will make
the discussion easy to track)

FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve 
-moverride=tune=none  check-function-bodies dup_x0_m 

Expected code:

  ...
  add (x[0-9]+), x0, #?1
  mov (p[0-7])\.b, p15\.b
  mov z0\.d, \2/m, \1
  ...
  ret


Code obtained w/o patch:
addvl   sp, sp, #-1
str p15, [sp]
add x0, x0, 1
mov p3.b, p15.b
mov z0.d, p3/m, x0
ldr p15, [sp]
addvl   sp, sp, #1
ret

Code obtained w/ patch:
addvl   sp, sp, #-1
str p15, [sp]
mov p3.b, p15.b
add x0, x0, 1
mov z0.d, p3/m, x0
ldr p15, [sp]
addvl   sp, sp, #1
ret

As we can see, with the patch, the following two instructions are interchanged:
add x0, x0, 1
mov p3.b, p15.b

I believe that this is fine and the test can be modified to allow it to pass on
aarch64. Please let me know what you think.

Regards,
Surya


On 24/11/23 4:18 pm, Richard Earnshaw wrote:
> 
> 
> On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote:
>> Hi Richard,
>> Ping. Please let me know if the test failure that I mentioned in the mail 
>> below can be handled by changing the expected generated code. I am not 
>> conversant with arm, and hence would appreciate your help.
>>
>> Regards,
>> Surya
>>
>> On 03/11/23 4:58 pm, Surya Kumari Jangala wrote:
>>> Hi Richard,
>>> I had submitted a patch for review 
>>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html)
>>> regarding scaling save/restore costs of callee save registers with block
>>> frequency in the IRA pass (PR111673).
>>>
>>> This patch has been approved by VMakarov
>>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html).
>>>
>>> With this patch, we are seeing performance improvements with spec on x86
>>> (exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%).
>>>
>>> I received a mail from Linaro about some failures seen in the CI pipeline 
>>> with
>>> this patch. I have analyzed the failures and I wish to discuss the analysis 
>>> with you.
>>>
>>> One failure reported by the Linaro CI is:
>>>
>>> FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+, 
>>> r[0-9]+, \\[r[0-9]+\\] 2
>>>
>>> The diff in the assembly between trunk and patch is:
>>>
>>> 93c93
>>> <   push    {r4, r5}
>>> ---
    push    {fp}
>>> 95c95
>>> <   ldrexd  r4, r5, [r0]
>>> ---
    ldrexd  fp, ip, [r0]
>>> 99c99
>>> <   pop {r4, r5}
>>> ---
    ldr fp, [sp], #4
>>>
>>>
>>> The test fails with patch because the ldrexd insn uses fp & ip registers 
>>> instead
>>> of r[0-9]+
>>>
>>> But the code produced by patch is better because it is pushing and 
>>> restoring only
>>> one register (fp) instead of two registers (r4, r5). Hence, this test can be
>>> modified to allow it to pass on arm. Please let me know what you think.
>>>
>>> If you need more information, please let me know. I will be sending 
>>> separate mails
>>> for the other test failures.
>>>
> 
> Thanks for looking at this.
> 
> 
> The key part of this test is that the compiler generates LDREXD.  The 
> registers used for that are pretty much irrelevant as we don't match them to 
> any other operations within the test.  So I'd recommend just testing for the 
> mnemonic and not for any of the operands (ie just match "ldrexd\t").
> 
> R.
> 
>>> Regards,
>>> Surya
>>>
>>>
>>>


Re: Discussion about arm/aarch64 testcase failures seen with patch for PR111673

2023-11-28 Thread Richard Earnshaw via Gcc




On 28/11/2023 12:52, Surya Kumari Jangala wrote:

Hi Richard,
Thanks a lot for your response!

Another failure reported by the Linaro CI is as follows :
(Note: I am planning to send a separate mail for each failure, as this will make
the discussion easy to track)

FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve 
-moverride=tune=none  check-function-bodies dup_x0_m

Expected code:

   ...
   add (x[0-9]+), x0, #?1
   mov (p[0-7])\.b, p15\.b
   mov z0\.d, \2/m, \1
   ...
   ret


Code obtained w/o patch:
 addvl   sp, sp, #-1
 str p15, [sp]
 add x0, x0, 1
 mov p3.b, p15.b
 mov z0.d, p3/m, x0
 ldr p15, [sp]
 addvl   sp, sp, #1
 ret

Code obtained w/ patch:
addvl   sp, sp, #-1
 str p15, [sp]
 mov p3.b, p15.b
 add x0, x0, 1
 mov z0.d, p3/m, x0
 ldr p15, [sp]
 addvl   sp, sp, #1
 ret

As we can see, with the patch, the following two instructions are interchanged:
 add x0, x0, 1
 mov p3.b, p15.b


Indeed, both look acceptable results to me, especially given that we 
don't schedule results at -O1.


There's two ways of fixing this:
1) Simply swap the order to what the compiler currently generates (which 
is a little fragile, since it might flip back someday).

2) Write the test as


** (
**   add (x[0-9]+), x0, #?1
**   mov (p[0-7])\.b, p15\.b
**   mov z0\.d, \2/m, \1
** |
**   mov (p[0-7])\.b, p15\.b
**   add (x[0-9]+), x0, #?1
**   mov z0\.d, \1/m, \2
** )

Note, we need to swap the match names in the third insn to account for 
the different order of the earlier instructions.


Neither is ideal, but the second is perhaps a little more bomb proof.

I don't really have a strong feeling either way, but perhaps the second 
is slightly preferable.


Richard S: thoughts?

R.


I believe that this is fine and the test can be modified to allow it to pass on
aarch64. Please let me know what you think.

Regards,
Surya


On 24/11/23 4:18 pm, Richard Earnshaw wrote:



On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote:

Hi Richard,
Ping. Please let me know if the test failure that I mentioned in the mail below 
can be handled by changing the expected generated code. I am not conversant 
with arm, and hence would appreciate your help.

Regards,
Surya

On 03/11/23 4:58 pm, Surya Kumari Jangala wrote:

Hi Richard,
I had submitted a patch for review 
(https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html)
regarding scaling save/restore costs of callee save registers with block
frequency in the IRA pass (PR111673).

This patch has been approved by VMakarov
(https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html).

With this patch, we are seeing performance improvements with spec on x86
(exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%).

I received a mail from Linaro about some failures seen in the CI pipeline with
this patch. I have analyzed the failures and I wish to discuss the analysis 
with you.

One failure reported by the Linaro CI is:

FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+, r[0-9]+, 
\\[r[0-9]+\\] 2

The diff in the assembly between trunk and patch is:

93c93
<   push    {r4, r5}
---

    push    {fp}

95c95
<   ldrexd  r4, r5, [r0]
---

    ldrexd  fp, ip, [r0]

99c99
<   pop {r4, r5}
---

    ldr fp, [sp], #4



The test fails with patch because the ldrexd insn uses fp & ip registers instead
of r[0-9]+

But the code produced by patch is better because it is pushing and restoring 
only
one register (fp) instead of two registers (r4, r5). Hence, this test can be
modified to allow it to pass on arm. Please let me know what you think.

If you need more information, please let me know. I will be sending separate 
mails
for the other test failures.



Thanks for looking at this.


The key part of this test is that the compiler generates LDREXD.  The registers used for 
that are pretty much irrelevant as we don't match them to any other operations within the 
test.  So I'd recommend just testing for the mnemonic and not for any of the operands (ie 
just match "ldrexd\t").

R.


Regards,
Surya





Re: Discussion about arm/aarch64 testcase failures seen with patch for PR111673

2023-11-28 Thread Richard Sandiford via Gcc
Richard Earnshaw  writes:
> On 28/11/2023 12:52, Surya Kumari Jangala wrote:
>> Hi Richard,
>> Thanks a lot for your response!
>> 
>> Another failure reported by the Linaro CI is as follows :
>> (Note: I am planning to send a separate mail for each failure, as this will 
>> make
>> the discussion easy to track)
>> 
>> FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve 
>> -moverride=tune=none  check-function-bodies dup_x0_m
>> 
>> Expected code:
>> 
>>...
>>add (x[0-9]+), x0, #?1
>>mov (p[0-7])\.b, p15\.b
>>mov z0\.d, \2/m, \1
>>...
>>ret
>> 
>> 
>> Code obtained w/o patch:
>>  addvl   sp, sp, #-1
>>  str p15, [sp]
>>  add x0, x0, 1
>>  mov p3.b, p15.b
>>  mov z0.d, p3/m, x0
>>  ldr p15, [sp]
>>  addvl   sp, sp, #1
>>  ret
>> 
>> Code obtained w/ patch:
>>  addvl   sp, sp, #-1
>>  str p15, [sp]
>>  mov p3.b, p15.b
>>  add x0, x0, 1
>>  mov z0.d, p3/m, x0
>>  ldr p15, [sp]
>>  addvl   sp, sp, #1
>>  ret
>> 
>> As we can see, with the patch, the following two instructions are 
>> interchanged:
>>  add x0, x0, 1
>>  mov p3.b, p15.b
>
> Indeed, both look acceptable results to me, especially given that we 
> don't schedule results at -O1.
>
> There's two ways of fixing this:
> 1) Simply swap the order to what the compiler currently generates (which 
> is a little fragile, since it might flip back someday).
> 2) Write the test as
>
>
> ** (
> **   add (x[0-9]+), x0, #?1
> **   mov (p[0-7])\.b, p15\.b
> **   mov z0\.d, \2/m, \1
> ** |
> **   mov (p[0-7])\.b, p15\.b
> **   add (x[0-9]+), x0, #?1
> **   mov z0\.d, \1/m, \2
> ** )
>
> Note, we need to swap the match names in the third insn to account for 
> the different order of the earlier instructions.
>
> Neither is ideal, but the second is perhaps a little more bomb proof.
>
> I don't really have a strong feeling either way, but perhaps the second 
> is slightly preferable.
>
> Richard S: thoughts?

Yeah, I agree the second is probably better.  The | doesn't reset the
capture numbers, so I think the final instruction needs to be:

**   mov z0\.d, \3/m, \4

Thanks,
Richard

>
> R.
>
>> I believe that this is fine and the test can be modified to allow it to pass 
>> on
>> aarch64. Please let me know what you think.
>> 
>> Regards,
>> Surya
>> 
>> 
>> On 24/11/23 4:18 pm, Richard Earnshaw wrote:
>>>
>>>
>>> On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote:
 Hi Richard,
 Ping. Please let me know if the test failure that I mentioned in the mail 
 below can be handled by changing the expected generated code. I am not 
 conversant with arm, and hence would appreciate your help.

 Regards,
 Surya

 On 03/11/23 4:58 pm, Surya Kumari Jangala wrote:
> Hi Richard,
> I had submitted a patch for review 
> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html)
> regarding scaling save/restore costs of callee save registers with block
> frequency in the IRA pass (PR111673).
>
> This patch has been approved by VMakarov
> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html).
>
> With this patch, we are seeing performance improvements with spec on x86
> (exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%).
>
> I received a mail from Linaro about some failures seen in the CI pipeline 
> with
> this patch. I have analyzed the failures and I wish to discuss the 
> analysis with you.
>
> One failure reported by the Linaro CI is:
>
> FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+, 
> r[0-9]+, \\[r[0-9]+\\] 2
>
> The diff in the assembly between trunk and patch is:
>
> 93c93
> <   push    {r4, r5}
> ---
>>     push    {fp}
> 95c95
> <   ldrexd  r4, r5, [r0]
> ---
>>     ldrexd  fp, ip, [r0]
> 99c99
> <   pop {r4, r5}
> ---
>>     ldr fp, [sp], #4
>
>
> The test fails with patch because the ldrexd insn uses fp & ip registers 
> instead
> of r[0-9]+
>
> But the code produced by patch is better because it is pushing and 
> restoring only
> one register (fp) instead of two registers (r4, r5). Hence, this test can 
> be
> modified to allow it to pass on arm. Please let me know what you think.
>
> If you need more information, please let me know. I will be sending 
> separate mails
> for the other test failures.
>
>>>
>>> Thanks for looking at this.
>>>
>>>
>>> The key part of this test is that the compiler generates LDREXD.  The 
>>> registers used for that are pretty much irrelevant as we don't match them 
>>> to any other operation