Sourceware infrastructure updates for Q4 2023
Sourceware infrastructure community updates for Q4 2023 - 6 months with the Software Freedom Conservancy - Sourceware @ Fosdem - OSUOSL provides extra larger arm64 and x86_64 buildbot servers - No more From rewriting for patches mailinglists = 6 months with the Software Freedom Conservancy Sourceware thanks Conservancy for their support and urges the community to support Conservancy. Sourceware has only been a Software Freedom Conservancy member project for just 6 months. But the story started a long time ago and a lot has happened in that time: https://sfconservancy.org/blog/2023/nov/27/sourceware-thanks-conservancy/ We hope the community will support the Software Freedom Conservancy 2023 Fundraiser and become a Conservancy Sustainer https://sfconservancy.org/sustainer = Sourceware @ Fosdem Various Sourceware projects will be present at Fosdem plus some overseers and of course Conservancy staff. Get your talk submissions in before end of the week (December 1st) to these developer rooms: Debuggers and Analysis tools gdb, libabigail, systemtap, valgrind, binutils, elfutils, gnupoke, cgen https://inbox.sourceware.org/6a2e8cbf-0d63-24e7-e3c2-c3d286e2e...@redhat.com/ GCC compiler devroom gcc, binutils, glibc, newlib https://inbox.sourceware.org/36fadb0549c3dca716eb3b923d66a11be2c67a61.ca...@redhat.com/ And if you like to organize an online virtual mini-BoF around some topic or project then the @conservancy BBB server is available for all Sourceware projects. https://inbox.sourceware.org/9ca90cd013675a960d47ee09fa4403f69405e9f2.ca...@klomp.org/ = OSUOSL provides extra larger arm64 and x86_64 buildbot servers There have been complaints about overloaded builders. So OSUOSL have provided us with another arm64 and x86_64 server. The new servers do the larger gcc and glibc builds so the other builders can do quicker (smaller) CI builds without having to wait on the big jobs. This also frees up the other container builders to do more automated jobs like the recently added autotools generated files checker for gcc, binutils and gdb: https://inbox.sourceware.org/20231115194803.gw31...@gnu.wildebeest.org/ Please contact the builder project build...@sourceware.org if you want to run some automated jobs on https://builder.sourceware.org/ = No more From rewriting for patches mailinglists Because of dkim, strict dmarc policies and an old mailman setup Sourceware mailinglists used From rewriting. No more! We upgraded mailman, gave up subject prefixes, mail footers, html stripping and reply-to mangling. After the libc-alpha and gcc-patches mailinglist tests to avoid From rewriting worked out nicely we enabled the same settings to some other mailinglists. The gcc patches lists for libstdc++, libgccjit, fortran and gcc-rust. And for those projects that use patchwork, newlib, elfutils, libabigail and gdb. This hopefully makes mailing patches and using git am on them a bit nicer. Outgoing sourceware email now also includes ARC headers. https://en.wikipedia.org/wiki/Authenticated_Received_Chain Feedback on whether this helps email delivery appreciated. Please contact overseers if you would like the new setting for any other Sourceware mailinglist. Thanks to the FSF tech-team for walking us through their setup for lists.gnu.org
Re: Discussion about arm/aarch64 testcase failures seen with patch for PR111673
Hi Richard, Thanks a lot for your response! Another failure reported by the Linaro CI is as follows : (Note: I am planning to send a separate mail for each failure, as this will make the discussion easy to track) FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve -moverride=tune=none check-function-bodies dup_x0_m Expected code: ... add (x[0-9]+), x0, #?1 mov (p[0-7])\.b, p15\.b mov z0\.d, \2/m, \1 ... ret Code obtained w/o patch: addvl sp, sp, #-1 str p15, [sp] add x0, x0, 1 mov p3.b, p15.b mov z0.d, p3/m, x0 ldr p15, [sp] addvl sp, sp, #1 ret Code obtained w/ patch: addvl sp, sp, #-1 str p15, [sp] mov p3.b, p15.b add x0, x0, 1 mov z0.d, p3/m, x0 ldr p15, [sp] addvl sp, sp, #1 ret As we can see, with the patch, the following two instructions are interchanged: add x0, x0, 1 mov p3.b, p15.b I believe that this is fine and the test can be modified to allow it to pass on aarch64. Please let me know what you think. Regards, Surya On 24/11/23 4:18 pm, Richard Earnshaw wrote: > > > On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote: >> Hi Richard, >> Ping. Please let me know if the test failure that I mentioned in the mail >> below can be handled by changing the expected generated code. I am not >> conversant with arm, and hence would appreciate your help. >> >> Regards, >> Surya >> >> On 03/11/23 4:58 pm, Surya Kumari Jangala wrote: >>> Hi Richard, >>> I had submitted a patch for review >>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html) >>> regarding scaling save/restore costs of callee save registers with block >>> frequency in the IRA pass (PR111673). >>> >>> This patch has been approved by VMakarov >>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html). >>> >>> With this patch, we are seeing performance improvements with spec on x86 >>> (exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%). >>> >>> I received a mail from Linaro about some failures seen in the CI pipeline >>> with >>> this patch. I have analyzed the failures and I wish to discuss the analysis >>> with you. >>> >>> One failure reported by the Linaro CI is: >>> >>> FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+, >>> r[0-9]+, \\[r[0-9]+\\] 2 >>> >>> The diff in the assembly between trunk and patch is: >>> >>> 93c93 >>> < push {r4, r5} >>> --- push {fp} >>> 95c95 >>> < ldrexd r4, r5, [r0] >>> --- ldrexd fp, ip, [r0] >>> 99c99 >>> < pop {r4, r5} >>> --- ldr fp, [sp], #4 >>> >>> >>> The test fails with patch because the ldrexd insn uses fp & ip registers >>> instead >>> of r[0-9]+ >>> >>> But the code produced by patch is better because it is pushing and >>> restoring only >>> one register (fp) instead of two registers (r4, r5). Hence, this test can be >>> modified to allow it to pass on arm. Please let me know what you think. >>> >>> If you need more information, please let me know. I will be sending >>> separate mails >>> for the other test failures. >>> > > Thanks for looking at this. > > > The key part of this test is that the compiler generates LDREXD. The > registers used for that are pretty much irrelevant as we don't match them to > any other operations within the test. So I'd recommend just testing for the > mnemonic and not for any of the operands (ie just match "ldrexd\t"). > > R. > >>> Regards, >>> Surya >>> >>> >>>
Re: Discussion about arm/aarch64 testcase failures seen with patch for PR111673
On 28/11/2023 12:52, Surya Kumari Jangala wrote: Hi Richard, Thanks a lot for your response! Another failure reported by the Linaro CI is as follows : (Note: I am planning to send a separate mail for each failure, as this will make the discussion easy to track) FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve -moverride=tune=none check-function-bodies dup_x0_m Expected code: ... add (x[0-9]+), x0, #?1 mov (p[0-7])\.b, p15\.b mov z0\.d, \2/m, \1 ... ret Code obtained w/o patch: addvl sp, sp, #-1 str p15, [sp] add x0, x0, 1 mov p3.b, p15.b mov z0.d, p3/m, x0 ldr p15, [sp] addvl sp, sp, #1 ret Code obtained w/ patch: addvl sp, sp, #-1 str p15, [sp] mov p3.b, p15.b add x0, x0, 1 mov z0.d, p3/m, x0 ldr p15, [sp] addvl sp, sp, #1 ret As we can see, with the patch, the following two instructions are interchanged: add x0, x0, 1 mov p3.b, p15.b Indeed, both look acceptable results to me, especially given that we don't schedule results at -O1. There's two ways of fixing this: 1) Simply swap the order to what the compiler currently generates (which is a little fragile, since it might flip back someday). 2) Write the test as ** ( ** add (x[0-9]+), x0, #?1 ** mov (p[0-7])\.b, p15\.b ** mov z0\.d, \2/m, \1 ** | ** mov (p[0-7])\.b, p15\.b ** add (x[0-9]+), x0, #?1 ** mov z0\.d, \1/m, \2 ** ) Note, we need to swap the match names in the third insn to account for the different order of the earlier instructions. Neither is ideal, but the second is perhaps a little more bomb proof. I don't really have a strong feeling either way, but perhaps the second is slightly preferable. Richard S: thoughts? R. I believe that this is fine and the test can be modified to allow it to pass on aarch64. Please let me know what you think. Regards, Surya On 24/11/23 4:18 pm, Richard Earnshaw wrote: On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote: Hi Richard, Ping. Please let me know if the test failure that I mentioned in the mail below can be handled by changing the expected generated code. I am not conversant with arm, and hence would appreciate your help. Regards, Surya On 03/11/23 4:58 pm, Surya Kumari Jangala wrote: Hi Richard, I had submitted a patch for review (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html) regarding scaling save/restore costs of callee save registers with block frequency in the IRA pass (PR111673). This patch has been approved by VMakarov (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html). With this patch, we are seeing performance improvements with spec on x86 (exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%). I received a mail from Linaro about some failures seen in the CI pipeline with this patch. I have analyzed the failures and I wish to discuss the analysis with you. One failure reported by the Linaro CI is: FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+, r[0-9]+, \\[r[0-9]+\\] 2 The diff in the assembly between trunk and patch is: 93c93 < push {r4, r5} --- push {fp} 95c95 < ldrexd r4, r5, [r0] --- ldrexd fp, ip, [r0] 99c99 < pop {r4, r5} --- ldr fp, [sp], #4 The test fails with patch because the ldrexd insn uses fp & ip registers instead of r[0-9]+ But the code produced by patch is better because it is pushing and restoring only one register (fp) instead of two registers (r4, r5). Hence, this test can be modified to allow it to pass on arm. Please let me know what you think. If you need more information, please let me know. I will be sending separate mails for the other test failures. Thanks for looking at this. The key part of this test is that the compiler generates LDREXD. The registers used for that are pretty much irrelevant as we don't match them to any other operations within the test. So I'd recommend just testing for the mnemonic and not for any of the operands (ie just match "ldrexd\t"). R. Regards, Surya
Re: Discussion about arm/aarch64 testcase failures seen with patch for PR111673
Richard Earnshaw writes: > On 28/11/2023 12:52, Surya Kumari Jangala wrote: >> Hi Richard, >> Thanks a lot for your response! >> >> Another failure reported by the Linaro CI is as follows : >> (Note: I am planning to send a separate mail for each failure, as this will >> make >> the discussion easy to track) >> >> FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve >> -moverride=tune=none check-function-bodies dup_x0_m >> >> Expected code: >> >>... >>add (x[0-9]+), x0, #?1 >>mov (p[0-7])\.b, p15\.b >>mov z0\.d, \2/m, \1 >>... >>ret >> >> >> Code obtained w/o patch: >> addvl sp, sp, #-1 >> str p15, [sp] >> add x0, x0, 1 >> mov p3.b, p15.b >> mov z0.d, p3/m, x0 >> ldr p15, [sp] >> addvl sp, sp, #1 >> ret >> >> Code obtained w/ patch: >> addvl sp, sp, #-1 >> str p15, [sp] >> mov p3.b, p15.b >> add x0, x0, 1 >> mov z0.d, p3/m, x0 >> ldr p15, [sp] >> addvl sp, sp, #1 >> ret >> >> As we can see, with the patch, the following two instructions are >> interchanged: >> add x0, x0, 1 >> mov p3.b, p15.b > > Indeed, both look acceptable results to me, especially given that we > don't schedule results at -O1. > > There's two ways of fixing this: > 1) Simply swap the order to what the compiler currently generates (which > is a little fragile, since it might flip back someday). > 2) Write the test as > > > ** ( > ** add (x[0-9]+), x0, #?1 > ** mov (p[0-7])\.b, p15\.b > ** mov z0\.d, \2/m, \1 > ** | > ** mov (p[0-7])\.b, p15\.b > ** add (x[0-9]+), x0, #?1 > ** mov z0\.d, \1/m, \2 > ** ) > > Note, we need to swap the match names in the third insn to account for > the different order of the earlier instructions. > > Neither is ideal, but the second is perhaps a little more bomb proof. > > I don't really have a strong feeling either way, but perhaps the second > is slightly preferable. > > Richard S: thoughts? Yeah, I agree the second is probably better. The | doesn't reset the capture numbers, so I think the final instruction needs to be: ** mov z0\.d, \3/m, \4 Thanks, Richard > > R. > >> I believe that this is fine and the test can be modified to allow it to pass >> on >> aarch64. Please let me know what you think. >> >> Regards, >> Surya >> >> >> On 24/11/23 4:18 pm, Richard Earnshaw wrote: >>> >>> >>> On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote: Hi Richard, Ping. Please let me know if the test failure that I mentioned in the mail below can be handled by changing the expected generated code. I am not conversant with arm, and hence would appreciate your help. Regards, Surya On 03/11/23 4:58 pm, Surya Kumari Jangala wrote: > Hi Richard, > I had submitted a patch for review > (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html) > regarding scaling save/restore costs of callee save registers with block > frequency in the IRA pass (PR111673). > > This patch has been approved by VMakarov > (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html). > > With this patch, we are seeing performance improvements with spec on x86 > (exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%). > > I received a mail from Linaro about some failures seen in the CI pipeline > with > this patch. I have analyzed the failures and I wish to discuss the > analysis with you. > > One failure reported by the Linaro CI is: > > FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+, > r[0-9]+, \\[r[0-9]+\\] 2 > > The diff in the assembly between trunk and patch is: > > 93c93 > < push {r4, r5} > --- >> push {fp} > 95c95 > < ldrexd r4, r5, [r0] > --- >> ldrexd fp, ip, [r0] > 99c99 > < pop {r4, r5} > --- >> ldr fp, [sp], #4 > > > The test fails with patch because the ldrexd insn uses fp & ip registers > instead > of r[0-9]+ > > But the code produced by patch is better because it is pushing and > restoring only > one register (fp) instead of two registers (r4, r5). Hence, this test can > be > modified to allow it to pass on arm. Please let me know what you think. > > If you need more information, please let me know. I will be sending > separate mails > for the other test failures. > >>> >>> Thanks for looking at this. >>> >>> >>> The key part of this test is that the compiler generates LDREXD. The >>> registers used for that are pretty much irrelevant as we don't match them >>> to any other operation