CFP deadline for Toolchains Track @ LPC 2024 extended
The call for proposals deadline for the Toolchains Track at the Linux Plumbers Conference has been extended to the 31th of July! https://lpc.events/event/18/abstracts/ The aim of the Toolchains track is to fix particular toolchain issues which are of the interest of the kernel and, ideally, find solutions in situ, making the best use of the opportunity of live discussion with kernel developers and maintainers. The track will be composed of activities, of variable length depending on the topic being discussed. Each activity is intended to cover a particular topic or issue involving both the Linux kernel and one or more of its associated toolchains and development tools. This includes compiling, linking, assemblers, debuggers and debugging formats, ABI analysis tools, object manipulation, etc.
gcc-11-20240710 is now available
Snapshot gcc-11-20240710 is now available on https://gcc.gnu.org/pub/gcc/snapshots/11-20240710/ and on various mirrors, see https://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 11 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-11 revision d67566cefe7325998cc2471a28e9d3a3016455a0 You'll find: gcc-11-20240710.tar.xz Complete GCC SHA256=b1a85f3a6a6613db48c332736c3c8082439de640e8e28e1a7eeef3e29df4f908 SHA1=359d31b2e70e32a2507f3f0425ead3d7cbc612e0 Diffs from 11-20240703 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-11 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Help with vector cost model
I need some help with the vector cost model for aarch64. I am adding V2HI and V4QI mode support by emulating it using the native V4HI/V8QI instructions (similarly to mmx as SSE is done). The problem is I am running into a cost model issue with gcc.target/aarch64/pr98772.c (wminus is similar to gcc.dg/vect/slp-gap-1.c, just slightly different offsets for the address). It seems like the cost mode is overestimating the number of loads for V8QI case . With the new cost model usage (-march=armv9-a+nosve), I get: ``` t.c:7:21: note: * Analysis succeeded with vector mode V4QI t.c:7:21: note: Comparing two main loops (V4QI at VF 1 vs V8QI at VF 2) t.c:7:21: note: Issue info for V4QI loop: t.c:7:21: note:load operations = 2 t.c:7:21: note:store operations = 1 t.c:7:21: note:general operations = 4 t.c:7:21: note:reduction latency = 0 t.c:7:21: note:estimated min cycles per iteration = 2.00 t.c:7:21: note: Issue info for V8QI loop: t.c:7:21: note:load operations = 12 t.c:7:21: note:store operations = 1 t.c:7:21: note:general operations = 6 t.c:7:21: note:reduction latency = 0 t.c:7:21: note:estimated min cycles per iteration = 4.33 t.c:7:21: note: Weighted cycles per iteration of V4QI loop ~= 4.00 t.c:7:21: note: Weighted cycles per iteration of V8QI loop ~= 4.33 t.c:7:21: note: Preferring loop with lower cycles per iteration t.c:7:21: note: * Preferring vector mode V4QI to vector mode V8QI ``` That is totally wrong and instead of vectorizing using V8QI we vectorize using V4QI and the resulting code is worse. Attached is my current patch for adding V4QI/V2HI to the aarch64 backend (Note I have not finished up the changelog nor the testcases; I have secondary patches that add the testcases already). Is there something I am missing here or are we just over estimating V8QI cost and is something easy to fix? Thanks, Andrew 0001-RFC-aarch64-Start-to-support-v4qi-modes-for-SLP.patch Description: Binary data