CFP deadline for Toolchains Track @ LPC 2024 extended

2024-07-10 Thread Jose E. Marchesi via Gcc


The call for proposals deadline for the Toolchains Track at the Linux
Plumbers Conference has been extended to the 31th of July!

https://lpc.events/event/18/abstracts/

The aim of the Toolchains track is to fix particular toolchain issues
which are of the interest of the kernel and, ideally, find solutions in
situ, making the best use of the opportunity of live discussion with
kernel developers and maintainers.

The track will be composed of activities, of variable length depending
on the topic being discussed. Each activity is intended to cover a
particular topic or issue involving both the Linux kernel and one or
more of its associated toolchains and development tools. This includes
compiling, linking, assemblers, debuggers and debugging formats, ABI
analysis tools, object manipulation, etc.


gcc-11-20240710 is now available

2024-07-10 Thread GCC Administrator via Gcc
Snapshot gcc-11-20240710 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/11-20240710/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 11 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-11 revision d67566cefe7325998cc2471a28e9d3a3016455a0

You'll find:

 gcc-11-20240710.tar.xz   Complete GCC

  SHA256=b1a85f3a6a6613db48c332736c3c8082439de640e8e28e1a7eeef3e29df4f908
  SHA1=359d31b2e70e32a2507f3f0425ead3d7cbc612e0

Diffs from 11-20240703 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-11
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Help with vector cost model

2024-07-10 Thread Andrew Pinski via Gcc
I need some help with the vector cost model for aarch64.
I am adding V2HI and V4QI mode support by emulating it using the
native V4HI/V8QI instructions (similarly to mmx as SSE is done). The
problem is I am running into a cost model issue with
gcc.target/aarch64/pr98772.c (wminus is similar to
gcc.dg/vect/slp-gap-1.c, just slightly different offsets for the
address).
It seems like the cost mode is overestimating the number of loads for
V8QI case .
With the new cost model usage (-march=armv9-a+nosve), I get:
```
t.c:7:21: note:  * Analysis succeeded with vector mode V4QI
t.c:7:21: note:  Comparing two main loops (V4QI at VF 1 vs V8QI at VF 2)
t.c:7:21: note:  Issue info for V4QI loop:
t.c:7:21: note:load operations = 2
t.c:7:21: note:store operations = 1
t.c:7:21: note:general operations = 4
t.c:7:21: note:reduction latency = 0
t.c:7:21: note:estimated min cycles per iteration = 2.00
t.c:7:21: note:  Issue info for V8QI loop:
t.c:7:21: note:load operations = 12
t.c:7:21: note:store operations = 1
t.c:7:21: note:general operations = 6
t.c:7:21: note:reduction latency = 0
t.c:7:21: note:estimated min cycles per iteration = 4.33
t.c:7:21: note:  Weighted cycles per iteration of V4QI loop ~= 4.00
t.c:7:21: note:  Weighted cycles per iteration of V8QI loop ~= 4.33
t.c:7:21: note:  Preferring loop with lower cycles per iteration
t.c:7:21: note:  * Preferring vector mode V4QI to vector mode V8QI
```

That is totally wrong and instead of vectorizing using V8QI we
vectorize using V4QI and the resulting code is worse.

Attached is my current patch for adding V4QI/V2HI to the aarch64
backend (Note I have not finished up the changelog nor the testcases;
I have secondary patches that add the testcases already).
Is there something I am missing here or are we just over estimating
V8QI cost and is something easy to fix?

Thanks,
Andrew


0001-RFC-aarch64-Start-to-support-v4qi-modes-for-SLP.patch
Description: Binary data