From: Andi Kleen
... that uses -march=native -mtune=native to build a compiler optimized
for the host.
config/ChangeLog:
* bootstrap-native.mk: New file.
gcc/ChangeLog:
* doc/install.texi: Document bootstrap-native.
---
config/bootstrap-native.mk | 1 +
gcc/doc/install.texi
This adds an automatic downloader for the latest test results from
the mailing list archive and supports diffing test_summary to it.
Useful if you don't want to run your own baseline.
contrib/ChangeLog:
* diffsummary.py: New file.
---
contrib/diffsummary.py | 104
Right now ggc has a single free list for multiple sizes. In some cases
the list can get mixed by orders and then the allocator may spend a lot
of time walking the free list to find the right sizes.
This patch splits the free list into multiple free lists by order
which allows O(1) access in most c
When -fprofile-generate is used musttail often fails because the
compiler adds instrumentation after the tail calls.
This patch prevents adding exit extra edges after musttail because for a
tail call the execution leaves the function and can never come back
even on a unwind or exception.
This is
From: Andi Kleen
This isn't a regression, but it's a very simple patch with high
performance improvement, so perhaps suitable in the current stage.
---
bitmap_set_bit checks the original value of the bit to return it to the
caller and then only writes the new value back if it cha
> I'd like to ping the
> https://gcc.gnu.org/pipermail/gcc-patches/2025-March/679182.html
> patch.
> I know it is quite controversial and if clang wouldn't be the first
> to implement this I'd certainly not go that way; I am willing to change
> the warning option names or move the maybe one from -W
> You're right (although I don't remember which targets are
> non-external_musttail).
Several flavors of ARM and Power at least.
Jakub Jelinek writes:
> --- gcc/testsuite/g++.dg/opt/musttail2.C.jj 2025-03-24 13:27:44.329204196
> +0100
> +++ gcc/testsuite/g++.dg/opt/musttail2.C 2025-03-24 13:28:08.975867389
> +0100
> @@ -0,0 +1,14 @@
> +// PR ipa/119376
> +// { dg-do compile { target musttail } }
I think this need
> This can be rewritten as
>
> void foo(int v)
> {
> {
> int a;
> capture(&a);
> if (condition)
> goto tail_position;
> // do something with a
> }
> tail_position:
> tailcall(v);
> }
>
> or with 'do { ... if (...) break; ...} while (0)' when one prefers that to
> goto
On Tue, Mar 25, 2025 at 07:43:28PM +0300, Alexander Monakov wrote:
> Hello,
>
> FWIW I think Clang made a mistake in bending semantics in a way that is
> clearly
> misaligned with the general design of C and C++, where a language-native, so
> to
> speak, solution was available: introduce a scope
> 2025-03-25 Jakub Jelinek
> Andi Kleen
>
> PR gcov-profile/118442
> * profile.cc (branch_prob): Ignore EDGE_FAKE edges from musttail calls
> to EXIT.
>
> * c-c++-common/pr118442.c: New test.
>
> --- gcc/profile.cc.jj 2025-
From: Andi Kleen
When -fprofile-generate is used musttail often fails because the
compiler adds instrumentation after the tail calls.
This patch prevents adding exit extra edges after musttail because for a
tail call the execution leaves the function and can never come back
even on a unwind or
On Thu, Mar 20, 2025 at 06:25:26PM +0100, Jakub Jelinek wrote:
> On Thu, Mar 20, 2025 at 10:01:02AM -0700, Andi Kleen wrote:
> > So it could be as simple as that patch? It solves your test case at least
> > for x86.
>
> Not sure I like this, but if others (e.g. Richi, Josep
On Thu, Mar 20, 2025 at 05:28:48PM +0100, Jakub Jelinek wrote:
> On Thu, Mar 20, 2025 at 09:19:02AM -0700, Andi Kleen wrote:
> > The inlining was just one of the issue, there are some related to
> > different semantics of escaped locals. gcc always errors out while
> > LLVM
On Thu, Mar 20, 2025 at 11:45:33AM -0400, Jason Merrill wrote:
> On 3/19/25 9:31 PM, Andi Kleen wrote:
> > From: Andi Kleen
> >
> > There are multiple reports (see PR 119376) now where semantic differences
> > in the gcc musttail implementation break existing programs
From: Andi Kleen
There are multiple reports (see PR 119376) now where semantic differences
in the gcc musttail implementation break existing programs written for the clang
variant.
Even though that can be all hopefully fixed eventually,
for the gcc 15 release it seems safer to disable clang
From: Andi Kleen
When -fprofile-generate is used musttail often fails because the
compiler adds instrumentation after the tail calls.
This patch prevents adding exit extra edges after musttail because for a
tail call the execution leaves the function and can never come back
even on a unwind or
> This looks wrong to me. Even tail calls can be terminated with exit,
> perform longjmp, do other things for which stmt_can_terminate_bb_p
> should return true. stmt_can_terminate_bb_p is used in many places, not
> just in the predict instrumentation.
Okay so the check should be only used for s
Andi Kleen writes:
> diff --git a/gcc/input.cc b/gcc/input.cc
> index fabfbfb6eaa..d3b12037ba8 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc
> @@ -1325,6 +1325,8 @@ dump_line_table_statistics (void)
>if (s.num_expanded_macros != 0)
> fprintf (stderr, "Av
"James K. Lowden" writes:
>> Having a minimal harness in GCCs testsuite is critical - I'd expect a
>> gcc/testsuite/gcobol.dg/dg.exp supporting execution tests. I assume
>> Cobol has a way to exit OK or fatally and this should be
>> distinguished as testsuite PASS or FAIL.
>
> Yes, a COBOL pro
From: Andi Kleen
The file-cache-lines / file-cache-files tunables were documented in the
wrong section. Fix that.
Reported-by: Filip Kastl
Comitted as obvious.
gcc/ChangeLog:
* doc/invoke.texi:
---
gcc/doc/invoke.texi | 20 ++--
1 file changed, 10 insertions(+), 10
From: Andi Kleen
Document new params in invoke.texi.
The auto tuning description was on the wrong tunable, move to lines.
Comitted as obvious.
gcc/ChangeLog:
* doc/invoke.texi: Document file cache tunables.
* params.opt: Move auto tuning description to lines.
---
gcc/doc
On Tue, Jan 28, 2025 at 09:50:41AM +0100, Richard Biener wrote:
> On Mon, Jan 27, 2025 at 9:59 PM David Malcolm wrote:
> >
> > On Sat, 2025-01-25 at 23:31 -0800, Andi Kleen wrote:
> > > From: Andi Kleen
> > >
> > > This is the hot function in input.cc
&
>
> If I reading this right, calls to get_next_line lead to insertions into
> the ring buffer whilst the buffer is empty or the last line in the ring
> buffer cache is m_line_num - 1.
>
> There are a few places where we update m_line_num, but this caching
> code doesn't seem to touch those places
On Sun, Feb 02, 2025 at 09:35:52PM -0800, Andi Kleen wrote:
> > Patch 7 is OK otherwise, and I'm taking a look at the rest of the
> > patches now; thanks.
>
> Any comments on the other patches?
nm. I see you already commented. somehow i missed that.
-Andi
> Patch 7 is OK otherwise, and I'm taking a look at the rest of the
> patches now; thanks.
Any comments on the other patches?
Thanks,
-Andi
From: Andi Kleen
While the input line cache size now tunable it's better if the compiler
auto tunes it. Otherwise large files needing random file access will
still have to search many lines to find the right lines.
Add support for allocating one line anchor per hundred input lines.
This
This is a fix for slowness accessing random lines in the source file
for diagnostics.
This version I added a unit test as requested by David, and also
added a x86 vectorization hint for the hot line search function (with the
early break work the vectorizer is powerful enough to handle it now)
If
From: Andi Kleen
gcc/ChangeLog:
* input.cc (check_line): New.
(test_replacement): New function to test line caching.
(input_cc_tests): Call test_replacement
---
gcc/input.cc | 46 ++
1 file changed, 46 insertions(+)
diff
From: Andi Kleen
This is the hot function in input.cc
The vectorizer can vectorize it now, but in a generic cpu O2 x86 build it isn't.
Add a automatic target clone to handle it for x86 and build
that function with O3.
The ifdef here is ugly, perhaps gcc should have a more convenient
"
From: Andi Kleen
The input machinery to read the source code independent of the lexer
has a range of hard coded maximum array sizes that can impact performance.
Make them tunable.
input.cc is part of libcommon so it cannot direct access params
without a level of indirection.
gcc/ChangeLog
From: Andi Kleen
The input context file_cache maintains an array of anchors
to speed up accessing lines before the previous line.
The array has a fixed upper size and the algorithm relies
on the linemap reporting the maximum number of lines in the file
in advance to compute the position of each
From: Andi Kleen
With the new cache maintenance algorithm we don't need the
maximum number of lines anymore. Remove all the code for that.
gcc/ChangeLog:
PR preprocessor/118168
* input.cc (total_lines_num): Remove.
(file_cache_slot::evict):
From: Andi Kleen
For larger files the file_cache line index will be spread out to make
the index fit into the fixed buffer, so any access to the non latest line
will need some skipping of lines.
Most accesses for line are near the latest line because
a diagnostic is likely near where the
From: Andi Kleen
Correct the description of inline assembler to say that gcc does
limited assembler parsing to estimate the length of inline assembler
statements, and document that certain assembler primitives can confuse
it.
gcc/ChangeLog:
* doc/extend.texi: Document assembler parsing
From: Andi Kleen
Committed as obvious.
gcc/ChangeLog:
* config/i386/x86-tune-sched-core.cc: Fix incorrect comment.
---
gcc/config/i386/x86-tune-sched-core.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/config/i386/x86-tune-sched-core.cc
b/gcc/config/i386
On Wed, Jan 15, 2025 at 10:41:11PM +0100, Jakub Jelinek wrote:
> Hi!
>
> When writing the gcc-15/changes.html patch posted earlier, I've been
> wondering where significant part of the Basic asm chapter went and the
> problem was the insertion of a new @node in the middle of the Basic Asm
> @node,
On Wed, Jan 08, 2025 at 07:47:27PM -0500, David Malcolm wrote:
> On Wed, 2025-01-08 at 07:48 -0800, Andi Kleen wrote:
> >
> > I wanted to ping this patch series. Thanks.
> >
> > -Andi
> >
>
> Thanks for tha patches, and sorry about not getting back
I wanted to ping this patch series. Thanks.
-Andi
On Tue, Jan 07, 2025 at 08:36:29PM +0100, Jakub Jelinek wrote:
> Hi!
>
> The following patch fixes ICEs when the new inline asm syntax
> to use C++26 static_assert-like constant expressions in place
> of string literals is used in templates.
> As finish_asm_stmt doesn't do any checking for
> proce
Mark Wielaard writes:
> commit 56946c801a7c ("gimple: Add limit after which slower switchlower
> algs are used [PR117091] [PR117352]") introduced a limit on the number
> of cases of a switch. It also bails out on finding jump tables if the
> switch is too large. This introduces a compile time reg
From: Andi Kleen
While the input line cache size now tunable it's better if the compiler
auto tunes it. Otherwise large files needing random file access will
still have to search many lines to find the right lines.
Add support for allocating one line anchor per hundred input lines.
This
From: Andi Kleen
With the new cache maintenance algorithm we don't need the
maximum number of lines anymore. Remove all the code for that.
gcc/ChangeLog:
PR preprocessor/118168
* input.cc (total_lines_num): Remove.
(file_cache_slot::evict):
From: Andi Kleen
The input context file_cache maintains an array of anchors
to speed up accessing lines before the previous line.
The array has a fixed upper size and the algorithm relies
on the linemap reporting the maximum number of lines in the file
in advance to compute the position of each
From: Andi Kleen
For larger files the file_cache line index will be spread out to make
the index fit into the fixed buffer, so any access to the non latest line
will need some skipping of lines.
Most accesses for line are near the latest line because
a diagnostic is likely near where the
This patch kit fixes scaling issues for the input cache,
especially for C, motivated by PR118168.
In overall in number of lines it is practically neutral:
gcc/input.cc | 261
--
gcc/inp
From: Andi Kleen
glibc ferror is surprisingly expensive. Move it out of the hot loop
of finding lines by setting a flag after the actual IO operations.
gcc/ChangeLog:
PR preprocessor/118168
* input.cc (file_cache_slot::m_error): New field.
(file_cache_slot::create
From: Andi Kleen
The input machinery to read the source code independent of the lexer
has a range of hard coded maximum array sizes that can impact performance.
Make them tunable.
input.cc is part of libcommon so it cannot direct access params
without a level of indirection.
gcc/ChangeLog
"James K. Lowden" writes:
> The following 8 patches constitute the 80 files needed to build and
> document the COBOL front end. They assume that following exist:
>
> gcc/cobol/ChangeLog
> libgcobol/ChangeLog
>
> The messages are grouped by files in a more or less logical order,
> but gro
> > What do you think, Andi and Richi? I myself slightly prefer keeping the DP
> > but
> > I would be fine with either option.
>
> I think we can keep both, though I have no strong opinion.
Keeping both is fine for me.
-Andi
> > But yeah, thinking about it some more, 1 seems like a lot. Maybe the
> > limit
> > could be 1000. That's also big enough. I could try to run the testcase
> > set to
> > 1000 on my not-so-powerful laptop this time and check that even on that
> > machine
> > it finishes "fast" (under a
On Tue, Nov 26, 2024 at 04:06:37PM -0800, Andrew Pinski wrote:
> On Thu, Oct 31, 2024 at 1:41 PM Andi Kleen wrote:
> >
> > From: Andi Kleen
> >
> > autofdo looks up inline stacks and tries to match them with the profile
> > data using their symbol name. Mak
On Fri, Nov 15, 2024 at 10:43:57AM +0100, Filip Kastl wrote:
> Hi,
>
> Andi's greedy bit test finding algorithm was reverted. I found a fix for the
> problem that caused the revert. I made this patch to reintroduce the greedy
> alg into GCC. However I think we should keep the old slow but more
On Tue, Jul 30, 2024 at 09:40:42AM -0700, Andi Kleen wrote:
> From: Andi Kleen
>
> ... that uses -march=native -mtune=native to build a compiler optimized
> for the host.
>
> config/ChangeLog:
>
> * bootstrap-native.mk: New file.
>
> gcc/ChangeLog:
On Fri, Nov 01, 2024 at 02:01:18PM -0400, John David Anglin wrote:
> This breaks build on hppa64-hp-hpux11.11. This target has clock_gettime
> but it doesn't have CLOCK_MONOTONIC. It has CLOCK_REALTIME. I modified
> timevar.cc as follows to restore build.
Alternative would be to check for CLOCK
On Tue, Nov 05, 2024 at 09:47:17AM +0100, Richard Biener wrote:
> On Tue, Nov 5, 2024 at 2:02 AM Jason Merrill wrote:
> >
> > On 10/31/24 4:40 PM, Andi Kleen wrote:
> > > From: Andi Kleen
> > >
> > > autofdo looks up inline stacks and tries to match th
From: Andi Kleen
- Fix warnings with newer python versions about bad escapes by
making all the python string raw.
- Add a fallback for using the builtin perf event list if the
CPU model number is unknown.
- Regenerate the shipped gcc-auto-profile with the changes.
contrib/ChangeLog
From: Andi Kleen
When autofdo bootstrap support was originally implemented there were
issues with the LTO bootstrap, that is why it wasn't enabled
for them. I retested this now and it works on x86_64-linux.
Fortran was also missing, not sure why. Also enabled now.
gcc/fortran/Chan
From: Andi Kleen
autofdo looks up inline stacks and tries to match them with the profile
data using their symbol name. Make sure all decls that can be in a inline stack
have a valid assembler name.
This fixes a bootstrap problem with autoprofiledbootstrap and LTO.
2024-10-30 Jason Merrill
> I'm getting a build failure:
>
> timevar.cc:163: undefined reference to `clock_gettime'
>
> Our frozen build tools are intended to produce binaries that work
> "everywhere", so they're a few years old, but apparently something didn't
> configure correctly.
>
> I see that libbacktrace configure
On Wed, Oct 23, 2024 at 02:56:51PM +0200, Richard Biener wrote:
> On Wed, Oct 9, 2024 at 6:18 PM Andi Kleen wrote:
> >
> > From: Andi Kleen
> >
> > Retrieving sys/user time in timevars is quite expensive because it
> > always needs a system call. Only getting
Qing Zhao writes:
> Control this with a new option -fdiagnostics-details.
It would be useful to be also able to print the inline call stack,
maybe with a separate option.
In some array bounds cases I looked at the problem was hidden in some inlines
and it wasn't trivial to figure it out.
I wro
> > However this exposes PR117352 which is a negative interaction of the
> > more aggressive bit test conversion. I don't think it's a show stopper,
> > this can be sorted out later.
>
> I think it is a show stopper for GCC 15 because it is a pretty big
> performance regression with targets that
On Tue, Oct 29, 2024 at 01:50:57PM +0100, Richard Biener wrote:
> On Mon, Oct 28, 2024 at 9:58 PM Andi Kleen wrote:
> >
> > From: Andi Kleen
> >
> > The current switch bit test clustering enumerates all possible case
> > clusters combinations to find ones
From: Andi Kleen
The bit cluster code generation strategy is only beneficial when
multiple case labels point to the same code. Do a quick check if
that is the case before trying to cluster.
This fixes the switch part of PR117091 where all case labels are unique
however it doesn't addres
From: Andi Kleen
The current switch bit test clustering enumerates all possible case
clusters combinations to find ones that fit the bit test constrains
best. This causes performance problems with very large switches.
For bit test clustering which happens naturally in word sized chunks
I don
From: Andi Kleen
gcc/ChangeLog:
* common.opt: Enable -fbit-tests and -fjump-tables only at -O1.
* opts.cc (default_options_table): Dito.
---
gcc/common.opt | 4 ++--
gcc/opts.cc| 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/gcc/common.opt b/gcc
From: Andi Kleen
The bit cluster code generation strategy is only beneficial when
multiple case labels point to the same code. Do a quick check if
that is the case before trying to cluster.
This fixes the switch part of PR117091 where all case labels are unique
however it doesn't addres
From: Andi Kleen
gcc/ChangeLog:
* common.opt: Enable -fbit-tests and -fjump-tables only at -O1.
* tree-switch-conversion.h (jump_table_cluster::is_enabled):
Dito.
---
gcc/common.opt | 4 ++--
gcc/tree-switch-conversion.h | 5 +++--
2 files changed, 5
From: Andi Kleen
Passes test suite. Ok to commit?
gcc/ChangeLog:
PR middle-end/116510
* tree-if-conv.cc (predicate_bbs): Add missing fold_converts.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-switch-ifcvt-3.c: New test.
---
gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt
From: Andi Kleen
Retrieving sys/user time in timevars is quite expensive because it
always needs a system call. Only getting the wall time is much
cheaper because operating systems have optimized paths for this.
The sys time isn't that interesting for a compiler and wall time
is usually
> So, shouldn't we go without the new option and simply change
> -ftime-report behavior?
I think it's fine (given the constraints I outlined earlier).
It will slightly change the output, but I guess there aren't that many
users that parse it mechanically.
I can do that unless someoneelse objects.
From: Andi Kleen
Time vars normally use times(2) to get the user/sys/wall time, which is always a
system call. I don't think the system time is very useful because most overhead
is in user time. If we only use the wall (or monotonic) time modern OS have an
optimized path to get it directly
> The only consumer I know of for the JSON time report data is in the
> integration tests I wrote for -fanalyzer, which assumes that all fields
> are present when printing, and then goes on to use the "user" times for
> summarizing; see this commit FWIW:
> https://github.com/davidmalcolm/gcc-analyz
On Thu, Oct 03, 2024 at 01:48:35PM +, Christophe Lyon wrote:
> Some of the musttail tests (eg musttail7.c) fail on arm-eabi because
> check_effective_target_musttail pass, but the actual code in the test
> is rejected.
Looks good to me. Thanks.
-Andi
> Note that if the user requests SARIF output e.g. with
> -fdiagnostics-format=sarif-stderr
> then any timevar data from -ftime-report is written in JSON form as
> part of the SARIF, rather than in text form to stderr (see
> 75d623946d4b6ea80a777b789b116d4b4a2298dc).
>
> I see that the proposed
From: Andi Kleen
Time vars normally use times(2) to get the user/sys/wall time, which is always a
system call. I don't think the system time is very useful because most overhead
is in user time. If we only use the wall (or monotonic) time modern OS have an
optimized path to get it directly
Jakub Jelinek writes:
> And for kernel perhaps we should add some new option which allows
> some dumb parsing of the toplevel asms and gather something from that
> parsing.
See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107779
> The restrictions I've implemented are:
> 1) asm qualifiers
On Tue, Sep 10, 2024 at 03:29:08AM +, Ramana Radhakrishnan wrote:
> > diff --git a/config/bootstrap-native.mk b/config/bootstrap-native.mk
> > new file mode 100644
> > index ..a4a3d8594089
> > --- /dev/null
> > +++ b/config/bootstrap-native.mk
> > @@ -0,0 +1
Andi Kleen writes:
Ping^4
Could someone please approve this (nearly trivial) patch?
Thanks,
-Andi
> Andi Kleen writes:
>
> Ping^3
>
>> Andi Kleen writes:
>>
>> PING^2 for the patch.
>>
>> (not sure if there is any maintainer to cc here, this is gen
Andi Kleen writes:
PING^3
> Andi Kleen writes:
>
> PING^2 for https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658602.html
>
> This fixes some musttail related test suite failures that cause noise on
> various targets.
>
>> Andi Kleen writes:
>>
>>
Andi Kleen writes:
Ping^3
> Andi Kleen writes:
>
> PING^2 for the patch.
>
> (not sure if there is any maintainer to cc here, this is generic build
> infrastructure)
>
>> Andi Kleen writes:
>>
>> I wanted to ping this patch:
>>
>> https:/
From: Andi Kleen
SPARC does not support vectorizing conditions, which this test relies
on. Use vect_condition as effective target.
Committed as obvious.
PR testsuite/116500
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-switch-ifcvt-1.c: Use vect_condition to
check if
Andi Kleen writes:
PING^2 for the patch.
(not sure if there is any maintainer to cc here, this is generic build
infrastructure)
> Andi Kleen writes:
>
> I wanted to ping this patch:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658729.html
>
>
>> From:
Andi Kleen writes:
PING^2 for https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658602.html
This fixes some musttail related test suite failures that cause noise on
various targets.
> Andi Kleen writes:
>
> I wanted to ping this patch. It fixes test suite noise on various
Andi Kleen writes:
I wanted to ping this patch. I believe Richard ok'ed most of it earlier
but need an ok for the changes resulting from his review too
(but they were mostly only test suite and comment fixes
apart from some minor tweaks)
-Andi
> The gimple-if-to-switch pass con
Andi Kleen writes:
I wanted to ping this patch. It fixes test suite noise on various
targets.
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658602.html
> From: Andi Kleen
>
> This is a new attempt to fix PR116080. The previous try was reverted
> because it just broke a bu
Andi Kleen writes:
I wanted to ping this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658729.html
> From: Andi Kleen
>
> ... that uses -march=native -mtune=native to build a compiler optimized
> for the host.
>
> config/ChangeLog:
>
> * boots
From: Andi Kleen
It is using a class now with a different name.
I will commit as obvious unless someone complains
Also I included this patch by mistake in my earlier if conversion v2
patch. Please ignore that hunk there.
gcc/ChangeLog:
* doc/cfg.texi: Fix references to dom_walker
The gimple-if-to-switch pass converts if statements with
multiple equal checks on the same value to a switch. This breaks
vectorization which cannot handle switches.
Teach the tree-if-conv pass used by the vectorizer to handle
simple switch statements, like those created by if-to-switch earlier.
T
> > But your comment made me realize there is a major bug.
> >
> > if_convertible_switch_p also needs to check that that the labels don't fall
> > through, so the the flow graph is diamond shape. Need some easy way to
> > verify that.
>
> Do we verify this for if()s? That is,
No we do not. Afte
> > + /* Create chain of switch tests for each case. */
> > + tree switch_cond = NULL_TREE;
> > + tree index = gimple_switch_index (sw);
> > + for (unsigned i = 1; i < gimple_switch_num_labels (sw); i++)
> > + {
> > + tree label = gimple_switch
> > Okay for trunk? I would like to check that one in to avoid the noise
> > in the regression reports.
>
> I've tested this version in a few trees.
Thanks Thomas.
> That's because of effective-target 'struct_musttail' for '-m32'
> reporting:
>
> struct_musttail1494739.cc: In function 'foo
On Tue, Aug 06, 2024 at 11:50:00AM -0700, Andi Kleen wrote:
> > - s += 16;
> > + v16qi data, t;
> > + /* Unaligned load. Reading beyond the final newline is safe, since
> > +files.cc:read_file_guts pads the allocation. */
>
> You need to chang
> - s += 16;
> + v16qi data, t;
> + /* Unaligned load. Reading beyond the final newline is safe, since
> + files.cc:read_file_guts pads the allocation. */
You need to change that function to use 32 byte padding as Jakub
pointed out (I forgot that too)
> + data = *(const
> Andi, can you push your own patch?).
Done.
-Andi
Cassio Neri writes:
> Implement the template function teju_jagua which finds the shortest
> representation of a floating-point number. The floating-point type is a
> template parameter and the implementation is generic enough to handle all
> floating-point types of interest, namely, IEEE 754, std
The gimple-if-to-switch pass converts if statements with
multiple equal checks on the same value to a switch. This breaks
vectorization which cannot handle switches.
Teach the tree-if-conv pass used by the vectorizer to handle
simple switch statements, like those created by if-to-switch earlier.
T
> BTW, I noticed that in LLVM there is FP8 support for ARM currently
> undergoing. I will have a look on it to see if everything is mature.
There's even FP8 work for ARM work under way for gcc, see
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659248.html
-andi
Andi Kleen writes:
> From: Andi Kleen
>
> This is a new attempt to fix PR116080. The previous try was reverted
> because it just broke a bunch of tests, hiding the problem.
The previous version still had one failure on powerpc because
of a template call that needs a dg-err
1 - 100 of 855 matches
Mail list logo