On Wed, 20 Jul 2022, Eric Botcazou wrote:
> > Eric is probably most familiar with this, but can you make sure to bootstrap
> > and test this on a SJLJ EH target? I'm not sure --enable-sjlj-exceptions
> > is well tested anywhere but on targets not supporting DWARF EH and the
> > configury is a b
Ping^2.
On Wed, 20 Jul 2022, Alexander Monakov wrote:
>
> Ping.
>
> On Thu, 7 Jul 2022, Alexander Monakov via Gcc-patches wrote:
>
> > From: Artem Klimov
> >
> > Fix PR99619, which asks to optimize TLS model based on visibility.
> > The fix is implemen
Ping. OK for trunk?
On Mon, 5 Jun 2023, Alexander Monakov wrote:
> Ping for the front-end maintainers' input.
>
> On Mon, 22 May 2023, Richard Biener wrote:
>
> > On Thu, May 18, 2023 at 11:04 PM Alexander Monakov via Gcc-patches
> > wrote:
> > >
> &
On Mon, 10 Jul 2023, liuhongt via Gcc-patches wrote:
> False dependency happens when destination is only updated by
> pternlog. There is no false dependency when destination is also used
> in source. So either a pxor should be inserted, or input operand
> should be set with constraint '0'.
>
>
On Mon, 10 Jul 2023, Michael Matz via Gcc-patches wrote:
> Hello,
>
> the ELF psABI for x86-64 doesn't have any callee-saved SSE
> registers (there were actual reasons for that, but those don't
> matter anymore). This starts to hurt some uses, as it means that
> as soon as you have a call (say
On Mon, 10 Jul 2023, Alexander Monakov wrote:
> > I chose to make it possible to write function definitions with that
> > attribute with GCC adding the necessary callee save/restore code in
> > the xlogue itself.
>
> But you can't trivially restore if the callee i
On Tue, 11 Jul 2023, Richard Biener wrote:
> > > If a function contains calls then GCC can't know which
> > > parts of the XMM regset is clobbered by that, it may be parts
> > > which don't even exist yet (say until avx2048 comes out), so we must
> > > restrict ourself to only save/restore the S
On Tue, 11 Jul 2023, Michael Matz wrote:
> > > To that end I introduce actually two related attributes (for naming
> > > see below):
> > > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> >
> > This is the weak/active form; I'd suggest "preserve_high_sse".
>
> But it preser
On Tue, 11 Jul 2023, Michael Matz wrote:
> Hey,
>
> On Tue, 11 Jul 2023, Alexander Monakov via Gcc-patches wrote:
>
> > > > > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> > > >
> > > > This is the weak/active fo
On Mon, 20 Feb 2023, Richard Biener via Gcc-patches wrote:
> On Sun, Feb 19, 2023 at 2:15 AM Maciej W. Rozycki wrote:
> >
> > > The problem is you don't see it as a divmod in expand_divmod unless you
> > > expose
> > > a divmod optab. See tree-ssa-mathopts.cc's divmod handling.
> >
> > That'
Hi,
On Mon, 6 Mar 2023, Richard Biener via Gcc-patches wrote:
> --- a/gcc/realmpfr.h
> +++ b/gcc/realmpfr.h
> @@ -24,6 +24,26 @@
> #include
> #include
>
> +class auto_mpfr
> +{
> +public:
> + auto_mpfr () { mpfr_init (m_mpfr); }
> + explicit auto_mpfr (mpfr_prec_t prec) { mpfr_init2 (m_mp
On Tue, 7 Mar 2023, Jonathan Wakely wrote:
> > Shouldn't this use the idiom suggested in ansidecl.h, i.e.
> >
> > private:
> > DISABLE_COPY_AND_ASSIGN (auto_mpfr);
>
>
> Why? A macro like that (or a base class like boost::noncopyable) has
> some value in a code base that wants to work fo
On Sat, 13 May 2023, Andrew Pinski via Gcc-patches wrote:
> +/* signbit(x) != 0 ? -x : x -> abs(x)
> + signbit(x) == 0 ? -x : x -> -abs(x) */
> +(for sign (SIGNBIT)
Surprised to see a dummy iterator here. Was this meant to include
float and long double versions of the builtin too (SIGNBITF an
On Sun, 14 May 2023, Alexander Monakov wrote:
> On Sat, 13 May 2023, Andrew Pinski via Gcc-patches wrote:
>
> > +/* signbit(x) != 0 ? -x : x -> abs(x)
> > + signbit(x) == 0 ? -x : x -> -abs(x) */
> > +(for sign (SIGNBIT)
>
> Surprised to see a dummy iter
On Sun, 14 May 2023, Andrew Pinski wrote:
> It is NOT a dummy iterator. SIGNBIT is a operator list that expands to
> "BUILT_IN_SIGNBITF BUILT_IN_SIGNBIT BUILT_IN_SIGNBITL IFN_SIGNBIT".
Ah, it's in cfn-operators.pd in the build tree, not the source tree.
> > On the other hand, the following cl
Since tree-ssa-math-opts may freely contract across statement boundaries
we should enable it only for -ffp-contract=fast instead of disabling it
for -ffp-contract=off.
No functional change, since -ffp-contract=on is not exposed yet.
gcc/ChangeLog:
* tree-ssa-math-opts.cc (convert_mult_to
Implement -ffp-contract=on for C and C++ without changing default
behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN).
gcc/c-family/ChangeLog:
* c-gimplify.cc (fma_supported_p): New helper.
(c_gimplify_expr) [PLUS_EXPR, MINUS_EXPR]: Implement FMA
contraction.
gcc/C
On Mon, 22 May 2023, Richard Biener wrote:
> On Thu, May 18, 2023 at 11:04 PM Alexander Monakov via Gcc-patches
> wrote:
> >
> > Implement -ffp-contract=on for C and C++ without changing default
> > behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN).
>
On Tue, 23 May 2023, Richard Biener wrote:
> > Ah, no, I deliberately decided against that, because that way we would go
> > via gimplify_arg, which would emit all side effects in *pre_p. That seems
> > wrong if arguments had side-effects that should go in *post_p.
>
> Ah, true - that warrants a
Explicitly say that bitwise shifts for narrow types work similar to
element-wise C shifts with integer promotions, which coincides with
OpenCL semantics.
gcc/ChangeLog:
* doc/extend.texi (Vector Extensions): Clarify bitwise shift
semantics.
---
gcc/doc/extend.texi | 7 ++-
1
On Wed, 24 May 2023, Richard Biener wrote:
> On Wed, May 24, 2023 at 2:54 PM Alexander Monakov via Gcc-patches
> wrote:
> >
> > Explicitly say that bitwise shifts for narrow types work similar to
> > element-wise C shifts with integer promotions, which coincides w
On Wed, 24 May 2023, Richard Biener via Gcc-patches wrote:
> I’d have to check the ISAs what they actually do here - it of course depends
> on RTL semantics as well but as you say those are not strictly defined here
> either.
Plus, we can add the following executable test to the testsuite:
#in
Can you supply a tar with tree dumps for me to look at please?
Also, if you can check if the problem can be triggered without a
collapsed loop (e.g. try removing collapse(2), remove mentions of
d2) and if so supply dumps from that instead, I'd appreciate that too.
Alexander
On Wed, 16 Sep 2020,
On Wed, 16 Sep 2020, Tom de Vries wrote:
> [ cc-ing author omp support for nvptx. ]
The issue looks familiar. I recognized it back in 2017 (and LLVM people
recognized it too for their GPU targets). In an attempt to get agreement
to fix the issue "properly" for GCC I found a similar issue that
On Mon, 21 Sep 2020, Martin Liška wrote:
> On 9/6/20 1:24 PM, Sergei Trofimovich wrote:
> > From: Sergei Trofimovich
> >
> > Before the change gcc did not stream correctly TOPN counters
> > if counters belonged to a non-local shared object.
> >
> > As a result zero-section optimization generate
On Mon, 5 Oct 2020, Tom de Vries wrote:
> I've had to modify this patch in two ways:
> - the original test-case stopped failing, though not the
> minimized one, so I added that one as a test-case
> - only testing for ENTER_ALLOC and EXIT, and not explicitly for VOTE_ANY
> in ignore_bb_p also s
On Fri, 16 Sep 2022, Uros Bizjak via Gcc-patches wrote:
> On Fri, Sep 16, 2022 at 3:32 AM Jeff Law via Gcc-patches
> wrote:
> >
> >
> > On 9/15/22 19:06, liuhongt via Gcc-patches wrote:
> > > There's peephole2 submit in 1990s which split cmp mem, 0 to load mem,
> > > reg + test reg, reg. I don't
Hi.
On the high level, I'd be highly uncomfortable with this. I guess we are in
vague agreement that it cannot be efficiently implemented. It also goes
against the good practice of accelerator programming, which requires queueing
work on the accelerator and letting it run asynchronously with the
Hi.
My main concerns remain not addressed:
1) what I said in the opening paragraphs of my previous email;
2) device-issued atomics are not guaranteed to appear atomic to the host
unless using atom.sys and translating for CUDA compute capability 6.0+.
Item 2 is a correctness issue. Item 1 I th
On Tue, 27 Sep 2022, Tobias Burnus wrote:
> Ignoring (1), does the overall patch and this part otherwise look okay(ish)?
>
>
> Caveat: The .sys scope works well with >= sm_60 but not does not handle
> older versions. For those, the __atomic_{load/store}_n are used. I do not
> see a good solut
Ping^3.
On Fri, 5 Aug 2022, Alexander Monakov wrote:
> Ping^2.
>
> On Wed, 20 Jul 2022, Alexander Monakov wrote:
>
> >
> > Ping.
> >
> > On Thu, 7 Jul 2022, Alexander Monakov via Gcc-patches wrote:
> >
> > > From: Artem Klimov
> &g
The crc32q instruction takes 64-bit operands, but ignores high 32 bits
of the destination operand, and zero-extends the result from 32 bits.
Let's model this in the RTL pattern to avoid zero-extension when the
_mm_crc32_u64 intrinsic is used with a 32-bit type.
PR target/106453
gcc/Chang
On Tue, 23 Aug 2022, Alexander Monakov via Gcc-patches wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106453.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msse4.2 -O2 -fdump-rtl-final" } */
> +/* { dg-final { scan-rtl-dump-
On Fri, 26 Aug 2022, Martin Jambor wrote:
> > +/* Check if promoting general-dynamic TLS access model to local-dynamic is
> > + desirable for DECL. */
> > +
> > +static bool
> > +optimize_dyn_tls_for_decl_p (const_tree decl)
> > +{
> > + if (optimize)
> > +return true;
>
> ...this. This
On Fri, 26 Aug 2022, Tobias Burnus wrote:
> @Tom and Alexander: Better suggestions are welcome for the busy loop in
> libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking
> its value.
I think to do that without polling you can use PTX 'brkpt' instruction on the
device and
On Tue, 30 Aug 2022, Martin Jambor wrote:
> There is still the optimize attribute so in fact no, even in non-LTO
> mode if there is no current function, you cannot trust the "global"
> "optimize" thing.
>
> Ideally we would assert that no "analysis" phase of an IPA pass reads
> the global optimiz
gcc.dg/tls/vis-flag-hidden-gd.c: New test.
* gcc.dg/tls/vis-flag-hidden.c: New test.
* gcc.dg/tls/vis-pragma-hidden-gd.c: New test.
* gcc.dg/tls/vis-pragma-hidden.c: New test.
Co-Authored-By: Alexander Monakov
Signed-off-by: Artem Klimov
---
gcc/ipa-visibility.cc
On Mon, 5 Sep 2022, Philipp Tomsich wrote:
> +riscv_mode_rep_extended (scalar_int_mode mode, scalar_int_mode mode_rep)
> +{
> + /* On 64-bit targets, SImode register values are sign-extended to DImode.
> */
> + if (TARGET_64BIT && mode == SImode && mode_rep == DImode)
> +return SIGN_EXTEND
On Sun, 15 May 2022, Rui Ueyama wrote:
[snip]
> > So get_symbols_v3 allows the linker to discard an LTO .o file to solve this.
> >
> > In absence of get_symbols_v3 mold tries to ensure correctness by restarting
> > itself while appending a list of .o files to be discarded to its command
> > line.
On Sun, 15 May 2022, Rui Ueyama wrote:
> > Is that a good tradeoff in the LTO case though? I believe you cannot assume
> > the plugin to be thread-safe, so you're serializing its API calls, right?
> > But the plugin is doing a lot of work, so using the index to feed it with as
> > few LTO objects
On Sun, 15 May 2022, Rui Ueyama wrote:
> > Makes sense, but I still don't understand why mold wants to discover in
> > advance whether the plugin is going to use get_symbols_v3. How would it
> > help with what mold does today to handle the _v2 case?
>
> Currently, mold restarts itself to reset th
On Sun, 15 May 2022, Rui Ueyama wrote:
> > Can you simply restart the linker on first call to get_symbols_v2 instead?
>
> I could, but it may not be a safe timing to call exec(2). I believe we
> are expected to call cleanup_hook after calling all_symbols_read_hook,
> and it is not clear what will
On Sun, 15 May 2022, Rui Ueyama wrote:
> > Regarding files, as far as I can tell, GCC plugin will leave a 'resolution
> > file'
> > on disk, but after re-exec it would recreate it anyway.
>
> Does it recreate a temporary file with the same file name so that
> there's no temporary file left on th
On Mon, 16 May 2022, Rui Ueyama wrote:
> If it is a guaranteed behavior that GCC of all versions that support
> only get_symbols_v2 don't leave a temporary file behind if it is
> suddenly disrupted during get_symbols_v2 execution, then yes, mold can
> restart itself when get_symbols_v2 is called f
On Mon, 16 May 2022, Richard Biener wrote:
> Is there an API document besides the header itself somewhere?
It's on the wiki: https://gcc.gnu.org/wiki/whopr/driver
(sadly the v3 entrypoint was added there without documentation)
Alexander
On Mon, 16 May 2022, Rui Ueyama wrote:
> > @Rui: Am I correct that you're interested in thread-safe claim_file? Is
> > there any
> > other function being called paralely?
>
> Yes, I want a thread-safe claim_file. And that function seems to be
> the only function in mold that is called in paralle
On Mon, 16 May 2022, Martin Liška wrote:
> I've implemented first version of the patch, please take a look.
I'll comment on the patch, feel free to inform me when I should back off
with forcing my opinion in this thread :)
> --- a/include/plugin-api.h
> +++ b/include/plugin-api.h
> @@ -483,6 +48
On Mon, 9 May 2022, Jan Hubicka wrote:
> > On second thought, it might be better to keep the assert, and place the loop
> > under 'if (optimize)'?
>
> The problem is that at IPA level it does not make sense to check
> optimize flag as it is function specific. (shlib is OK to check it
> anywhere
On Fri, 20 May 2022, Richard Biener via Gcc-patches wrote:
> > Still waiting for a suggestion, since "side effect" is the description
> > that made sense to me :-)
>
> I think side-effect captures it quite well even if it overlaps with a term
> used in language standards. Doing c = a << b has th
On Fri, 20 May 2022, Richard Biener wrote:
> On Fri, May 20, 2022 at 8:38 AM Alexander Monakov wrote:
> >
> > On Fri, 20 May 2022, Richard Biener via Gcc-patches wrote:
> >
> > > > Still waiting for a suggestion, since "side effect" is th
On Fri, 20 May 2022, Richard Biener wrote:
> > > > I suggest 'deduce', 'deduction', 'deducing a range'. What the code is
> > > > actually
> > > > doing is deducing that 'b' in 'a / b' cannot be zero. Function in GCC
> > > > might be
> > > > called like 'deduce_ranges_from_stmt'.
> > >
> > > So h
On Mon, 16 May 2022, Alexander Monakov wrote:
> On Mon, 9 May 2022, Jan Hubicka wrote:
>
> > > On second thought, it might be better to keep the assert, and place the
> > > loop
> > > under 'if (optimize)'?
> >
> > The problem is that at
On Mon, 26 Oct 2020, Jakub Jelinek wrote:
> On Mon, Oct 26, 2020 at 07:14:48AM -0700, Julian Brown wrote:
> > This patch adds caching for the stack block allocated for offloaded
> > OpenMP kernel launches on NVPTX. This is a performance optimisation --
> > we observed an average 11% or so performa
Hello Jakub,
On Fri, 12 Nov 2021, Jakub Jelinek via Gcc-patches wrote:
> On Fri, Nov 12, 2021 at 02:27:16PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > On Fri, Nov 12, 2021 at 02:20:23PM +0100, Jakub Jelinek via Gcc-patches
> > wrote:
> > > This patch assumes that .shared variables are initi
On Fri, 12 Nov 2021, Jakub Jelinek via Gcc-patches wrote:
> --- libgomp/config/nvptx/team.c.jj2021-05-25 13:43:02.793121350 +0200
> +++ libgomp/config/nvptx/team.c 2021-11-12 17:49:02.847341650 +0100
> @@ -32,6 +32,7 @@
> #include
>
> struct gomp_thread *nvptx_thrs __attribute__((sh
On Fri, 12 Nov 2021, Jakub Jelinek via Gcc-patches wrote:
> On Fri, Nov 12, 2021 at 08:47:09PM +0100, Jakub Jelinek wrote:
> > The problem is that the argument of the num_teams clause isn't always known
> > before target is launched.
>
> There was a design mistake that the clause has been put
On Thu, 10 Jun 2021, Richard Biener wrote:
> This makes it possible to apply GCCs stable sort algorithm to vec<>
> and also use it with the qsort_r compatible interface.
>
> Alex, any comments?
I'm afraid the patch is not correct, see below; (I'll also point out
errors in comments while at it).
On Thu, 8 Jul 2021, Richard Biener via Gcc-patches wrote:
> You made me lookup idiv and I figured we're not optimally
> handling
>
> int foo (long x, int y)
> {
> return x / y;
> }
>
> by using a 32:32 / 32 bit divide. combine manages to
> see enough to eventually do this though.
We cannot d
nder
On Fri, 9 Dec 2022, Alexander Monakov wrote:
> Model the divider in Lujiazui processors as a separate automaton to
> significantly reduce the overall model size. This should also result
> in improved accuracy, as pipe 0 should be able to accept new
> instructions while the
d `f'
> is known to call longjmp.
>
> As discussed in BZ 57067, the root cause for this is the fact that
> setjmp is not properly modeled in RTL, and therefore the backend
> passes have no normalized way to handle this situation.
>
> As Alexander Monakov noted in the BZ
On Thu, 22 Dec 2022, Qing Zhao wrote:
> > I think scheduling across calls in the pre-RA scheduler is simply an
> > oversight,
> > we do not look at dataflow information and with 50% chance risk extending
> > lifetime of a pseudoregister across a call, causing higher register
> > pressure at
>
On Fri, 23 Dec 2022, Qing Zhao wrote:
> >> I am a little confused, you mean pre-RA scheduler does not look at the
> >> data flow
> >> information at all when scheduling insns across calls currently?
> >
> > I think it does not inspect liveness info, and may extend lifetime of a
> > pseudo
> > ac
On Fri, 23 Dec 2022, Jose E. Marchesi wrote:
> > (scheduling across calls in sched2 is somewhat dubious as well, but
> > it doesn't risk register pressure issues, and on VLIW CPUs it at least
> > can result in better VLIW packing)
>
> Does sched2 actually schedule across calls? All the comment
On Fri, 23 Dec 2022, Qing Zhao wrote:
> Then, sched2 still can move insn across calls?
> So does sched2 have the same issue of incorrectly moving the insn across a
> call which has unknown control flow?
I think problems are unlikely because register allocator assigns pseudos that
cross setj
On Fri, 23 Dec 2022, Qing Zhao wrote:
> BTW, Why sched1 is not enabled on x86 by default?
Register allocation is tricky on x86 due to small number of general-purpose
registers, and sched1 can make it even more difficult. I think before register
pressure modeling was added, sched1 could not be e
On Sat, 24 Dec 2022, Jose E. Marchesi wrote:
> However, there is something I don't understand: wouldn't sched2
> introduce the same problem when -fsched2-use-superblocks is specified?
Superblocks are irrelevant, a call instruction does not end a basic block
and the problematic motion happens wi
On Tue, 3 Jan 2023, Jan Hubicka wrote:
> > * gcc/common/config/i386/i386-common.cc (processor_alias_table):
> > Use CPU_ZNVER4 for znver4.
> > * config/i386/i386.md: Add znver4.md.
> > * config/i386/znver4.md: New.
> OK,
> thanks!
Honza, I'm curious what are your further plans f
On Wed, 11 Mar 2020, Martin Liška wrote:
> > Is there a comprehensive list of plugins out in the wild using the LD
> > plugin API?
>
> I know only about:
> $ ls /usr/lib/bfd-plugins
> liblto_plugin.so.0.0.0 LLVMgold.so
>
> and I know about Alexander Monakov (some
On Sat, 14 Mar 2020, Alexey Neyman wrote:
> Attached is a patch that does it: at -g1, the type attributes are not
> generated.
Two small issues I pointed out the last time are still present:
https://gcc.gnu.org/legacy-ml/gcc-patches/2020-02/msg01646.html
(I did not review the new patch on a more
On Thu, 16 Apr 2020, Martin Liška wrote:
> On 4/16/20 9:57 AM, Richard Biener wrote:
> > Ah, tab vs. spaces. Changed to all spaces now and pushed.
>
> Ah, I've also hit the issue. That's caused by our local vimrc.
> We should exclude tab options for .py files.
I think your patch is correct.
On Thu, 16 Apr 2020, Martin Liška wrote:
> To be honest I have:
> autocmd Filetype python setlocal expandtab tabstop=4 shiftwidth=4
> softtabstop=4
>
> in my default vim config.
> But I'm wondering what's default for 'python' Filetype?
Since October 2013 Vim ftplugin/python.vim has:
" As sugges
On Tue, 5 May 2020, Richard Biener wrote:
>
> Pushed as obvious.
>
> C++ makes mismatched prototype and implementation OK.
(because of overloads)
I think this would have been caught if GCC enabled -Wmissing-declarations
during bootstrap, and the main reason we have this problem is that the
On Fri, 8 May 2020, Richard Biener wrote:
>
> Currently we fail to optimize those which are used when MIN/MAX_EXPR
> cannot be used for FP values but the target has IEEE conforming
> implementations.
i386 ieee_s{min,max} patterns are definitely not IEEE-compliant,
their comment alludes to tha
On Fri, 8 May 2020, Uros Bizjak wrote:
> > Am I missing something?
>
> Is the above enough to declare min/max as IEEE compliant?
No. SSE min/max instructions semantics match C expression x < y ? x : y.
IEEE min/max operations are commutative when exactly one operand is a NaN,
and so are C fmin/f
On Sun, 10 May 2020, Uros Bizjak wrote:
> So, I found [1], that tries to explain this issue.
>
> [1] https://2pi.dk/2016/05/ieee-min-max
I would also recommend reading this report that covers a few more
architectures and issues with IEEE754 definitions:
http://grouper.ieee.org/groups/msc/ANS
On Mon, 11 May 2020, Richard Sandiford wrote:
> Like you say, the idea is that since the operation is commutative and
> is the same in both vector and scalar form, there's no reason to require
> any -ffast-math flags.
Note that PR88540 that Richard is referencing uses open-coded x < y ? x : y
(no
On Sun, 31 May 2020, H.J. Lu via Gcc-patches wrote:
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -7656,6 +7656,90 @@ ix86_expand_set_or_cpymem (rtx dst, rtx src, rtx
> count_exp, rtx val_exp,
>return true;
> }
>
> +/* Expand cmpstrn or memcmp. */
> +
>
Hi,
On Wed, 2 Dec 2020, Martin Liška wrote:
> Hey.
>
> I see the current help description of GCC hooks not much useful:
>
> $ help user-defined
[snip]
> trt -- GCC hook: trt [tree]
>
> It's quite hard to be familiar what each hooks means and rather suggest:
>
[snip]
> trt -- GCC hook: TREE_TY
On Tue, 8 Dec 2020, Julian Brown wrote:
> Ping?
This has addressed my concerns, thanks.
Alexander
> On Fri, 13 Nov 2020 20:54:54 +
> Julian Brown wrote:
>
> > Hi Alexander,
> >
> > Thanks for the review! Comments below.
> >
> > On Tue, 10 Nov
On Wed, 9 Nov 2022, Philipp Tomsich wrote:
> > To give a specific example that will be problematic if you go far enough
> > down
> > the road of matching MIPS64 behavior:
> >
> > long f(void)
> > {
> > int x;
> > asm("" : "=r"(x));
> > return x;
> > }
> >
> > here GCC (unlike LLVM) om
can't easily jump to Current development documentation.
For this I would suggest using the tag to neatly fold links for
old releases. Please see the attached patch.
AlexanderFrom ab6ce8c24aa17dba8ed79f3c3f7a5e8038dd3205 Mon Sep 17 00:00:00 2001
From: Alexander Monakov
Date: Wed, 9 Nov 2022 22:17:1
On Thu, 10 Nov 2022, Martin Liška wrote:
> On 11/10/22 08:29, Gerald Pfeifer wrote:
> > On Wed, 9 Nov 2022, Alexander Monakov wrote:
> >> For this I would suggest using the tag to neatly fold links
> >> for old releases. Please see the attached patch.
&g
On Mon, 7 Nov 2022, Alexander Monakov wrote:
>
> On Tue, 1 Nov 2022, Alexander Monakov wrote:
>
> > Hi,
> >
> > I'm sending followup fixes for combinatorial explosion of znver scheduling
> > automaton tables as described in the earlier thread:
>
On Mon, 14 Nov 2022, Joshi, Tejas Sanjay wrote:
> [Public]
>
> Hi,
Hi. I'm still waiting for feedback on fixes for existing models:
https://inbox.sourceware.org/gcc-patches/5ae6fc21-edc6-133-aee2-a41e16eb...@ispras.ru/T/#t
did you have a chance to look at those?
> PFA the patch which adds znv
On Tue, 15 Nov 2022, Joshi, Tejas Sanjay wrote:
> > > +;; AVX instructions
> > > +(define_insn_reservation "znver4_sse_log" 1
> > > + (and (eq_attr "cpu" "znver4")
> > > + (and (eq_attr "type" "sselog,sselog1")
> > > +
On Tue, 15 Nov 2022, Jonathan Wakely via Gcc-patches wrote:
> > @item -mrelax-cmpxchg-loop
> > @opindex mrelax-cmpxchg-loop
> >-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
> >-execute pause if load value is not expected. This reduces excessive
> >-cachline bouncing whe
On Tue, 15 Nov 2022, Jonathan Wakely wrote:
> > How about the following:
> >
> > When emitting a compare-and-swap loop for @ref{__sync Builtins}
> > and @ref{__atomic Builtins} lacking a native instruction, optimize
> > for the highly contended case by issuing an atomic load before the
> > @code
On Wed, 16 Nov 2022, Hongyu Wang wrote:
> > When emitting a compare-and-swap loop for @ref{__sync Builtins}
> > and @ref{__atomic Builtins} lacking a native instruction, optimize
> > for the highly contended case by issuing an atomic load before the
> > @code{CMPXCHG} instruction, and using the
On Wed, 16 Nov 2022, Jan Hubička wrote:
> This looks really promising. I will experiment with the patch for separate
> znver3 model, but I think we should be able to keep
> them unified and hopefully get both less code duplicatoin and table sizes.
Do you mean separate znver4 (not '3') model (i
When instrumentation is requested via -fsanitize-coverage=trace-pc, GCC
emits calls to __sanitizer_cov_trace_pc callback into each basic block.
This callback is supposed to be implemented by the user, and should be
able to identify the containing basic block by inspecting its return
address. Tailca
Clean up confusing changes from the recent refactoring for
parallel match.pd build.
gimple-match-head.o is not built. Remove related flags adjustment.
Autogenerated gimple-match-N.o files do not depend on
gimple-match-exports.cc.
{gimple,generic)-match-auto.h only depend on the prerequisites of
On Fri, 5 May 2023, Tamar Christina wrote:
> > > Am 05.05.2023 um 19:03 schrieb Alexander Monakov via Gcc-patches > patc...@gcc.gnu.org>:
> > >
> > > Clean up confusing changes from the recent refactoring for parallel
> > > match.pd build.
> >
On Fri, 5 May 2023, Tamar Christina wrote:
> > -Original Message-
> > From: Alexander Monakov
> > Sent: Friday, May 5, 2023 6:59 PM
> > To: Tamar Christina
> > Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH] Makefile.in: cl
On Fri, 5 May 2023, Alexander Monakov wrote:
> > > gimple-head-export.cc does not exist.
> > >
> > > gimple-match-exports.cc is not a generated file. It's under source
> > > control and
> > > edited independently from genmatch.cc. It is compil
I'm trying to study match.pd/genmatch with the eventual goal of
improving match-and-simplify code generation. Here's some trivial
cleanups for the recent refactoring in the meantime.
Alexander Monakov (3):
genmatch: clean up emit_func
genmatch: clean up showUsage
genma
Display usage more consistently and get rid of camelCase.
gcc/ChangeLog:
* genmatch.cc (showUsage): Reimplement as ...
(usage): ...this. Adjust all uses.
(main): Print usage when no arguments. Add missing 'return 1'.
---
gcc/genmatch.cc | 21 ++---
1 fil
get_out_file did not follow the coding conventions (mixing three-space
and two-space indentation, missing linebreak before function name).
Take that as an excuse to reimplement it in a more terse manner and
rename as 'choose_output', which is hopefully more descriptive.
gcc/ChangeLog:
*
Eliminate boolean parameters of emit_func. The first ('open') just
prints 'extern' to generated header, which is unnecessary. Introduce a
separate function to use when finishing a declaration in place of the
second ('close').
Rename emit_func to 'fp_decl' (matching 'fprintf' in length) to unbreak
On Wed, 25 May 2022, liuhongt via Gcc-patches wrote:
> Rigt now, mem_cost for separate mem alternative is 1 * frequency which
> is pretty small and caused the unnecessary SSE spill in the PR, I've tried
> to rework backend cost model, but RA still not happy with that(regress
> somewhere else). I t
> > In the PR, the spill happens in the initial basic block of the function,
> > i.e.
> > the one with the highest frequency.
> >
> > Also as noted in the PR, swapping the 'unlikely' branch to 'likely' avoids
> > the spill,
> > even though it does not affect the frequency of the initial basic bl
901 - 1000 of 1023 matches
Mail list logo