Re: memory model, READ_ONCE

2025-03-01 Thread Alexander Monakov


On Sat, 1 Mar 2025, Martin Uecker via Gcc wrote:

> Sorry for being a bit slow.  This is still not clear to me.
> 
> In vect/pr65206.c the following loop can be vectorized
> with -fallow-store-data-races.
> 
> #define N 1024
> 
> double a[N], b[N];
> 
> void foo ()
> {
>   for (int i = 0; i < N; ++i)
> if (b[i] < 3.)
>   a[i] += b[i];
> }
> 
> But even though this invents stores these would still seem
> to be allowed by C11 because they are not detectable to
> another reader because they write back the same value as
> read.
> 
> Is this understanding correct? 

No. Imagine another thread running a similar loop, but with
the opposite condition in the 'if' (b[i] > 3). Then there are
no data races in the abstract machine, but vectorized code
may end up with wrong result.

(suppose all odd elements in 'b' are less than three, all even
are more than three, and the two threads proceed through the
loop in lockstep; then half of updates to 'a' are lost)

> Does -fallow-store-data-races also create other data races
> which would not be allowed by C11?

Yes, for instance it allows an unsafe form of store motion where
a conditional store is moved out of the loop and made unconditional:

int position;

void f(int n, float *a)
{
for (int i = 0; i < n; i++)
if (a[i] < 0)
position = i;
}

> What was the reason to disallow those optimizations that 
> could be allowed by default?

-fallow-store-data-races guards transforms that we know to be incorrect
for source code that may become a part of a multithreaded program.
Are you asking about something else?

Alexander


Re: memory model, READ_ONCE

2025-03-01 Thread Martin Uecker via Gcc
Am Samstag, dem 01.03.2025 um 16:52 +0300 schrieb Alexander Monakov:
> On Sat, 1 Mar 2025, Martin Uecker via Gcc wrote:
> 
> > Sorry for being a bit slow.  This is still not clear to me.
> > 
> > In vect/pr65206.c the following loop can be vectorized
> > with -fallow-store-data-races.
> > 
> > #define N 1024
> > 
> > double a[N], b[N];
> > 
> > void foo ()
> > {
> >   for (int i = 0; i < N; ++i)
> > if (b[i] < 3.)
> >   a[i] += b[i];
> > }
> > 
> > But even though this invents stores these would still seem
> > to be allowed by C11 because they are not detectable to
> > another reader because they write back the same value as
> > read.
> > 
> > Is this understanding correct? 
> 
> No. Imagine another thread running a similar loop, but with
> the opposite condition in the 'if' (b[i] > 3). Then there are
> no data races in the abstract machine, but vectorized code
> may end up with wrong result.
> 
> (suppose all odd elements in 'b' are less than three, all even
> are more than three, and the two threads proceed through the
> loop in lockstep; then half of updates to 'a' are lost)

Ah, right.  Thank you! So this example would violate C11. 

> 
> > Does -fallow-store-data-races also create other data races
> > which would not be allowed by C11?
> 
> Yes, for instance it allows an unsafe form of store motion where
> a conditional store is moved out of the loop and made unconditional:
> 
> int position;
> 
> void f(int n, float *a)
> {
> for (int i = 0; i < n; i++)
> if (a[i] < 0)
> position = i;
> }

Now I wonder why this is not allowed in C11.  This
is transformed to:

void f(int n, float *a)
{
int tmp = position;
for (int i = 0; i < n; i++)
if (a[i] < 0)
tmp = i;
position = tmp;
}

So if there is a write, no other thread can read or write
position.  Thus, the interesting case is when there should
be no write at all, i.e.

void f(int n, float *a)
{
int tmp = position;
position = tmp;
}

and this could undo the store from another thread, so violates
the C11 memory model.

> 
> > What was the reason to disallow those optimizations that 
> > could be allowed by default?
> 
> -fallow-store-data-races guards transforms that we know to be incorrect
> for source code that may become a part of a multithreaded program.
> Are you asking about something else?

I am looking for an example where stores are invented but this
is allowed by C11.  So something such as

if (x != 1)
  x = 1;

being transformed to

x = 1;

this should not harm another reader for x == 1 and if
there is a writer it is UB anyway, so this should be ok.   Does GCC
do such things?  And is then then also guarded by the 
-falloc-store-data-races flag or not (because it is allowed in C11)? 

Martin



Re: RFC: A redesign of `-Mmodules` output

2025-03-01 Thread vspefs via Gcc
I read a few mails from the autoconf thread. I'll try to read all now. However,
a maybe-off-topic-but-could-be-on-topic question: what exactly is the state of
Autotools now? The whole Autotools build system seems to be on a very slow
release cycle. They seem to lack enough contributors/maintainers as well.

So if Autotools is having release struggles, I would personally prior to
solutions that require less effort on the build system side.

Also, I forgot to "reply all" on the last mail. Here's the mail that answers
some questions from NightStrike:

> GCC conjures up both .o file and .gcm file in one invocation when possible,
> too. And yes, that can be managed well with grouped target - but a rule with
> grouped target must have a recipe, which I think is a little beyond `gcc -M`'s
> scope.
>
> Thanks for bringing up the pattern rule workaround, though. That's something I
> didn't know. (I'm not super familiar with Make!) I'll see what can be done.
>
> But generally, I try to avoid the problem, because a workaround here could be
> a trouble elsewhere. And compiling a CMI is relatively fast, so one-or-zero
> time of occasionally recompiling a CMI, compared to other problems that need
> to be fixed, is tolerable to me.
>
> Of course, if you have any good idea, please do share it.
>
> And yes, Fortran module resonates quite a lot with C++ modules. I'll be
> reading about it.


On Saturday, March 1st, 2025 at 05:18, Ben Boeckel via Gcc  
wrote:

> On Fri, Feb 28, 2025 at 13:54:45 -0500, Paul Smith wrote:
> 
> > On Fri, 2025-02-28 at 19:26 +0100, Ben Boeckel via Gcc wrote:
> > 
> > > > In POSIX make, including GNU Make, if a command doesn't modify the
> > > > modification time of the target then that target is not considered
> > > > updated, and other targets which list it as a prerequisite are not
> > > > invoked:
> > > 
> > > Hmm. My understanding was that if a recipe is run, its outputs are
> > > assumed to have been updated. I forget where I saw that, but it
> > > sticks in my mind for some reason. Experimenting with GNU Make indeed
> > > does exhibit `restat = 1` behavior out-of-the-box. That's good to
> > > know :). I'll update my priors.
> > 
> > There is one issue related to this: I don't know if Ninja addresses
> > this or not: maybe this is what people were thinking of?
> > 
> > Because the modification time of the target is not updated, the recipe
> > for the the target will always be run (but targets that depend on it
> > will not be run).
> 
> 
> Ah! Yes, that is a difference. Ninja records the "last run" mtime as a
> "shadow mtime" for the rule in its `.ninja_log` file. This is the mtime
> used when deciding to rerun the rule or not. Another difference is that
> this also encodes the command line last used for the output(s), so if
> that changes, it is also considered out-of-date. CMake has command
> line change detection, but it is much coarser: there's a file with
> information about all command lines in a given target and if that file
> changes, all commands in the target rerun.
> 
> > This is the best that make can do, because it relies solely on the
> > filesystem as its database of "build state" and doesn't preserve any
> > build state of its own. Since "two"'s timestamp was not modified make
> > has no way to know that its recipe was actually run previously.
> > 
> > If the recipe to build "two" is expensive, it's probably better to just
> > bite the bullet and allow "two" to be updated whenever its
> > prerequisites change.
> 
> 
> Yes, this is reminding me of details I had eventually compressed to
> "restat behavior isn't supported in make": it works, but is not
> useful because it never results in a "do nothing" build.
> 
> > The only way for make to do better would be for it to keep its own
> > database of up-to-date notations, alongside or in addition to the
> > filesystem's modification time. That would be a major structural
> > change, of course.
> 
> 
> If it does happen, it would be wonderful to reuse `ninja`'s
> `.ninja_log` format. That would allow the usage of tools like
> `ninjatracing`[1] to get performance graphs of arbitrary `make` builds.
> 
> Note that there is one fairly annoying thing with this in `ninja`: when
> using `sudo ninja`, the log file is written using the elevated
> permissions which makes further non-`sudo` usages…fail. See this issue:
> 
> https://github.com/ninja-build/ninja/issues/1302
> 
> --Ben
> 
> [1] https://github.com/nico/ninjatracing


gcc 3.2.3 i370 with z/PDOS-generic and mfemul

2025-03-01 Thread Paul Edwards via Gcc
Hello.

I have a mainframe operating system called z/PDOS-generic
available from https://pdos.org

I have ported a slightly modified gcc 3.2.3 (called
gccmvs) to it, and when run under Hercules/380
(mainframe emulator) it works fine and gccmvs is
able to reproduce itself byte-exact.

But I have a new emulator called mfemul.c, and it
isn't very mature and almost certainly has a bug in
it that is affecting gcc 3.2.3. And since that is the only
known issue at this stage, this is where I am trying
to debug, which means delving into the internals
of gcc to some degree.

Since I have a good (both PC and Hercules/380)
version and a bad version, I can put in printf to
compare what is happening, without actually having
to understand the internals.

However, I'm not sure where to get started on this one.

I do know the "-da" and that has shown that between:

tempr.c.22.postreload
and
tempr.c.23.flow2

this "plus" has gone away:

(insn 84 15 16 (set (reg/v:SI 3 3 [27])
(mem/f:SI (plus:SI (reg/f:SI 11 11)
(const_int 4 [0x4])) [3 ymode+0 S4 A32])) 15 {movsi} (nil)
(expr_list:REG_EQUIV (mem/f:SI (plus:SI (reg/f:SI 11 11)
(const_int 4 [0x4])) [3 ymode+0 S4 A32])
(nil)))

(it's still there on the PC)

This is the code I am compiling:

D:\devel\gcc\gcc>type tempr.c
extern unsigned char mode_size[50];

int foo(int xmode, int ymode)
{
  int mode_multiple;

  mode_multiple = mode_size[xmode] / mode_size[ymode];
  if (mode_multiple == 0)
  {
for (;;) ;
  }
  return 0;
}

D:\devel\gcc\gcc>


which is stripped down from:

mode_multiple = GET_MODE_SIZE (xmode) / GET_MODE_SIZE (ymode);
if (mode_multiple == 0)

in function:

unsigned int
subreg_regno_offset (xregno, xmode, offset, ymode)

of rtlanal.c


And yes, I know gcc 3.2.3 is long out of support, but I'm
interested in the i370 target which no longer exists.

Any suggestions on where to stick some printfs so that
I can start looking for a divergence, given that the
generated code is meant to be identical, with the possible
exception of register selection which is dependent on
floating point calculations, and the floating point is known
to differ on my different environments? (because the PC
uses IEEE, Hercules has proper IBM hex float, and mfemul
has bastardized IBM hex float which you can see below
if interested - but unrelated to my question).

https://sourceforge.net/p/pdos/gitcode/ci/master/tree/util/mfemul.c

Thanks. Paul.


Re: RFC: A redesign of `-Mmodules` output

2025-03-01 Thread Ben Boeckel via Gcc
On Sat, Mar 01, 2025 at 20:01:21 +, vspefs wrote:
> Supporting C++ modules is easy, but I don't think "properly" is possible for 
> any
> build system under current circumstances. Industrial consensus needed for many
> subjects:

I believe CMake is the furthest in this regard (though I haven't looked
at what xmake does between projects), but it currently only offers the
information through CMake target properties which is not all that useful
outside of CMake itself. The likely solution is to use CPS:

https://github.com/cps-org/cps

to propagate the information in a more tool-agnostic way.

> For example, the "permission to import" thing of module is also a general
> distribution issue. We are still not clear about how we should distribute
> modules, with questions like:

I encourage you to join the Ecosystem Evolution group where these things
are being discussed:

https://groups.google.com/g/cxx-ecosystem-evolution/about

There's also the #ecosystem_evolution channel on cpplang.slack.com if
you're interested.

> 1. Is there a system-wide, universal way of finding installed modules?

I don't believe there should be a way to just "find" modules as they
have done so for headers. Modules may require their own `-D` flag
settings (e.g., export macros for shared vs. static usage) and have
their own dependencies. Just finding files on disk offers none of that.

> 2. How much information should we expose when distributing a module? In what
>way?

What CMake exports is a minimum, but there may be more. Beyond normal
CMake target things like flags and dependent targets, for C++ modules
there is:

- the list of module interface files part of the target
- a list containing tuples of the form:
  - module name
  - module interface unit
  - list of installed CMIs available (currently unused; really needs
something like
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2581r0.pdf
to reliably use.

Right now CMake exports this as CMake properties, but encoding it in CPS
is something we'll need to figure out down the line.

> GCC and clang both have a .module.json file installed under /lib path, which
> kind of attempt to answer the first question, and reply the second question 
> with
> "look, here we try to have a structured description of distributed modules".

MSVC has one too; the formats differ slightly, but they're pretty close.
Again, something for the Ecosystem Evolution group to work on. GCC and
Clang are using the format discussed here:

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3286r0.pdf

> They're obviously very early, immature, and underdeveloped attempts that
> aren't designed to solve this problem anyway. But a well-designed "structured
> description of distributed modules" can solve a lot of problems. However, the
> question could grow bigger when you throw in more ideas like versioning.
> 
> We don't necessarily need to have the answers standardized, but some form of
> consensus needs to exist.
> 
> Another example is the "multiple CMIs" question. From one aspect, if C++
> standard could officially define/validate/standardize compile options, it
> would be much easier to have a making-everyone-happy answer to this.
> (although it's simply offloading the annoying-everyone part to the committee)
> 
> I know some SG15 attempts (like P3335), but you never know standardization.

ISO is a…complicated beast with its rules that make it difficult to have
public official documents, so Ecosystem Evolution is the place to be.

> I made this RFC to see if we can have a usable `-Mmodules` output now, and if
> we can design it so well that it could be trivially useful when build systems
> implement module-supported Makefile generation. Also to see where we should 
> draw
> the line between "build system scope" and "compiler scope".

I'm not sure how useful one-compiler-one-tool solutions are. The issue I
see is that the build *system* needs to be involved to translate what
"this TU creates a CMI file" *means* in the context of its semantics.
Basically, P1689 is the mechanism by which build systems can implement
their policies. I don't know what kind of useful policy is possible from
the compiler's point of view without imposing some level of "this TU is
part of a group X which participates in a larger semantic usage graph
not visible purely from the Makefile recipes".

> > Clang does support separate `.cc -> .pcm` and `.pcm -> .o` commands,
> > 
> > but, IIRC, this is not actually equivalent to `.cc -> {.pcm, .o}` in
> > 
> > that the codegen may differ subtle ways that may matter. I'm interested
> > in correctness before we start really squeezing performance out of the
> > setup, so I'd like to avoid adding more bumpy roads along the way before
> > we have a smoothed out path to compare it against if/when bugs crop up.
> 
> I totally agree. My opinion is that it's easier to change compiler
> implementations/behaviours than established building practices, so we mus

gcc-14-20250301 is now available

2025-03-01 Thread GCC Administrator via Gcc
Snapshot gcc-14-20250301 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/14-20250301/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 14 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-14 revision 451791981b9630489b6935491720c6ecc97f6830

You'll find:

 gcc-14-20250301.tar.xz   Complete GCC

  SHA256=68fc16f6d559e5e9ffb993a2e9974bbcd3805a50b3af7578a4f9368de065c0ea
  SHA1=b68b9b49b1c721d3585d8bfc4d65cdbeab2a88fe

Diffs from 14-20250222 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-14
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: memory model, READ_ONCE

2025-03-01 Thread Martin Uecker via Gcc
Am Freitag, dem 28.02.2025 um 21:39 +0300 schrieb Alexander Monakov:
> On Fri, 28 Feb 2025, Martin Uecker via Gcc wrote:
> 
> > 
> > I have one follow-up question:  What is the reason
> > that we have stronger semantics for stores by default (i.e.
> > when not using -fallow-store-data-races) than for reads
> > given that the standard would allow more freedom.
> 
> Why would it? On the contrary, it makes an explicit note that
> introducing new writes that could create a racing store is
> not allowed, see note 13 in C11 5.1.2.4.

My original question was about inventing stores that currently
seem to be allowed.  Turning

if (x != 1)
  x = 1;

into

x = 1;

should be ok even though it invents a store for x == 1.

While this technically introduces a race with another
reader, it does so only in a seeminglessly harmless way
when overwriting 1 with 1. 

Are you saying -fallow-store-data-races also introduces
other kinds of races not allowed by C11?

Martin

> 
> > Only that for reads this is more difficult to have?
> > Or other specific reasons why data races for stores
> > are problematic?
> 
> Introducing racing loads is generally not harmful, see note 14.







Re: RFC: A redesign of `-Mmodules` output

2025-03-01 Thread vspefs via Gcc
GCC conjures up both .o file and .gcm file in one invocation when possible, too.
And yes, that can be managed well with grouped target - but a rule with grouped
target must have a recipe, which I think is a little beyond `gcc -M`'s scope.

Thanks for bringing up the pattern rule workaround, though. That's something I
didn't know. (I'm not super familiar with Make!) I'll see what can be done.

But generally, I try to avoid the problem, because a workaround here could be a
trouble elsewhere. And compiling a CMI is relatively fast, so one-or-zero time
of occasionally recompiling a CMI, compared to other problems that need to be
fixed, is tolerable to me.

Of course, if you have any good idea, please do share it.

And yes, Fortran module resonates quite a lot with C++ modules. I'll be reading
about it.



On Friday, February 28th, 2025 at 23:59, NightStrike via Gcc  
wrote:

> On Thu, Feb 27, 2025 at 21:39 vspefs via Gcc gcc@gcc.gnu.org wrote:
> 
> > Current `-Mmodules` output is based on P1602R0, which
> > speaks about a set of Makefile rules that can handle modules, with the
> > help of
> > module mappers and a modified GNU Make.
> > 
> > The proposal came out in 2019, and the output of those rules was
> > implemented
> > at GCC in 2020. However, so far we still don't have a new release of GNU
> > Make
> > which implements P1602R0.
> > 
> > What's more, the rules described in P1602R0 are not ideal. It sets up phony
> > prerequisites for real-file targets, causing guaranteed rebuilds. It is
> > also
> > unable to handle dependencies among module interfaces - that is to say, if
> > module A imports and exports B, then the interface of module A depends on
> > that
> > of module B, so its CMI should be rebuilt if the interface of B changes.
> > 
> > I tried a few approaches to fix the current implementation, but in vain.
> > 
> > It is possible, however, to have another set of Makefile rules generated,
> > which
> > solves all the problems, and doesn't need a new GNU Make. I've posted it in
> > reddit.
> > 
> > See here.
> > 
> > To briefly summarize the idea:
> > 
> > > If an object target is built from a module interface unit, the rules
> > > generated
> > > are:
> > > 
> > > `Makefile target.o: source.cc regular_prereqs header_unit_prereqs| 
> > > header_unit_prereqs \\ module_prereqs source_cmi.gcm: source.cc 
> > > regular_prereqs header_unit_prereqs \\ module_prereqs| target.o`
> > > If an object target is not, the rule generated is:
> > > 
> > > `Makefile target.o: source_files regular_prereqs \\ header_unit_prereqs| 
> > > header_unit_prereqs module_prereqs`
> > > 
> > > The `header_unit_prereqs` and `module_prereqs` are paths to the
> > > corresponding
> > > CMI files.
> > 
> > The patched GCC can be found here.
> > 
> > An example project can be found here.
> > 
> > Best regards,
> > vspefs
> 
> 
> 
> Could your approach simultaneously be used to have better dependency
> information for Fortran modules? I feel like there’s at least some overlap
> there.
> 
> Regarding Fortran modules, it is possible to create both the .o and any
> needed .mod files from one compiler execution. You can tell GNU Make that a
> particular rule creates multiple files using a grouped target (
> https://www.gnu.org/software/make/manual/make.html#Multiple-Targets), and
> older make has a work around using pattern rules. With a grouped target and
> knowledge of the modules that a translation unit will create, the make
> rules are straightforward.
> 
> There’s not a whole lot of conceptual difference between that and these gcm
> files, right? If gcc could output a rule that a .o creates one or more .mod
> files for Fortran, or a whole lot of .gcm files for c++, then the same
> machinery could be used for both.


Re: memory model, READ_ONCE

2025-03-01 Thread Richard Biener via Gcc



> Am 01.03.2025 um 15:24 schrieb Martin Uecker :
> 
> Am Samstag, dem 01.03.2025 um 16:52 +0300 schrieb Alexander Monakov:
>>> On Sat, 1 Mar 2025, Martin Uecker via Gcc wrote:
>>> 
>>> Sorry for being a bit slow.  This is still not clear to me.
>>> 
>>> In vect/pr65206.c the following loop can be vectorized
>>> with -fallow-store-data-races.
>>> 
>>> #define N 1024
>>> 
>>> double a[N], b[N];
>>> 
>>> void foo ()
>>> {
>>>  for (int i = 0; i < N; ++i)
>>>if (b[i] < 3.)
>>>  a[i] += b[i];
>>> }
>>> 
>>> But even though this invents stores these would still seem
>>> to be allowed by C11 because they are not detectable to
>>> another reader because they write back the same value as
>>> read.
>>> 
>>> Is this understanding correct?
>> 
>> No. Imagine another thread running a similar loop, but with
>> the opposite condition in the 'if' (b[i] > 3). Then there are
>> no data races in the abstract machine, but vectorized code
>> may end up with wrong result.
>> 
>> (suppose all odd elements in 'b' are less than three, all even
>> are more than three, and the two threads proceed through the
>> loop in lockstep; then half of updates to 'a' are lost)
> 
> Ah, right.  Thank you! So this example would violate C11. 
> 
>> 
>>> Does -fallow-store-data-races also create other data races
>>> which would not be allowed by C11?
>> 
>> Yes, for instance it allows an unsafe form of store motion where
>> a conditional store is moved out of the loop and made unconditional:
>> 
>> int position;
>> 
>> void f(int n, float *a)
>> {
>>for (int i = 0; i < n; i++)
>>if (a[i] < 0)
>>position = i;
>> }
> 
> Now I wonder why this is not allowed in C11.  This
> is transformed to:
> 
> void f(int n, float *a)
> {
>int tmp = position;
>for (int i = 0; i < n; i++)
>if (a[i] < 0)
>tmp = i;
>position = tmp;
> }
> 
> So if there is a write, no other thread can read or write
> position.  Thus, the interesting case is when there should
> be no write at all, i.e.
> 
> void f(int n, float *a)
> {
>int tmp = position;
>position = tmp;
> }
> 
> and this could undo the store from another thread, so violates
> the C11 memory model.
> 
>> 
>>> What was the reason to disallow those optimizations that 
>>> could be allowed by default?
>> 
>> -fallow-store-data-races guards transforms that we know to be incorrect
>> for source code that may become a part of a multithreaded program.
>> Are you asking about something else?
> 
> I am looking for an example where stores are invented but this
> is allowed by C11.  So something such as
> 
> if (x != 1)
>  x = 1;
> 
> being transformed to
> 
> x = 1;

GCC also does this, but in two steps.

> this should not harm another reader for x == 1 and if
> there is a writer it is UB anyway, so this should be ok.   Does GCC
> do such things?  And is then then also guarded by the 
> -falloc-store-data-races flag or not (because it is allowed in C11)?

And it’s guarded the same 

> 
> Martin
> 


gcc-13-20250301 is now available

2025-03-01 Thread GCC Administrator via Gcc
Snapshot gcc-13-20250301 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/13-20250301/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 13 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-13 revision 722be964341ad089e255aa85a4505bb8a09daed5

You'll find:

 gcc-13-20250301.tar.xz   Complete GCC

  SHA256=3a492b18901d546b7bfa36106c85cc9729797f85148ccf09d0fa8d1b7800b1ac
  SHA1=f271ab7f5b88fc09e2dded00cc0962b69c5e939f

Diffs from 13-20250221 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-13
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: memory model, READ_ONCE

2025-03-01 Thread Alexander Monakov


On Sat, 1 Mar 2025, Martin Uecker via Gcc wrote:

> > > What was the reason to disallow those optimizations that 
> > > could be allowed by default?
> > 
> > -fallow-store-data-races guards transforms that we know to be incorrect
> > for source code that may become a part of a multithreaded program.
> > Are you asking about something else?
> 
> I am looking for an example where stores are invented but this
> is allowed by C11.  So something such as
> 
> if (x != 1)
>   x = 1;
> 
> being transformed to
> 
> x = 1;
> 
> this should not harm another reader for x == 1 and if
> there is a writer it is UB anyway, so this should be ok.   Does GCC
> do such things?

I'm not aware of any. In addition for the transform to be correct in
the abstract, the compiler would need to be sure that 'x' does not reside
in read-only memory. And then, as Richi said, this may be a de-optimization
if you consider the costs of dirtying a cache line conditionally or always
(or, for a more dramatic case, triggering a page fault or not).

Alexander


Re: RFC: A redesign of `-Mmodules` output

2025-03-01 Thread Ben Boeckel via Gcc
On Sat, Mar 01, 2025 at 16:15:12 +, vspefs wrote:
> I read a few mails from the autoconf thread. I'll try to read all now. 
> However,
> a maybe-off-topic-but-could-be-on-topic question: what exactly is the state of
> Autotools now? The whole Autotools build system seems to be on a very slow
> release cycle. They seem to lack enough contributors/maintainers as well.

It was unmaintained for years, but Zack Weinberg took up maintenance in
2020. See these articles:

https://www.owlfolio.org/development/autoconf-swot/
https://lwn.net/Articles/834682/

AFAICT, it's in good hands.

> So if Autotools is having release struggles, I would personally prior to
> solutions that require less effort on the build system side.

I think the 9-year gap is "done", but yes, there needs to be substantial
work for automake to support C++ modules properly. I have no idea who
might be interested in funding that (Red Hat for GCC dogfooding of C++
modules perhaps?) nevermind doing the necessary work.

> Also, I forgot to "reply all" on the last mail. Here's the mail that answers
> some questions from NightStrike:
> 
> > GCC conjures up both .o file and .gcm file in one invocation when possible,
> > too. And yes, that can be managed well with grouped target - but a rule with
> > grouped target must have a recipe, which I think is a little beyond `gcc 
> > -M`'s
> > scope.

It's not a significant problem on the build graph side. I only
implemented single-command CMI-and-object rules so far because:

- it's what Fortran does today and is already supported
- is supported by the three major C++ compilers for C++

Clang does support separate `.cc -> .pcm` and `.pcm -> .o` commands,
but, IIRC, this is not actually equivalent to `.cc -> {.pcm, .o}` in
that the codegen may differ subtle ways that may matter. I'm interested
in correctness before we start really squeezing performance out of the
setup, so I'd like to avoid adding more bumpy roads along the way before
we have a smoothed out path to compare it against if/when bugs crop up.

Known performance bits I know of (and they may be worse in specific
circumstances; need actual numbers to know):

- batch scanning: scan a set of sources in a single shot (supported by
  P1689 already)
  pro: fewer scanning commands
  con: scan many files if any change
- combine scanning and collation: a batch scanner can also do the
  collation work
  pro: fewer commands
  con: collation requires build system and build tool knowledge;
   probably hard to truly abstract it out into a combined tool given
   that scanning is so toolchain-dependent
- "two phase" module generation: fissioning codegen into `.cc -> .cmi`
  and `.cmi -> .o` steps
  pro: better parallelism unlocked (importers can run before codegen of
   what they import
  con: harder to implement on the toolchain side? without build system
   knowledge of "makes a CMI", requires dynamic rule generation
   during the build

--Ben


Re: RFC: A redesign of `-Mmodules` output

2025-03-01 Thread vspefs via Gcc
On Sunday, March 2nd, 2025 at 02:13, Ben Boeckel via Gcc  
wrote:

> On Sat, Mar 01, 2025 at 16:15:12 +, vspefs wrote:
> 
> > I read a few mails from the autoconf thread. I'll try to read all now. 
> > However,
> > a maybe-off-topic-but-could-be-on-topic question: what exactly is the state 
> > of
> > Autotools now? The whole Autotools build system seems to be on a very slow
> > release cycle. They seem to lack enough contributors/maintainers as well.
> 
> 
> It was unmaintained for years, but Zack Weinberg took up maintenance in
> 2020. See these articles:
> 
> https://www.owlfolio.org/development/autoconf-swot/
> https://lwn.net/Articles/834682/
> 
> AFAICT, it's in good hands.

Good to hear!

> > So if Autotools is having release struggles, I would personally prior to
> > solutions that require less effort on the build system side.
> 
> 
> I think the 9-year gap is "done", but yes, there needs to be substantial
> work for automake to support C++ modules properly. I have no idea who
> might be interested in funding that (Red Hat for GCC dogfooding of C++
> modules perhaps?) nevermind doing the necessary work.

Supporting C++ modules is easy, but I don't think "properly" is possible for any
build system under current circumstances. Industrial consensus needed for many
subjects:

For example, the "permission to import" thing of module is also a general
distribution issue. We are still not clear about how we should distribute
modules, with questions like:

1. Is there a system-wide, universal way of finding installed modules?

2. How much information should we expose when distributing a module? In what
   way?

GCC and clang both have a .module.json file installed under /lib path, which
kind of attempt to answer the first question, and reply the second question with
"look, here we try to have a structured description of distributed modules".

They're obviously very early, immature, and underdeveloped attempts that
aren't designed to solve this problem anyway. But a well-designed "structured
description of distributed modules" can solve a lot of problems. However, the
question could grow bigger when you throw in more ideas like versioning.

We don't necessarily need to have the answers standardized, but some form of
consensus needs to exist.

Another example is the "multiple CMIs" question. From one aspect, if C++
standard could officially define/validate/standardize compile options, it
would be much easier to have a making-everyone-happy answer to this.
(although it's simply offloading the annoying-everyone part to the committee)

I know some SG15 attempts (like P3335), but you never know standardization.

I made this RFC to see if we can have a usable `-Mmodules` output now, and if
we can design it so well that it could be trivially useful when build systems
implement module-supported Makefile generation. Also to see where we should draw
the line between "build system scope" and "compiler scope".

> > Also, I forgot to "reply all" on the last mail. Here's the mail that answers
> > some questions from NightStrike:
> > 
> > > GCC conjures up both .o file and .gcm file in one invocation when 
> > > possible,
> > > too. And yes, that can be managed well with grouped target - but a rule 
> > > with
> > > grouped target must have a recipe, which I think is a little beyond `gcc 
> > > -M`'s
> > > scope.
> 
> 
> It's not a significant problem on the build graph side. I only
> implemented single-command CMI-and-object rules so far because:
> 
> - it's what Fortran does today and is already supported
> - is supported by the three major C++ compilers for C++

I'll have to learn the Fortran practices soon.

> Clang does support separate `.cc -> .pcm` and `.pcm -> .o` commands,
> 
> but, IIRC, this is not actually equivalent to `.cc -> {.pcm, .o}` in
> 
> that the codegen may differ subtle ways that may matter. I'm interested
> in correctness before we start really squeezing performance out of the
> setup, so I'd like to avoid adding more bumpy roads along the way before
> we have a smoothed out path to compare it against if/when bugs crop up.

I totally agree. My opinion is that it's easier to change compiler
implementations/behaviours than established building practices, so we must try
to make sure we get the latter right.