Data race within write intrinsic with output into a character variable

2021-03-10 Thread Martin Stein via Fortran
Hi,

I am seeing rare but reproducible memory corruptions which I can trace back to 
lines like

write(out,'(a,i8)') 'short string', k

where out is a (sufficiently large) character(len=...) variable and k some 
small integer. The line itself occurs in a subroutine called from within an 
openmp region.

I have seen this in two rather different circumstances. If I change the line to

out = 'short string' // toStr(k)

and write my own small toStr function, which translates an integer to its 
string representation, then the memory corruption (usually occuring shortly 
afterwards but on seemingly unrelated code) disappears.
As out is usually not even used (it is a routine for debugging which only uses 
the output in case something goes wrong), I am pretty sure that the problem is 
within the write code.

Unfortunately I cannot create a small reproducer. As I have already seen data 
races/memory corruption with write (see 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88899 and 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88768) I am inclined to conclude 
that the write intrinsic is at fault here.

Any idea on how this can be further investigated? If write is indeed at fault, 
that would be pretty bad.

Best regards
Martin



Re: [Patch] Fortran: Fix libgfortran I/O race with newunit_free [PR99529]

2021-03-11 Thread Martin Stein via Fortran
I have seen the lock inversion with helgrind as well. Otherwise I
have not seen anything more in my real code. I will give the
thread sanitizer another try. Problem with both are the huge
number of false positives in conjunction with typical openmp code
(e.g. master thread allocates a big vector or matrix outside
openmp region, and then it is accessed within openmp loop by all
threads without locks of course -> possible data race warning).

BTW, I will do some more tests, but it looks like the patch fixes
the memory corruption issue. This morning I tried with a wrong
build setup, as I used patched gfortran but linked with unpatched
gcc, which probably used the unpatched libgfortran...

Thanks!
Martin
 
 
 

Gesendet: Donnerstag, 11. März 2021 um 11:38 Uhr
Von: "Tobias Burnus" 
An: "gcc-patches" , "fortran" , 
"Jerry DeLisle" , ms...@gmx.net
Betreff: Re: [Patch] Fortran: Fix libgfortran I/O race with newunit_free 
[PR99529]
Revised version – the previous had a lock inversion, which could lead to
a deadlock, found by -fsanitize=thread. See transfer.c for the changes.

OK?

Tobias

On 11.03.21 10:42, Tobias Burnus wrote:
> Hi all,
>
> as found by Martin (thanks!) there is a race for newunit_free.
> While that call is within the unitlock for the calls in io/unit.c,
> the call in transfer.c did not use locks.
>
> Additionally,
>   unit = get_gfc_unit (dtp->common.unit, do_create);
>   set_internal_unit (dtp, unit, kind);
> gets first the unit (with proper locking when using the unit number
> dtp->common.unit) but then in set_internal_unit it re-sets the
> unit number to the same number without locking. That causes
> race warnings and if the assignment is not atomic it is a true race.
>
> OK for mainline? What about GCC 10?
>
> As Martin notes in the email thread and in the PR there are more
> race warnings (and likely true race issues).
>
> Tobias
>


Aw: Help with long compile time of all-USE module

2021-05-12 Thread Martin Stein via Fortran
Hi Matthew,
 
please have a look at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98426 but 
ignore any patches I have attached there, because they are just wrong. It 
definitely sound like you hit the same problem. As described there, if you have 
a lot of symbols within the mod files pulled in by the use statements, then an 
O(N^2) search increases compilation time considerably.
 
Best regards Martin
 
 
 

Gesendet: Mittwoch, 12. Mai 2021 um 15:52 Uhr
Von: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] via 
Fortran" 
An: "fortran@gcc.gnu.org" 
Betreff: Help with long compile time of all-USE module
All,

I'm hoping to rely on the wisdom of the gfortran gurus for some help on this.

I've been looking at trying to speed up the compilation of a library I help 
maintain because, well, always a worthy goal, especially with CI and Git. I did 
some profiling and found that the most expensive file to compile in our library 
(doing a debug build with GCC 10.3) is a "meta-module" that only has 'use foo' 
statements:

https://github.com/GEOS-ESM/MAPL/blob/main/base/MAPL_Mod.F90

It allows us to "gather" many use-statements so users don't need to "use" 50 
different modules.

This really surprised me as there are other files in this library that are 
infinitely more complex and I can think of ways we could maybe refactor or 
break them up, but this file is...boring. I guess I could split it up, but 
almost doesn't seem worth the effort.

Now, it is the slowest compile when building with Debugging flags which are, 
for this model:

FLAGS = -O0 -g -fcheck=all,no-array-temps -finit-real=snan 
-ffree-line-length-none -fno-range-check -Wno-missing-include-dirs -fbacktrace 
-ffpe-trap=zero,overflow -fbacktrace -fallow-argument-mismatch 
-fallow-invalid-boz -falign-commons -Jinclude/MAPL.base -fPIC 
-ffixed-line-length-132 -pthread -fopenmp

My naïve thought when I first saw this was the tall pole was "well, I'll 
degrade the optimization" but, well, already at -O0. So perhaps one of these 
other flags is triggering some sort of weirdness with an "only USE" file?

I might try a binary exclusion experiment to figure it out (remove half the 
flags, etc.), but maybe it's obvious to the experts.

Thanks,
Matt

--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson[http://science.gsfc.nasa.gov/sed/bio/matthew.thompson]