Data race within write intrinsic with output into a character variable
Hi, I am seeing rare but reproducible memory corruptions which I can trace back to lines like write(out,'(a,i8)') 'short string', k where out is a (sufficiently large) character(len=...) variable and k some small integer. The line itself occurs in a subroutine called from within an openmp region. I have seen this in two rather different circumstances. If I change the line to out = 'short string' // toStr(k) and write my own small toStr function, which translates an integer to its string representation, then the memory corruption (usually occuring shortly afterwards but on seemingly unrelated code) disappears. As out is usually not even used (it is a routine for debugging which only uses the output in case something goes wrong), I am pretty sure that the problem is within the write code. Unfortunately I cannot create a small reproducer. As I have already seen data races/memory corruption with write (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88899 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88768) I am inclined to conclude that the write intrinsic is at fault here. Any idea on how this can be further investigated? If write is indeed at fault, that would be pretty bad. Best regards Martin
Re: [Patch] Fortran: Fix libgfortran I/O race with newunit_free [PR99529]
I have seen the lock inversion with helgrind as well. Otherwise I have not seen anything more in my real code. I will give the thread sanitizer another try. Problem with both are the huge number of false positives in conjunction with typical openmp code (e.g. master thread allocates a big vector or matrix outside openmp region, and then it is accessed within openmp loop by all threads without locks of course -> possible data race warning). BTW, I will do some more tests, but it looks like the patch fixes the memory corruption issue. This morning I tried with a wrong build setup, as I used patched gfortran but linked with unpatched gcc, which probably used the unpatched libgfortran... Thanks! Martin Gesendet: Donnerstag, 11. März 2021 um 11:38 Uhr Von: "Tobias Burnus" An: "gcc-patches" , "fortran" , "Jerry DeLisle" , ms...@gmx.net Betreff: Re: [Patch] Fortran: Fix libgfortran I/O race with newunit_free [PR99529] Revised version – the previous had a lock inversion, which could lead to a deadlock, found by -fsanitize=thread. See transfer.c for the changes. OK? Tobias On 11.03.21 10:42, Tobias Burnus wrote: > Hi all, > > as found by Martin (thanks!) there is a race for newunit_free. > While that call is within the unitlock for the calls in io/unit.c, > the call in transfer.c did not use locks. > > Additionally, > unit = get_gfc_unit (dtp->common.unit, do_create); > set_internal_unit (dtp, unit, kind); > gets first the unit (with proper locking when using the unit number > dtp->common.unit) but then in set_internal_unit it re-sets the > unit number to the same number without locking. That causes > race warnings and if the assignment is not atomic it is a true race. > > OK for mainline? What about GCC 10? > > As Martin notes in the email thread and in the PR there are more > race warnings (and likely true race issues). > > Tobias >
Aw: Help with long compile time of all-USE module
Hi Matthew, please have a look at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98426 but ignore any patches I have attached there, because they are just wrong. It definitely sound like you hit the same problem. As described there, if you have a lot of symbols within the mod files pulled in by the use statements, then an O(N^2) search increases compilation time considerably. Best regards Martin Gesendet: Mittwoch, 12. Mai 2021 um 15:52 Uhr Von: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] via Fortran" An: "fortran@gcc.gnu.org" Betreff: Help with long compile time of all-USE module All, I'm hoping to rely on the wisdom of the gfortran gurus for some help on this. I've been looking at trying to speed up the compilation of a library I help maintain because, well, always a worthy goal, especially with CI and Git. I did some profiling and found that the most expensive file to compile in our library (doing a debug build with GCC 10.3) is a "meta-module" that only has 'use foo' statements: https://github.com/GEOS-ESM/MAPL/blob/main/base/MAPL_Mod.F90 It allows us to "gather" many use-statements so users don't need to "use" 50 different modules. This really surprised me as there are other files in this library that are infinitely more complex and I can think of ways we could maybe refactor or break them up, but this file is...boring. I guess I could split it up, but almost doesn't seem worth the effort. Now, it is the slowest compile when building with Debugging flags which are, for this model: FLAGS = -O0 -g -fcheck=all,no-array-temps -finit-real=snan -ffree-line-length-none -fno-range-check -Wno-missing-include-dirs -fbacktrace -ffpe-trap=zero,overflow -fbacktrace -fallow-argument-mismatch -fallow-invalid-boz -falign-commons -Jinclude/MAPL.base -fPIC -ffixed-line-length-132 -pthread -fopenmp My naïve thought when I first saw this was the tall pole was "well, I'll degrade the optimization" but, well, already at -O0. So perhaps one of these other flags is triggering some sort of weirdness with an "only USE" file? I might try a binary exclusion experiment to figure it out (remove half the flags, etc.), but maybe it's obvious to the experts. Thanks, Matt -- Matt Thompson, SSAI, Ld Scientific Programmer/Analyst NASA GSFC, Global Modeling and Assimilation Office Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771 Phone: 301-614-6712 Fax: 301-614-6246 http://science.gsfc.nasa.gov/sed/bio/matthew.thompson[http://science.gsfc.nasa.gov/sed/bio/matthew.thompson]