Re: [Patch, fortran] PRs 105152, 100193, 87946, 103389, 104429 and 82774

2023-04-24 Thread Bernhard Reutner-Fischer via Fortran
On Sun, 23 Apr 2023 23:48:03 +0200
Harald Anlauf via Fortran  wrote:

> Am 22.04.23 um 10:32 schrieb Paul Richard Thomas via Gcc-patches:

> > PR103931 - I couldn't reproduce the bug, which involves 'ambiguous c_ptr'.
> > To judge by the comments, it seems that this bug is a bit elusive.

See Haralds comment 12, you need to remove the use cmodule:
module DModule
   use AModule
   !comment 12, 'use CModule' should not be needed: use CModule
   !use CModule

   implicit none
   private

   public :: DType

   type, abstract :: DType
   end type
end module

> PR103931: it is indeed a bit elusive, but very sensitive to code
> changes.  Also Bernhard had a look at it.  Given that there are
> a couple of bugs related to module reading, and rename-on-use,
> I'd recommend to leave that open for further analysis.

I would mark the dt sym that is used *as* the generic interface with
attr.generic.

Like: https://gcc.gnu.org/PR103931#c18

This seems to work and sounds somewhat plausible (to me).
If that is not correct, then i'm running out of ideas and will stop
looking at that PR.

cheers,


Re: (GCC) 13.0.1: internal compiler error

2023-04-24 Thread Harald Anlauf via Fortran

Hi Patrick,

I did not see any similar report in bugzilla, so could you please
open a PR and attach a self-contained reproducer?  Ideally the
reproducer would be reduced to simplify the analysis for those
familiar with the status of the OpenACC implementation.

Thanks,
Harald

Am 21.04.23 um 17:13 schrieb Patrick Begou:

Hi,

I have built this morning the latest gfortran from a git clone:

GNU Fortran (GCC) 13.0.1 20230421 (prerelease)

I'm trying this compiler on a large and complexe Fortran90 code with
offloading using OpenACC.

At this time:

- code compiles with nvfortran and runs on A100 GPU.

- code compiles with Cray Fortran (with some difficulties) but do not
run on MI250 GPU (we are tacking the problem, a segfault if openacc is
set on)

- code compile with GNU GCC 13 without -fopenacc option and runs on cpu
(Epyc2 7302)

- a basic test-code using OpenACC compiles and run on the GPU.

- compiling my large code with gcc 13.0.1 using -fopenacc for A100 GPU
produce an internal error in the compiler :


transforms_defs_m.f90:354:53:

   354 | !$acc enter data attach(atransform2%next)
   | ^
internal compiler error: in omp_group_base, at gimplify.cc:9412
0xa830c6 omp_group_base
     ../../gcc/gcc/gimplify.cc:9412
0xa830c6 omp_index_mapping_groups_1
     ../../gcc/gcc/gimplify.cc:9441
0xa833c7 omp_index_mapping_groups
     ../../gcc/gcc/gimplify.cc:9502
0xa96a9a gimplify_scan_omp_clauses
     ../../gcc/gcc/gimplify.cc:10802
0xa8660d gimplify_omp_target_update
     ../../gcc/gcc/gimplify.cc:15563
0xa8660d gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
     ../../gcc/gcc/gimplify.cc:16928
0xa89826 gimplify_stmt(tree_node**, gimple**)
     ../../gcc/gcc/gimplify.cc:7219
0xa875a3 gimplify_statement_list
     ../../gcc/gcc/gimplify.cc:2019
0xa875a3 gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
     ../../gcc/gcc/gimplify.cc:16821
0xa89826 gimplify_stmt(tree_node**, gimple**)
     ../../gcc/gcc/gimplify.cc:7219
0xa86e8a gimplify_and_add(tree_node*, gimple**)
     ../../gcc/gcc/gimplify.cc:492
0xa86e8a gimplify_loop_expr
     ../../gcc/gcc/gimplify.cc:1993
0xa86e8a gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
     ../../gcc/gcc/gimplify.cc:16581
0xa89826 gimplify_stmt(tree_node**, gimple**)
     ../../gcc/gcc/gimplify.cc:7219
0xa875a3 gimplify_statement_list
     ../../gcc/gcc/gimplify.cc:2019
0xa875a3 gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
     ../../gcc/gcc/gimplify.cc:16821
0xa89826 gimplify_stmt(tree_node**, gimple**)
     ../../gcc/gcc/gimplify.cc:7219
0xa89d2b gimplify_bind_expr
     ../../gcc/gcc/gimplify.cc:1430
0xa86d8e gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
     ../../gcc/gcc/gimplify.cc:16577
0xa89826 gimplify_stmt(tree_node**, gimple**)
     ../../gcc/gcc/gimplify.cc:7219
Please submit a full bug report, with preprocessed source (by using
-freport-bug).


Options used (I've just added  -fopenacc for moving from cpu version to
OpenACC):

-fopenacc -freport-bug -g -fpic -x f95-cpp-input -std=gnu -ffree-form
-fall-intrinsics -fallow-argument-mismatch -Wall -Wextra -W
-Wno-unused-function -Wno-compare-reals -fno-omit-frame-pointer  -O3
-ftree-vectorize -ffast-math -funroll-loops -pipe

No additionnal files a produced with -freport-bug.

In attachment the script used to build the compiler.

Let me know how I can help with informations to improve Gnu fortran
compilers.

Patrick




Re: (GCC) 13.0.1: internal compiler error

2023-04-24 Thread Patrick Begou via Fortran

Hi Harald

as I said, it is a large, massively parallel fortran code: more than 700 
files, some with several thousands of lines. It could be difficult to 
create a small reproducer but I will try if the problem is not known as 
an other git branch of this code also create an internal error on 
another file.


Best regards

Patrick

Le 24/04/2023 à 19:27, Harald Anlauf a écrit :

Hi Patrick,

I did not see any similar report in bugzilla, so could you please
open a PR and attach a self-contained reproducer?  Ideally the
reproducer would be reduced to simplify the analysis for those
familiar with the status of the OpenACC implementation.

Thanks,
Harald

Am 21.04.23 um 17:13 schrieb Patrick Begou:

Hi,

I have built this morning the latest gfortran from a git clone:

GNU Fortran (GCC) 13.0.1 20230421 (prerelease)

I'm trying this compiler on a large and complexe Fortran90 code with
offloading using OpenACC.

At this time:

- code compiles with nvfortran and runs on A100 GPU.

- code compiles with Cray Fortran (with some difficulties) but do not
run on MI250 GPU (we are tacking the problem, a segfault if openacc is
set on)

- code compile with GNU GCC 13 without -fopenacc option and runs on cpu
(Epyc2 7302)

- a basic test-code using OpenACC compiles and run on the GPU.

- compiling my large code with gcc 13.0.1 using -fopenacc for A100 GPU
produce an internal error in the compiler :


transforms_defs_m.f90:354:53:

   354 | !$acc enter data attach(atransform2%next)
   | ^
internal compiler error: in omp_group_base, at gimplify.cc:9412
0xa830c6 omp_group_base
     ../../gcc/gcc/gimplify.cc:9412
0xa830c6 omp_index_mapping_groups_1
     ../../gcc/gcc/gimplify.cc:9441
0xa833c7 omp_index_mapping_groups
     ../../gcc/gcc/gimplify.cc:9502
0xa96a9a gimplify_scan_omp_clauses
     ../../gcc/gcc/gimplify.cc:10802
0xa8660d gimplify_omp_target_update
     ../../gcc/gcc/gimplify.cc:15563
0xa8660d gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
     ../../gcc/gcc/gimplify.cc:16928
0xa89826 gimplify_stmt(tree_node**, gimple**)
     ../../gcc/gcc/gimplify.cc:7219
0xa875a3 gimplify_statement_list
     ../../gcc/gcc/gimplify.cc:2019
0xa875a3 gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
     ../../gcc/gcc/gimplify.cc:16821
0xa89826 gimplify_stmt(tree_node**, gimple**)
     ../../gcc/gcc/gimplify.cc:7219
0xa86e8a gimplify_and_add(tree_node*, gimple**)
     ../../gcc/gcc/gimplify.cc:492
0xa86e8a gimplify_loop_expr
     ../../gcc/gcc/gimplify.cc:1993
0xa86e8a gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
     ../../gcc/gcc/gimplify.cc:16581
0xa89826 gimplify_stmt(tree_node**, gimple**)
     ../../gcc/gcc/gimplify.cc:7219
0xa875a3 gimplify_statement_list
     ../../gcc/gcc/gimplify.cc:2019
0xa875a3 gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
     ../../gcc/gcc/gimplify.cc:16821
0xa89826 gimplify_stmt(tree_node**, gimple**)
     ../../gcc/gcc/gimplify.cc:7219
0xa89d2b gimplify_bind_expr
     ../../gcc/gcc/gimplify.cc:1430
0xa86d8e gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int)
     ../../gcc/gcc/gimplify.cc:16577
0xa89826 gimplify_stmt(tree_node**, gimple**)
     ../../gcc/gcc/gimplify.cc:7219
Please submit a full bug report, with preprocessed source (by using
-freport-bug).


Options used (I've just added  -fopenacc for moving from cpu version to
OpenACC):

-fopenacc -freport-bug -g -fpic -x f95-cpp-input -std=gnu -ffree-form
-fall-intrinsics -fallow-argument-mismatch -Wall -Wextra -W
-Wno-unused-function -Wno-compare-reals -fno-omit-frame-pointer -O3
-ftree-vectorize -ffast-math -funroll-loops -pipe

No additionnal files a produced with -freport-bug.

In attachment the script used to build the compiler.

Let me know how I can help with informations to improve Gnu fortran
compilers.

Patrick






Re: (GCC) 13.0.1: internal compiler error

2023-04-24 Thread Bernhard Reutner-Fischer via Fortran
Cc:ing Thomas, who knows openacc better.

There is a devel/omp/gcc-12 branch you might want to try. I don't know
how different that branch is wrt openacc.

HTH,

On Mon, 24 Apr 2023 19:39:15
+0200 Patrick Begou via Fortran  wrote:

> Hi Harald
> 
> as I said, it is a large, massively parallel fortran code: more than 700 
> files, some with several thousands of lines. It could be difficult to 
> create a small reproducer but I will try if the problem is not known as 
> an other git branch of this code also create an internal error on 
> another file.
> 
> Best regards
> 
> Patrick
> 
> Le 24/04/2023 à 19:27, Harald Anlauf a écrit :
> > Hi Patrick,
> >
> > I did not see any similar report in bugzilla, so could you please
> > open a PR and attach a self-contained reproducer?  Ideally the
> > reproducer would be reduced to simplify the analysis for those
> > familiar with the status of the OpenACC implementation.
> >
> > Thanks,
> > Harald
> >
> > Am 21.04.23 um 17:13 schrieb Patrick Begou:  
> >> Hi,
> >>
> >> I have built this morning the latest gfortran from a git clone:
> >>
> >> GNU Fortran (GCC) 13.0.1 20230421 (prerelease)
> >>
> >> I'm trying this compiler on a large and complexe Fortran90 code with
> >> offloading using OpenACC.
> >>
> >> At this time:
> >>
> >> - code compiles with nvfortran and runs on A100 GPU.
> >>
> >> - code compiles with Cray Fortran (with some difficulties) but do not
> >> run on MI250 GPU (we are tacking the problem, a segfault if openacc is
> >> set on)
> >>
> >> - code compile with GNU GCC 13 without -fopenacc option and runs on cpu
> >> (Epyc2 7302)
> >>
> >> - a basic test-code using OpenACC compiles and run on the GPU.
> >>
> >> - compiling my large code with gcc 13.0.1 using -fopenacc for A100 GPU
> >> produce an internal error in the compiler :
> >>
> >>
> >> transforms_defs_m.f90:354:53:
> >>
> >>    354 | !$acc enter data attach(atransform2%next)
> >>    | ^
> >> internal compiler error: in omp_group_base, at gimplify.cc:9412
> >> 0xa830c6 omp_group_base
> >>      ../../gcc/gcc/gimplify.cc:9412
> >> 0xa830c6 omp_index_mapping_groups_1
> >>      ../../gcc/gcc/gimplify.cc:9441
> >> 0xa833c7 omp_index_mapping_groups
> >>      ../../gcc/gcc/gimplify.cc:9502
> >> 0xa96a9a gimplify_scan_omp_clauses
> >>      ../../gcc/gcc/gimplify.cc:10802
> >> 0xa8660d gimplify_omp_target_update
> >>      ../../gcc/gcc/gimplify.cc:15563
> >> 0xa8660d gimplify_expr(tree_node**, gimple**, gimple**, bool
> >> (*)(tree_node*), int)
> >>      ../../gcc/gcc/gimplify.cc:16928
> >> 0xa89826 gimplify_stmt(tree_node**, gimple**)
> >>      ../../gcc/gcc/gimplify.cc:7219
> >> 0xa875a3 gimplify_statement_list
> >>      ../../gcc/gcc/gimplify.cc:2019
> >> 0xa875a3 gimplify_expr(tree_node**, gimple**, gimple**, bool
> >> (*)(tree_node*), int)
> >>      ../../gcc/gcc/gimplify.cc:16821
> >> 0xa89826 gimplify_stmt(tree_node**, gimple**)
> >>      ../../gcc/gcc/gimplify.cc:7219
> >> 0xa86e8a gimplify_and_add(tree_node*, gimple**)
> >>      ../../gcc/gcc/gimplify.cc:492
> >> 0xa86e8a gimplify_loop_expr
> >>      ../../gcc/gcc/gimplify.cc:1993
> >> 0xa86e8a gimplify_expr(tree_node**, gimple**, gimple**, bool
> >> (*)(tree_node*), int)
> >>      ../../gcc/gcc/gimplify.cc:16581
> >> 0xa89826 gimplify_stmt(tree_node**, gimple**)
> >>      ../../gcc/gcc/gimplify.cc:7219
> >> 0xa875a3 gimplify_statement_list
> >>      ../../gcc/gcc/gimplify.cc:2019
> >> 0xa875a3 gimplify_expr(tree_node**, gimple**, gimple**, bool
> >> (*)(tree_node*), int)
> >>      ../../gcc/gcc/gimplify.cc:16821
> >> 0xa89826 gimplify_stmt(tree_node**, gimple**)
> >>      ../../gcc/gcc/gimplify.cc:7219
> >> 0xa89d2b gimplify_bind_expr
> >>      ../../gcc/gcc/gimplify.cc:1430
> >> 0xa86d8e gimplify_expr(tree_node**, gimple**, gimple**, bool
> >> (*)(tree_node*), int)
> >>      ../../gcc/gcc/gimplify.cc:16577
> >> 0xa89826 gimplify_stmt(tree_node**, gimple**)
> >>      ../../gcc/gcc/gimplify.cc:7219
> >> Please submit a full bug report, with preprocessed source (by using
> >> -freport-bug).
> >>
> >>
> >> Options used (I've just added  -fopenacc for moving from cpu version to
> >> OpenACC):
> >>
> >> -fopenacc -freport-bug -g -fpic -x f95-cpp-input -std=gnu -ffree-form
> >> -fall-intrinsics -fallow-argument-mismatch -Wall -Wextra -W
> >> -Wno-unused-function -Wno-compare-reals -fno-omit-frame-pointer -O3
> >> -ftree-vectorize -ffast-math -funroll-loops -pipe
> >>
> >> No additionnal files a produced with -freport-bug.
> >>
> >> In attachment the script used to build the compiler.
> >>
> >> Let me know how I can help with informations to improve Gnu fortran
> >> compilers.
> >>
> >> Patrick  
> >  
> 



Re: [PATCH] Fortran: function results never have the ALLOCATABLE attribute [PR109500]

2023-04-24 Thread Steve Kargl via Fortran
On Sat, Apr 22, 2023 at 08:43:53PM +0200, Mikael Morin wrote:
> > 
> OK, let's go with your patch as originally submitted then.
> 

Mikael, thanks for looking at the original patch and
suggested an alternative location to attempt to fix
the bug.  Halrald, thanks for following up on Mikael's
suggestion.

-- 
Steve


Re: [PATCH v3] libgfortran: Replace mutex with rwlock

2023-04-24 Thread Bernhard Reutner-Fischer via Fortran
Hi!

[please do not top-post]

On Thu, 20 Apr 2023 21:13:08 +0800
"Zhu, Lipeng"  wrote:

> Hi Bernhard,
> 
> Thanks for your questions and suggestions.
> The rwlock could allow multiple threads to have concurrent read-only 
> access to the cache/unit list, only a single writer is allowed.

right.

> Write lock will not be acquired until all read lock are released.

So i must have confused rwlock with something else, something that
allows self to hold a read-lock and upgrade that to a write-lock,
purposely starving all successive incoming readers. I.e. just toggle
your RD_TO_WRLOCK impl, here, atomically. This proved to be benefical in
some situations in the past. Doesn't seem to work with your rwlock,
does it

> And I didn't change the mutex scope when refactor the code, only make a 
> more fine-grained distinction for the read/write cache/unit list.

Yes of course, i can see you did that.

> I complete the comment according to your template, I will insert the 
> comment in the source code in next version patch with other refinement 
> by your suggestions.
> "
> Now we did not get a unit in cache and unit list, so we need to create a
> new unit, and update it to cache and unit list.

s/Now/By now/ or s/Now w/W/ and s/get/find/
"
We did not find a unit in the cache nor in the unit list, create a new
(locked) unit and insert into the unit list and cache.
Manipulating either or both the unit list and the unit cache requires to
hold a write-lock [for obvious reasons]"

Superfluous when talking about pthread_rwlock_wrlock since that
implies that even the process acquiring the wrlock has to first
release it's very own rdlock.

> Prior to update the cache and list, we need to release all read locks,
> and then immediately to acquire write lock, thus ensure the exclusive
> update to the cache and unit list.
> Either way, we will manipulate the cache and/or the unit list so we must
> take a write lock now.
> We don't take the write bit in *addition* to the read lock because:
> 1. It will needlessly complicate releasing the respective lock;

Under pthread_rwlock_wrlock it will deadlock, so that's wrong?
Drop that point then? If not, what's your reasoning / observation?

Under my lock, you hold the R, additionally take the W and then
immediately release the R because you yourself won't read, just write.
But mine's not the pthread_rwlock you talk about, admittedly.

> 2. By separate the read/write lock, it will greatly reduce the
> contention at the read part, while write part is not always necessary or
> most unlikely once the unit hit in cache;

We know that.

> 3. We try to balance the implementation complexity and the performance
> gains that fit into current cases we observed.

.. by just using a pthread_rwlock. And that's the top point iff you
keep it at that. That's a fair step, sure. BTW, did you look at the
RELEASE semantics, respectively the note that one day (and now is that
very day), we might improve on the release semantics? Can of course be
incremental AFAIC

> "

If folks agree on this first step then you have my OK with a catchy
malloc and the discussion recorded here on the list. A second step would
be RELEASE.
And, depending on the underlying capabilities of available locks,
further tweaks, obviously.

PS: and, please, don't top-post

thanks,

> 
> Best Regards,
> Zhu, Lipeng
> 
> On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote:
> > On 19 April 2023 09:06:28 CEST, Lipeng Zhu via Fortran 
> >  wrote:  
> >> This patch try to introduce the rwlock and split the read/write to
> >> unit_root tree and unit_cache with rwlock instead of the mutex to
> >> increase CPU efficiency. In the get_gfc_unit function, the percentage
> >> to step into the insert_unit function is around 30%, in most instances,
> >> we can get the unit in the phase of reading the unit_cache or unit_root
> >> tree. So split the read/write phase by rwlock would be an approach to
> >> make it more parallel.
> >>
> >> BTW, the IPC metrics can gain around 9x in our test server with 220
> >> cores. The benchmark we used is https://github.com/rwesson/NEAT
> >>  
> >   
> >> +#define RD_TO_WRLOCK(rwlock) \
> >> +  RWUNLOCK (rwlock);\
> >> +  WRLOCK (rwlock);
> >> +#endif
> >> +  
> > 
> >   
> >> diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index
> >> 82664dc5f98..4312c5f36de 100644
> >> --- a/libgfortran/io/unit.c
> >> +++ b/libgfortran/io/unit.c  
> >   
> >> @@ -329,7 +335,7 @@ get_gfc_unit (int n, int do_create)
> >>int c, created = 0;
> >>
> >>NOTE ("Unit n=%d, do_create = %d", n, do_create);
> >> -  LOCK (&unit_lock);
> >> +  RDLOCK (&unit_rwlock);
> >>
> >> retry:
> >>for (c = 0; c < CACHE_SIZE; c++)
> >> @@ -350,6 +356,7 @@ retry:
> >>if (c == 0)
> >>break;
> >>  }
> >> +  RD_TO_WRLOCK (&unit_rwlock);  
> > 
> > So I'm trying to convince myself why it's safe to unlock and only then take 
> > the write lock.
> > 
> > Can you please elaborate/confirm why that's ok?
> > 
> > I wouldn't mind a c