[Bug fortran/58175] [OOP] Incorrect warning message on scalar finalizer

2016-07-07 Thread jhogg41 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58175

Jonathan Hogg  changed:

   What|Removed |Added

 CC||jhogg41 at gmail dot com

--- Comment #9 from Jonathan Hogg  ---
Still present in 6.1.1, please fix.

[Bug fortran/58175] [OOP] Incorrect warning message on scalar finalizer

2016-07-07 Thread jhogg41 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58175

--- Comment #11 from Jonathan Hogg  ---
It looks like there's already a patch there, if you point me at a list of
what needs doing to get it into the code base, I'm happy to take a look.

Thanks,

Jonathan.

On Thu, Jul 7, 2016 at 4:14 PM, dominiq at lps dot ens.fr <
gcc-bugzi...@gcc.gnu.org> wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58175
>
> --- Comment #10 from Dominique d'Humieres  ---
> > Still present in 6.1.1, please fix.
>
> You're welcome to do it!
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

[Bug libgomp/71781] Severe performance degradation of short parallel for loop on hardware with lots of cores

2016-07-30 Thread jhogg41 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71781

Jonathan Hogg  changed:

   What|Removed |Added

 CC||jhogg41 at gmail dot com

--- Comment #1 from Jonathan Hogg  ---
We also see similar behaviour when running task-based (as opposed to parallel
for) code. When the number of tasks is much smaller than the number of cores,
most time is spent in libgomp spinning. Presumably there is too much contention
on the work-queue lock. We're running on 28 real cores (2x14 core intel
haswell-EP chips).

If we look out our task profile, we see very little of the time is spent inside
our task code, and this is confirmed by profile data from perf:
  27.60%  spral_ssids  libgomp.so.1.0.0  [.] gomp_mutex_lock_slow
   6.96%  spral_ssids  libgomp.so.1.0.0  [.] gomp_team_barrier_wait_end
   3.78%  spral_ssids  [kernel.kallsyms] [k] _spin_lock_irq
   2.91%  spral_ssids  [kernel.kallsyms] [k] smp_invalidate_interrupt
   2.21%  spral_ssids  spral_ssids   [.] __CreateCoarseGraphNoMask
   2.18%  spral_ssids  [kernel.kallsyms] [k] _spin_lock
   2.05%  spral_ssids  libmkl_avx2.so[.] mkl_blas_avx2_dgemm_kernel_0
   1.99%  spral_ssids  spral_ssids   [.] __FM_2WayNodeRefine_OneSided
   1.78%  spral_ssids  libgomp.so.1.0.0  [.] gomp_sem_wait_slow
   1.64%  spral_ssids  libc-2.12.so  [.] __GI_strtod_l_internal

Here's an example of what we're seeing:

Small problems (much less work than cores):
4 cores / 28 cores times in seconds
0.02 / 0.17
0.20 / 0.60
0.20 / 0.58
0.14 / 0.63
0.75 / 2.37

Bigger problems (sufficient work exists):
4 cores / 28 cores times in seconds
48.52 / 22.16
153.49 / 61.77
140.89 / 54.51
189.75 / 71.43