http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50098

             Bug #: 50098
           Summary: The OpenMP ordered construct blocks parallelism, when
                    appearing at the beginning of a loop body
    Classification: Unclassified
           Product: gcc
           Version: 4.4.3
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: terec...@gmail.com


Created attachment 25021
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25021
program illustrating the performance issue of the ordered construct

According to the OpenMP spec, the ordered construct enforces sequential
ordering of the ordered region. In the GNU OpenMP implementation it works fine,
if the ordered construct resides at the end of the loop body. However, when it
is in the beginning, the parallelism is blocked.

Using the attached C program, the timing figures for the sequential, ordered in
the beginning and ordered in the end versions of the code are presented on a
4-core Intel CPU. The timing shows that the version, where the ordered
construct sits in the beginning of the loop body, has a slowdown, whereas the
version with the ordered construct at the end of the loop body is faster than
the sequential code.

andrei@jos:~/src$ gcc ordered_perf_bug.c -o /tmp/a.out
andrei@jos:~/src$ time /tmp/a.out

real    0m5.411s
user    0m5.400s
sys    0m0.000s
andrei@jos:~/src$ gcc -fopenmp ordered_perf_bug.c -o /tmp/a.out
andrei@jos:~/src$ time /tmp/a.out

real    0m6.155s
user    0m24.530s
sys    0m0.010s
andrei@jos:~/src$ gcc -DFAST_ORDERED -fopenmp ordered_perf_bug.c -o /tmp/a.out
andrei@jos:~/src$ time /tmp/a.out

real    0m3.082s
user    0m12.290s
sys    0m0.000s

andrei@jos:~/src$ gcc --version
gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

andrei@jos:~/src$ uname -a
Linux jos 2.6.32-27-generic #49-Ubuntu SMP Thu Dec 2 00:51:09 UTC 2010 x86_64
GNU/Linux
andrei@jos:~/src$ lshw 
jos                       
    description: Computer
    width: 64 bits
    capabilities: vsyscall64 vsyscall32
  *-core
       description: Motherboard
       physical id: 0
     *-memory
          description: System memory
          physical id: 0
          size: 3922MiB
     *-cpu
          product: Intel(R) Core(TM) i5 CPU         760  @ 2.80GHz
          vendor: Intel Corp.
          physical id: 1
          bus info: cpu@0
          size: 1197MHz
          capacity: 1197MHz
          width: 64 bits
          capabilities: fpu fpu_exception wp vme de pse tsc msr pae mce cx8
apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
tm pbe syscall nx rdtscp x86-64 constant_tsc arch_perfmon pebs bts rep_good
xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2
ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi
flexpriority ept vpid cpufreq
...

P.S. Looking at the GOMP_ordered_end(void) implementation, I suspect that it
needs some synchronization code to fix the reported performance issue.

Reply via email to