https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65589
Bug ID: 65589 Summary: OpenMP 3.1 produces random results for simple array copy Product: gcc Version: unknown Status: UNCONFIRMED Severity: blocker Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: felix.ospald at gmx dot de CC: jakub at gcc dot gnu.org /* gcc --version gcc (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388] echo | cpp -fopenmp -dM | grep -i open #define _OPENMP 201107 cat /proc/version Linux version 3.11.10-17-desktop (geeko@buildhost) (gcc version 4.8.1 20130909 [gcc-4_8-branch revision 202388] (SUSE Linux) ) #1 SMP PREEMPT Mon Jun 16 15:28:13 UTC 2014 (fba7c1f) cat /proc/meminfo MemTotal: 529410640 kB lsb_release -a LSB Version: core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64:desktop-4.0-amd64:desktop-4.0-noarch:graphics-2.0-amd64:graphics-2.0-noarch:graphics-3.2-amd64:graphics-3.2-noarch:graphics-4.0-amd64:graphics-4.0-noarch Distributor ID: openSUSE project Description: openSUSE 13.1 (Bottle) (x86_64) Release: 13.1 Codename: Bottle lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 4 NUMA node(s): 4 Vendor ID: GenuineIntel CPU family: 6 Model: 45 Model name: Intel(R) Xeon(R) CPU E5-4640 0 @ 2.40GHz Stepping: 7 CPU MHz: 2712.000 BogoMIPS: 4815.64 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 20480K NUMA node0 CPU(s): 0-7,32-39 NUMA node1 CPU(s): 8-15,40-47 NUMA node2 CPU(s): 16-23,48-55 NUMA node3 CPU(s): 24-31,56-63 Running this program results in an output like loop 0 loop 1 loop 2 loop 3 loop 4 loop 5 loop 6 loop 7 loop 8 fault index=716440 value=0.5 So it seems like that the value at index 716440 is never set to 1. Sometimes several attempts (ctrl+C and run again) and sometimes several hundered loops are required until the error occurs. This error rarely occurs when run with less than 32 threads (never on 1 core). The "rand()" function is called because it seems to make the error occour more often (but it also occurs without this line). In general code wich produces different runtime on each thread seems to make the error occour more often. The "rand()" function seems to have internal thread locking, wich casuses different delays for each thread. The error was reproduced on two different machines (so bad memory is unlikely, however both run the same os+gcc versions). The following things do not have any influence: - gcc optimization swich -O0 - gcc -march switch (native or x86-64) - #pragma omp flush (at various places) - schedule static/dynamic - catching exceptions inside the loop I have no clue what is going on. Any help is very appreciated. */ #include <iostream> #include <cmath> #include <stdlib.h> #include <omp.h> int main(int argc, char* argv[]) { int num_threads = omp_get_num_procs(); if (argc > 1) { num_threads = atoi(argv[1]); } omp_set_dynamic(0); omp_set_nested(0); omp_set_num_threads(num_threads); std::cout << "num_threads=" << omp_get_max_threads() << std::endl; const int n = 512*512*4; double* phi0 = new double[n]; double* sigma0 = new double[n]; for (int iter = 0;; iter++) { std::cout << "loop " << iter << std::endl; for (int k = 0; k < n; k++) { phi0[k] = 1; sigma0[k] = 0.5; } #pragma omp parallel for schedule(static) for (int i = 0; i < n; i++) { //#pragma omp critical rand(); sigma0[i] = phi0[i]; } for (int j = 0; j < n; j++) { if (sigma0[j] != 1) { std::cout << "fault index=" << j << " value=" << sigma0[j] << std::endl; return 1; } } } return 0; } The CMakeLists.txt: cmake_minimum_required(VERSION 2.8) SET(CMAKE_BUILD_TYPE Release) SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -march=native -O2 -fopenmp") #SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -march=x86-64 -mtune=generic -O0 -fopenmp -fstack-check -fbounds-check") SET(SOURCES main.cpp) ADD_EXECUTABLE(main ${SOURCES})