[libgomp] MEMMODEL_* constants and OMP_STACKSIZE: a few questions/proposals
Ping. Hi, I’ve recently started to work on libgomp with the goal of proposing a new way of handling queues of tasks based on the work done by a PhD student. While working on libgomp’s code I noticed two things that puzzled me: - The code uses gcc’s atomic builtins but doesn’t use the __ATOMIC_* constants to select the desired memory model for the operation. Instead it uses the MEMMODEL_* constants defined in libgomp.h. Is there a good reason for that? I’d be very tempted to (possibly) “fix” it with the attached patch. - When using the OMP_STACKSIZE environment variable to select the stack size of the threads, it only applies to the threads created by pthread_create and has no effect on the main thread of the process. This behaviour looks compliant with version 3.1 of the spec: “OMP_STACKSIZE sets the stacksize-var ICV that specifies the size of the stack for threads created by the OpenMP implementation” but I was wondering if it was the best thing to do? I discovered this when playing with the UTS benchmark of the BOTS benchmark suite which can require quite big stacks for some input datasets. It uses OMP_STACKSIZE to set its requirements but that doesn’t prevent a task with stack requirements bigger than the default size to be scheduled on the main thread of the process, leading the application to crash because of a stack overflow. Should the application be setting itself the size of its main thread’s stack? Shouldn’t something be done in the compiler/runtime to handle this? That wouldn’t be not compliant with the spec and much more intuitive to the programmer: “I can expect that every thread will have OMP_STACKSIZE worth of stack”. I should hopefully write again soon with some more useful patches and proposals. In the meantime, thank you for your answers. Regards, Kévin From b1a6f296d6c4304b41f735859e226213881f861e Mon Sep 17 00:00:00 2001 From: Kevin PETIT Date: Mon, 20 May 2013 13:28:40 +0100 Subject: [PATCH] Don't redefine an equivalent to the __ATOMIC_* constants If we can use the __atomic_* builtins it means that we also have the __ATOMIC_* constants available. This patch removes the definition of the MEMMODEL_* constants in libgomp.h and changes the code to use __ATOMIC_*. --- libgomp/config/linux/affinity.c |2 +- libgomp/config/linux/bar.c | 10 +- libgomp/config/linux/bar.h |8 libgomp/config/linux/lock.c | 10 +- libgomp/config/linux/mutex.c|6 +++--- libgomp/config/linux/mutex.h|4 ++-- libgomp/config/linux/ptrlock.c |6 +++--- libgomp/config/linux/ptrlock.h |6 +++--- libgomp/config/linux/sem.c |6 +++--- libgomp/config/linux/sem.h |4 ++-- libgomp/config/linux/wait.h |2 +- libgomp/critical.c |2 +- libgomp/libgomp.h | 11 --- libgomp/ordered.c |4 ++-- libgomp/task.c |4 ++-- 15 files changed, 37 insertions(+), 48 deletions(-) diff --git a/libgomp/config/linux/affinity.c b/libgomp/config/linux/affinity.c index dc6c7e5..fcbc323 100644 --- a/libgomp/config/linux/affinity.c +++ b/libgomp/config/linux/affinity.c @@ -108,7 +108,7 @@ gomp_init_thread_affinity (pthread_attr_t *attr) unsigned int cpu; cpu_set_t cpuset; - cpu = __atomic_fetch_add (&affinity_counter, 1, MEMMODEL_RELAXED); + cpu = __atomic_fetch_add (&affinity_counter, 1, __ATOMIC_RELAXED); cpu %= gomp_cpu_affinity_len; CPU_ZERO (&cpuset); CPU_SET (gomp_cpu_affinity[cpu], &cpuset); diff --git a/libgomp/config/linux/bar.c b/libgomp/config/linux/bar.c index 35baa88..8a81f7c 100644 --- a/libgomp/config/linux/bar.c +++ b/libgomp/config/linux/bar.c @@ -38,14 +38,14 @@ gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state) /* Next time we'll be awaiting TOTAL threads again. */ bar->awaited = bar->total; __atomic_store_n (&bar->generation, bar->generation + 4, - MEMMODEL_RELEASE); + __ATOMIC_RELEASE); futex_wake ((int *) &bar->generation, INT_MAX); } else { do do_wait ((int *) &bar->generation, state); - while (__atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE) == state); + while (__atomic_load_n (&bar->generation, __ATOMIC_ACQUIRE) == state); } } @@ -95,7 +95,7 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state) } else { - __atomic_store_n (&bar->generation, state + 3, MEMMODEL_RELEASE); + __atomic_store_n (&bar->generation, state + 3, __ATOMIC_RELEASE); futex_wake ((int *) &bar->generation, INT_MAX); return; } @@ -105,11 +105,11 @@ gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state) do { do_wait ((int *) &bar->generation, generation); - gen = __atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE); + gen = __atomic_load_n (&bar->gene
Target options
Hi, when defining target options with Mask() and "Target" going to target_flags. Can I use Init(1) to define the default, or is "Init" only used to initialize "Var(name)" kind of options? If so, what's the proper way to define defaults, it wasn't clear to me when checking mips/i386 definitions for instance. Thanks, Hendrik Greving
Re: Target options
Along the same lines, what's the difference of target_flags (I know from old compilers) and target_flags_explicit (I do not know)? Thanks, Regards, Hendrik Greving On Mon, Jul 15, 2013 at 10:30 AM, Hendrik Greving wrote: > Hi, > > when defining target options with Mask() and "Target" going to > target_flags. Can I use Init(1) to define the default, or is "Init" > only used to initialize "Var(name)" kind of options? If so, what's the > proper way to define defaults, it wasn't clear to me when checking > mips/i386 definitions for instance. > > Thanks, > Hendrik Greving
gettext prereq vs po/zh_TW
The gcc prereq page says gettext 0.14.5 is the minimum version, but po/zh_TW.po has lines like this: #, fuzzy #~| msgid "Unexpected EOF" #~ msgid "Unexpected type..." #~ msgstr "未預期的型態…" The | syntax appears to have been added in gettext 0.16, and gettext 0.14 can't process it. Seems to have been a result of this request: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01436.html
Re: Target options
2013/7/16 Hendrik Greving : > Along the same lines, what's the difference of target_flags (I know > from old compilers) and target_flags_explicit (I do not know)? > > Thanks, > Regards, > Hendrik Greving > > On Mon, Jul 15, 2013 at 10:30 AM, Hendrik Greving > wrote: >> Hi, >> >> when defining target options with Mask() and "Target" going to >> target_flags. Can I use Init(1) to define the default, or is "Init" >> only used to initialize "Var(name)" kind of options? If so, what's the >> proper way to define defaults, it wasn't clear to me when checking >> mips/i386 definitions for instance. >> >> Thanks, >> Hendrik Greving Hi, Greving To my understanding, when the option use MASK(F) we shouldn't use Init (), one approach is to initialize it in option_override target hook Ex: target_flags |= MASK_F target_flags_explicit determine whether user have given the flag value (disable/enable)or not. Ex: if the flag initial value depends on cpu type when the cpu type is A, flag F should enable However, user may disable the flag explicitly we wish user semantic could take effect. Therefore, the condition would be if (cpu_type == A && !(target_flags_explicit & MASK_F)) target_flags |= MASK_F; Cheers, Shiva
Re: Delay scheduling due to possible future multiple issue in VLIW
Paulo, GCC schedule is not particularly designed for VLIW architectures, but it handles them reasonably well. For the example of your code both schedules take same time to execute: 38: 0: r1 = e[r0] 40: 4: [r0] = r1 41: 5: r0 = r0+4 43: 5: p0 = r1!=0 44: 6: jump p0 and 38: 0: r1 = e[r0] 41: 1: r0 = r0+4 40: 4: [r0] = r1 43: 5: p0 = r1!=0 44: 6: jump p0 [It is true that the first schedule takes less space due to fortunate VLIW packing.] You are correct that GCC scheduler is greedy and that it tries to issue instructions as soon as possible (i.e., it is better to issue something on the cycle, than nothing at all), which is a sensible strategy. For small basic block the greedy algorithm may cause artifacts like the one you describe. You could try increasing size of regions on which scheduler operates by switching your port to use scheb-ebb scheduler, which was originally developed for ia64. Regards, -- Maxim Kuvyrkov KugelWorks On 27/06/2013, at 8:35 PM, Paulo Matos wrote: > Let me add to my own post saying that it seems that the problem is that the > list scheduler is greedy in the sense that it will take an instruction from > the ready list no matter what when waiting and trying to pair it with later > on with another instruction might be more beneficial. In a sense it seems > that the idea is that 'issuing instructions as soon as possible is better' > which might be true for a single issue chip but a VLIW with multiple issue > has to contend with other problems. > > Any thoughts on this? > > Paulo Matos > > >> -Original Message- >> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Paulo >> Matos >> Sent: 26 June 2013 15:08 >> To: gcc@gcc.gnu.org >> Subject: Delay scheduling due to possible future multiple issue in VLIW >> >> Hello, >> >> We have a port for a VLIW machine using gcc head 4.8 with an maximum issue of >> 2 per clock cycle (sometimes only 1 due to machine constraints). >> We are seeing the following situation in sched2: >> >> ;; --- forward dependences: >> >> ;; --- Region Dependences --- b 3 bb 0 >> ;; insn codebb dep prio cost reservation >> ;; -- --- --- >> ;; 38 1395 3 0 6 4 >> (p0+long_imm+ldst0+lock0),nothing*3 : 44m 43 41 40 >> ;; 40 491 3 1 2 2 (p0+long_imm+ldst0+lock0),nothing >> : 44m 41 >> ;; 41 536 3 2 1 1 (p0+no_stl2)|(p1+no_dual) : 44 >> ;; 43 1340 3 1 2 1 (p0+no_stl2)|(p1+no_dual) : 44m >> ;; 44 1440 3 4 1 1 (p0+long_imm) : >> >> ;; dependencies resolved: insn 38 >> ;; tick updated: insn 38 into ready >> ;; dependencies resolved: insn 41 >> ;; tick updated: insn 41 into ready >> ;; Advanced a state. >> ;; Ready list after queue_to_ready:41:4 38:2 >> ;; Ready list after ready_sort:41:4 38:2 >> ;; Ready list (t = 0):41:4 38:2 >> ;; Chosen insn : 38 >> ;;0--> b 0: i 38r1=zxn([r0+`b']) >> :(p0+long_imm+ldst0+lock0),nothing*3 >> ;; dependencies resolved: insn 43 >> ;; Ready-->Q: insn 43: queued for 4 cycles (change queue index). >> ;; tick updated: insn 43 into queue with cost=4 >> ;; dependencies resolved: insn 40 >> ;; Ready-->Q: insn 40: queued for 4 cycles (change queue index). >> ;; tick updated: insn 40 into queue with cost=4 >> ;; Ready-->Q: insn 41: queued for 1 cycles (resource conflict). >> ;; Ready list (t = 0): >> ;; Advanced a state. >> ;; Q-->Ready: insn 41: moving to ready without stalls >> ;; Ready list after queue_to_ready:41:4 >> ;; Ready list after ready_sort:41:4 >> ;; Ready list (t = 1):41:4 >> ;; Chosen insn : 41 >> ;;1--> b 0: i 41r0=r0+0x4 >> :(p0+no_stl2)|(p1+no_dual) >> >> So, it is scheduling first insn 38 followed by 41. >> The insn chain for bb3 before sched2 looks like: >> (insn 38 36 40 3 (set (reg:DI 1 r1) >>(zero_extend:DI (mem:SI (plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ] >> [119]) >>(symbol_ref:SI ("b") [flags 0x80] > 0x2b9c011f75a0 b>)) [2 MEM[symbol: b, index: ivtmp.13_7, offset: 0B]+0 S4 >> A32]))) pr3115b.c:13 1395 {zero_extendsidi2} >> (nil)) >> (insn 40 38 41 3 (set (mem:SI (plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ] >> [119]) >>(symbol_ref:SI ("a") [flags 0x80] > a>)) [2 MEM[symbol: a, index: ivtmp.13_7, offset: 0B]+0 S4 A32]) >>(reg:SI 1 r1 [orig:118 D.3048 ] [118])) pr3115b.c:13 491 {fp_movsi} >> (nil)) >> (insn 41 40 43 3 (set (reg:SI 0 r0 [orig:119 ivtmp.13 ] [119]) >>(plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ] [119]) >>(const_int 4 [0x4]))) 536 {addsi3} >> (nil))