On Mon, Oct 15, 2012 at 5:47 PM, Gary Funck <[email protected]> wrote:
> We have maintained the gupc (GNU Unified Parallel C) branch for
> a couple of years now, and would like to merge these changes into
> the GCC trunk.
>
> It is our goal to integrate the GUPC changes into the GCC 4.8
> trunk, in order to provide a UPC (Unified Parallel C) capability
> in the subsequent GCC 4.8 release.
>
> The purpose of this note is to introduce the GUPC project,
> provide an overview of the UPC-related changes and to introduce
> the subsequent sets of patches which merge the GUPC branch into
> GCC 4.8.
>
> For reference,
>
> The GUPC project page is here:
> http://gcc.gnu.org/projects/gupc.html
>
> The current GUPC release is distributed here:
> http://gccupc.org
>
> Roughly a year ago, we described the front-end related
> changes at the time:
> http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00081.html
>
> We merge the GCC trunk into the gupc branch on approximately
> a weekly basis. The current GUPC branch is based upon a recent
> version of the GCC trunk (192449 dated 2012-10-15), and has
> been bootstrapped on x86_64/i686 Linux, PPC/POWER7/Linux and
> IA64/Altix Linux. In earlier versions, GUPC was successfully
> ported to SGI/MIPS (big endian) and SciCortex/MIPS (little endian).
>
> The UPC-related source code differences
> can be viewed here in various formats:
> http://gccupc.org/gupc-changes
>
> In the discussion below, the changes are
> excerpted in order to highlight important
> aspects of the UPC-related changes. The version used in
> this presentation is 190707.
>
> UPC's Shared Qualifier and Layout Qualifier
> -------------------------------------------
>
> The UPC language specification describes
> the language syntax and semantics:
> http://upc.gwu.edu/docs/upc_specs_1.2.pdf
>
> UPC introduces a new qualifier, "shared"
> that indicates that the qualified object
> is located in a global shared address space
> that is accessible by all UPC threads.
> Additional qualifiers ("strict" and "relaxed")
> further specify the semantics of accesses to
> UPC shared objects.
>
> In UPC, a shared qualified array can further
> specify a "layout qualifier" that indicates
> how the shared data is blocked and distributed.
>
> There are two language pre-defined identifiers
> that indicate the number of threads that
> will be created when the program starts (THREADS)
> and the current (zero-based) thread number
> (MYTHREAD). Typically, a UPC thread is implemented
> as an operating system process. Access to UPC
> shared memory may be implemented locally via
> OS provided facilities (for example, mmap),
> or across nodes via a high speed network
> inter-connect (for example, Infiniband).
>
> GUPC provides a runtime (libgupc) that targets
> an SMP-based system and uses mmap() to implement
> global shared memory.
>
> Optionally, GUPC can use the more general and
> more capable Berkeley UPCR runtime:
> http://upc.lbl.gov/download/source.shtml#runtime
> The UPCR runtime supports a number of network
> topologies, and has been ported to most of the
> current High Performance Computing (HPC) systems.
>
> The following example illustrates
> the use of the UPC "shared" qualifier
> combined with a layout qualifier.
>
> #define BLKSIZE 5
> #define N_PER_THREAD (4 * BLKSIZE)
> shared [BLKSIZE] double A[N_PER_THREAD*THREADS];
>
> Above the "[BLKSIZE]" construct is the UPC
> layout factor; this specifies that the shared
> array, A, distributes its elements across
> each thread in blocks of 5 elements. If the
> program is run with two threads, then A is
> distributed as shown below:
>
> Thread 0 Thread 1
> -------- ---------
> A[ 0.. 4] A[ 5.. 9]
> A[10..14] A[15..19]
> A[20..24] A[25..29]
> A[30..34] A[35..39]
>
> Above, the elements shown for thread 0
> are defined as having "affinity" to thread 0.
> Similarly, those elements shown for thread 1
> have affinity to thread 1. In UPC, a pointer
> to a shared object can be cast to a thread
> local pointer (a "C" pointer), when the
> designated shared object has affinity
> to the referencing thread.
>
> A UPC "pointer-to-shared" (PTS) is a pointer
> that references a UPC shared object.
> A UPC pointer-to-shared is a "fat" pointer
> with the following logical fields:
> (virt_addr, thread, offset)
>
> The virtual address (virt_addr) field is combined with
> the thread number (thread) and offset within the
> block (offset), to derive the location of the
> referenced object within the UPC shared address space.
>
> GUPC implements pointer-to-shared objects using
> either a "packed" representation or a "struct"
> representation. The user can select the
> pointer-to-shared representation with a "configure"
> parameter. The packed representation is the default.
>
> The "packed" pointer-to-shared representation
> limits the range of the various fields within
> the pointer-to-shared in order to gain efficiency.
> Packed pointer-to-shared values encode the three
> part shared address (described above) as a 64-bit
> value (on both 64-bit and 32-bit platforms).
>
> The "struct" representation provides a wider
> addressing range at the expense of requiring
> twice the number of bits (128) needed to encode
> the pointer-to-shared value.
>
> UPC-Related Front-End Changes
> -----------------------------
>
> GCC's internal tree representation is
> extended to record the UPC "shared",
> "strict", "relaxed" qualifiers,
> and the layout qualifier.
What immediately comes to my mind is that apart from parsing
the core machinery should be shareable with Cilk+, no?
Richard.
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h (.../trunk) (revision 190707)
> +++ gcc/tree.h (.../branches/gupc) (revision 190736)
> @@ -458,7 +458,10 @@ struct GTY(()) tree_base {
> unsigned packed_flag : 1;
> unsigned user_align : 1;
> unsigned nameless_flag : 1;
> - unsigned spare0 : 4;
> + unsigned upc_shared_flag : 1;
> + unsigned upc_strict_flag : 1;
> + unsigned upc_relaxed_flag : 1;
> + unsigned spare0 : 1;
>
> unsigned spare1 : 8;
>
>
> UPC defines a few additional tree node types:
>
> +++ gcc/upc/upc-tree.def (.../branches/gupc) (revision 190736)
> +/* UPC statements */
> +
> +/* Used to represent a `upc_forall' statement. The operands are
> + UPC_FORALL_INIT_STMT, UPC_FORALL_COND, UPC_FORALL_EXPR,
> + UPC_FORALL_BODY, and UPC_FORALL_AFFINITY respectively. */
> +
> +DEFTREECODE (UPC_FORALL_STMT, "upc_forall_stmt", tcc_statement, 5)
> +
> +/* Used to represent a UPC synchronization statement. The first
> + operand is the synchronization operation, UPC_SYNC_OP:
> + UPC_SYNC_NOTIFY_OP 1 Notify operation
> + UPC_SYNC_WAIT_OP 2 Wait operation
> + UPC_SYNC_BARRIER_OP 3 Barrier operation
> +
> + The second operand, UPC_SYNC_ID is the (optional) expression
> + whose value specifies the barrier identifier which is checked
> + by the various synchronization operations. */
> +
> +DEFTREECODE (UPC_SYNC_STMT, "upc_sync_stmt", tcc_statement, 2)
>
> The "C" parser is extended to recognize UPC's syntactic
> extensions.
>
> --- gcc/c-family/c-common.c (.../trunk) (revision 190707)
> +++ gcc/c-family/c-common.c (.../branches/gupc) (revision 190736)
> @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
> #include "ggc.h"
> #include "c-common.h"
> #include "c-objc.h"
> +#include "c-upc.h"
> #include "tm_p.h"
> #include "obstack.h"
> #include "cpplib.h"
> @@ -193,6 +194,24 @@ const char *pch_file;
> user's namespace. */
> int flag_iso;
>
> +/* Nonzero whenever UPC -fupc-threads-N is asserted.
> + The value N gives the number of UPC threads to be
> + defined at compile-time. */
> +int flag_upc_threads;
> +
> +/* Nonzero whenever UPC -fupc-pthreads-model-* is asserted. */
> +int flag_upc_pthreads;
> +
> +/* The -fupc-pthreads-per-process-N switch tells the UPC compiler
> + and runtime to map N UPC threads per process onto
> + N POSIX threads running inside the process. */
> +int flag_upc_pthreads_per_process;
> +
> +/* The implementation model for UPC threads that
> + are mapped to POSIX threads, specified at compilation
> + time by the -fupc-pthreads-model-* switch. */
> +upc_pthreads_model_kind upc_pthreads_model;
> +
> /* Warn about #pragma directives that are not recognized. */
>
> int warn_unknown_pragmas; /* Tri state variable. */
> @@ -389,8 +408,9 @@ static int resort_field_decl_cmp (const
> C --std=c89: D_C99 | D_CXXONLY | D_OBJC | D_CXX_OBJC
> C --std=c99: D_CXXONLY | D_OBJC
> ObjC is like C except that D_OBJC and D_CXX_OBJC are not set
> - C++ --std=c98: D_CONLY | D_CXXOX | D_OBJC
> - C++ --std=c0x: D_CONLY | D_OBJC
> + UPC is like C except that D_UPC is not set
> + C++ --std=c98: D_CONLY | D_CXXOX | D_OBJC | D_UPC
> + C++ --std=c0x: D_CONLY | D_OBJC | D_UPC
> ObjC++ is like C++ except that D_OBJC is not set
>
> If -fno-asm is used, D_ASM is added to the mask. If
> @@ -583,6 +603,19 @@ const struct c_common_resword c_common_r
> { "inout", RID_INOUT, D_OBJC },
> { "oneway", RID_ONEWAY, D_OBJC },
> { "out", RID_OUT, D_OBJC },
> +
> + /* UPC keywords */
> + { "shared", RID_SHARED, D_UPC },
> + { "relaxed", RID_RELAXED, D_UPC },
> + { "strict", RID_STRICT, D_UPC },
> + { "upc_barrier", RID_UPC_BARRIER, D_UPC },
> + { "upc_blocksizeof", RID_UPC_BLOCKSIZEOF, D_UPC },
> + { "upc_elemsizeof", RID_UPC_ELEMSIZEOF, D_UPC },
> + { "upc_forall", RID_UPC_FORALL, D_UPC },
> + { "upc_localsizeof", RID_UPC_LOCALSIZEOF, D_UPC },
> + { "upc_notify", RID_UPC_NOTIFY, D_UPC },
> + { "upc_wait", RID_UPC_WAIT, D_UPC },
> +
>
> --- gcc/c/c-parser.c (.../trunk) (revision 190707)
> +++ gcc/c/c-parser.c (.../branches/gupc) (revision 190736)
> [...]
> @@ -498,6 +504,11 @@ c_token_starts_typename (c_token *token)
> case RID_ACCUM:
> case RID_SAT:
> return true;
> + /* UPC qualifiers */
> + case RID_SHARED:
> + case RID_STRICT:
> + case RID_RELAXED:
> + return true;
> [...]
> @@ -1224,6 +1245,14 @@ static void c_parser_objc_at_dynamic_dec
> static bool c_parser_objc_diagnose_bad_element_prefix
> (c_parser *, struct c_declspecs *);
>
> +/* These UPC parser functions are only ever called when
> + compiling UPC. */
> +static void c_parser_upc_forall_statement (c_parser *);
> +static void c_parser_upc_sync_statement (c_parser *, int);
> +static void c_parser_upc_shared_qual (source_location,
> + c_parser *,
> + struct c_declspecs *);
> +
> [...]
> + /* UPC qualifiers */
> + case RID_SHARED:
> + attrs_ok = true;
> + c_parser_upc_shared_qual (loc, parser, specs);
> + break;
> + case RID_STRICT:
> + case RID_RELAXED:
> + attrs_ok = true;
> + declspecs_add_qual (loc, specs, c_parser_peek_token
> (parser)->value);
> + c_parser_consume_token (parser);
> + break;
> case RID_ATTRIBUTE:
> if (!attrs_ok)
> goto out;
> [...]
> @@ -4558,6 +4612,22 @@ c_parser_statement_after_labels (c_parse
> gcc_assert (c_dialect_objc ());
> c_parser_objc_synchronized_statement (parser);
> break;
> + case RID_UPC_FORALL:
> + gcc_assert (c_dialect_upc ());
> + c_parser_upc_forall_statement (parser);
> + break;
> + case RID_UPC_NOTIFY:
> + gcc_assert (c_dialect_upc ());
> + c_parser_upc_sync_statement (parser, UPC_SYNC_NOTIFY_OP);
> + goto expect_semicolon;
> + case RID_UPC_WAIT:
> + gcc_assert (c_dialect_upc ());
> + c_parser_upc_sync_statement (parser, UPC_SYNC_WAIT_OP);
> + goto expect_semicolon;
> + case RID_UPC_BARRIER:
> + gcc_assert (c_dialect_upc ());
> + c_parser_upc_sync_statement (parser, UPC_SYNC_BARRIER_OP);
> + goto expect_semicolon;
> default:
> goto expr_stmt;
> }
>
> --- gcc/c-family/c-pragma.c (.../trunk) (revision 190707)
> +++ gcc/c-family/c-pragma.c (.../branches/gupc) (revision 190736)
> @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
> #include "c-pragma.h"
> #include "flags.h"
> #include "c-common.h"
> +#include "c-upc.h"
> #include "tm_p.h" /* For REGISTER_TARGET_PRAGMAS (why is
> this not a target hook?). */
> #include "vec.h"
> @@ -507,6 +508,242 @@ add_to_renaming_pragma_list (tree oldnam
> /* The current prefix set by #pragma extern_prefix. */
> GTY(()) tree pragma_extern_prefix;
>
> +/* variables used to implement #pragma upc semantics */
> +#ifndef UPC_CMODE_STACK_INCREMENT
> +#define UPC_CMODE_STACK_INCREMENT 32
> +#endif
> +static int pragma_upc_permitted;
> +static int upc_cmode;
> +static int *upc_cmode_stack;
> +static int upc_cmode_stack_in_use;
> +static int upc_cmode_stack_allocated;
> +
> +static void init_pragma_upc (void);
> +static void handle_pragma_upc (cpp_reader * ARG_UNUSED (dummy));
>
> c-decl.c handles the additional UPC qualifiers
> and declspecs. The layout qualifier is handled here:
>
> --- gcc/c/c-decl.c (.../trunk) (revision 190707)
> +++ gcc/c/c-decl.c (.../branches/gupc) (revision 190736)
> [...]
> @@ -8857,6 +9046,23 @@ declspecs_add_qual (source_location loc,
> bool dupe = false;
> specs->non_sc_seen_p = true;
> specs->declspecs_seen_p = true;
> +
> + /* A UPC layout qualifier is encoded as an ARRAY_REF,
> + further, it implies the presence of the 'shared' keyword. */
> + if (TREE_CODE (qual) == ARRAY_REF)
> + {
> + if (specs->upc_layout_qualifier)
> + {
> + error ("two or more layout qualifiers specified");
> + return specs;
> + }
> + else
> + {
> + specs->upc_layout_qualifier = qual;
> + qual = ridpointers[RID_SHARED];
> + }
> + }
> +
>
> In UPC, a qualifier includes both the traditional
> "C" qualifier flags and the UPC "layout qualifier".
> Thus, the pointer_quals field of a declarator node
> is defined as a struct including both qualifier
> flags and the UPC type qualifier, as shown below.
>
> @@ -5702,7 +5835,9 @@ grokdeclarator (const struct c_declarato
>
> /* Process type qualifiers (such as const or volatile)
> that were given inside the `*'. */
> - type_quals = declarator->u.pointer_quals;
> + type_quals = declarator->u.pointer.quals;
> + upc_layout_qualifier = declarator->u.pointer.upc_layout_qual;
> + sharedp = ((type_quals & TYPE_QUAL_SHARED) != 0);
>
> UPC shared variables are allocated at runtime in the global
> memory that is allocated and managed by the UPC runtime.
> A separate link section is used as a method of assigning
> virtual addresses to UPC shared variables. The UPC
> shared variable section is designated as a "no load"
> section on systems that support that facility; in that
> case, the linkage section begins at virtual address zero.
> The logic below assigns UPC shared variables to
> their own linkage section.
>
> @@ -6235,6 +6409,13 @@ grokdeclarator (const struct c_declarato
> [...]
> + /* Shared variables are given their own link section on
> + most target platforms, and if compiling in pthreads mode
> + regular local file scope variables are made thread local. */
> + if ((TREE_CODE(decl) == VAR_DECL)
> + && !threadp && (TREE_SHARED (decl) || flag_upc_pthreads))
> + upc_set_decl_section (decl);
> +
>
> Various UPC language related checks and operations
> are called in the "C" front-end and middle-end.
> To insure that these operations are defined,
> when linked with the other language front-ends
> and compilers, these functions are stub-ed,
> in a fashion similar to Objective C:
>
> --- gcc/c-family/c-upc.h (.../trunk) (revision 0)
> +++ gcc/c-family/c-upc.h (.../branches/gupc) (revision 190736)
> [...]
> +
> +/* UPC entry points. */
> +
> +/* The following UPC functions are called by the C front-end;
> + * they all must have corresponding stubs in stub-upc.c. */
> +
> +extern int count_upc_threads_refs (tree);
> +extern void deny_pragma_upc (void);
> +extern int get_upc_consistency_mode (void);
> [...]
> +extern tree upc_rts_forall_depth_var (void);
> +extern void upc_set_decl_section (tree);
> +extern void upc_write_global_declarations (void);
>
> A few command line option flags must also be
> stub'ed out in order to link the other
> language front-ends.
>
> --- gcc/c-family/stub-upc.c (.../trunk) (revision 0)
> +++ gcc/c-family/stub-upc.c (.../branches/gupc) (revision 190736)
> [...]
> +int compiling_upc;
> +int flag_upc;
> +int use_upc_dwarf2_extensions;
>
> The complete set of GUPC-related patches will be provided for
> review in a collection of 16 patch sets. A listing of those
> patch sets is attached.
>
> Each patch set will be sent in an separate email following
> this one for the purposes of review.
>
> -- end --