http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489
--- Comment #6 from Gary Funck <gary at intrepid dot com> 2011-09-25 19:58:58 UTC --- (In reply to comment #5) > D.3059_11 = VIEW_CONVERT_EXPR<shared [8] struct foo[1] *>(D.3058); > > looks like bogus IL to me. You view D.3058, a struct of size 16, as > a pointer (of size 8). I suppose you want to load D.3058.vaddr here? > > D.3060_12 = (shared [8] struct foo *) D.3059_11; > D.3061_13 = VIEW_CONVERT_EXPR<struct upc_shared_ptr_t>(D.3060_12).phase; > > looks bogus IL to me. It views the pointer(!?) D.3060_12 as being a > struct upc_shared_ptr_t and extracts a value that is not within that > pointer. > > But maybe I'm missing something because I don't recognize that 'shared [8]' > qualification. [...] The syntax (shared [8] struct foo *) above is unique to UPC. This is a pointer to a "shared' qualified object with a "blocking factor" (layout qualifier) of 8. This type of pointer is called a "pointer-to-shared" (PTS) in the UPC language definition; it is a pointer that can span nodes. On a 64-bit machine, using the "sturct PTS" (as opposed to "packed PTS") representation it is a 16 byte quantity. Thus the casts back/forth between (shared *) and "struct upc_shared_ptr_t" do not violate the size assumptions of VIEW_CONVERT_EXPR(). The "blocking factor" (the [8] in "shared [8] *" above) is unique to UPC. In UPC, arrays are "block distributed". This means that block 0 is on thread 0, block 1 is on thread 1 and so on. Thus, for a UPC program that is run with 2 threads, foo[0], foo[1] ... foo[7] are allocated on (have "affinity to") thread 0 and foo[8], foo[9] ... foo[13] are allocated on thread 1. This blocking factor provides for the ability to cast a pointer to a block of shared storage into a regular "C" pointer (a "local" pointer) as long as the thread performing the cast has affinity to the block. What is potentially troublesome for the "middle end" tree optimizations and "back end" RTL optimizations is that these pointers-to-shared (PTS's) are "fat" pointers. Note that after the lowering pass (performed in upc/upc-genericize.c) that there will be no *indirections* through a PTS. Instead, indirections of a PTS in a value context will be converted into "get" calls, which are implemented by the UPC runtime (libupc/smp). Indirections that are the targets of assignments are translated into "put" calls, implemented by the UPC runtime. The lowering pass also translates UPC pointer-to-shared arithmetic operations into their equivalent operations which do not involve PTS's, but rather cast the PTS's to their representation type (struct upc_shared_ptr_t) and then operate on the component parts of the PTS. As you can see from the description of blocking factors above, the mapping of foo[i] to its (global) address requires a fairly complex arrangement of division and modulo operations. The libupc runtime is unique in that parts of it may be inlined. Inlining of the runtime is enabled at optimization levels greater than 0, or it can be explicitly inlined/not-inlined via the -fupc-inline-lib switch. The inlining is accomplished via a pre-include of a runtime header file, implemented by the "upc" driver. Inlining is enabled in the test case documented in this bug report. Thus, a simple assignment statement involving array indexing of a UPC shared "blocked" array expands into a rather complex assortment of tree code, and generated RTL. (This complexity makes it difficult to create an equivalent "C" test case.) After lowering, any references to "shared *" (pointers-to-shared) should only occur in casts to/from the representation type and in moves/copies of the PTS container. We have run into a few places where the middle end makes some assumptions about regular pointers and tries to apply those assumptions to a UPC pointer-to-shared; we have been able to exclude PTS's by adding additional checks for them -- there are not many places that we have had to do this. Perhaps that sort of pointer-specific logic is kicking in here. Arguably, the UPC lowering pass should fully lower PTS typed expressions, so that they don't end up in the tree. Potentially, a PTS hanging around in the tree doesn't meet the strict (or even not-so-strict) definition of GENERIC. Fully lowering those expressions is on our "to do" list. When we do that, rather than using casts, we will likely rewrite the PTS type references into references to the PTS representation type. We have shied away from this because it makes the resulting tree code even more difficult to follow, because it loses logical correspondence to the original "C" source statements. That said, this technique of casting a PTS to its representation type and then extracting its sub-parts has been working for quite a while on several different target architectures. However, maybe this recast of a pointer-to-shared is confusing the post-reload instruction scheduler and/or the logic that creates the MEM_REF?. We would like to see if we can find a way to make the current lowering pass approach work, because it does work in many contexts, and will allow us to make forward progress without making the lowering pass re-work become a critical path task. Also, we don't know that the presence of a PTS-typed node in the tree is actually the cause of the problem.