More of a Loop distribution.
All: Loop distribution considers DDG to decide on distributing the Loops. The Loops with control statements like IF-THEN-ELSE can also be Distributed. Instead of Data Dependency Graph, the Control Dependence Graph should be considered in order to distribute the loops In presence of control Statements. Also the presence of multiple exits in the Loop can also be considered for Loop distribution transformation. Thus the above transformation helps in the Libquantum benchmarks for SPEC 2006. There are following articles that looks interesting to me. "Loop Distribution in presence of arbitrarily control flow Ken Kennedy et.al." "Loop Distribution in presence of Multiple Exits Bor-Ming Hsieh etal." I don't think the loop distribution in presence of control flow is implemented in GCC/LLVM. I think it is feasible to consider the above for the implementation in GCC. Thanks & Regards Ajit
Re: More of a Loop distribution.
On Thu, Aug 13, 2015 at 9:37 AM, Ajit Kumar Agarwal wrote: > All: > > Loop distribution considers DDG to decide on distributing the Loops. The > Loops with control statements like IF-THEN-ELSE can also be > Distributed. Instead of Data Dependency Graph, the Control Dependence Graph > should be considered in order to distribute the loops > In presence of control Statements. > > Also the presence of multiple exits in the Loop can also be considered for > Loop distribution transformation. > > Thus the above transformation helps in the Libquantum benchmarks for SPEC > 2006. > > There are following articles that looks interesting to me. > > "Loop Distribution in presence of arbitrarily control flow Ken Kennedy et.al." > "Loop Distribution in presence of Multiple Exits Bor-Ming Hsieh etal." > > > I don't think the loop distribution in presence of control flow is > implemented in GCC/LLVM. > > I think it is feasible to consider the above for the implementation in GCC. It's true that loop distribution does not try to distribute based on any control structure heuristics but it only considers data locality. It does however already compute the control dependence graph (and uses it to add control edges to the DDG to properly add data dependence edges to uses of control statements necessary in partitions). So it should be a matter of specifying the proper set of starting statements it tries separating. Not sure which kind of distribution you are after, can you give an example? Richard. > Thanks & Regards > Ajit
Re: Results from SPEC2006 FP analysis done at Richard`s request {late July / early August}
On Thu, Aug 13, 2015 at 3:32 AM, Abe wrote: > Dear all, > > Overall, I think the WIP new if converter is holding up > relatively well, but there is obviously opportunity to do better, > at least if the numbers mean what they look like they mean, > i.e. the old converter`s code was fully OK and so is the new one`s. > By "fully OK" I mean e.g. no crashing bugs were introduced by the > conversion. > > > In the following, all the integers over 1000 are loops-vectorized counts. > > > "base": baseline compiler source code > Git hash: cb791e75379bc0c8b10bd13bcb24305c36fd504b > commit date: July 10 2015 > committer: Richard > > "new": base + patches for new [GIMPLE-level] if converter > > > > -O3 > === > > no special flags > > base: 5951 > new: 5956 > > with only "-ftree-loop-if-convert" added That is -ftree-loop-if-convert-stores? > > base: 5954 > new: 5956 > > with both if-conversion flags added What's the other if-conversion flag? I suppose _this_ is -ftree-loop-if-convert-stores? That would match the numbers above which are mostly identical because -O3/-Ofast already enable -ftree-loop-if-convert by means of enabling vectorization. > --- > base: 5970 > new: 5956 > > > > -Ofast > == > > no special flags > > base: 7393 > new: 7401 > > with only "-ftree-loop-if-convert" added > > base: 7393 > new: 7401 > > with both if-conversion flags added > --- > base: 7421 > new: 7401 > Can you please post individual benchmark numbers instead of just the overall score? >From the numbers above I can see the new if-converter removes any improvement we get from -ftree-loop-if-convert-stores (as expected - it's not a vectorization enabler with the new scheme). Thanks, Richard. > > I have a spreadsheet [and a PDF generated therefrom] that shows the above in > a > more visual format. Please feel free to ask for the PDF as an email > attachment. > > > Regards, > > Abe > >
Finding insns to reorder using dataflow
Hi all, I'm implementing a target-specific reorg pass, and one thing that I want to do is for a given insn in the stream to find an instruction in the stream that I can swap it with, without violating any dataflow dependencies. The candidate instruction could be earlier or later in the stream. I'm stuck on finding an approach to do this. It seems that using some of the dataflow infrastructure is the right way to go, but I can't figure out the details. can_move_insns_across looks like relevant, but it looks too heavyweight with quite a lot of arguments. I suppose somehow constructing regions of interchangeable instructions would be the way to go, but I'm not sure how clean/cheap that would be outside the scheduler Any ideas would be appreciated. Thanks, Kyrill
Re: Using libbacktrace in libgfortran: some questions
FX writes: > 1. It appears that even on platforms with BACKTRACE_SUPPORTED == 0 > (such as x86_64-apple-darwin), libbacktrace is built and able to > perform a nonsymbolic backtrace (which appears accurate). Is that a > feature? Can I rely on it? Yes, that is a feature. You should always get accurate PC values even on systems where libbacktrace does not yet generate file/line information. > 2. The backtraces I get on x86_64-linux-gnu are missing symbols. The > attached source file, compiled with “gfortran -g” with the attached > patch, gives the following backtrace with libgfortran’s existing code, > which uses unwind and calls to addr2line: > >> #0 0x7F4F6E333467 >> #1 0x7F4F6E334C42 >> #2 0x7F4F6E409308 >> #3 0x4008A3 in bar at a.f90:9 >> #4 0x4008C8 in foo at a.f90:5 >> #5 0x4008AF in test at a.f90:2 > > > with my patch using libbacktrace, I get: > >> 0x7f04f00f8c7d _gfortrani_show_backtrace >> ../../../trunk/libgfortran/runtime/backtrace.c:112 >> 0x7f04f00f9ac4 _gfortrani_sys_abort >> ../../../trunk/libgfortran/runtime/error.c:176 >> 0x7f04f01c8c78 _gfortran_abort >> ../../../trunk/libgfortran/intrinsics/abort.c:33 >> 0x4008a3 ??? >> /home/fx/gcc/irun/a.f90:9 >> 0x4008c8 ??? >> /home/fx/gcc/irun/a.f90:5 >> 0x4008af test >> /home/fx/gcc/irun/a.f90:2 >> 0x4008ff main >> /home/fx/gcc/irun/a.f90:2 > > > where the symbols for foo() and bar() are apparently not found, though > the source location is. Am I missing something here? I’m attaching the > output of “dwarfdump a.out” and the a.out executable file itself > (gzipped). I don't know why this is not working. Everything looks fine in the a.out that you sent. Unfortunately, I think you sent the one built without libbacktrace. Can you send me the one built with libbacktrace? Thanks. Ian
Re: Using libbacktrace in libgfortran: some questions
Resending in text/plain, sorry for any extra spam... On Thu, Aug 13, 2015 at 4:44 PM, Ian Lance Taylor wrote: > > FX writes: > > > 1. It appears that even on platforms with BACKTRACE_SUPPORTED == 0 > > (such as x86_64-apple-darwin), libbacktrace is built and able to > > perform a nonsymbolic backtrace (which appears accurate). Is that a > > feature? Can I rely on it? > > Yes, that is a feature. You should always get accurate PC values even > on systems where libbacktrace does not yet generate file/line > information. > > > > 2. The backtraces I get on x86_64-linux-gnu are missing symbols. The > > attached source file, compiled with “gfortran -g” with the attached > > patch, gives the following backtrace with libgfortran’s existing code, > > which uses unwind and calls to addr2line: > > > >> #0 0x7F4F6E333467 > >> #1 0x7F4F6E334C42 > >> #2 0x7F4F6E409308 > >> #3 0x4008A3 in bar at a.f90:9 > >> #4 0x4008C8 in foo at a.f90:5 > >> #5 0x4008AF in test at a.f90:2 Yes, the current implementation cannot resolve addresses from dynamic libraries, since those are loaded at a random offset and addr2line looks at the binary on disk and not the process image in memory. A workaround is to compile with "-g -static". > > > > > > with my patch using libbacktrace, I get: > > > >> 0x7f04f00f8c7d _gfortrani_show_backtrace > >> ../../../trunk/libgfortran/runtime/backtrace.c:112 > >> 0x7f04f00f9ac4 _gfortrani_sys_abort > >> ../../../trunk/libgfortran/runtime/error.c:176 > >> 0x7f04f01c8c78 _gfortran_abort > >> ../../../trunk/libgfortran/intrinsics/abort.c:33 > >> 0x4008a3 ??? > >> /home/fx/gcc/irun/a.f90:9 > >> 0x4008c8 ??? > >> /home/fx/gcc/irun/a.f90:5 > >> 0x4008af test > >> /home/fx/gcc/irun/a.f90:2 > >> 0x4008ff main > >> /home/fx/gcc/irun/a.f90:2 > > > > > > where the symbols for foo() and bar() are apparently not found, though > > the source location is. Am I missing something here? I’m attaching the > > output of “dwarfdump a.out” and the a.out executable file itself > > (gzipped). > > I don't know why this is not working. Everything looks fine in the > a.out that you sent. Unfortunately, I think you sent the one built > without libbacktrace. Can you send me the one built with libbacktrace? > Thanks. > > Ian You might also take a look at the patch posted to PR 54572 which was my attempt to use libbacktrace a few years ago. While I got symbolic backtraces working somewhat, unfortunately I never got it to work completely since it crashed somewhere in libbacktrace in some cases, but maybe whatever bugs caused that have been fixed in the meantime... -- Janne Blomqvist
Re: Using libbacktrace in libgfortran: some questions
> You should always get accurate PC values even > on systems where libbacktrace does not yet generate file/line > information. Cool! We’ll be able to use it unconditionaly with all targets, which is very nice. > I don't know why this is not working. Everything looks fine in the > a.out that you sent. Unfortunately, I think you sent the one built > without libbacktrace. Can you send me the one built with libbacktrace? Attached is the a.out with libgfortran (and thus libbacktrace) linked in statically. If this isn’t sufficient, I can send any file necessary (including the whole tree if need be). Thanks for helping with this, FX
Re: Using libbacktrace in libgfortran: some questions
> You might also take a look at the patch posted to PR 54572 which was my > attempt to use libbacktrace a few years ago. While I got symbolic backtraces > working somewhat, unfortunately I never got it to work completely since it > crashed somewhere in libbacktrace in some cases, but maybe whatever bugs > caused that have been fixed in the meantime… I definitely did start from your patch at PR 54572! libbacktrace definitely has improved, and now supports pecoff targets (i.e. Windows, I think). If we can get rid of the last few hurdles, then it will be a perfect solution for libgfortran, given it is already used in the compiler itself (and thus well-maintained). Obviously, the major target for which support is missing is Darwin (Mach-O object files). I have looked at implementing it, but it is well beyond my simple understanding of object files’ inner working :( FX
Re: Using libbacktrace in libgfortran: some questions
On Thu, Aug 13, 2015 at 7:11 AM, FX wrote: > >> I don't know why this is not working. Everything looks fine in the >> a.out that you sent. Unfortunately, I think you sent the one built >> without libbacktrace. Can you send me the one built with libbacktrace? > > Attached is the a.out with libgfortran (and thus libbacktrace) linked in > statically. If this isn’t sufficient, I can send any file necessary > (including the whole tree if need be). Thanks. The problem seems to be that gfortran is generating DWARF info that looks like this: subprogram test subprogram foo subprogram bar libbacktrace does not expect to see this structure, and it thinks that foo and bar have been inlined within test, which is not the case. Please try this patch to libbacktrace and see if it helps. Ian diff -r 131549887d7c src/cmd/5g/cgen.c --- a/src/cmd/5g/cgen.c Mon Apr 15 11:50:14 2013 -0700 +++ b/src/cmd/5g/cgen.c Tue Apr 23 16:45:14 2013 -0700 @@ -679,6 +679,19 @@ case ODOT: agen(nl, res); + // explicit check for nil if struct is large enough + // that we might derive too big a pointer. + if(nl->type->width >= unmappedzero) { + regalloc(&n1, types[tptr], N); + gmove(res, &n1); + regalloc(&n2, types[TUINT8], &n1); + n1.op = OINDREG; + n1.type = types[TUINT8]; + n1.xoffset = 0; + gmove(&n1, &n2); + regfree(&n1); + regfree(&n2); + } if(n->xoffset != 0) { nodconst(&n1, types[TINT32], n->xoffset); regalloc(&n2, n1.type, N); @@ -694,20 +707,20 @@ case ODOTPTR: cgen(nl, res); + // explicit check for nil if struct is large enough + // that we might derive too big a pointer. + if(nl->type->type->width >= unmappedzero) { + regalloc(&n1, types[tptr], N); + gmove(res, &n1); + regalloc(&n2, types[TUINT8], &n1); + n1.op = OINDREG; + n1.type = types[TUINT8]; + n1.xoffset = 0; + gmove(&n1, &n2); + regfree(&n1); + regfree(&n2); + } if(n->xoffset != 0) { - // explicit check for nil if struct is large enough - // that we might derive too big a pointer. - if(nl->type->type->width >= unmappedzero) { - regalloc(&n1, types[tptr], N); - gmove(res, &n1); - regalloc(&n2, types[TUINT8], &n1); - n1.op = OINDREG; - n1.type = types[TUINT8]; - n1.xoffset = 0; - gmove(&n1, &n2); - regfree(&n1); - regfree(&n2); - } nodconst(&n1, types[TINT32], n->xoffset); regalloc(&n2, n1.type, N); regalloc(&n3, types[tptr], N); @@ -759,6 +772,19 @@ case ODOT: igen(n->left, a, res); + // explicit check for nil if struct is large enough + // that we might derive too big a pointer. + if(0 && n->left->type->width >= unmappedzero) { + regalloc(&n1, types[tptr], N); + gmove(a, &n1); + regalloc(&n2, types[TUINT8], &n1); + n1.op = OINDREG; + n1.type = types[TUINT8]; + n1.xoffset = 0; + gmove(&n1, &n2); + regfree(&n1); + regfree(&n2); + } a->xoffset += n->xoffset; a->type = n->type; return; @@ -777,20 +803,18 @@ regalloc(a, types[tptr], res); cgen(n->left, a); } - if(n->xoffset != 0) { - // explicit check for nil if struct is large enough - // that we might derive too big a pointer. - if(n->left->type->type->width >= unmappedzero) { - regalloc(&n1, types[tptr], N); - gmove(a, &n1); - regalloc(&n2, types[TUINT8], &n1); - n1.op = OINDREG; - n1.type = types[TUINT8]; - n1.xoffset = 0; - gmove(&n1, &n2); - regfree(&n1); -
Re: Using libbacktrace in libgfortran: some questions
On Thu, Aug 13, 2015 at 8:25 AM, Ian Lance Taylor wrote: > On Thu, Aug 13, 2015 at 7:11 AM, FX wrote: >> >>> I don't know why this is not working. Everything looks fine in the >>> a.out that you sent. Unfortunately, I think you sent the one built >>> without libbacktrace. Can you send me the one built with libbacktrace? >> >> Attached is the a.out with libgfortran (and thus libbacktrace) linked in >> statically. If this isn’t sufficient, I can send any file necessary >> (including the whole tree if need be). > > Thanks. The problem seems to be that gfortran is generating DWARF > info that looks like this: > > subprogram test > subprogram foo > subprogram bar > > libbacktrace does not expect to see this structure, and it thinks that > foo and bar have been inlined within test, which is not the case. > > Please try this patch to libbacktrace and see if it helps. And yet, that patch has absolutely nothing to do with libbacktrace. Hmmm. Let's try this one. Ian Index: dwarf.c === --- dwarf.c (revision 226846) +++ dwarf.c (working copy) @@ -2250,7 +2250,8 @@ read_function_entry (struct backtrace_st struct unit *u, uint64_t base, struct dwarf_buf *unit_buf, const struct line_header *lhdr, backtrace_error_callback error_callback, void *data, -struct function_vector *vec) +struct function_vector *vec_function, +struct function_vector *vec_inlined) { while (unit_buf->left > 0) { @@ -2258,6 +2259,7 @@ read_function_entry (struct backtrace_st const struct abbrev *abbrev; int is_function; struct function *function; + struct function_vector *vec; size_t i; uint64_t lowpc; int have_lowpc; @@ -2279,6 +2281,11 @@ read_function_entry (struct backtrace_st || abbrev->tag == DW_TAG_entry_point || abbrev->tag == DW_TAG_inlined_subroutine); + if (abbrev->tag == DW_TAG_inlined_subroutine) + vec = vec_inlined; + else + vec = vec_function; + function = NULL; if (is_function) { @@ -2458,7 +2465,8 @@ read_function_entry (struct backtrace_st if (!is_function) { if (!read_function_entry (state, ddata, u, base, unit_buf, lhdr, - error_callback, data, vec)) + error_callback, data, vec_function, + vec_inlined)) return 0; } else @@ -2471,7 +2479,8 @@ read_function_entry (struct backtrace_st memset (&fvec, 0, sizeof fvec); if (!read_function_entry (state, ddata, u, base, unit_buf, lhdr, - error_callback, data, &fvec)) + error_callback, data, vec_function, + &fvec)) return 0; if (fvec.count > 0) @@ -2535,7 +2544,7 @@ read_function_info (struct backtrace_sta while (unit_buf.left > 0) { if (!read_function_entry (state, ddata, u, 0, &unit_buf, lhdr, - error_callback, data, pfvec)) + error_callback, data, pfvec, pfvec)) return; }
Re: Using libbacktrace in libgfortran: some questions
> And yet, that patch has absolutely nothing to do with libbacktrace. > Hmmm. Let's try this one. Works perfectly with the patch: Program aborted. Backtrace: #0 0xf75e5b9b _gfortrani_show_backtrace ../../../../trunk/libgfortran/runtime/backtrace.c:113 #1 0xf75e6aa7 _gfortrani_sys_abort ../../../../trunk/libgfortran/runtime/error.c:176 #2 0xf769a7a7 _gfortran_abort ../../../../trunk/libgfortran/intrinsics/abort.c:33 #3 0x80486e4 bar /home/fx/gcc/irun/a.f90:9 #4 0x8048706 foo /home/fx/gcc/irun/a.f90:5 #5 0x80486f2 test /home/fx/gcc/irun/a.f90:2 #6 0x8048743 main /home/fx/gcc/irun/a.f90:2 Aborted (core dumped) Thanks! FX
Re: Finding insns to reorder using dataflow
On 08/13/2015 05:06 AM, Kyrill Tkachov wrote: Hi all, I'm implementing a target-specific reorg pass, and one thing that I want to do is for a given insn in the stream to find an instruction in the stream that I can swap it with, without violating any dataflow dependencies. The candidate instruction could be earlier or later in the stream. I'm stuck on finding an approach to do this. It seems that using some of the dataflow infrastructure is the right way to go, but I can't figure out the details. can_move_insns_across looks like relevant, but it looks too heavyweight with quite a lot of arguments. I suppose somehow constructing regions of interchangeable instructions would be the way to go, but I'm not sure how clean/cheap that would be outside the scheduler Any ideas would be appreciated. I think you want all the dependency analysis done by the scheduler. Which leads to the question, can you model what you're trying to do in the various scheduler hooks -- in particular walking through the ready list seems appropriate. jeff
About loop unrolling and optimize for size
Hi I'm using an ARM thumb cross compiler for embedded systems and always do optimize for small size with -Os. Though I've experimented with optimization flags, and loop unrolling. Normally loop unrolling is always bad for size, code is duplicated and size increases. Though I discovered that in some special cases where the number of iteration is very small, eg a loop of 2-3 times, in this case an unrolling could make code size smaller - eg. losen up registers used for index in loops etc. Example when I use the flag "-fpeel-loops" together with -Os I will 99% of the cases get smaller code size for ARM thumb target. Some my question is how unrolling works with -Os, is it always totally disabled, or are there some cases when it could be tested, eg. with small number iterations, so loop can be eliminated? Could eg. "-fpeel-loops" be enabled by default for -Os perhaps? Now its only enabled for -O2 and above I think. Thanks and Best Regards Fredrik
Re: Using libbacktrace in libgfortran: some questions
On Thu, Aug 13, 2015 at 8:50 AM, FX wrote: >> And yet, that patch has absolutely nothing to do with libbacktrace. >> Hmmm. Let's try this one. > > Works perfectly with the patch: Patch tested and committed with this ChangeLog entry. 2015-08-13 Ian Lance Taylor * dwarf.c (read_function_entry): Add vec_inlined parameter. Change all callers. Index: dwarf.c === --- dwarf.c (revision 226846) +++ dwarf.c (working copy) @@ -2250,7 +2250,8 @@ read_function_entry (struct backtrace_st struct unit *u, uint64_t base, struct dwarf_buf *unit_buf, const struct line_header *lhdr, backtrace_error_callback error_callback, void *data, -struct function_vector *vec) +struct function_vector *vec_function, +struct function_vector *vec_inlined) { while (unit_buf->left > 0) { @@ -2258,6 +2259,7 @@ read_function_entry (struct backtrace_st const struct abbrev *abbrev; int is_function; struct function *function; + struct function_vector *vec; size_t i; uint64_t lowpc; int have_lowpc; @@ -2279,6 +2281,11 @@ read_function_entry (struct backtrace_st || abbrev->tag == DW_TAG_entry_point || abbrev->tag == DW_TAG_inlined_subroutine); + if (abbrev->tag == DW_TAG_inlined_subroutine) + vec = vec_inlined; + else + vec = vec_function; + function = NULL; if (is_function) { @@ -2458,7 +2465,8 @@ read_function_entry (struct backtrace_st if (!is_function) { if (!read_function_entry (state, ddata, u, base, unit_buf, lhdr, - error_callback, data, vec)) + error_callback, data, vec_function, + vec_inlined)) return 0; } else @@ -2471,7 +2479,8 @@ read_function_entry (struct backtrace_st memset (&fvec, 0, sizeof fvec); if (!read_function_entry (state, ddata, u, base, unit_buf, lhdr, - error_callback, data, &fvec)) + error_callback, data, vec_function, + &fvec)) return 0; if (fvec.count > 0) @@ -2535,7 +2544,7 @@ read_function_info (struct backtrace_sta while (unit_buf.left > 0) { if (!read_function_entry (state, ddata, u, 0, &unit_buf, lhdr, - error_callback, data, pfvec)) + error_callback, data, pfvec, pfvec)) return; }
[powerpc64le] seq_cst memory order possibly not honored
Hi, I'm having a problem with one of the Boost.Atomic tests on a PowerPC64 LE test platform. The test is running two threads which are looping code like this: Thread 1 Thread 2 [initially a == 0 && b == 0] a.store(1, seq_cst); b.store(1, seq_cst); a.load(relaxed); b.load(relaxed); x = b.load(relaxed); y = a.load(relaxed); On each iteration the test verifies that !(x == 0 && y == 0) and this check fails. As far as I can tell the test is valid and it indeed passes on x86. Boost.Atomic uses __atomic* intrinsics in both cases, so from C++ perspective the implementation is the same. I don't have the access to the tester that fails, so I can't tell the exact GCC version that is used there, only that it is labeled as 6.0. I did some experimenting on the version 4.9.2 that I have locally and found out that for code like this: __atomic_store_n(&n, 1, __ATOMIC_SEQ_CST); gcc generates this assembly: 1c: 01 00 40 39 li r10,1 20: ac 04 00 7c sync 24: 00 00 49 91 stw r10,0(r9) I would expect that for seq_cst there should be a second 'sync' right after the store, but it is absent. If 6.0 generates the similar code then this might explain the test failures I'm seeing. So my questions are: 1. Is my test valid or is there a flaw that I'm missing? 2. Am I correct about the missing 'sync' instruction? Thanks. PS: If you're interested, here's the full code snippet I compiled: int n = 0; int main() { __atomic_store_n(&n, 1, __ATOMIC_SEQ_CST); return n; } The command line was: powerpc64le-linux-gnu-g++ -g -O0 -o seq_cst_ppc64el.o -c ./seq_cst.cpp And here's the link to the original Boost.Atomic test: https://github.com/boostorg/atomic/blob/develop/test/ordering.cpp
RE: More of a Loop distribution.
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Thursday, August 13, 2015 3:23 PM To: Ajit Kumar Agarwal Cc: Jeff Law; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: More of a Loop distribution. On Thu, Aug 13, 2015 at 9:37 AM, Ajit Kumar Agarwal wrote: > All: > > Loop distribution considers DDG to decide on distributing the Loops. > The Loops with control statements like IF-THEN-ELSE can also be > Distributed. Instead of Data Dependency Graph, the Control Dependence Graph > should be considered in order to distribute the loops In presence of control > Statements. > > Also the presence of multiple exits in the Loop can also be considered for > Loop distribution transformation. > > Thus the above transformation helps in the Libquantum benchmarks for SPEC > 2006. > > There are following articles that looks interesting to me. > > "Loop Distribution in presence of arbitrarily control flow Ken Kennedy et.al." > "Loop Distribution in presence of Multiple Exits Bor-Ming Hsieh etal." > > > I don't think the loop distribution in presence of control flow is > implemented in GCC/LLVM. > > I think it is feasible to consider the above for the implementation in GCC. >>It's true that loop distribution does not try to distribute based on any >>control structure heuristics but it only considers data locality. It does >>however already >>compute the control dependence graph (and uses it to add >>control edges to the DDG to properly add data dependence edges to uses of >>control statements >>necessary in partitions). >>So it should be a matter of specifying the proper set of starting statements >>it tries separating. Thanks. >>Not sure which kind of distribution you are after, can you give an example? I would like to have a distribution of the loop having control flow. For example For (I = 2 ; I < N; i++) { If (condition) { S1: A[i] = ... S2:D[i] = A[i-1]... } } The above loop can be distributed with two loops having one loop with S1 inside IF and another loop with S2 with the IF. The two scenario can be true. 1. The condition inside IF have a check on A[i] and is dependent on S1. In this case the distribution is difficult. And the above article From Ken Kennedy et.al does store the partial results of comparison in an temporary array and that array is compared inside the IF Condition. This makes the loop distributed. This is what I was looking for which I found in the above article. 2. The condition inside the IF in the above loop is not dependent on the S1 and S2 , and this case the loop can be distributed. In the above two scenario the GCC can't distribute the loop, as the control dependency graph ( control structure ) is not used. The advantage of The above loop distribution makes the loop vectorizable which otherwise not possible due to presence of multiple statements inside the IF and Also may not be IF-converted due to presence of multiple statements. If we distribute the loop for the above two scenarios the individual loops in the distributed loop can be vectorized which is otherwise not possible. Thanks & Regards Ajit Richard. > Thanks & Regards > Ajit
RE: More of a Loop distribution.
On August 14, 2015 4:59:07 AM GMT+02:00, Ajit Kumar Agarwal wrote: > > >-Original Message- >From: Richard Biener [mailto:richard.guent...@gmail.com] >Sent: Thursday, August 13, 2015 3:23 PM >To: Ajit Kumar Agarwal >Cc: Jeff Law; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; >Vidhumouli Hunsigida; Nagaraju Mekala >Subject: Re: More of a Loop distribution. > >On Thu, Aug 13, 2015 at 9:37 AM, Ajit Kumar Agarwal > wrote: >> All: >> >> Loop distribution considers DDG to decide on distributing the Loops. >> The Loops with control statements like IF-THEN-ELSE can also be >> Distributed. Instead of Data Dependency Graph, the Control Dependence >Graph should be considered in order to distribute the loops In presence >of control Statements. >> >> Also the presence of multiple exits in the Loop can also be >considered for Loop distribution transformation. >> >> Thus the above transformation helps in the Libquantum benchmarks for >SPEC 2006. >> >> There are following articles that looks interesting to me. >> >> "Loop Distribution in presence of arbitrarily control flow Ken >Kennedy et.al." >> "Loop Distribution in presence of Multiple Exits Bor-Ming Hsieh >etal." >> >> >> I don't think the loop distribution in presence of control flow is >implemented in GCC/LLVM. >> >> I think it is feasible to consider the above for the implementation >in GCC. > >>>It's true that loop distribution does not try to distribute based on >any control structure heuristics but it only considers data locality. >It does however already >>compute the control dependence graph (and >uses it to add control edges to the DDG to properly add data dependence >edges to uses of control statements >>necessary in partitions). > >>>So it should be a matter of specifying the proper set of starting >statements it tries separating. > >Thanks. > >>>Not sure which kind of distribution you are after, can you give an >example? > >I would like to have a distribution of the loop having control flow. >For example > >For (I = 2 ; I < N; i++) >{ >If (condition) > { > S1: A[i] = ... > S2:D[i] = A[i-1]... > } >} > >The above loop can be distributed with two loops having one loop with >S1 inside IF and another loop with S2 with the IF. >The two scenario can be true. > >1. The condition inside IF have a check on A[i] and is dependent on S1. >In this case the distribution is difficult. And the above article >From Ken Kennedy et.al does store the partial results of comparison in >an temporary array and that array is compared inside the IF >Condition. This makes the loop distributed. This is what I was looking >for which I found in the above article. > >2. The condition inside the IF in the above loop is not dependent on >the S1 and S2 , and this case the loop can be distributed. > >In the above two scenario the GCC can't distribute the loop, as the >control dependency graph ( control structure ) is not used. The above loop can be distributed by gcc just fine. Richard. The >advantage of >The above loop distribution makes the loop vectorizable which otherwise >not possible due to presence of multiple statements inside the IF and >Also may not be IF-converted due to presence of multiple statements. If >we distribute the loop for the above two scenarios the individual loops > >in the distributed loop can be vectorized which is otherwise not >possible. > >Thanks & Regards >Ajit > > >Richard. > >> Thanks & Regards >> Ajit
RE: More of a Loop distribution.
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Friday, August 14, 2015 11:30 AM To: Ajit Kumar Agarwal Cc: Jeff Law; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: RE: More of a Loop distribution. On August 14, 2015 4:59:07 AM GMT+02:00, Ajit Kumar Agarwal wrote: > > >-Original Message- >From: Richard Biener [mailto:richard.guent...@gmail.com] >Sent: Thursday, August 13, 2015 3:23 PM >To: Ajit Kumar Agarwal >Cc: Jeff Law; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; >Vidhumouli Hunsigida; Nagaraju Mekala >Subject: Re: More of a Loop distribution. > >On Thu, Aug 13, 2015 at 9:37 AM, Ajit Kumar Agarwal > wrote: >> All: >> >> Loop distribution considers DDG to decide on distributing the Loops. >> The Loops with control statements like IF-THEN-ELSE can also be >> Distributed. Instead of Data Dependency Graph, the Control Dependence >Graph should be considered in order to distribute the loops In presence >of control Statements. >> >> Also the presence of multiple exits in the Loop can also be >considered for Loop distribution transformation. >> >> Thus the above transformation helps in the Libquantum benchmarks for >SPEC 2006. >> >> There are following articles that looks interesting to me. >> >> "Loop Distribution in presence of arbitrarily control flow Ken >Kennedy et.al." >> "Loop Distribution in presence of Multiple Exits Bor-Ming Hsieh >etal." >> >> >> I don't think the loop distribution in presence of control flow is >implemented in GCC/LLVM. >> >> I think it is feasible to consider the above for the implementation >in GCC. > >>>It's true that loop distribution does not try to distribute based on >any control structure heuristics but it only considers data locality. >It does however already >>compute the control dependence graph (and >uses it to add control edges to the DDG to properly add data dependence >edges to uses of control statements >>necessary in partitions). > >>>So it should be a matter of specifying the proper set of starting >statements it tries separating. > >Thanks. > >>>Not sure which kind of distribution you are after, can you give an >example? > >I would like to have a distribution of the loop having control flow. >For example > >For (I = 2 ; I < N; i++) >{ >If (condition) > { > S1: A[i] = ... > S2:D[i] = A[i-1]... > } >} > >The above loop can be distributed with two loops having one loop with >S1 inside IF and another loop with S2 with the IF. >The two scenario can be true. > >1. The condition inside IF have a check on A[i] and is dependent on S1. >In this case the distribution is difficult. And the above article From >Ken Kennedy et.al does store the partial results of comparison in an >temporary array and that array is compared inside the IF Condition. >This makes the loop distributed. This is what I was looking for which I >found in the above article. > >2. The condition inside the IF in the above loop is not dependent on >the S1 and S2 , and this case the loop can be distributed. > >In the above two scenario the GCC can't distribute the loop, as the >control dependency graph ( control structure ) is not used. >>The above loop can be distributed by gcc just fine. Existing loop distribution implementation in GCC distributes the above loop in both the above scenario's? Thanks & Regards Ajit Richard. The >advantage of >The above loop distribution makes the loop vectorizable which otherwise >not possible due to presence of multiple statements inside the IF and >Also may not be IF-converted due to presence of multiple statements. If >we distribute the loop for the above two scenarios the individual loops > >in the distributed loop can be vectorized which is otherwise not >possible. > >Thanks & Regards >Ajit > > >Richard. > >> Thanks & Regards >> Ajit