More of a Loop distribution.

2015-08-13 Thread Ajit Kumar Agarwal
All:

Loop distribution considers DDG to decide on distributing the Loops. The Loops 
with control statements like IF-THEN-ELSE can also be 
Distributed. Instead of Data Dependency Graph, the Control Dependence Graph 
should be considered in order to distribute the loops
In presence of control Statements.

Also the presence of multiple exits in the Loop can also be considered for Loop 
distribution transformation.

Thus the above transformation helps in the Libquantum benchmarks for SPEC 2006.

There are following articles that looks interesting to me.

"Loop Distribution in presence of arbitrarily control flow Ken Kennedy et.al."
"Loop Distribution in presence of Multiple Exits Bor-Ming Hsieh etal."


I don't think the loop distribution in presence of control flow is implemented 
in GCC/LLVM.

I think it is feasible to consider the above for the implementation in GCC.

Thanks & Regards
Ajit


Re: More of a Loop distribution.

2015-08-13 Thread Richard Biener
On Thu, Aug 13, 2015 at 9:37 AM, Ajit Kumar Agarwal
 wrote:
> All:
>
> Loop distribution considers DDG to decide on distributing the Loops. The 
> Loops with control statements like IF-THEN-ELSE can also be
> Distributed. Instead of Data Dependency Graph, the Control Dependence Graph 
> should be considered in order to distribute the loops
> In presence of control Statements.
>
> Also the presence of multiple exits in the Loop can also be considered for 
> Loop distribution transformation.
>
> Thus the above transformation helps in the Libquantum benchmarks for SPEC 
> 2006.
>
> There are following articles that looks interesting to me.
>
> "Loop Distribution in presence of arbitrarily control flow Ken Kennedy et.al."
> "Loop Distribution in presence of Multiple Exits Bor-Ming Hsieh etal."
>
>
> I don't think the loop distribution in presence of control flow is 
> implemented in GCC/LLVM.
>
> I think it is feasible to consider the above for the implementation in GCC.

It's true that loop distribution does not try to distribute based on
any control structure heuristics
but it only considers data locality.  It does however already compute
the control dependence graph
(and uses it to add control edges to the DDG to properly add data
dependence edges to uses of
control statements necessary in partitions).

So it should be a matter of specifying the proper set of starting
statements it tries separating.

Not sure which kind of distribution you are after, can you give an example?

Richard.

> Thanks & Regards
> Ajit


Re: Results from SPEC2006 FP analysis done at Richard`s request {late July / early August}

2015-08-13 Thread Richard Biener
On Thu, Aug 13, 2015 at 3:32 AM, Abe  wrote:
> Dear all,
>
> Overall, I think the WIP new if converter is holding up
> relatively well, but there is obviously opportunity to do better,
> at least if the numbers mean what they look like they mean,
> i.e. the old converter`s code was fully OK and so is the new one`s.
> By "fully OK" I mean e.g. no crashing bugs were introduced by the
> conversion.
>
>
> In the following, all the integers over 1000 are loops-vectorized counts.
>
>
> "base": baseline compiler source code
> Git hash: cb791e75379bc0c8b10bd13bcb24305c36fd504b
> commit date: July 10 2015
> committer: Richard
>
> "new": base + patches for new [GIMPLE-level] if converter
>
>
>
> -O3
> ===
>
> no special flags
> 
> base: 5951
> new:  5956
>
> with only "-ftree-loop-if-convert" added

That is -ftree-loop-if-convert-stores?

> 
> base: 5954
> new:  5956
>
> with both if-conversion flags added

What's the other if-conversion flag?  I suppose _this_ is
-ftree-loop-if-convert-stores?  That would match the numbers
above which are mostly identical because -O3/-Ofast already
enable -ftree-loop-if-convert by means of enabling vectorization.

> ---
> base: 5970
> new:  5956
>
>
>
> -Ofast
> ==
>
> no special flags
> 
> base: 7393
> new:  7401
>
> with only "-ftree-loop-if-convert" added
> 
> base: 7393
> new:  7401
>
> with both if-conversion flags added
> ---
> base: 7421
> new:  7401
>

Can you please post individual benchmark numbers instead of just the
overall score?

>From the numbers above I can see the new if-converter removes any improvement
we get from -ftree-loop-if-convert-stores (as expected - it's not a
vectorization enabler
with the new scheme).

Thanks,
Richard.

>
> I have a spreadsheet [and a PDF generated therefrom] that shows the above in
> a
> more visual format.  Please feel free to ask for the PDF as an email
> attachment.
>
>
> Regards,
>
> Abe
>
>


Finding insns to reorder using dataflow

2015-08-13 Thread Kyrill Tkachov

Hi all,

I'm implementing a target-specific reorg pass, and one thing that I want to do
is for a given insn in the stream to find an instruction
in the stream that I can swap it with, without violating any dataflow 
dependencies.
The candidate instruction could be earlier or later in the stream.

I'm stuck on finding an approach to do this. It seems that using some of the 
dataflow
infrastructure is the right way to go, but I can't figure out the details.
can_move_insns_across looks like relevant, but it looks too heavyweight with 
quite a lot
of arguments.

I suppose somehow constructing regions of interchangeable instructions would be 
the way
to go, but I'm not sure how clean/cheap that would be outside the scheduler

Any ideas would be appreciated.

Thanks,
Kyrill



Re: Using libbacktrace in libgfortran: some questions

2015-08-13 Thread Ian Lance Taylor
FX  writes:

> 1. It appears that even on platforms with BACKTRACE_SUPPORTED == 0
> (such as x86_64-apple-darwin), libbacktrace is built and able to
> perform a nonsymbolic backtrace (which appears accurate). Is that a
> feature? Can I rely on it?

Yes, that is a feature.  You should always get accurate PC values even
on systems where libbacktrace does not yet generate file/line
information.


> 2. The backtraces I get on x86_64-linux-gnu are missing symbols. The
> attached source file, compiled with “gfortran -g” with the attached
> patch, gives the following backtrace with libgfortran’s existing code,
> which uses unwind and calls to addr2line:
>
>> #0  0x7F4F6E333467
>> #1  0x7F4F6E334C42
>> #2  0x7F4F6E409308
>> #3  0x4008A3 in bar at a.f90:9
>> #4  0x4008C8 in foo at a.f90:5
>> #5  0x4008AF in test at a.f90:2
>
>
> with my patch using libbacktrace, I get:
>
>> 0x7f04f00f8c7d _gfortrani_show_backtrace
>>  ../../../trunk/libgfortran/runtime/backtrace.c:112
>> 0x7f04f00f9ac4 _gfortrani_sys_abort
>>  ../../../trunk/libgfortran/runtime/error.c:176
>> 0x7f04f01c8c78 _gfortran_abort
>>  ../../../trunk/libgfortran/intrinsics/abort.c:33
>> 0x4008a3 ???
>>  /home/fx/gcc/irun/a.f90:9
>> 0x4008c8 ???
>>  /home/fx/gcc/irun/a.f90:5
>> 0x4008af test
>>  /home/fx/gcc/irun/a.f90:2
>> 0x4008ff main
>>  /home/fx/gcc/irun/a.f90:2
>
>
> where the symbols for foo() and bar() are apparently not found, though
> the source location is. Am I missing something here? I’m attaching the
> output of “dwarfdump a.out” and the a.out executable file itself
> (gzipped).

I don't know why this is not working.  Everything looks fine in the
a.out that you sent.  Unfortunately, I think you sent the one built
without libbacktrace.  Can you send me the one built with libbacktrace?
Thanks.

Ian


Re: Using libbacktrace in libgfortran: some questions

2015-08-13 Thread Janne Blomqvist
Resending in text/plain, sorry for any extra spam...

On Thu, Aug 13, 2015 at 4:44 PM, Ian Lance Taylor  wrote:
>
> FX  writes:
>
> > 1. It appears that even on platforms with BACKTRACE_SUPPORTED == 0
> > (such as x86_64-apple-darwin), libbacktrace is built and able to
> > perform a nonsymbolic backtrace (which appears accurate). Is that a
> > feature? Can I rely on it?
>
> Yes, that is a feature.  You should always get accurate PC values even
> on systems where libbacktrace does not yet generate file/line
> information.
>
>
> > 2. The backtraces I get on x86_64-linux-gnu are missing symbols. The
> > attached source file, compiled with “gfortran -g” with the attached
> > patch, gives the following backtrace with libgfortran’s existing code,
> > which uses unwind and calls to addr2line:
> >
> >> #0  0x7F4F6E333467
> >> #1  0x7F4F6E334C42
> >> #2  0x7F4F6E409308
> >> #3  0x4008A3 in bar at a.f90:9
> >> #4  0x4008C8 in foo at a.f90:5
> >> #5  0x4008AF in test at a.f90:2

Yes, the current implementation cannot resolve addresses from dynamic
libraries, since those are loaded at a random offset and addr2line
looks at the binary on disk and not the process image in memory. A
workaround is to compile with "-g -static".

> >
> >
> > with my patch using libbacktrace, I get:
> >
> >> 0x7f04f00f8c7d _gfortrani_show_backtrace
> >>  ../../../trunk/libgfortran/runtime/backtrace.c:112
> >> 0x7f04f00f9ac4 _gfortrani_sys_abort
> >>  ../../../trunk/libgfortran/runtime/error.c:176
> >> 0x7f04f01c8c78 _gfortran_abort
> >>  ../../../trunk/libgfortran/intrinsics/abort.c:33
> >> 0x4008a3 ???
> >>  /home/fx/gcc/irun/a.f90:9
> >> 0x4008c8 ???
> >>  /home/fx/gcc/irun/a.f90:5
> >> 0x4008af test
> >>  /home/fx/gcc/irun/a.f90:2
> >> 0x4008ff main
> >>  /home/fx/gcc/irun/a.f90:2
> >
> >
> > where the symbols for foo() and bar() are apparently not found, though
> > the source location is. Am I missing something here? I’m attaching the
> > output of “dwarfdump a.out” and the a.out executable file itself
> > (gzipped).
>
> I don't know why this is not working.  Everything looks fine in the
> a.out that you sent.  Unfortunately, I think you sent the one built
> without libbacktrace.  Can you send me the one built with libbacktrace?
> Thanks.
>
> Ian

You might also take a look at the patch posted to PR 54572 which was
my attempt to use libbacktrace a few years ago. While I got symbolic
backtraces working somewhat, unfortunately I never got it to work
completely since it crashed somewhere in libbacktrace in some cases,
but maybe whatever bugs caused that have been fixed in the meantime...


-- 
Janne Blomqvist


Re: Using libbacktrace in libgfortran: some questions

2015-08-13 Thread FX
> You should always get accurate PC values even
> on systems where libbacktrace does not yet generate file/line
> information.

Cool! We’ll be able to use it unconditionaly with all targets, which is very 
nice.


> I don't know why this is not working.  Everything looks fine in the
> a.out that you sent.  Unfortunately, I think you sent the one built
> without libbacktrace.  Can you send me the one built with libbacktrace?

Attached is the a.out with libgfortran (and thus libbacktrace) linked in 
statically. If this isn’t sufficient, I can send any file necessary (including 
the whole tree if need be).

Thanks for helping with this,
FX


Re: Using libbacktrace in libgfortran: some questions

2015-08-13 Thread FX
> You might also take a look at the patch posted to PR 54572 which was my 
> attempt to use libbacktrace a few years ago. While I got symbolic backtraces 
> working somewhat, unfortunately I never got it to work completely since it 
> crashed somewhere in libbacktrace in some cases, but maybe whatever bugs 
> caused that have been fixed in the meantime…

I definitely did start from your patch at PR 54572!

libbacktrace definitely has improved, and now supports pecoff targets (i.e. 
Windows, I think). If we can get rid of the last few hurdles, then it will be a 
perfect solution for libgfortran, given it is already used in the compiler 
itself (and thus well-maintained).

Obviously, the major target for which support is missing is Darwin (Mach-O 
object files). I have looked at implementing it, but it is well beyond my 
simple understanding of object files’ inner working :(

FX

Re: Using libbacktrace in libgfortran: some questions

2015-08-13 Thread Ian Lance Taylor
On Thu, Aug 13, 2015 at 7:11 AM, FX  wrote:
>
>> I don't know why this is not working.  Everything looks fine in the
>> a.out that you sent.  Unfortunately, I think you sent the one built
>> without libbacktrace.  Can you send me the one built with libbacktrace?
>
> Attached is the a.out with libgfortran (and thus libbacktrace) linked in 
> statically. If this isn’t sufficient, I can send any file necessary 
> (including the whole tree if need be).

Thanks.  The problem seems to be that gfortran is generating DWARF
info that looks like this:

subprogram test
  subprogram foo
  subprogram bar

libbacktrace does not expect to see this structure, and it thinks that
foo and bar have been inlined within test, which is not the case.

Please try this patch to libbacktrace and see if it helps.

Ian
diff -r 131549887d7c src/cmd/5g/cgen.c
--- a/src/cmd/5g/cgen.c Mon Apr 15 11:50:14 2013 -0700
+++ b/src/cmd/5g/cgen.c Tue Apr 23 16:45:14 2013 -0700
@@ -679,6 +679,19 @@
 
case ODOT:
agen(nl, res);
+   // explicit check for nil if struct is large enough
+   // that we might derive too big a pointer.
+   if(nl->type->width >= unmappedzero) {
+   regalloc(&n1, types[tptr], N);
+   gmove(res, &n1);
+   regalloc(&n2, types[TUINT8], &n1);
+   n1.op = OINDREG;
+   n1.type = types[TUINT8];
+   n1.xoffset = 0;
+   gmove(&n1, &n2);
+   regfree(&n1);
+   regfree(&n2);
+   }
if(n->xoffset != 0) {
nodconst(&n1, types[TINT32], n->xoffset);
regalloc(&n2, n1.type, N);
@@ -694,20 +707,20 @@
 
case ODOTPTR:
cgen(nl, res);
+   // explicit check for nil if struct is large enough
+   // that we might derive too big a pointer.
+   if(nl->type->type->width >= unmappedzero) {
+   regalloc(&n1, types[tptr], N);
+   gmove(res, &n1);
+   regalloc(&n2, types[TUINT8], &n1);
+   n1.op = OINDREG;
+   n1.type = types[TUINT8];
+   n1.xoffset = 0;
+   gmove(&n1, &n2);
+   regfree(&n1);
+   regfree(&n2);
+   }
if(n->xoffset != 0) {
-   // explicit check for nil if struct is large enough
-   // that we might derive too big a pointer.
-   if(nl->type->type->width >= unmappedzero) {
-   regalloc(&n1, types[tptr], N);
-   gmove(res, &n1);
-   regalloc(&n2, types[TUINT8], &n1);
-   n1.op = OINDREG;
-   n1.type = types[TUINT8];
-   n1.xoffset = 0;
-   gmove(&n1, &n2);
-   regfree(&n1);
-   regfree(&n2);
-   }
nodconst(&n1, types[TINT32], n->xoffset);
regalloc(&n2, n1.type, N);
regalloc(&n3, types[tptr], N);
@@ -759,6 +772,19 @@
 
case ODOT:
igen(n->left, a, res);
+   // explicit check for nil if struct is large enough
+   // that we might derive too big a pointer.
+   if(0 && n->left->type->width >= unmappedzero) {
+   regalloc(&n1, types[tptr], N);
+   gmove(a, &n1);
+   regalloc(&n2, types[TUINT8], &n1);
+   n1.op = OINDREG;
+   n1.type = types[TUINT8];
+   n1.xoffset = 0;
+   gmove(&n1, &n2);
+   regfree(&n1);
+   regfree(&n2);
+   }
a->xoffset += n->xoffset;
a->type = n->type;
return;
@@ -777,20 +803,18 @@
regalloc(a, types[tptr], res);
cgen(n->left, a);
}
-   if(n->xoffset != 0) {
-   // explicit check for nil if struct is large enough
-   // that we might derive too big a pointer.
-   if(n->left->type->type->width >= unmappedzero) {
-   regalloc(&n1, types[tptr], N);
-   gmove(a, &n1);
-   regalloc(&n2, types[TUINT8], &n1);
-   n1.op = OINDREG;
-   n1.type = types[TUINT8];
-   n1.xoffset = 0;
-   gmove(&n1, &n2);
-   regfree(&n1);
- 

Re: Using libbacktrace in libgfortran: some questions

2015-08-13 Thread Ian Lance Taylor
On Thu, Aug 13, 2015 at 8:25 AM, Ian Lance Taylor  wrote:
> On Thu, Aug 13, 2015 at 7:11 AM, FX  wrote:
>>
>>> I don't know why this is not working.  Everything looks fine in the
>>> a.out that you sent.  Unfortunately, I think you sent the one built
>>> without libbacktrace.  Can you send me the one built with libbacktrace?
>>
>> Attached is the a.out with libgfortran (and thus libbacktrace) linked in 
>> statically. If this isn’t sufficient, I can send any file necessary 
>> (including the whole tree if need be).
>
> Thanks.  The problem seems to be that gfortran is generating DWARF
> info that looks like this:
>
> subprogram test
>   subprogram foo
>   subprogram bar
>
> libbacktrace does not expect to see this structure, and it thinks that
> foo and bar have been inlined within test, which is not the case.
>
> Please try this patch to libbacktrace and see if it helps.

And yet, that patch has absolutely nothing to do with libbacktrace.
Hmmm.  Let's try this one.

Ian
Index: dwarf.c
===
--- dwarf.c (revision 226846)
+++ dwarf.c (working copy)
@@ -2250,7 +2250,8 @@ read_function_entry (struct backtrace_st
 struct unit *u, uint64_t base, struct dwarf_buf *unit_buf,
 const struct line_header *lhdr,
 backtrace_error_callback error_callback, void *data,
-struct function_vector *vec)
+struct function_vector *vec_function,
+struct function_vector *vec_inlined)
 {
   while (unit_buf->left > 0)
 {
@@ -2258,6 +2259,7 @@ read_function_entry (struct backtrace_st
   const struct abbrev *abbrev;
   int is_function;
   struct function *function;
+  struct function_vector *vec;
   size_t i;
   uint64_t lowpc;
   int have_lowpc;
@@ -2279,6 +2281,11 @@ read_function_entry (struct backtrace_st
 || abbrev->tag == DW_TAG_entry_point
 || abbrev->tag == DW_TAG_inlined_subroutine);
 
+  if (abbrev->tag == DW_TAG_inlined_subroutine)
+   vec = vec_inlined;
+  else
+   vec = vec_function;
+
   function = NULL;
   if (is_function)
{
@@ -2458,7 +2465,8 @@ read_function_entry (struct backtrace_st
  if (!is_function)
{
  if (!read_function_entry (state, ddata, u, base, unit_buf, lhdr,
-   error_callback, data, vec))
+   error_callback, data, vec_function,
+   vec_inlined))
return 0;
}
  else
@@ -2471,7 +2479,8 @@ read_function_entry (struct backtrace_st
  memset (&fvec, 0, sizeof fvec);
 
  if (!read_function_entry (state, ddata, u, base, unit_buf, lhdr,
-   error_callback, data, &fvec))
+   error_callback, data, vec_function,
+   &fvec))
return 0;
 
  if (fvec.count > 0)
@@ -2535,7 +2544,7 @@ read_function_info (struct backtrace_sta
   while (unit_buf.left > 0)
 {
   if (!read_function_entry (state, ddata, u, 0, &unit_buf, lhdr,
-   error_callback, data, pfvec))
+   error_callback, data, pfvec, pfvec))
return;
 }
 


Re: Using libbacktrace in libgfortran: some questions

2015-08-13 Thread FX
> And yet, that patch has absolutely nothing to do with libbacktrace.
> Hmmm.  Let's try this one.

Works perfectly with the patch:

Program aborted. Backtrace:
#0  0xf75e5b9b _gfortrani_show_backtrace
../../../../trunk/libgfortran/runtime/backtrace.c:113
#1  0xf75e6aa7 _gfortrani_sys_abort
../../../../trunk/libgfortran/runtime/error.c:176
#2  0xf769a7a7 _gfortran_abort
../../../../trunk/libgfortran/intrinsics/abort.c:33
#3  0x80486e4 bar
/home/fx/gcc/irun/a.f90:9
#4  0x8048706 foo
/home/fx/gcc/irun/a.f90:5
#5  0x80486f2 test
/home/fx/gcc/irun/a.f90:2
#6  0x8048743 main
/home/fx/gcc/irun/a.f90:2
Aborted (core dumped)


Thanks!

FX


Re: Finding insns to reorder using dataflow

2015-08-13 Thread Jeff Law

On 08/13/2015 05:06 AM, Kyrill Tkachov wrote:

Hi all,

I'm implementing a target-specific reorg pass, and one thing that I want
to do
is for a given insn in the stream to find an instruction
in the stream that I can swap it with, without violating any dataflow
dependencies.
The candidate instruction could be earlier or later in the stream.

I'm stuck on finding an approach to do this. It seems that using some of
the dataflow
infrastructure is the right way to go, but I can't figure out the details.
can_move_insns_across looks like relevant, but it looks too heavyweight
with quite a lot
of arguments.

I suppose somehow constructing regions of interchangeable instructions
would be the way
to go, but I'm not sure how clean/cheap that would be outside the scheduler

Any ideas would be appreciated.

I think you want all the dependency analysis done by the scheduler.

Which leads to the question, can you model what you're trying to do in 
the various scheduler hooks -- in particular walking through the ready 
list seems appropriate.


jeff


About loop unrolling and optimize for size

2015-08-13 Thread sa...@hederstierna.com
Hi
I'm using an ARM thumb cross compiler for embedded systems and always do 
optimize for small size with -Os.

Though I've experimented with optimization flags, and loop unrolling.

Normally loop unrolling is always bad for size, code is duplicated and size 
increases.

Though I discovered that in some special cases where the number of iteration is 
very small, eg a loop of 2-3 times,
in this case an unrolling could make code size smaller - eg. losen up registers 
used for index in loops etc.

Example when I use the flag "-fpeel-loops" together with -Os I will 99% of the 
cases get smaller code size for ARM thumb target.

Some my question is how unrolling works with -Os, is it always totally disabled,
or are there some cases when it could be tested, eg. with small number 
iterations, so loop can be eliminated?

Could eg. "-fpeel-loops" be enabled by default for -Os perhaps? Now its only 
enabled for -O2 and above I think.

Thanks and Best Regards
Fredrik


Re: Using libbacktrace in libgfortran: some questions

2015-08-13 Thread Ian Lance Taylor
On Thu, Aug 13, 2015 at 8:50 AM, FX  wrote:
>> And yet, that patch has absolutely nothing to do with libbacktrace.
>> Hmmm.  Let's try this one.
>
> Works perfectly with the patch:

Patch tested and committed with this ChangeLog entry.

2015-08-13  Ian Lance Taylor  

* dwarf.c (read_function_entry): Add vec_inlined parameter.
Change all callers.
Index: dwarf.c
===
--- dwarf.c (revision 226846)
+++ dwarf.c (working copy)
@@ -2250,7 +2250,8 @@ read_function_entry (struct backtrace_st
 struct unit *u, uint64_t base, struct dwarf_buf *unit_buf,
 const struct line_header *lhdr,
 backtrace_error_callback error_callback, void *data,
-struct function_vector *vec)
+struct function_vector *vec_function,
+struct function_vector *vec_inlined)
 {
   while (unit_buf->left > 0)
 {
@@ -2258,6 +2259,7 @@ read_function_entry (struct backtrace_st
   const struct abbrev *abbrev;
   int is_function;
   struct function *function;
+  struct function_vector *vec;
   size_t i;
   uint64_t lowpc;
   int have_lowpc;
@@ -2279,6 +2281,11 @@ read_function_entry (struct backtrace_st
 || abbrev->tag == DW_TAG_entry_point
 || abbrev->tag == DW_TAG_inlined_subroutine);
 
+  if (abbrev->tag == DW_TAG_inlined_subroutine)
+   vec = vec_inlined;
+  else
+   vec = vec_function;
+
   function = NULL;
   if (is_function)
{
@@ -2458,7 +2465,8 @@ read_function_entry (struct backtrace_st
  if (!is_function)
{
  if (!read_function_entry (state, ddata, u, base, unit_buf, lhdr,
-   error_callback, data, vec))
+   error_callback, data, vec_function,
+   vec_inlined))
return 0;
}
  else
@@ -2471,7 +2479,8 @@ read_function_entry (struct backtrace_st
  memset (&fvec, 0, sizeof fvec);
 
  if (!read_function_entry (state, ddata, u, base, unit_buf, lhdr,
-   error_callback, data, &fvec))
+   error_callback, data, vec_function,
+   &fvec))
return 0;
 
  if (fvec.count > 0)
@@ -2535,7 +2544,7 @@ read_function_info (struct backtrace_sta
   while (unit_buf.left > 0)
 {
   if (!read_function_entry (state, ddata, u, 0, &unit_buf, lhdr,
-   error_callback, data, pfvec))
+   error_callback, data, pfvec, pfvec))
return;
 }
 


[powerpc64le] seq_cst memory order possibly not honored

2015-08-13 Thread Andrey Semashev

Hi,

I'm having a problem with one of the Boost.Atomic tests on a PowerPC64 
LE test platform. The test is running two threads which are looping code 
like this:


  Thread 1   Thread 2
 [initially a == 0 && b == 0]
  a.store(1, seq_cst);   b.store(1, seq_cst);
  a.load(relaxed);   b.load(relaxed);
  x = b.load(relaxed);   y = a.load(relaxed);

On each iteration the test verifies that !(x == 0 && y == 0) and this 
check fails. As far as I can tell the test is valid and it indeed passes 
on x86. Boost.Atomic uses __atomic* intrinsics in both cases, so from 
C++ perspective the implementation is the same.


I don't have the access to the tester that fails, so I can't tell the 
exact GCC version that is used there, only that it is labeled as 6.0. I 
did some experimenting on the version 4.9.2 that I have locally and 
found out that for code like this:


  __atomic_store_n(&n, 1, __ATOMIC_SEQ_CST);

gcc generates this assembly:

  1c:   01 00 40 39 li  r10,1
  20:   ac 04 00 7c sync
  24:   00 00 49 91 stw r10,0(r9)

I would expect that for seq_cst there should be a second 'sync' right 
after the store, but it is absent. If 6.0 generates the similar code 
then this might explain the test failures I'm seeing.


So my questions are:

1. Is my test valid or is there a flaw that I'm missing?
2. Am I correct about the missing 'sync' instruction?

Thanks.

PS: If you're interested, here's the full code snippet I compiled:

  int n = 0;

  int main()
  {
__atomic_store_n(&n, 1, __ATOMIC_SEQ_CST);
return n;
  }

The command line was:

  powerpc64le-linux-gnu-g++ -g -O0 -o seq_cst_ppc64el.o -c ./seq_cst.cpp

And here's the link to the original Boost.Atomic test:

https://github.com/boostorg/atomic/blob/develop/test/ordering.cpp


RE: More of a Loop distribution.

2015-08-13 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Thursday, August 13, 2015 3:23 PM
To: Ajit Kumar Agarwal
Cc: Jeff Law; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: More of a Loop distribution.

On Thu, Aug 13, 2015 at 9:37 AM, Ajit Kumar Agarwal 
 wrote:
> All:
>
> Loop distribution considers DDG to decide on distributing the Loops. 
> The Loops with control statements like IF-THEN-ELSE can also be 
> Distributed. Instead of Data Dependency Graph, the Control Dependence Graph 
> should be considered in order to distribute the loops In presence of control 
> Statements.
>
> Also the presence of multiple exits in the Loop can also be considered for 
> Loop distribution transformation.
>
> Thus the above transformation helps in the Libquantum benchmarks for SPEC 
> 2006.
>
> There are following articles that looks interesting to me.
>
> "Loop Distribution in presence of arbitrarily control flow Ken Kennedy et.al."
> "Loop Distribution in presence of Multiple Exits Bor-Ming Hsieh etal."
>
>
> I don't think the loop distribution in presence of control flow is 
> implemented in GCC/LLVM.
>
> I think it is feasible to consider the above for the implementation in GCC.

>>It's true that loop distribution does not try to distribute based on any 
>>control structure heuristics but it only considers data locality.  It does 
>>however already >>compute the control dependence graph (and uses it to add 
>>control edges to the DDG to properly add data dependence edges to uses of 
>>control statements >>necessary in partitions).

>>So it should be a matter of specifying the proper set of starting statements 
>>it tries separating.

Thanks.

>>Not sure which kind of distribution you are after, can you give an example?

I would like to have a distribution of the loop having control flow. For example

For (I = 2 ; I < N; i++)
{
If  (condition)
   {
  S1: A[i] = ...
  S2:D[i] = A[i-1]...
   }
}

The above loop can be distributed with two loops having one loop with S1  
inside IF and another loop with S2 with the IF.
The two scenario can be true.

1. The condition inside IF have a check on A[i] and is dependent on S1. In this 
case the distribution is difficult. And the above article
From Ken Kennedy et.al does store the partial results of comparison in an 
temporary array and that array is compared inside the IF
Condition. This makes the loop distributed. This is what I was looking for 
which I found in the above article.

2. The condition inside the IF in the above loop is not dependent on the S1 and 
S2 , and this case the loop can be distributed.

In the above two scenario the GCC can't distribute the loop, as the control 
dependency graph ( control structure ) is not used. The advantage of
The above loop distribution makes the loop vectorizable which otherwise not 
possible due to presence of multiple statements inside the IF and
Also may not be IF-converted due to presence of multiple statements. If we 
distribute the loop for the above two scenarios the individual loops 
in the distributed loop can be vectorized which is otherwise not possible.

Thanks & Regards
Ajit


Richard.

> Thanks & Regards
> Ajit


RE: More of a Loop distribution.

2015-08-13 Thread Richard Biener
On August 14, 2015 4:59:07 AM GMT+02:00, Ajit Kumar Agarwal 
 wrote:
>
>
>-Original Message-
>From: Richard Biener [mailto:richard.guent...@gmail.com] 
>Sent: Thursday, August 13, 2015 3:23 PM
>To: Ajit Kumar Agarwal
>Cc: Jeff Law; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta;
>Vidhumouli Hunsigida; Nagaraju Mekala
>Subject: Re: More of a Loop distribution.
>
>On Thu, Aug 13, 2015 at 9:37 AM, Ajit Kumar Agarwal
> wrote:
>> All:
>>
>> Loop distribution considers DDG to decide on distributing the Loops. 
>> The Loops with control statements like IF-THEN-ELSE can also be 
>> Distributed. Instead of Data Dependency Graph, the Control Dependence
>Graph should be considered in order to distribute the loops In presence
>of control Statements.
>>
>> Also the presence of multiple exits in the Loop can also be
>considered for Loop distribution transformation.
>>
>> Thus the above transformation helps in the Libquantum benchmarks for
>SPEC 2006.
>>
>> There are following articles that looks interesting to me.
>>
>> "Loop Distribution in presence of arbitrarily control flow Ken
>Kennedy et.al."
>> "Loop Distribution in presence of Multiple Exits Bor-Ming Hsieh
>etal."
>>
>>
>> I don't think the loop distribution in presence of control flow is
>implemented in GCC/LLVM.
>>
>> I think it is feasible to consider the above for the implementation
>in GCC.
>
>>>It's true that loop distribution does not try to distribute based on
>any control structure heuristics but it only considers data locality. 
>It does however already >>compute the control dependence graph (and
>uses it to add control edges to the DDG to properly add data dependence
>edges to uses of control statements >>necessary in partitions).
>
>>>So it should be a matter of specifying the proper set of starting
>statements it tries separating.
>
>Thanks.
>
>>>Not sure which kind of distribution you are after, can you give an
>example?
>
>I would like to have a distribution of the loop having control flow.
>For example
>
>For (I = 2 ; I < N; i++)
>{
>If  (condition)
>   {
>  S1: A[i] = ...
>  S2:D[i] = A[i-1]...
>   }
>}
>
>The above loop can be distributed with two loops having one loop with
>S1  inside IF and another loop with S2 with the IF.
>The two scenario can be true.
>
>1. The condition inside IF have a check on A[i] and is dependent on S1.
>In this case the distribution is difficult. And the above article
>From Ken Kennedy et.al does store the partial results of comparison in
>an temporary array and that array is compared inside the IF
>Condition. This makes the loop distributed. This is what I was looking
>for which I found in the above article.
>
>2. The condition inside the IF in the above loop is not dependent on
>the S1 and S2 , and this case the loop can be distributed.
>
>In the above two scenario the GCC can't distribute the loop, as the
>control dependency graph ( control structure ) is not used.

The above loop can be distributed by gcc just fine.

Richard.

 The
>advantage of
>The above loop distribution makes the loop vectorizable which otherwise
>not possible due to presence of multiple statements inside the IF and
>Also may not be IF-converted due to presence of multiple statements. If
>we distribute the loop for the above two scenarios the individual loops
>
>in the distributed loop can be vectorized which is otherwise not
>possible.
>
>Thanks & Regards
>Ajit
>
>
>Richard.
>
>> Thanks & Regards
>> Ajit




RE: More of a Loop distribution.

2015-08-13 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Friday, August 14, 2015 11:30 AM
To: Ajit Kumar Agarwal
Cc: Jeff Law; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: RE: More of a Loop distribution.

On August 14, 2015 4:59:07 AM GMT+02:00, Ajit Kumar Agarwal 
 wrote:
>
>
>-Original Message-
>From: Richard Biener [mailto:richard.guent...@gmail.com]
>Sent: Thursday, August 13, 2015 3:23 PM
>To: Ajit Kumar Agarwal
>Cc: Jeff Law; gcc@gcc.gnu.org; Vinod Kathail; Shail Aditya Gupta; 
>Vidhumouli Hunsigida; Nagaraju Mekala
>Subject: Re: More of a Loop distribution.
>
>On Thu, Aug 13, 2015 at 9:37 AM, Ajit Kumar Agarwal 
> wrote:
>> All:
>>
>> Loop distribution considers DDG to decide on distributing the Loops. 
>> The Loops with control statements like IF-THEN-ELSE can also be 
>> Distributed. Instead of Data Dependency Graph, the Control Dependence
>Graph should be considered in order to distribute the loops In presence 
>of control Statements.
>>
>> Also the presence of multiple exits in the Loop can also be
>considered for Loop distribution transformation.
>>
>> Thus the above transformation helps in the Libquantum benchmarks for
>SPEC 2006.
>>
>> There are following articles that looks interesting to me.
>>
>> "Loop Distribution in presence of arbitrarily control flow Ken
>Kennedy et.al."
>> "Loop Distribution in presence of Multiple Exits Bor-Ming Hsieh
>etal."
>>
>>
>> I don't think the loop distribution in presence of control flow is
>implemented in GCC/LLVM.
>>
>> I think it is feasible to consider the above for the implementation
>in GCC.
>
>>>It's true that loop distribution does not try to distribute based on
>any control structure heuristics but it only considers data locality. 
>It does however already >>compute the control dependence graph (and 
>uses it to add control edges to the DDG to properly add data dependence 
>edges to uses of control statements >>necessary in partitions).
>
>>>So it should be a matter of specifying the proper set of starting
>statements it tries separating.
>
>Thanks.
>
>>>Not sure which kind of distribution you are after, can you give an
>example?
>
>I would like to have a distribution of the loop having control flow.
>For example
>
>For (I = 2 ; I < N; i++)
>{
>If  (condition)
>   {
>  S1: A[i] = ...
>  S2:D[i] = A[i-1]...
>   }
>}
>
>The above loop can be distributed with two loops having one loop with
>S1  inside IF and another loop with S2 with the IF.
>The two scenario can be true.
>
>1. The condition inside IF have a check on A[i] and is dependent on S1.
>In this case the distribution is difficult. And the above article From 
>Ken Kennedy et.al does store the partial results of comparison in an 
>temporary array and that array is compared inside the IF Condition. 
>This makes the loop distributed. This is what I was looking for which I 
>found in the above article.
>
>2. The condition inside the IF in the above loop is not dependent on 
>the S1 and S2 , and this case the loop can be distributed.
>
>In the above two scenario the GCC can't distribute the loop, as the 
>control dependency graph ( control structure ) is not used.

>>The above loop can be distributed by gcc just fine.

Existing  loop distribution implementation in GCC distributes the above loop in 
both the above scenario's? 

Thanks & Regards
Ajit

Richard.

 The
>advantage of
>The above loop distribution makes the loop vectorizable which otherwise 
>not possible due to presence of multiple statements inside the IF and 
>Also may not be IF-converted due to presence of multiple statements. If 
>we distribute the loop for the above two scenarios the individual loops
>
>in the distributed loop can be vectorized which is otherwise not 
>possible.
>
>Thanks & Regards
>Ajit
>
>
>Richard.
>
>> Thanks & Regards
>> Ajit