Re: aarch64 asm operand checking

2015-01-29 Thread Andrew Pinski
On Wed, Jan 28, 2015 at 11:54 PM, Jan Beulich  wrote:
> Hello,
>
> in the Xen project we had (meanwhile fixed) code like this (meant to
> be uniform between 32- and 64-bit):
>
> static inline int fls(unsigned int x) {
> int ret;
> asm("clz\t%0, %1" : "=r" (ret) : "r" (x));
> return BITS_PER_LONG - ret;
> }

You want:
asm("clz\t%w0, %w1" : "=r" (ret) : "r" (x));

The modifier 'w' should be documented but if it is not already.
>
> Being mainly an x86 person, when I first saw this I didn't understand
> how this could be correct, as for aarch64 BITS_PER_LONG is 64, and
> both operands being 32-bit I expected "clz w, w" to result.
> Yet I had to learn that no matter what size the C operands, x
> registers are always being picked. Which still doesn't mean the above
> is correct - a suitable call chain can leave a previous operation's
> 64-bit result unconverted, making the above produce a supposedly
> impossible result greater than 32.

That is because the full register is xN but you want only the 32bit part of it.
It is the same issue as on x86_64 where you want the lower 32bit part
of it that is eax vs rax.

>
> Therefore I wonder whether aarch64_print_operand() shouldn't,
> when neither the 'x' not the 'w' modifier is given, either - like
> ix86_print_operand() (via print_reg()) - honor
> GET_MODE_SIZE (GET_MODE (x)), or at the very least warn
> when that one is more narrow than 64 bits. And yes, I realize that
> this isn't going to be optimal (and could even be considered
> inconsistent) as there's no way to express the low half word or
> byte of a general register, i.e. operands more narrow than 32 bits
> couldn't be fully checked without also knowing/evaluating the
> instruction suffix, e.g. by introducing a 'z' operand modifier like
> x86 has, or extending the existing 'e' one.

No because sometimes you want to use the full register size as not all
places where you use a register allows for wN (memory locations for
one).

Thanks,
Andrew Pinski

>
> Jan
>


Re: libgomp: Now known as the GNU Offloading and Multi Processing Runtime Library (was: libgomp: "GNU OpenMP Runtime Library")

2015-01-29 Thread Thomas Schwinge
Hi!

On Sat, 10 Jan 2015 20:21:46 +0100, I wrote:
> On Wed, 12 Nov 2014 15:43:06 -0500, David Malcolm  wrote:
> > On Wed, 2014-11-12 at 21:30 +0100, Jakub Jelinek wrote:
> > > On Wed, Nov 12, 2014 at 03:22:21PM -0500, David Malcolm wrote:
> > > > On Wed, 2014-11-12 at 14:47 +0100, Jakub Jelinek wrote:
> > > > > On Wed, Nov 12, 2014 at 08:33:34AM -0500, David Malcolm wrote:
> > > > > > Apologies for bikeshedding, and I normally dislike "cute" names, but
> > > > > > renaming it to
> > > > > > 
> > > > > >"GNU Offloading and Multi Processing library"
> 
> Oh, how cute!  ;-P
> 
> > > > > > would allow a backronym of "libgomp", thus preserving the existing
> > > > > > filenames/SONAME etc.

> As
> pointed out by Tobias in
> , we'll also
> need to update some more files outside of the GCC sources repository,
> that is, in the web pages repository as well as some wiki pages, I
> assume, which I'll do next week.
> 
> commit c35c9a626070a8660c10a37786cedf2d6e3742c9
> Author: tschwinge 
> Date:   Sat Jan 10 19:10:37 2015 +
> 
> libgomp: Now known as the GNU Offloading and Multi Processing Runtime 
> Library.

Changed in ,
and committed to wwwdocs:

Index: htdocs/onlinedocs/index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/onlinedocs/index.html,v
retrieving revision 1.145
retrieving revision 1.146
diff -u -p -r1.145 -r1.146
--- htdocs/onlinedocs/index.html19 Dec 2014 13:24:58 -  1.145
+++ htdocs/onlinedocs/index.html29 Jan 2015 08:26:06 -  1.146
@@ -957,8 +957,8 @@ existing release.
href="https://gcc.gnu.org/onlinedocs/gccgo.ps.gz";>PostScript or 
https://gcc.gnu.org/onlinedocs/gccgo-html.tar.gz";>an
HTML tarball)
-https://gcc.gnu.org/onlinedocs/libgomp/";>GNU OpenMP
-   Manual (https://gcc.gnu.org/onlinedocs/libgomp/";>GNU Offloading and
+   Multi Processing Runtime Library Manual (https://gcc.gnu.org/onlinedocs/libgomp.pdf";>also in
PDF or https://gcc.gnu.org/onlinedocs/libgomp.ps.gz";>PostScript 
or 

signature.asc
Description: PGP signature


limiting call clobbered registers for library functions

2015-01-29 Thread Paul Shortis
I've ported GCC to a small 16 bit CPU that has single bit shifts. 
So I've handled variable / multi-bit shifts using a mix of inline 
shifts and calls to assembler support functions.


The calls to the asm library functions clobber only one (by 
const) or two (variable) registers but of course calling these 
functions causes all of the standard call clobbered registers to 
be considered clobbered, thus wasting lots of candidate registers 
for use in expressions surrounding these shifts and causing 
unnecessary register saves in the surrounding function 
prologue/epilogue.


I've scrutinized and cloned the actions of other ports that do 
the same, however I'm unable to convince the various passes that 
only r1 and r2 can be clobbered by these library calls.


Is anyone able to point me in the proper direction for a solution 
to this problem ?


Thanks, Paul.





Re: aarch64 asm operand checking

2015-01-29 Thread Jan Beulich
>>> On 29.01.15 at 09:20,  wrote:
> On Wed, Jan 28, 2015 at 11:54 PM, Jan Beulich  wrote:
>> Hello,
>>
>> in the Xen project we had (meanwhile fixed) code like this (meant to
>> be uniform between 32- and 64-bit):
>>
>> static inline int fls(unsigned int x) {
>> int ret;
>> asm("clz\t%0, %1" : "=r" (ret) : "r" (x));
>> return BITS_PER_LONG - ret;
>> }
> 
> You want:
> asm("clz\t%w0, %w1" : "=r" (ret) : "r" (x));

I understand that - as said, we fixed the issue already.

>> Being mainly an x86 person, when I first saw this I didn't understand
>> how this could be correct, as for aarch64 BITS_PER_LONG is 64, and
>> both operands being 32-bit I expected "clz w, w" to result.
>> Yet I had to learn that no matter what size the C operands, x
>> registers are always being picked. Which still doesn't mean the above
>> is correct - a suitable call chain can leave a previous operation's
>> 64-bit result unconverted, making the above produce a supposedly
>> impossible result greater than 32.
> 
> That is because the full register is xN but you want only the 32bit part of 
> it.
> It is the same issue as on x86_64 where you want the lower 32bit part
> of it that is eax vs rax.

No, it's not: An "unsigned int" asm() operand will result in %eax to
be used (in the absence of any modifiers), while an "unsigned long"
one will produce %rax.

>> Therefore I wonder whether aarch64_print_operand() shouldn't,
>> when neither the 'x' not the 'w' modifier is given, either - like
>> ix86_print_operand() (via print_reg()) - honor
>> GET_MODE_SIZE (GET_MODE (x)), or at the very least warn
>> when that one is more narrow than 64 bits. And yes, I realize that
>> this isn't going to be optimal (and could even be considered
>> inconsistent) as there's no way to express the low half word or
>> byte of a general register, i.e. operands more narrow than 32 bits
>> couldn't be fully checked without also knowing/evaluating the
>> instruction suffix, e.g. by introducing a 'z' operand modifier like
>> x86 has, or extending the existing 'e' one.
> 
> No because sometimes you want to use the full register size as not all
> places where you use a register allows for wN (memory locations for
> one).

Above I was specifically talking about register operands only. If
instead you meant the register used inside a memory reference,
that's true for the base address register (but not the second one
usable as offset, where again the C operand type could control
the one chosen), yet the respective operand should never be of
a 32-bit type (but instead ought to be pointer size).

Jan



pass_stdarg problem when run after pass_lim

2015-01-29 Thread Tom de Vries

Jakub,

consider attached patch, which adds pass_lim after fre1 (a simplification of my 
oacc kernels patch series).


The included testcase lim-before-stdarg.c fails.

The first sign of trouble is in lim-before-stdarg.c.088t.stdarg (attached):
...
gen_rtvec: va_list escapes 0, needs to save 0 GPR units and 0 FPR units.
...

Because of the 'need to save 0 GPRs units', at expand no prologue is generated 
to dump the varargs in registers onto stack.


However, the varargs are still read from stack and are therefore undefined, 
which valgrind observes:

...
==6254== Conditional jump or move depends on uninitialised value(s)
==6254==at 0x4005AB: gen_rtvec (in a.out)
==6254==by 0x400411: main (in a.out)
...
and as a result the test executable aborts.

AFAIU, stdarg recognizes a va_arg item by looking for 'ap[0].field' references 
(in our example, p.gp_offset) of the form 'ap[0].field = temp' and 'temp = 
ap[0].field'.


With -fno-tree-loop-im, we find both read and write references in the loop:
...
  :
  # i_28 = PHI 
  _12 = p.gp_offset;<<<
  if (_12 > 47)
goto ;
  else
goto ;

  :
  _13 = p.reg_save_area;
  _14 = (sizetype) _12;
  addr.0_15 = _13 + _14;
  _16 = _12 + 8;
  p.gp_offset = _16;<<<
  goto ;

  :
  _18 = p.overflow_arg_area;
  _19 = _18 + 8;
  p.overflow_arg_area = _19;

  :
  # addr.0_3 = PHI 
  _21 = MEM[(void * * {ref-all})addr.0_3];
  rt_val_11->elem[i_28] = _21;
  i_23 = i_28 + 1;
  if (n_9(D) > i_23)
goto ;
  else
goto ;
...

But with -ftree-loop-im, that's no longer the case. We just find one reference, 
before the loop, a read:

...
  :
  __builtin_va_start (&p, 0);
  if (n_8(D) == 0)
goto ;
  else
goto ;

  :
  __builtin_va_end (&p);
  goto ;

  :
  rt_val_12 = rtvec_alloc (n_8(D));
  p_gp_offset_lsm.4_31 = p.gp_offset;   <<<
  _15 = p.reg_save_area;
  p_overflow_arg_area_lsm.6_33 = p.overflow_arg_area;
  if (n_8(D) > 0)
goto ;
  else
goto ;
...

pass_stdarg recognizes the reference as a read in va_list_counter_struct_op, and 
calls va_list_counter_op. But since it's a read that is only executed once, 
there's no effect on cfun->va_list_gpr_size:

...
va_list_counter_op (si=0x7fffd7f0, ap=0x76963540, var=0x7696b948, 
gpr_p=true, write_p=false)

at src/gcc/tree-stdarg.c:323
323   if (si->compute_sizes < 0)
(gdb) n
325   si->compute_sizes = 0;
(gdb)
326   if (si->va_start_count == 1
(gdb)
327   && reachable_at_most_once (si->bb, si->va_start_bb))
(gdb)
326   if (si->va_start_count == 1
(gdb)
328 si->compute_sizes = 1;
(gdb)
330   if (dump_file && (dump_flags & TDF_DETAILS))
(gdb)
339   && (increment = va_list_counter_bump (si, ap, var, gpr_p)) + 1 > 
1)
(gdb)
337   if (write_p
(gdb)
354   if (write_p || !si->compute_sizes)
(gdb)
361 }
...

Do I understand correctly that the assumptions of pass_stdarg are that:
- the reads and writes occur in pairs (I'm guessing that because the read above
  seems to be ignored. Also PR41089 seems to hint at this)
- the related memref occurs in the same loop nesting level as the pair
?

Any advice on how to fix this, or work around it?

Thanks,
- Tom
Run pass_lim after fre1

---
 gcc/passes.def   |  3 ++
 gcc/testsuite/gcc.dg/lim-before-stdarg.c | 67 
 gcc/tree-ssa-loop.c  |  2 +
 3 files changed, 72 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/lim-before-stdarg.c

diff --git a/gcc/passes.def b/gcc/passes.def
index 2bc5dcd..03d749e 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -86,6 +86,9 @@ along with GCC; see the file COPYING3.  If not see
 	 execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
 	  NEXT_PASS (pass_fre);
+	  NEXT_PASS (pass_tree_loop_init);
+	  NEXT_PASS (pass_lim);
+	  NEXT_PASS (pass_tree_loop_done);
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
 	  NEXT_PASS (pass_early_ipa_sra);
diff --git a/gcc/testsuite/gcc.dg/lim-before-stdarg.c b/gcc/testsuite/gcc.dg/lim-before-stdarg.c
new file mode 100644
index 000..c7a6f03
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lim-before-stdarg.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+
+#include 
+
+typedef void *rtx;
+
+struct rtvec
+{
+  rtx elem[100];
+};
+typedef struct rtvec *rtvec;
+
+#define NULL_RTVEC ((void *)0)
+
+rtvec __attribute__((noinline,noclone))
+rtvec_alloc (int n)
+{
+  static struct rtvec v;
+
+  if (n != 2)
+__builtin_abort ();
+
+  return &v;
+}
+
+rtvec __attribute__((noinline,noclone))
+gen_rtvec (int n, ...)
+{
+  int i;
+  rtvec rt_val;
+  va_list p;
+
+  va_start (p, n);
+
+  if (n == 0)
+{
+  va_end (p);
+  return NULL_RTVEC;
+}
+
+  rt_val = rtvec_alloc (n);
+
+  for (i = 0; i < n; i++)
+rt_val->elem[i] = va_arg (p, rtx);
+
+  va_end (p);
+  return rt_val;
+}
+
+int
+main ()
+{
+  int a;
+  in

Re: pass_stdarg problem when run after pass_lim

2015-01-29 Thread Jakub Jelinek
On Thu, Jan 29, 2015 at 06:19:45PM +0100, Tom de Vries wrote:
> consider attached patch, which adds pass_lim after fre1 (a simplification of
> my oacc kernels patch series).
> 
> The included testcase lim-before-stdarg.c fails.
> 
> The first sign of trouble is in lim-before-stdarg.c.088t.stdarg (attached):
> ...
> gen_rtvec: va_list escapes 0, needs to save 0 GPR units and 0 FPR units.
> ...
> 
> Because of the 'need to save 0 GPRs units', at expand no prologue is
> generated to dump the varargs in registers onto stack.

The stdarg pass can't grok too heavy optimizations, so if at all possible,
don't schedule such passes early, and if you for some reason do, avoid
optimizing in there the va_list related accesses.  I'm afraid that is the
only recommendation I can give here for that.

Jakub


Re: limiting call clobbered registers for library functions

2015-01-29 Thread Richard Henderson
On 01/29/2015 02:08 AM, Paul Shortis wrote:
> I've ported GCC to a small 16 bit CPU that has single bit shifts. So I've
> handled variable / multi-bit shifts using a mix of inline shifts and calls to
> assembler support functions.
> 
> The calls to the asm library functions clobber only one (by const) or two
> (variable) registers but of course calling these functions causes all of the
> standard call clobbered registers to be considered clobbered, thus wasting 
> lots
> of candidate registers for use in expressions surrounding these shifts and
> causing unnecessary register saves in the surrounding function 
> prologue/epilogue.
> 
> I've scrutinized and cloned the actions of other ports that do the same,
> however I'm unable to convince the various passes that only r1 and r2 can be
> clobbered by these library calls.
> 
> Is anyone able to point me in the proper direction for a solution to this
> problem ?

You wind up writing a pattern that contains a call,
but isn't represented in rtl as a call.

The SH port does this for its shifts too.  See



(define_expand "ashlsi3"
  [(set (match_operand:SI 0 "arith_reg_operand" "")
(ashift:SI (match_operand:SI 1 "arith_reg_operand" "")
   (match_operand:SI 2 "shift_count_operand" "")))]
...
  /* Expand a library call for the dynamic shift.  */
  if (!CONST_INT_P (operands[2]) && !TARGET_DYNSHIFT)
{
  emit_move_insn (gen_rtx_REG (SImode, R4_REG), operands[1]);
  rtx funcaddr = gen_reg_rtx (Pmode);
  function_symbol (funcaddr, "__ashlsi3_r0", SFUNC_STATIC);
  emit_insn (gen_ashlsi3_d_call (operands[0], operands[2], funcaddr));

  DONE;
}
})

...

;; If dynamic shifts are not available use a library function.
;; By specifying the pattern we reduce the number of call clobbered regs.
;; In order to make combine understand the truncation of the shift amount
;; operand we have to allow it to use pseudo regs for the shift operands.
(define_insn "ashlsi3_d_call"
  [(set (match_operand:SI 0 "arith_reg_dest" "=z")
(ashift:SI (reg:SI R4_REG)
   (and:SI (match_operand:SI 1 "arith_reg_operand" "z")
   (const_int 31
   (use (match_operand:SI 2 "arith_reg_operand" "r"))
   (clobber (reg:SI T_REG))
   (clobber (reg:SI PR_REG))]
  "TARGET_SH1 && !TARGET_DYNSHIFT"
  "jsr  @%2%#"
  [(set_attr "type" "sfunc")
   (set_attr "needs_delay_slot" "yes")])


r~


Re: pass_stdarg problem when run after pass_lim

2015-01-29 Thread Richard Biener
On January 29, 2015 6:25:35 PM CET, Jakub Jelinek  wrote:
>On Thu, Jan 29, 2015 at 06:19:45PM +0100, Tom de Vries wrote:
>> consider attached patch, which adds pass_lim after fre1 (a
>simplification of
>> my oacc kernels patch series).
>> 
>> The included testcase lim-before-stdarg.c fails.
>> 
>> The first sign of trouble is in lim-before-stdarg.c.088t.stdarg
>(attached):
>> ...
>> gen_rtvec: va_list escapes 0, needs to save 0 GPR units and 0 FPR
>units.
>> ...
>> 
>> Because of the 'need to save 0 GPRs units', at expand no prologue is
>> generated to dump the varargs in registers onto stack.
>
>The stdarg pass can't grok too heavy optimizations, so if at all
>possible,
>don't schedule such passes early, and if you for some reason do, avoid
>optimizing in there the va_list related accesses.  I'm afraid that is
>the
>only recommendation I can give here for that.

The other possibility (Matz has patches for that) is to delay vaarg lowering 
currently done by gimplification and combine it with the stdarg pass.

Richard.

>   Jakub




Re: pass_stdarg problem when run after pass_lim

2015-01-29 Thread Jakub Jelinek
On Thu, Jan 29, 2015 at 07:44:29PM +0100, Richard Biener wrote:
> On January 29, 2015 6:25:35 PM CET, Jakub Jelinek  wrote:
> >On Thu, Jan 29, 2015 at 06:19:45PM +0100, Tom de Vries wrote:
> >> consider attached patch, which adds pass_lim after fre1 (a
> >simplification of
> >> my oacc kernels patch series).
> >> 
> >> The included testcase lim-before-stdarg.c fails.
> >> 
> >> The first sign of trouble is in lim-before-stdarg.c.088t.stdarg
> >(attached):
> >> ...
> >> gen_rtvec: va_list escapes 0, needs to save 0 GPR units and 0 FPR
> >units.
> >> ...
> >> 
> >> Because of the 'need to save 0 GPRs units', at expand no prologue is
> >> generated to dump the varargs in registers onto stack.
> >
> >The stdarg pass can't grok too heavy optimizations, so if at all
> >possible,
> >don't schedule such passes early, and if you for some reason do, avoid
> >optimizing in there the va_list related accesses.  I'm afraid that is
> >the
> >only recommendation I can give here for that.
> 
> The other possibility (Matz has patches for that) is to delay vaarg lowering 
> currently done by gimplification and combine it with the stdarg pass.

Yeah, that should work too.  But stage1 material probably.

Jakub


Re: limiting call clobbered registers for library functions

2015-01-29 Thread Jeff Law

On 01/29/15 10:32, Richard Henderson wrote:

On 01/29/2015 02:08 AM, Paul Shortis wrote:

I've ported GCC to a small 16 bit CPU that has single bit shifts. So I've
handled variable / multi-bit shifts using a mix of inline shifts and calls to
assembler support functions.

The calls to the asm library functions clobber only one (by const) or two
(variable) registers but of course calling these functions causes all of the
standard call clobbered registers to be considered clobbered, thus wasting lots
of candidate registers for use in expressions surrounding these shifts and
causing unnecessary register saves in the surrounding function 
prologue/epilogue.

I've scrutinized and cloned the actions of other ports that do the same,
however I'm unable to convince the various passes that only r1 and r2 can be
clobbered by these library calls.

Is anyone able to point me in the proper direction for a solution to this
problem ?


You wind up writing a pattern that contains a call,
but isn't represented in rtl as a call.

The SH port does this for its shifts too.  See
Richard is precisely correct, you can make a call in the asm output of a 
pattern, but not express the call in the RTL.


The trick here is expressing how parameters are passed and any clobbers. 
 The example that comes to my mind is the PSI extensions in the mn102 
port (long deprecated).


;; The last alternative is necessary because the second operand might
;; have been the frame pointer.  The frame pointer would get replaced
;; by (plus (stack_pointer) (const_int)).
;;
;; Reload would think that it only needed a PSImode register in
;; push_reload and at the start of allocate_reload_regs.  However,
;; at the end of allocate_reload_reg it would realize that the
;; reload register must also be valid for SImode, and if it was
;; not valid reload would abort.
(define_insn "zero_extendpsisi2"
  [(set (match_operand:SI 0 "register_operand" "=d,?d,?*d,?*d")
(zero_extend:SI (match_operand:PSI 1 "extendpsi_operand"
"m,?0,?*dai,Q")))]
  ""
  "@
  mov %L1,%L0\;movbu %H1,%H0
  jsr ___zero_extendpsisi2_%0
  mov %1,%L0\;jsr ___zero_extendpsisi2_%0
  mov a3,%L0\;add %Z1,%L0\;jsr ___zero_extendpsisi2_%0"
  [(set_attr "cc" "clobber")])


Except for the memory->register case these extensions were implemented 
by a library call.  But we don't express the call in RTL and the library 
call did not follow the usual mn102 calling conventions.  In fact, we 
created a specialized ABI for the extension calls.


If you look closely, you'll probably realize we actually had multiple 
zero_extendpsisi2 routines in libgcc (4 in total).  One with d0 as its 
input & result, another with d1, another with d2 and last one with d3.


That avoided lots of register shuffling, but without bloating the 
library with 16 variants had we allowed the input and output operand to 
be in different registers.


We did similar things for psi->si sign extensions, some truncations, 
some negations, prologue/epilogue.


https://stuff.mit.edu/afs/athena/astaff/source/src-8.2/third/gcc/config/mn10200/mn10200.md

Jeff


Re: pass_stdarg problem when run after pass_lim

2015-01-29 Thread Tom de Vries

On 29-01-15 18:25, Jakub Jelinek wrote:

The stdarg pass can't grok too heavy optimizations, so if at all possible,
don't schedule such passes early, and if you for some reason do, avoid
optimizing in there the va_list related accesses.


This patch work for the example.

In pass_lim1, I get:
...
;; Function gen_rtvec (gen_rtvec, funcdef_no=1, decl_uid=1841, cgraph_uid=1, 
symbol_order=1)


va_list_related_stmt_p: no simple_mem_ref
_15 = p.gp_offset;
va_list_related_stmt_p: no simple_mem_ref
_16 = p.reg_save_area;
va_list_related_stmt_p: no simple_mem_ref
p.gp_offset = _21;
va_list_related_stmt_p: no simple_mem_ref
_23 = p.overflow_arg_area;
va_list_related_stmt_p: no simple_mem_ref
p.overflow_arg_area = _25;
va_list_related_stmt_p: MOVE_IMPOSSIBLE
_15 = p.gp_offset;
va_list_related_stmt_p: MOVE_IMPOSSIBLE
_16 = p.reg_save_area;
va_list_related_stmt_p: MOVE_IMPOSSIBLE
_23 = p.overflow_arg_area;
gen_rtvec (int n)
...

Thanks,
- Tom
Handle va_list conservatively in pass_lim

---
 gcc/tree-ssa-loop-im.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 9aba79b..2520fa2 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -70,6 +70,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-propagate.h"
 #include "trans-mem.h"
 #include "gimple-fold.h"
+#include "target.h"
+#include "gimple-walk.h"
 
 /* TODO:  Support for predicated code motion.  I.e.
 
@@ -289,6 +291,32 @@ enum move_pos
   };
 
 
+static tree
+va_list_related_tree_p (tree *t, int *walk_subtrees ATTRIBUTE_UNUSED,
+			void *data ATTRIBUTE_UNUSED)
+{
+  tree cfun_va_list = targetm.fn_abi_va_list (cfun->decl);
+  tree c1, c2, type;
+  if (!DECL_P (*t))
+return NULL_TREE;
+  type = TREE_TYPE (*t);
+  c1 = TYPE_CANONICAL (type);
+  c2 = TYPE_CANONICAL(cfun_va_list);
+
+  if (c1 == c2)
+return *t;
+
+  return NULL_TREE;
+}
+
+bool
+va_list_related_stmt_p (gimple stmt)
+{
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  tree res = walk_gimple_stmt (&gsi, NULL, va_list_related_tree_p, NULL);
+  return res != NULL_TREE;
+}
+
 /* If it is possible to hoist the statement STMT unconditionally,
returns MOVE_POSSIBLE.
If it is possible to hoist the statement STMT, but we must avoid making
@@ -384,6 +412,15 @@ movement_possibility (gimple stmt)
 	}
 }
 
+  if (va_list_related_stmt_p (stmt))
+{
+  if (dump_file)
+	{
+	  fprintf (dump_file, "va_list_related_stmt_p: MOVE_IMPOSSIBLE\n");
+	  print_gimple_stmt (dump_file, stmt, 2, 0);
+	}
+  return MOVE_IMPOSSIBLE;
+}
   return ret;
 }
 
@@ -593,6 +630,16 @@ simple_mem_ref_in_stmt (gimple stmt, bool *is_store)
   if (!gimple_assign_single_p (stmt))
 return NULL;
 
+  if (va_list_related_stmt_p (stmt))
+{
+  if (dump_file)
+	{
+	  fprintf (dump_file, "va_list_related_stmt_p: no simple_mem_ref\n");
+	  print_gimple_stmt (dump_file, stmt, 2, 0);
+	}
+  return NULL;
+}
+
   lhs = gimple_assign_lhs_ptr (stmt);
   rhs = gimple_assign_rhs1_ptr (stmt);
 
-- 
1.9.1



gcc-4.8-20150129 is now available

2015-01-29 Thread gccadmin
Snapshot gcc-4.8-20150129 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20150129/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.8 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch 
revision 220264

You'll find:

 gcc-4.8-20150129.tar.bz2 Complete GCC

  MD5=58765fca6938b9f52c18c8ec03a0dd4f
  SHA1=c857ce08337066c3c65cb3fa0e6f1e71a093e748

Diffs from 4.8-20150122 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


value not set via reference

2015-01-29 Thread Conrad S
Which compiler is correct here - gcc or clang?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64870

Consider the following code:

#include 

struct blah {
  inline double setval(unsigned int& x) const
{
x = 123;
return 456.0;
}
  };


int
main(int argc, char** argv) {
  blah blah_instance;

  unsigned int val = ;

  std::cout << blah_instance.setval(val) << "  val: " << val << std::endl;
  std::cout << blah_instance.setval(val) << "  val: " << val << std::endl;

  return 0;
  }

when compiled with gcc 4.9.2, the above program produces:
456  val:   <-- unexpected
456  val: 123

when compiled with clang 3.5:
456  val: 123
456  val: 123

Clang has the least surprising result. Is gcc relying on a loophole in
C++ legalese to muck up the order of evaluation?


Re: value not set via reference

2015-01-29 Thread James Dennett
On Thu, Jan 29, 2015 at 10:38 PM, Conrad S  wrote:
>
> Which compiler is correct here - gcc or clang?

Both compilers are correct.

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64870
>
> Consider the following code:
>
> #include 
>
> struct blah {
>   inline double setval(unsigned int& x) const
> {
> x = 123;
> return 456.0;
> }
>   };
>
>
> int
> main(int argc, char** argv) {
>   blah blah_instance;
>
>   unsigned int val = ;
>
>   std::cout << blah_instance.setval(val) << "  val: " << val << std::endl;
>   std::cout << blah_instance.setval(val) << "  val: " << val << std::endl;
>
>   return 0;
>   }
>
> when compiled with gcc 4.9.2, the above program produces:
> 456  val:   <-- unexpected
> 456  val: 123
>
> when compiled with clang 3.5:
> 456  val: 123
> 456  val: 123
>
> Clang has the least surprising result. Is gcc relying on a loophole in
> C++ legalese to muck up the order of evaluation?

It's hardly just a loophole: C++ doesn't specify the order of evaluation,
so the code is wrong (i.e., non-portable, as you've found).

Arguably this is a design problem with IOStreams, given how tempting it can
be to write code that assumes left-to-right evaluation, but it's not a
compiler bug.

-- James


Re: value not set via reference

2015-01-29 Thread Conrad S
On 30 January 2015 at 16:58, James Dennett wrote:
> It's hardly just a loophole: C++ doesn't specify the order of evaluation,
> so the code is wrong (i.e., non-portable, as you've found).
>
> Arguably this is a design problem with IOStreams, given how
> tempting it can be to write code that assumes left-to-right evaluation,
> but it's not a compiler bug.

Okay, but what is the reason for changing the "expected"
(left-to-right) order of evaluation?  Is there an optimisation
benefit?

If not, why change the order to something unexpected?


forcing to emit absolute addresses in the .debug_loc setion

2015-01-29 Thread Umesh Kalappa
Hi Guys,

Myself was very new  to dwarf debugging format and recently we migrate
GCC compiler  to 4.8.3 toolchain from 4.5.2 ans using same binutils
2.23.51.

we are seeing the  weird issue with .debug_loc entries and assembler
pop up with below error


/tmp/ccUj1tbg.s: Assembler messages:
/tmp/ccUj1tbg.s:778: Error: can't resolve `.LVL0' {.text section} -
`.text._ZN10__cxxabiv117__array_type_infoD2Ev' {*UND* section}
/tmp/ccUj1tbg.s:779: Error: can't resolve `.LVL1' {.text section} -
`.text._ZN10__cxxabiv117__array_type_infoD2Ev' {*UND* section}
/tmp/ccUj1tbg.s:782: Error: can't resolve `.LVL1' {.text section} -
`.text._ZN10__cxxabiv117__array_type_infoD2Ev' {*UND* section}

corresponding .debug_loc entries

.section.debug_loc,info
.Ldebug_loc0:
.LLST0:
.4byte  .LVL0-.text._ZN10__cxxabiv117__array_type_infoD2Ev
.4byte  .LVL1-1-.text._ZN10__cxxabiv117__array_type_infoD2Ev
.2byte  0x1
.byte   0x54
.4byte  .LVL1-1-.text._ZN10__cxxabiv117__array_type_infoD2Ev
.4byte  .LFE72-.text._ZN10__cxxabiv117__array_type_infoD2Ev
.2byte  0x4
.byte   0xf3
.uleb128 0x1
.byte   0x54
.byte   0x9f
.4byte  0
.4byte  0


googling on the above issue was with  no luck :( ,after going through
the dwarf format it was found that ,the above .debug_loc entries are
relatively not absloute,please correct me here if my assumption was
wrong and we need to stick to dwarf-2 format not to  like 3,4, or 5 .

second,was tweaked/forced the compiler to generate abs address like

static void
output_loc_list (dw_loc_list_ref list_head)
{

  else if (/*!have_multiple_function_sections*/0) //our hacked thing
and weird too
{
  dw2_asm_output_delta (DWARF2_ADDR_SIZE, curr->begin, curr->section,
"Location list begin address (%s)",
list_head->ll_symbol);
  dw2_asm_output_delta (DWARF2_ADDR_SIZE, curr->end, curr->section,
"Location list end address (%s)",
list_head->ll_symbol);
}
  else
{
  dw2_asm_output_addr (DWARF2_ADDR_SIZE, curr->begin,
   "Location list begin address (%s)",
   list_head->ll_symbol);
  dw2_asm_output_addr (DWARF2_ADDR_SIZE, curr->end,
   "Location list end address (%s)",
   list_head->ll_symbol);
}

}

now the .debug_loc section looks like

.section.debug_loc,info
.Ldebug_loc0:
.LLST0:
.4byte  .LVL0
.4byte  .LVL1-1
.2byte  0x1
.byte   0x54
.4byte  .LVL1-1
.4byte  .LFE72
.2byte  0x4
.byte   0xf3
.uleb128 0x1
.byte   0x54
.byte   0x9f
.4byte  0
.4byte  0

now everything goes well,But we are looking the cause and proper fix too.


So please guys, pass us your insights / suggestion / comments  on this.

Thank you
~Umesh


Re: limiting call clobbered registers for library functions

2015-01-29 Thread Yury Gribov

On 01/29/2015 08:32 PM, Richard Henderson wrote:

On 01/29/2015 02:08 AM, Paul Shortis wrote:

I've ported GCC to a small 16 bit CPU that has single bit shifts. So I've
handled variable / multi-bit shifts using a mix of inline shifts and calls to
assembler support functions.

The calls to the asm library functions clobber only one (by const) or two
(variable) registers but of course calling these functions causes all of the
standard call clobbered registers to be considered clobbered, thus wasting lots
of candidate registers for use in expressions surrounding these shifts and
causing unnecessary register saves in the surrounding function 
prologue/epilogue.

I've scrutinized and cloned the actions of other ports that do the same,
however I'm unable to convince the various passes that only r1 and r2 can be
clobbered by these library calls.

Is anyone able to point me in the proper direction for a solution to this
problem ?


You wind up writing a pattern that contains a call,
but isn't represented in rtl as a call.


Could it be useful to provide a pragma for specifying function register 
usage? This would allow e.g. library writer to write a hand-optimized 
assembly version and then inform compiler of it's binary interface.


Currently a surrogate of this can be achieved by putting inline asm code 
in static inline functions in public library headers but this has it's 
own disadvantages (e.g. code bloat).


-Y