Re: -mcx16 vs. not using CAS for atomic loads

2017-01-24 Thread Torvald Riegel
On Fri, 2017-01-20 at 09:55 -0800, Richard Henderson wrote:
> On 01/19/2017 10:23 AM, Torvald Riegel wrote:
> > I think I prefer Option 3b as the short-term solution.  It does not
> > break programs (except the __atomic_always_lock_free assertion scenario,
> > but that's likely to not work anyway given that the atomics will be
> > lock-free but not "fast").  It makes programs aware that the atomics
> > will not be fast when they are not fast indeed (ie, when getting loads
> > through cmpxchg).
> 
> I agree.  Let's go through the library for the loads, giving us a hook to fix 
> this in the future.

I'm working on a patch for this.

> > I'm worried that Option 4 would not be possible until some time in the
> > future when we have actually gotten confirmation from the HW vendors
> > about 16-byte atomic loads.  The additional risk is that we may never
> > get such a confirmation (eg, because they do not want to constrain
> > future HW), or that this actually holds just for a few processors.
> 
> Indeed, I don't think we'll get any proper confirmation from the hw vendors 
> any 
> time soon.  Or possibly ever.
> 
> The only light on the horizon that I can see is that HTM is now working in 
> newly shipping Intel processors, and we could write a pure load path through 
> libatomic that uses that.  Over time the lack of guaranteed SSE atomicity 
> becomes less relevant.

Unless HW transactions are guaranteed to succeed for scenarios that are
sufficient for the atomics, HTM won't help because we'd have to consider
the worst-case, which would mean some non-HTM fallback.
Intel's current HTM does not make guarantees; IIRC, either Power or s390
have an HTM mode in which there are guarantees, provided that the user
follows a few rules.



implicit-fallthrough warnings in powerpc64le-linux GCC build

2017-01-24 Thread Sebastian Huber

Hello,

I noticed some implicit-fallthrough warnings in a powerpc64le-linux GCC 
build:


/home/sh/b-gcc-git/./gcc/xgcc -B/home/sh/b-gcc-git/./gcc/ 
-B/home/sh/install-gcc-git/powerpc64le-unknown-linux-gnu/bin/ 
-B/home/sh/install-gcc-git/powerpc64le-unknown-linux-gnu/lib/ -isystem 
/home/sh/install-gcc-git/powerpc64le-unknown-linux-gnu/include -isystem 
/home/sh/install-gcc-git/powerpc64le-unknown-linux-gnu/sys-include -g 
-O2 -O2  -g -O2 -DIN_GCC-W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wno-format -Wstrict-prototypes -Wmissing-prototypes 
-Wold-style-definition  -isystem ./include -fPIC -mlong-double-128 
-mno-minimal-toc -g -DIN_LIBGCC2 -fbuilding-libgcc 
-fno-stack-protector   -fPIC -mlong-double-128 -mno-minimal-toc -I. -I. 
-I../.././gcc -I/home/sh/gcc-git/libgcc -I/home/sh/gcc-git/libgcc/. 
-I/home/sh/gcc-git/libgcc/../gcc -I/home/sh/gcc-git/libgcc/../include 
-I/home/sh/gcc-git/libgcc/../libdecnumber/dpd 
-I/home/sh/gcc-git/libgcc/../libdecnumber -DHAVE_CC_TLS  -o _ucmpdi2.o 
-MT _ucmpdi2.o -MD -MP -MF _ucmpdi2.dep -DL_ucmpdi2 -c 
/home/sh/gcc-git/libgcc/libgcc2.c -fvisibility=hidden -DHIDE_EXPORTS
/home/sh/gcc-git/libgcc/soft-fp/op-common.h:900:10: warning: this 
statement may fall through [-Wimplicit-fallthrough=]

R##_s = X##_s; \
/home/sh/gcc-git/libgcc/soft-fp/quad.h:308:29: note: in expansion of 
macro ‘_FP_MUL’

 # define FP_MUL_Q(R, X, Y)  _FP_MUL (Q, 2, R, X, Y)
 ^~~
mulkf3-sw.c:48:3: note: in expansion of macro ‘FP_MUL_Q’
   FP_MUL_Q (R, A, B);
   ^~~~
/home/sh/gcc-git/libgcc/soft-fp/op-common.h:902:2: note: here
  case _FP_CLS_COMBINE (FP_CLS_INF, FP_CLS_INF):  \
  ^
/home/sh/gcc-git/libgcc/soft-fp/quad.h:308:29: note: in expansion of 
macro ‘_FP_MUL’

 # define FP_MUL_Q(R, X, Y)  _FP_MUL (Q, 2, R, X, Y)
 ^~~
mulkf3-sw.c:48:3: note: in expansion of macro ‘FP_MUL_Q’
   FP_MUL_Q (R, A, B);
   ^~~~
/home/sh/gcc-git/libgcc/soft-fp/op-common.h:913:10: warning: this 
statement may fall through [-Wimplicit-fallthrough=]

R##_s = Y##_s; \
/home/sh/gcc-git/libgcc/soft-fp/quad.h:308:29: note: in expansion of 
macro ‘_FP_MUL’

 # define FP_MUL_Q(R, X, Y)  _FP_MUL (Q, 2, R, X, Y)
 ^~~
mulkf3-sw.c:48:3: note: in expansion of macro ‘FP_MUL_Q’
   FP_MUL_Q (R, A, B);
   ^~~~
/home/sh/gcc-git/libgcc/soft-fp/op-common.h:915:2: note: here
  case _FP_CLS_COMBINE (FP_CLS_NORMAL, FP_CLS_INF): \
  ^
/home/sh/gcc-git/libgcc/soft-fp/quad.h:308:29: note: in expansion of 
macro ‘_FP_MUL’

 # define FP_MUL_Q(R, X, Y)  _FP_MUL (Q, 2, R, X, Y)
 ^~~
mulkf3-sw.c:48:3: note: in expansion of macro ‘FP_MUL_Q’
   FP_MUL_Q (R, A, B);
   ^~~

I don't know this code enough to fix them.

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



IEEE 128-bit floating point support for PowerPC RTEMS

2017-01-24 Thread Sebastian Huber

Hello,

some time ago IEEE 128-bit floating point support for PowerPC was added 
to GCC:


https://gcc.gnu.org/wiki/Ieee128PowerPC

I noticed some issues for RTEMS in this area. Firstly, RTEMS had no 
__powerpc__ builtin define, so some source files were effectively 
disabled, e.g. ibm-ldouble.c. With __powerpc__ defined, the 
ibm-ldouble.c didn't compile due to:


In file included from 
/home/EB/sebastian_h/archive/gcc-git/libgcc/config/rs6000/ibm-ldouble.c:374:0:
/home/EB/sebastian_h/archive/gcc-git/libgcc/soft-fp/quad.h:72:1: error: 
unable to emulate 'TF'

  typedef float TFtype __attribute__ ((mode (TF)));
  ^~~

I added

#define RS6000_DEFAULT_LONG_DOUBLE_SIZE 128

#undef TARGET_IEEEQUAD
#define TARGET_IEEEQUAD 1

This fixed the problem above and changes the long double type from 8 
bytes to 16 bytes.


The new compiler defines now (powerpc-rtems target):

#define __LONG_DOUBLE_128__ 1
#define __LONGDOUBLE128 1
#define __LONG_DOUBLE_IEEE128__ 1

However, the libgcc multilib build fails due to several ICEs. See 
attached errors.log.


Is this supposed to work for 32-bit PowerPC. Did I miss some magic 
configuration switch?


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.


>From 23951fde45ff6b81afd2432866166b0f43401401 Mon Sep 17 00:00:00 2001
From: Sebastian Huber 
Date: Tue, 24 Jan 2017 11:20:22 +0100
Subject: [PATCH] Enable IEEE 754R 128-bit FP for powerpc-rtems

---
 gcc/config/rs6000/rtems.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/rs6000/rtems.h b/gcc/config/rs6000/rtems.h
index 54a36de..449de0f 100644
--- a/gcc/config/rs6000/rtems.h
+++ b/gcc/config/rs6000/rtems.h
@@ -25,6 +25,7 @@
   do  \
 { \
   builtin_define_std ("PPC"); \
+  builtin_define_std ("powerpc"); \
   builtin_define ("__rtems__");   \
   builtin_define ("__USE_INIT_FINI__"); \
   builtin_assert ("system=rtems");\
@@ -58,3 +59,8 @@
 #undef  SUBSUBTARGET_EXTRA_SPECS
 #define SUBSUBTARGET_EXTRA_SPECS \
   { "cpp_os_rtems",		CPP_OS_RTEMS_SPEC }
+
+#define RS6000_DEFAULT_LONG_DOUBLE_SIZE 128
+
+#undef TARGET_IEEEQUAD
+#define TARGET_IEEEQUAD 1
-- 
1.8.4.5


make[4]: Entering directory `/build/powerpc-rtems4.12/m403/libgcc'
# If this is the top-level multilib, build all the other
# multilibs.
/build/./gcc/xgcc -B/build/./gcc/ -nostdinc -B/build/powerpc-rtems4.12/newlib/ 
-isystem /build/powerpc-rtems4.12/newlib/targ-include -isystem 
/gcc/newlib/libc/include -B/opt/rtems-4.12/powerpc-rtems4.12/bin/ 
-B/opt/rtems-4.12/powerpc-rtems4.12/lib/ -isystem 
/opt/rtems-4.12/powerpc-rtems4.12/include -isystem 
/opt/rtems-4.12/powerpc-rtems4.12/sys-include-g -O2 -mcpu=403 -O2 
-I/gcc/libgcc/../newlib/libc/sys/rtems/include -g -O2 -DIN_GCC  
-DCROSS_DIRECTORY_STRUCTURE  -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  
-isystem ./include   -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector 
-Dinhibit_libc  -I. -I. -I../../.././gcc -I/gcc/libgcc -I/gcc/libgcc/. 
-I/gcc/libgcc/../gcc -I/gcc/libgcc/../include  -DHAVE_CC_TLS  -o _powitf2.o -MT 
_powitf2.o -MD -MP -MF _powitf2.dep -DL_powitf2 -c /gcc/libgcc/libgcc2.c 
-fvisibility=hidden -DHIDE_EXPORTS
/gcc/libgcc/libgcc2.c: In function '__powitf2':
/gcc/libgcc/libgcc2.c:1882:9: internal compiler error: Segmentation fault
   x = x * x;
   ~~^~~
0x9fa37f crash_signal
/gcc/gcc/toplev.c:333
0x72d1a3 assign_temp(tree_node*, int, int)
/gcc/gcc/function.c:956
0x6c9f56 emit_push_insn(rtx_def*, machine_mode, tree_node*, rtx_def*, unsigned 
int, int, rtx_def*, int, rtx_def*, rtx_def*, int, rtx_def*, bool)
/gcc/gcc/expr.c:4314
0x59ceb0 emit_library_call_value_1
/gcc/gcc/calls.c:4838
0x5a3b81 emit_library_call_value(rtx_def*, rtx_def*, libcall_type, 
machine_mode, int, ...)
/gcc/gcc/calls.c:5159
0x90763b expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*, rtx_def*, 
int, optab_methods)
/gcc/gcc/optabs.c:1758
0x6b020b expand_mult(machine_mode, rtx_def*, rtx_def*, rtx_def*, int)
/gcc/gcc/expmed.c:3358
0x6d559b expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
expand_modifier)
/gcc/gcc/expr.c:8792
0x5b3291 expand_gimple_stmt_1
/gcc/gcc/cfgexpand.c:3677
0x5b3291 expand_gimple_stmt
/gcc/gcc/cfgexpand.c:3737
0x5b5399 expand_gimple_basic_block
/gcc/gcc/cfgexpand.c:5744
0x5badf6 execute
/gcc/gcc/cfgexpand.c:6357
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
make[4]: [_powitf2.o] E

Re: IEEE 128-bit floating point support for PowerPC RTEMS

2017-01-24 Thread Joseph Myers
On Tue, 24 Jan 2017, Sebastian Huber wrote:

> I noticed some issues for RTEMS in this area. Firstly, RTEMS had no
> __powerpc__ builtin define, so some source files were effectively disabled,
> e.g. ibm-ldouble.c. With __powerpc__ defined, the ibm-ldouble.c didn't compile
> due to:

When you're using IEEE binary128, you should not be using ibm-ldouble.c.  
It's needed for GNU/Linux because of the existing ABI, but if your 
existing ABI does not use IBM long double, you should not introduce uses 
of it.

> Is this supposed to work for 32-bit PowerPC. Did I miss some magic
> configuration switch?

I think the binary128 support for 32-bit PowerPC (using _q_* names for 
library functions, and passing arguments by reference) dates back to 
PowerPC Solaris, c. 1995, and is very likely to be bitrotten.

-- 
Joseph S. Myers
jos...@codesourcery.com


HW subregs in machine description

2017-01-24 Thread Dimitar Dimitrov
Hello,

I'm a newbie working on a GCC port [1] for PRU [2]. In order to achieve ABI 
compatibility with the proprietary TI toolchain, I need my Machine Description 
to support HW register subfields as indipendent first-class registers. I could 
not find a relevant example in the GCC source. Looks like other ports declare 
subregisters to be the same entities as the corresponding main register (e.g. 
%al , %ah and %ax are treated by the GCC register allocator as one and same 
i386 register).

So is it possible to describe subregs (independent fields) of 32-bit HW 
registers in GCC?

Currently I'm attempting to describe the  8-bit PRU subregisters as the "real" 
target register set, and then work on defining 16-bit and 32-bit ALU 
operations. But I'm not sure if that would be efficient for a 32-bit PRU 
target, or feasible at all.

To give an example, each 32-bit PRU register can hold either:
  - one 32-bit value (e.g. r10)
  - four independent 8-bit values (e.g. r10.b0, r10.b1, r10.b2, r10.b3).
  - two independent 16-bit values (e.g. r10.w0, r10.w2).
  - a mixture of the above (e.g. r10.w0, r10.b2, r10.b3).

And here is an example of zero-extending one 8-bit and one 16-bit value, 
performing a 32-bit addition, and storing the result  into another 8-bit 
subregister:
   add r10.b3, r10.b2, r10.w0

Thanks,
Dimitar

[1] https://github.com/dinuxbg/gnupru/tree/master/patches/gcc
[2] http://elinux.org/images/d/da/Am335xPruReferenceGuide.pdf , section 
5.3.2.5.2 "Registers"


Re: HW subregs in machine description

2017-01-24 Thread Nathan Sidwell

On 01/24/2017 03:24 PM, Dimitar Dimitrov wrote:


Currently I'm attempting to describe the  8-bit PRU subregisters as the "real" 
target register set, and then work on defining 16-bit and 32-bit ALU operations. But I'm 
not sure if that would be efficient for a 32-bit PRU target, or feasible at all.


this is what occurred to me as the way to go.  Some existing targets do 
the moral equivalent of that to hold a 64-bit double in an even/odd pair 
of 32 bit regs.  Don't think there's one that's quite as extreme as what 
you describe.


nathan

--
Nathan Sidwell


Re: -mcx16 vs. not using CAS for atomic loads

2017-01-24 Thread Richard Henderson
On 01/24/2017 01:08 AM, Torvald Riegel wrote:
> Unless HW transactions are guaranteed to succeed for scenarios that are
> sufficient for the atomics, HTM won't help because we'd have to consider
> the worst-case, which would mean some non-HTM fallback.

We're talking about a 16 byte aligned load here -- one cacheline, probably 3-4
instructions.  If an HTM cannot succeed with that, I'm happy to call it useless.

The only possible concern I see might be with simulators that force HTM
failure, for the purpose of forcibly testing fallback paths.  I guess we'd have
to continue to fall back to the lock path for that case.


r~


Re: -mcx16 vs. not using CAS for atomic loads

2017-01-24 Thread Peter Bergner

On 1/24/17 3:06 PM, Richard Henderson wrote:

The only possible concern I see might be with simulators that force HTM
failure, for the purpose of forcibly testing fallback paths.  I guess we'd have
to continue to fall back to the lock path for that case.


IIRC, this was the path that valgrind was going to use all of the time,
because actually implementing the HTM instructions was too hard.

Peter




gcc-5-20170124 is now available

2017-01-24 Thread gccadmin
Snapshot gcc-5-20170124 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/5-20170124/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-5-branch 
revision 244883

You'll find:

 gcc-5-20170124.tar.bz2   Complete GCC

  MD5=47342d55733b509c4ea7da88b55f7a44
  SHA1=7e2ad934df38ffaba96d08502f7ffcd9a34f5d3d

Diffs from 5-20170117 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.