Re: builtin ffs vs. renamed ffs (vms-crtl.h)

2010-05-08 Thread Andrew Pinski



Sent from my iPhone

On May 7, 2010, at 11:20 PM, Jay K  wrote:



In gcc for VMS there is some mechanism to rename functions.
See the files:

/src/gcc-4.5.0/gcc/config/vms/vms-crtl-64.h
/src/gcc-4.5.0/gcc/config/vms/vms-crtl.h


which are mostly just lists of function from/to.


As well in gcc there is a mechanism for optimizing various "builtin"  
functions, like ffs.


For builtins you should look at how PowerPC Darwin handles the long  
double builtins since they are renamed if long double is 128bit.


Thanks,
Andrew Pinski





These two mechanisms seem to conflict or be applied in the wrong  
order.

I didn't look at it deeply.


The symptom is that if you add ffs (to decc$ffs) to vms-crtl.h, the  
translation

is not done, and you end up with unresolved external ffs.


If you #if out the support for "builtin ffs", it works.


My local hack is below but obviously that's not the way.


I'll enter a bug.


Thanks,
 - Jay


diff -u /src/orig/gcc-4.5.0/gcc/builtins.c ./builtins.c
--- /src/orig/gcc-4.5.0/gcc/builtins.c2010-04-13 06:47:11.0 
 -0700

+++ ./builtins.c2010-05-07 23:11:30.0 -0700
@@ -51,6 +51,8 @@
 #include "value-prof.h"
 #include "diagnostic.h"

+#define DISABLE_FFS
+
 #ifndef SLOW_UNALIGNED_ACCESS
 #define SLOW_UNALIGNED_ACCESS(MODE, ALIGN) STRICT_ALIGNMENT
 #endif
@@ -5899,6 +5901,7 @@
 return target;
   break;

+#ifndef DISABLE_FFS
 CASE_INT_FN (BUILT_IN_FFS):
 case BUILT_IN_FFSIMAX:
   target = expand_builtin_unop (target_mode, exp, target,
@@ -5906,6 +5909,7 @@
   if (target)
 return target;
   break;
+#endif

 CASE_INT_FN (BUILT_IN_CLZ):
 case BUILT_IN_CLZIMAX:
@@ -13612,6 +13616,7 @@
 case BUILT_IN_ABORT:
   abort_libfunc = set_user_assembler_libfunc ("abort", asmspec);
   break;
+#ifndef DISABLE_FFS
 case BUILT_IN_FFS:
   if (INT_TYPE_SIZE < BITS_PER_WORD)
 {
@@ -13620,6 +13625,7 @@
MODE_INT, 0), "ffs");
 }
   break;
+#endif
 default:
   break;
 }
diff -u /src/orig/gcc-4.5.0/gcc/optabs.c ./optabs.c
--- /src/orig/gcc-4.5.0/gcc/optabs.c2010-03-19  
12:45:01.0 -0700

+++ ./optabs.c2010-05-07 23:11:36.0 -0700
@@ -45,6 +45,8 @@
 #include "basic-block.h"
 #include "target.h"

+#define DISABLE_FFS
+
 /* Each optab contains info on how this target machine
can perform a particular operation
for all sizes and kinds of operands.
@@ -3240,6 +3242,7 @@
 return temp;
 }

+#ifndef DISABLE_FFS
   /* Try implementing ffs (x) in terms of clz (x).  */
   if (unoptab == ffs_optab)
 {
@@ -3247,6 +3250,7 @@
   if (temp)
 return temp;
 }
+#endif

   /* Try implementing ctz (x) in terms of clz (x).  */
   if (unoptab == ctz_optab)
@@ -3268,7 +3272,11 @@

   /* All of these functions return small values.  Thus we  
choose to

  have them return something that isn't a double-word.  */
-  if (unoptab == ffs_optab || unoptab == clz_optab || unoptab  
== ctz_optab

+  if (
+#ifndef DISABLE_FFS
+  unoptab == ffs_optab ||
+#endif
+unoptab == clz_optab || unoptab == ctz_optab
   || unoptab == popcount_optab || unoptab == parity_optab)
 outmode
   = GET_MODE (hard_libcall_value (TYPE_MODE (integer_type_node),
@@ -6301,7 +6309,9 @@
   init_optab (addcc_optab, UNKNOWN);
   init_optab (one_cmpl_optab, NOT);
   init_optab (bswap_optab, BSWAP);
+#ifndef DISABLE_FFS
   init_optab (ffs_optab, FFS);
+#endif
   init_optab (clz_optab, CLZ);
   init_optab (ctz_optab, CTZ);
   init_optab (popcount_optab, POPCOUNT);
@@ -6558,9 +6568,11 @@
   one_cmpl_optab->libcall_basename = "one_cmpl";
   one_cmpl_optab->libcall_suffix = '2';
   one_cmpl_optab->libcall_gen = gen_int_libfunc;
+#ifndef DISABLE_FFS
   ffs_optab->libcall_basename = "ffs";
   ffs_optab->libcall_suffix = '2';
   ffs_optab->libcall_gen = gen_int_libfunc;
+#endif
   clz_optab->libcall_basename = "clz";
   clz_optab->libcall_suffix = '2';
   clz_optab->libcall_gen = gen_int_libfunc;
@@ -6643,11 +6655,13 @@
   satfractuns_optab->libcall_basename = "satfractuns";
   satfractuns_optab->libcall_gen = gen_satfractuns_conv_libfunc;

+#ifndef DISABLE_FFS
   /* The ffs function operates on `int'.  Fall back on it if we do  
not

  have a libgcc2 function for that width.  */
   if (INT_TYPE_SIZE < BITS_PER_WORD)
 set_optab_libfunc (ffs_optab, mode_for_size (INT_TYPE_SIZE,  
MODE_INT, 0),

"ffs");
+#endif

   /* Explicitly initialize the bswap libfuncs since we need them to  
be

  valid for things other than word_mode.  */


Thanks,
 - Jay



Re: What is the best way to resolve ARM alignment issues for large modules?

2010-05-08 Thread Martin Guy
On 5/7/10, Shaun Pinney  wrote:
>  Essentially, we have code which works fine on x86/PowerPC but fails on ARM 
> due
>  to differences in how misaligned accesses are handled.  The failures occur in
>  multiple large modules developed outside of our team and we need to find a
>  solution.  The best question to sum this up is, how can we use the compiler 
> to
>  arrive at a complete solution to quickly identify all code locations which
>  generate misaligned accesses and/or prevent the compiler from generating
>  misaligned accesses?

Dunno about the compiler, but if you use the Linux kernel you can fiddle with
/proc/cpu/alignment.

By default it's set to 0, which silently gives garbage results when
unaligned accesses are made.

echo 3 > /proc/cpu/alignment

will fix those misalignments using a kernel trap to emulate "correct"
behaviour (i.e. loading from bytes (char *)a to (char *)a + 3 in the
case of an int). Alternatively,

echo 5 > /proc/cpu/alignment

will make an unaligned access cause a Bus Error, which usually kills
the process and you can identify the offending code by running it
under gdb.

Eliminating the unaligned accesses is tedious work, but the result
will run slightly faster than relying on fixups, as well as making it
portable to any word-aligned system.

   M


Re: What is the best way to resolve ARM alignment issues for large modules?

2010-05-08 Thread Mikael Pettersson
Shaun Pinney writes:
 > Hello all,
 > 
 > Essentially, we have code which works fine on x86/PowerPC but fails on ARM 
 > due
 > to differences in how misaligned accesses are handled.  The failures occur in
 > multiple large modules developed outside of our team and we need to find a
 > solution.  The best question to sum this up is, how can we use the compiler 
 > to
 > arrive at a complete solution to quickly identify all code locations which
 > generate misaligned accesses and/or prevent the compiler from generating
 > misaligned accesses?  Thanks for any advice.  I'll go into more detail below.
 > 
 > ---
 > We're using an ARM9 core (ARMv5) and notice that GCC generates misaligned 
 > load
 > instructions for certain modules in our platform.  For these modules, which 
 > work
 > correctly on x86/PowerPC, the misaligned loads causes failures.  This is 
 > because
 > the ARM rounds down misaligned addresses to the correct alignment, performs 
 > the
 > memory load, and rotates the data before placing in a register.  As a 
 > result, a
 > misaligned multi-byte load instruction on ARM actually loads memory below the
 > requested address and does not load all upper bytes from "address" to 
 > "address +
 > size - 1" so it appears to these modules as incorrect data.  On x86/PowerPC,
 > loads do provide bytes from "address" to "address + size - 1" regardless of
 > alignment, so there are no problems.
 > 
 > Fixing the code manually for ARM alignment has difficulties.  Due to the 
 > large
 > code volume of these external modules, it is difficult to identify all 
 > locations
 > which may be affected by misaligned accesses so the code can be rewritten.
 > Currently, the only way to detect these issues is to use -Wcast-align and 
 > view
 > the output to get a list of potential alignment issues.  This appears to 
 > list a
 > large number of false positives so sorting through and doing code 
 > investigation
 > to locate true problems looks very time-consuming.  On the runtime side, 
 > we've
 > enabled alignment exceptions to catch some additional cases, but the problem 
 > is
 > that exceptions are only thrown for running code.  There is always the chance
 > there is some more unexecuted 'hidden' code waiting to fail when the right
 > circumstance occurs.  I'd like to provably remove the problem entirely and
 > quickly.
 > 
 > One idea, to guarantee no load/store alignment problems will affect our 
 > product,
 > was to force the compiler to generate single byte load/store instructions in
 > place of multi byte load/store instructions when the alignment cannot be
 > verified by the compiler.  Such as, for pointer typecasts where the 
 > alignment is
 > increased (e.g. char * to int *), accesses to misaligned fields of packed 
 > data
 > structures, accesses to structure fields not allocated on the stack, etc.  Is
 > this available?  Obviously, this will add performance overhead, but would
 > clearly resolve the issue for affected modules.
 > 
 > Does the ARM compiler provide any other techniques to help with these types 
 > of
 > problems?  It'd be very helpful to find a fast and complete way to do this 
 > work.
 > Thanks!
 > 
 > Thanks again for your advice.
 > 
 > Best regards,
 > Shaun
 > 
 > BTW - our ARM also allows us to change the behavior of multi-byte load/store
 > instructions so they read from 'address' to 'address + size - 1'.  However, 
 > our
 > OS, indicates that it intentionally uses misaligned loads/stores, so changing
 > the ARM's load/store behavior to fix the module alignment problems would 
 > break
 > the OS in unknown places.  Also, because of this we cannot permanently enable
 > alignment exceptions either.  I plan to discuss this more with our OS vendor.

You don't name the platform OS but the obvious solution (to me anyway) is to run
the code on ARM/Linux. On that platform you can instruct the kernel to take 
various
actions on alignment faults. In particular, by

> echo 5 > /proc/cpu/alignment

you tell the kernel to log misalignment traps and then kill the offending 
process.

So you:

1. Run the application. It gets killed.
2. Retrieve the fault PC from the kernel message log.
3. Map it back to the application source. Fix the problem or add debugging code.
4. Repeat from step 1 until all alignment faults have been eliminated.

You can also instruct the kernel to (correctly) handle and emulate misaligned
loads/stores without killing the process. That allows you to run the code 
correctly,
though the fault handling will induce some performance overhead.

If you can't run Linux on your target HW then you could do the debugging in an
ARM emulator such as QEMU.


Re: C++0x Memory model and gcc

2010-05-08 Thread Jean-Marc Bourguet

-fmemory-model=single
Assume single threaded execution, which also means no signal
handlers.
-fmemory-model=fast
The user is responsible for all synchronization.  Accessing
the same memory words from different threads may break
unpredictably.
-fmemory-model=safe
The compiler will do its best to protect you.


With that description, I'd think that "safe" lets the user code assumes
the sequential consistency model.  I'd use -fmemory-model=conformant or
something like that for the model where the compiler assumes that the user
code respect the constraint led out for it by the standard.  As which
constraints are put on user code depend on the languages -- Java has its
own memory model which AFAIK is more constraining than C++ and I think Ada
has its own but my Ada programming days are too far for me to comment on
it -- one may prefer some other name.

Yours,

--
Jean-Marc Bourguet



Re: C++0x Memory model and gcc

2010-05-08 Thread Albert Cohen

Jean-Marc Bourguet wrote:

-fmemory-model=single
Assume single threaded execution, which also means no signal
handlers.
-fmemory-model=fast
The user is responsible for all synchronization.  Accessing
the same memory words from different threads may break
unpredictably.
-fmemory-model=safe
The compiler will do its best to protect you.


With that description, I'd think that "safe" lets the user code assumes
the sequential consistency model.  I'd use -fmemory-model=conformant or
something like that for the model where the compiler assumes that the user
code respect the constraint led out for it by the standard.  As which
constraints are put on user code depend on the languages -- Java has its
own memory model which AFAIK is more constraining than C++ and I think Ada
has its own but my Ada programming days are too far for me to comment on
it -- one may prefer some other name.


I agree. Or even, =c++0x or =gnu++0x

On the other hand, I fail to see the differen between =single and =fast, 
and the explanation about "the same memory word" is not really relevant 
as memory models typically tell you about concurrent accesses to 
"different memory words".


Albert


gcc-4.6-20100508 is now available

2010-05-08 Thread gccadmin
Snapshot gcc-4.6-20100508 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.6-20100508/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 159191

You'll find:

gcc-4.6-20100508.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.6-20100508.tar.bz2 C front end and core compiler

gcc-ada-4.6-20100508.tar.bz2  Ada front end and runtime

gcc-fortran-4.6-20100508.tar.bz2  Fortran front end and runtime

gcc-g++-4.6-20100508.tar.bz2  C++ front end and runtime

gcc-java-4.6-20100508.tar.bz2 Java front end and runtime

gcc-objc-4.6-20100508.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.6-20100508.tar.bz2The GCC testsuite

Diffs from 4.6-20100501 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Compile times for gcc with ppl/cloog backened?

2010-05-08 Thread ajmcello
I've got a quad core 3.2Ghz FreeBSD-8 system with 8GB of ram. I
compiled and installed Cloog-PPL and PPL, mpfr, gmp, mpc, polylib,
etc. I'm using make -j 4, and my gcc compile has been going for about
24 hours. Is this normal or did something go terribly wrong? It is
still building, but each file uses about 20-30 hours of CPU time to
build (according to top). I think this is the 3rd compile. After the
first, I recompiled all of my tools (binutils, coreutils, etc) with
the new gcc and then started the build over. This last one, I think
stage1 and stage2 went by relatively quick, but the 3rd and final
stage has been taking forever. I wrote a test program and compiled it
with prev-gcc/xgcc with no arguments, and it took about 5 seconds.
Just curious.

Here's my runtime options. Thanks.


./configure --prefix=/usr/gnu \
--disable-werror \
--disable-ppl-version-check \
--disable-cloog-version-check \
--enable-cxx \
--enable-shared \
--enable-static \
--enable-bootstrapp \
--enable-lto \
--enable-objc-gc \
--enable-gold \
--enable-stage1-checking=all \
--enable-64-bit-bfd  \
--enable-languages=c,c++,lto \
--with-pic \
--with-libiconv=/usr/gnu \
--with-gmp-include=/usr/gnu/include \
--with-gmp-lib=/usr/gnu/lib \
--with-gmp-build=/wl2k/src/gmp/gmp-5.0.1 \
--with-gmp=/usr/gnu \
--with-mpfr-include=/usr/gnu/include \
--with-mpfr-lib=/usr/gnu/lib \
--with-mpfr-build=/wl2k/src/mpfr/mpfr-2.4.2 \
--with-mpfr=/usr/gnu \
--with-cloog=/usr/gnu \
--with-cloog-include=/usr/gnu/include \
--with-cloog-lib=/usr/gnu/lib \
--with-ppl=/usr/gnu \
--with-ppl-include=/usr/gnu/include \
--with-ppl-lib=/usr/gnu/lib \
--with-mpc=/usr/gnu \
--with-mpc-include=/usr/gnu/include \
--with-mpc-lib=/usr/gnu/lib \
--with-gnu-ld \
--with-ld=/usr/gnu/bin/ld \
--with-gnu-as \
--with-as=/usr/gnu/bin/as