date:20130118

Re: Imprecise data flow analysis leads to code bloat

2013-01-18 Thread Richard Biener

On Thu, Jan 17, 2013 at 6:04 PM, Georg-Johann Lay  wrote:
> Richard Biener wrote:
>> On Thu, Jan 17, 2013 at 12:20 PM, Georg-Johann Lay wrote:
>>> Hi, suppose the following C code:
>>>
>>>
>>> static __inline__ __attribute__((__always_inline__))
>>> _Fract rbits (const int i)
>>> {
>>> _Fract f;
>>> __builtin_memcpy (&f, &i, sizeof (_Fract));
>>> return f;
>>> }
>>>
>>> _Fract func (void)
>>> {
>>> #if B == 1
>>> return rbits (0x1234);
>>> #elif B == 2
>>> return 0.14222r;
>>> #endif
>>> }
>>>
>>>
>>> Type-punning idioms like in rbits above are very common in libgcc, for 
>>> example
>>> in fixed-bit.c.
>>>
>>> In this example, both compilation variants are equivalent (provided int and
>>> _Fract are 16 bits wide).  The problem with the B=1 variant is that it is
>>> inefficient:
>>>
>>> Variant B=1 needs 2 instructions.
>>
>> B == 1 shows that fold-const.c native_interpret/encode_expr lack
>> support for FIXED_POINT_TYPE.
>>
>>> Variant B=2 needs 11 instructions, 9 of them are not needed at all.
>
> I confused B=1 and B=2.  The inefficient case with 11 instructions is B=1, of
> course.
>
> Would a patch like below be acceptable in the current stage?

I'd be fine with it (not sure about test coverage).  But please also add
native_encode_fixed.

> It's only the native_interpret and pretty much like the int case.
>
> Difference is that it rejects if the sizes don't match exactly.

Hmm, yeah.  I'm not sure why the _interpret routines chose to ignore
tail padding ... was there any special correctness reason you did it
differently than the int variant?

>  A new function
> is used for low-level construction of a const_fixed_from_double_int.  Isn't
> there a better way?  I wonder that such functionality is not already there...

Good question - there are probably more places that could make use of
this.

> A quick test shows that it works reasonable on AVR.
>
>>> The problem goes as follows:
>>>
>>> The memcpy is represented as a VIEW_CONVERT_EXPR<_Fract> and expanded to 
>>> memory
>>> moves through the frame:
>>>
>>> (insn 5 4 6 (set (reg:HI 45)
>>> (const_int 4660 [0x1234])) bloat.c:5 -1
>>>  (nil))
>>>
>>> (insn 6 5 7 (set (mem/c:HI (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])
>>> (reg:HI 45)) bloat.c:5 -1
>>>  (nil))
>>>
>>> (insn 7 6 8 (set (reg:HQ 46)
>>> (mem/c:HQ (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])) bloat.c:12 -1
>>>  (nil))
>>>
>>> (insn 8 7 9 (set (reg:HQ 43 [  ])
>>> (reg:HQ 46)) bloat.c:12 -1
>>>  (nil))
>>>
>>>
>>> Is there a specific reason why this is not expanded as subreg like this?
>>>
>>>
>>>   (set (reg:HQ 46)
>>>(subreg:HQ [(reg:HI 45)] 0))
>>
>> Probably your target does not allow this?  Not sure, but then a testcase like
>
> This (movhq) is supported, similar for other fixed-point modes moves:
>
> QQ, UQQ, HQ, UHQ, HA, UHA, SQ, USQ, SA, USA, DQ, UDQ, DA, UDA, TA, UTA.
>
>
>> static __inline__ __attribute__((__always_inline__))
>> _Fract rbits (const int i)
>> {
>>   _Fract f;
>>   __builtin_memcpy (&f, &i, sizeof (_Fract));
>>   return f;
>> }
>>
>> _Fract func (int i)
>> {
>>   return rbits (i);
>> }
>>
>> would be more interesting, as your testcase boils down to a constant
>> folding issue (see above).
>
> This case is optimized fine and the generated code is nice.

I see - so it's really confused about the constant-ness.  Yeah, I can imagine
that.

Thanks,
Richard.

>>> The insns are analyzed in .dfinit:
>>>
>>> ;;  regs ever live   24[r24] 25[r25] 29[r29]
>>>
>>> 24/25 is the return register, 28/29 is the frame pointer:
>>>
>>>
>>> (insn 6 5 7 2 (set (mem/c:HI (plus:HI (reg/f:HI 28 r28)
>>> (const_int 1 [0x1])) [2 S2 A8])
>>> (reg:HI 45)) bloat.c:5 82 {*movhi}
>>>
>>>
>>> Is there a reason why R28 is not marked as live?
>>>
>>>
>>> The memory accesses are optimized out in .fwprop2 so that no frame pointer 
>>> is
>>> needed any more.
>>>
>>> However, in the subsequent passes, R29 is still reported as "regs ever live"
>>> which is not correct.
>>>
>>> The "regs ever live" don't change until .ira, where R28 springs to live 
>>> again:
>>>
>>> ;;  regs ever live   24[r24] 25[r25] 28[r28] 29[r29]
>>>
>>> And in .reload:
>>>
>>> ;;  regular block artificial uses28 [r28] 32 [__SP_L__]
>>> ;;  eh block artificial uses 28 [r28] 32 [__SP_L__] 34 [argL]
>>> ;;  entry block defs 8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 
>>> 14
>>> [r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 
>>> [r22]
>>> 23 [r23] 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
>>> ;;  exit block uses  24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
>>> ;;  regs ever live   24[r24] 25[r25] 28[r28] 29[r29]
>>>
>>> Outcome is that the frame pointer is set up without need, which is very 
>>> costly
>>> on avr.
>>>
>>>
>>> The compiler is trunk from 2013-01-16 configured for avr:
>>>
>>>--target=avr --enable-languages=c,c++ --disable-nls --with-dwarf2
>>>
>>> The example

Re: Bootstrapping glibc vs. dependency on system headers

2013-01-18 Thread Thomas Schwinge

Hi!

On Thu, 17 Jan 2013 17:18:41 +, "Joseph S. Myers"  
wrote:
> Really, for glibc bootstrapping I don't think you want to include any 
> headers there.  If $CPP is defined and nonempty, use that, otherwise use 
> $CC -E; no testing for a "working" preprocessor is needed; we require GCC 
> 4.3 or later for building glibc.

OK.  I'd still be interested in hearing from the Autoconf folks whether
something should be done on Autoconf side, too.  Also, regarding the
/lib/cpp fallback in the cross-compilation case.

> > Issue 3: Assuming fixing Autoconf is the way to go, what do we do in
> > glibc until we upgrade to the respective future version of Autoconf?
> > Supply our own copy of _AC_PROG_PREPROC_WORKS_IFELSE (or AC_PROG_CPP)?
> 
> Yes.  There's already code in configure.in to do something special with 
> _AC_INCLUDES_DEFAULT_REQUIREMENTS.

Yep -- I had submitted an equivalent patch months before H.J.'s was then
applied.  ;-)

Here's now one for AC_PROG_CPP, borrowing from Autoconf's definition of
AC_PROG_CPP, leaving out the "details" we're not interested in (in
particular also the final »C preprocessor "$CPP" [...] sanity check«).
Tested on x86_64 GNU/Linux for a ARM GNU/Linux host.  OK to apply?

* configure.in (AC_PROG_CPP): New definition.

diff --git configure.in configure.in
index 05cbad5..ee72c17 100644
--- configure.in
+++ configure.in
@@ -17,6 +17,32 @@ AC_DEFUN([_AC_INCLUDES_DEFAULT_REQUIREMENTS],
   [m4_divert_text([DEFAULTS],
 [ac_includes_default='/* none */'])])
 
+# We require GCC, and by default use its preprocessor.  Override AC_PROG_CPP
+# here to work around the Autoconf issue discussed in
+# 
.
+AC_DEFUN([AC_PROG_CPP],
+[AC_REQUIRE([AC_PROG_CC])dnl
+AC_ARG_VAR([CPP],  [C preprocessor])dnl
+_AC_ARG_VAR_CPPFLAGS()dnl
+AC_MSG_CHECKING([how to run the C preprocessor])
+# On Suns, sometimes $CPP names a directory.
+if test -n "$CPP" && test -d "$CPP"; then
+  CPP=
+fi
+if test -z "$CPP"; then
+  AC_CACHE_VAL([ac_cv_prog_CPP],
+  [dnl
+ac_cv_prog_CPP="$CC -E"
+  ])dnl
+  CPP=$ac_cv_prog_CPP
+else
+  ac_cv_prog_CPP=$CPP
+fi
+AC_MSG_RESULT([$CPP])
+AC_SUBST(CPP)dnl
+])# AC_PROG_CPP
+
+
 dnl This is here so we can set $subdirs directly based on configure fragments.
 AC_CONFIG_SUBDIRS()
 


Grüße,
 Thomas


pgpQySWnlx_JK.pgp
Description: PGP signature

Re: Graphite TODO tasks

2013-01-18 Thread Richard Biener

On Fri, Jan 18, 2013 at 8:28 AM, Shakthi Kannan  wrote:
> Hi,
>
> --- On Thu, Jan 17, 2013 at 5:53 PM, Richard Biener
>  wrote:
> | It's ISL 0.11.1 actually.  0.18.0 is the current CLooG version.
> \--
>
> Thanks. I downloaded cloog-0.18.0, compiled and installed the same using:
>
>   $ cd cloog-0.18.0
>   $ mkdir build
>   $ cd build
>   $ ../configure
>   $ make
>   $ sudo make install
>
> It installed in /usr/local. 'cloog' is present in /usr/local/bin, and
> the following in /usr/local/lib:
>
> libcloog-isl.a
> libcloog-isl.la
> libcloog-isl.so
> libcloog-isl.so.4
> libcloog-isl.so.4.0.0
> libisl.a
> libisl.la
> libisl.so
> libisl.so.10
> libisl.so.10.1.1
> libisl.so.10.1.1-gdb.py
> pkgconfig
>
> I checked out gcc trunk from svn, and used the following to compile:
>
>   $ cd trunk
>   $ mkdir build
>   $ cd build
>   $ ../configure --with-cloog=/usr/local --with-isl=/usr/local
>
> configure fails with
>
>   checking for version 0.10 of ISL... no
>   checking for version 0.11 of ISL... no
>
> The relevant output in config.log is:
>
> === config.log ===
>
> configure:5838: checking for version 0.10 of ISL
> configure:5857: gcc -o conftest -g -O2 -I/usr/local/include
> -L/usr/local/lib conftest.c  -lisl >&5
> configure:5857: $? = 0
> configure:5857: ./conftest
> configure:5857: $? = 1
> configure: program exited with status 1
> configure: failed program was:
> | /* confdefs.h */
> | #define PACKAGE_NAME ""
> | #define PACKAGE_TARNAME ""
> | #define PACKAGE_VERSION ""
> | #define PACKAGE_STRING ""
> | #define PACKAGE_BUGREPORT ""
> | #define PACKAGE_URL ""
> | #define LT_OBJDIR ".libs/"
> | /* end confdefs.h.  */
> | #include 
> |#include 
> | int
> | main ()
> | {
> | if (strncmp (isl_version (), "isl-0.10", strlen ("isl-0.10")) != 0)
> |  return 1;
> |
> |   ;
> |   return 0;
> | }
> configure:5866: result: no
> configure:5887: checking for version 0.11 of ISL
> configure:5906: gcc -o conftest -g -O2 -I/usr/local/include
> -L/usr/local/lib conftest.c  -lisl >&5
> configure:5906: $? = 0
> configure:5906: ./conftest
> configure:5906: $? = 1
> configure: program exited with status 1
> configure: failed program was:
> | /* confdefs.h */
> | #define PACKAGE_NAME ""
> | #define PACKAGE_TARNAME ""
> | #define PACKAGE_VERSION ""
> | #define PACKAGE_STRING ""
> | #define PACKAGE_BUGREPORT ""
> | #define PACKAGE_URL ""
> | #define LT_OBJDIR ".libs/"
> | /* end confdefs.h.  */
> | #include 
> |#include 
> | int
> | main ()
> | {
> | if (strncmp (isl_version (), "isl-0.11", strlen ("isl-0.11")) != 0)
> |  return 1;
> |
> |   ;
> |   return 0;
> | }
> configure:5915: result: no
> configure:5950: error: Unable to find a usable ISL.  See config.log for 
> details.
>
> === END ===
>
> What could I be missing? What is the recommended approach to build gcc with 
> isl?

Wild guess is that you miss /usr/local/lib{,64} in your LD_LIBRARY_PATH
so the built configure test cannot be executed because the dynamic linker
does not find ISL.

Richard.

> Appreciate any help in this regard,
>
> Thanks!
>
> SK
>
> --
> Shakthi Kannan
> http://www.shakthimaan.com

Re: GCC cannot move address calculation to store+load?

2013-01-18 Thread Richard Biener

On Fri, Jan 18, 2013 at 9:32 AM, Konstantin Vladimirov
 wrote:
> Hi,
>
> Faced this problem in private backend, but it can be easily reproduced
> on x86 GCC:
>
> Sample code (test.c):
>
> int a;
>
> int foo(int *x, int y)
> {
>   a = x[(y << 1)];
>   x[(y << 1)] = y;
>   return 0;
> }
>
> Compile with gcc-4.7.2:
>
> $ gcc --version
> gcc (GCC) 4.7.2
> Copyright (C) 2012 Free Software Foundation, Inc.
>
> Command line is:
>
> $ gcc -O2 -S -m32 -dp test.c
>
> Yields code:
>
> foo:
> .LFB0:
>   .cfi_startproc
>   movl  8(%esp), %edx # 3 *movsi_internal/1 [length = 4]
>   leal  0(,%edx,8), %eax  # 25  *leasi  [length = 7]
>   addl  4(%esp), %eax # 9 *addsi_1/1  [length = 4]

I think you'd need to intermediate combine the lea and the add to

 movl 4(%esp) %ecx
 leal 0%(%ecx,%edx,8), %eax

only then fwprop may consider combining this with the load/stores.
Note that combine does not apply because %eax is used multiple
times.  This also means that for code-size the combining is not a good
idea.

Richard.

>   movl  (%eax), %ecx  # 10  *movsi_internal/1 [length = 2]
>   movl  %ecx, a # 11  *movsi_internal/2 [length = 6]
>   movl  %edx, (%eax)  # 12  *movsi_internal/2 [length = 2]
>   xorl  %eax, %eax  # 29  *movsi_xor  [length = 2]
>   ret # 28  simple_return_internal  [length = 1]
>
> It is obvious, that it can be rewritten without prior leal much better:
>
> foo:
> .LFB0:
>   .cfi_startproc
>   movl  8(%esp), %eax # 3 *movsi_internal/1 [length = 4]
>   movl  4(%esp), %edx # 19  *movsi_internal/1 [length = 4]
>   movl  (%edx,%eax,8), %ecx  # 8 *movsi_internal/1 [length = 3]
>   movl  %ecx, a # 11  *movsi_internal/2 [length = 6]
>   movl  %eax, (%edx,%eax,8)  # 8 *movsi_internal/2 [length = 3]
>   xorl  %eax, %eax  # 29  *movsi_xor  [length = 2]
>   ret # 28  simple_return_internal  [length = 1]
>
> (this assembler is handwritten, summary instruction count -1, summery length 
> -5)
>
> Key Idea is that common address here, that forms leal is profitable to
> be not calculated standalone, but moved into address operands in store
> and in load.
>
> When we have only store or only load, this job is done by combining.
>
> But it seems, that combine pass even don't try store+load.
>
> Am I missing something?
>
> P.S. Posted similar one to gcc-help, got no response, trying to repost
> here (sorry for possible offtopic).
>
> ---
> With best regards, Konstantin

Re: Graphite TODO tasks

2013-01-18 Thread Shakthi Kannan

Hi

--- On Fri, Jan 18, 2013 at 3:26 PM, Richard Biener
 wrote:
| Wild guess is that you miss /usr/local/lib{,64} in your LD_LIBRARY_PATH
| so the built configure test cannot be executed because the dynamic linker
| does not find ISL.
\--

By default, cloog installs to /usr/local/, so setting LD_LIBRARY_PATH
shouldn't be necessary. I was able to run configure in gcc trunk
using:

  $ ../configure --with-cloog=/usr/local --with-isl=/usr/local
--disable-isl-version-check

Thanks for your reply,

SK

-- 
Shakthi Kannan
http://www.shakthimaan.com

Re: Bootstrapping glibc vs. dependency on system headers

2013-01-18 Thread Thomas Schwinge

Hi!

On Thu, 17 Jan 2013 18:09:33 +0100, I wrote:
> Also known as: »I found another one«.

(That's the last one I'm currently seeing.)  Again depending on
 usability, we either get:

checking for [GCC] option to accept ISO C89... none needed

Or:

checking for [GCC] option to accept ISO C89... unsupported

As setting »ac_cv_prog_cc_c89=no« (which we never check for) is all what
this test does, and we're C89 always by our compiler requirements, it is
safe to simply elide it.  Tested on x86_64 GNU/Linux for a ARM GNU/Linux
host.  OK to apply?

* configure.in (_AC_PROG_CC_C89): New definition.

diff --git configure.in configure.in
index ee72c17..edc7f72 100644
--- configure.in
+++ configure.in
@@ -42,6 +42,10 @@ AC_MSG_RESULT([$CPP])
 AC_SUBST(CPP)dnl
 ])# AC_PROG_CPP
 
+# We require GCC.  Override _AC_PROG_CC_C89 here to work around the Autoconf
+# issue discussed in [idem].
+AC_DEFUN([_AC_PROG_CC_C89], [[$1]])
+
 
 dnl This is here so we can set $subdirs directly based on configure fragments.
 AC_CONFIG_SUBDIRS()


Grüße,
 Thomas


pgpOH03ebrAP5.pgp
Description: PGP signature

Re: Imprecise data flow analysis leads to code bloat

2013-01-18 Thread Georg-Johann Lay

Richard Biener wrote:
> On Thu, Jan 17, 2013 at 6:04 PM, Georg-Johann Lay wrote:
>> Richard Biener wrote:
>>> On Thu, Jan 17, 2013 at 12:20 PM, Georg-Johann Lay wrote:
 Hi, suppose the following C code:

 static __inline__ __attribute__((__always_inline__))
 _Fract rbits (const int i)
 {
 _Fract f;
 __builtin_memcpy (&f, &i, sizeof (_Fract));
 return f;
 }

 _Fract func (void)
 {
 #if B == 1
 return rbits (0x1234);
 #elif B == 2
 return 0.14222r;
 #endif
 }

 Type-punning idioms like in rbits above are very common in libgcc, for 
 example
 in fixed-bit.c.

 In this example, both compilation variants are equivalent (provided int and
 _Fract are 16 bits wide).  The problem with the B=1 variant is that it is
 inefficient:

 Variant B=1 needs 2 instructions.
>>> B == 1 shows that fold-const.c native_interpret/encode_expr lack
>>> support for FIXED_POINT_TYPE.
>>>
 Variant B=2 needs 11 instructions, 9 of them are not needed at all.
>> I confused B=1 and B=2.  The inefficient case with 11 instructions is B=1, of
>> course.
>>
>> Would a patch like below be acceptable in the current stage?
> 
> I'd be fine with it (not sure about test coverage).  But please also add
> native_encode_fixed.

Yes, of course.  Just wanted to know if the change is in order in principle.

As far as test cases are concerned: Is, for instance, __xFRACT_EPSILON__ always
represented as 1 if the bits are regarded as integer?  What's with padded
fixed-points as mentioned below?

>> It's only the native_interpret and pretty much like the int case.
>>
>> Difference is that it rejects if the sizes don't match exactly.
> 
> Hmm, yeah.  I'm not sure why the _interpret routines chose to ignore
> tail padding ... was there any special correctness reason you did it
> differently than the int variant?

Well, I am interested in this optimization.  But I am also interested in
learning (by doing) about GCC, which means I am unsure about most corners of 
GCC.

In this case I don't know why padding should occur in the first place because
view_convert_expr only allows same-size transformations.

Moreover, I am unsure about padding in a fixed-point itself.  Mode definition
mumbles something about possible padding in the type, but the compiler only
allows to set IBIT, FBIT and the mode size.

Now suppose a 32-bit, little-endian target and an 8-bit like QQ.  The target
want to pad the QQ in such a way that it is at the high end of the register,
i.e. the QQ is stored as 8-bit value, but when loaded to a 32-bit register it
shall be loaded at the high end.

How would one express this in GCC? Obviously, implementing the QQ insns
appropriately is not enough because libgcc needs to know the layout.  Moreover,
it should even work without insns if everything is lowered in int operations by
optabs.

>>  A new function
>> is used for low-level construction of a const_fixed_from_double_int.  Isn't
>> there a better way?  I wonder that such functionality is not already there...
> 
> Good question - there are probably more places that could make use of
> this.

Is there a specific reason for why native_interpret_int looks like it does?
Historical reasons? Performance?

I'd like to move most of the buffer encode / decode stuff to double_int so that
these internals are inside the double_int.  int and fixed cases would be tidied
up and be clearer, but it introduced overhead.

The code will be effectively serialization / deserialization with some special
treatment for endianess.  Isn't such code already present for LTO or PCH or
similar?

Johann

Re: mips16 and nomips16

2013-01-18 Thread Richard Sandiford

Sorry for the slow reply, only just saw this.

reed kotler  writes:
> On 01/14/2013 04:50 PM, David Daney wrote:
>> On 01/14/2013 04:32 PM, reed kotler wrote:
>>> I'm not understanding why mips16 and nomips16 are not simple inheritable
>>> attributes.
>>
>> The mips16ness of a function must be known by the caller so that the 
>> appropriate version of the JAL/JALX instruction can be emitted
>>
>>
>>>
>>> i..e you should be able to say:
>>>
>>> void foo();
>>> void __attribute((nomips16)) foo();
>>>
>>>
>>> or
>>>
>>> void goo();
>>
>> Any call here would assume nomips16
>>
>>> void __attribute((mips16)) goo();
>>
>> A call here would assume mips16.
>>
>> Which is it?  If you allow it to change, one case will always be 
>> incorrect.
>>
>> Or perhaps I misunderstand the question.
>>
>> David Daney
>>
> I would assume that foo would be nomips16 and goo would be mips16.
>
> The definition of plain foo() or goo() says that nothing is specified.
>
> What is not clear then?
>
> This is how all such other attributes in gcc are handled.

Well, in a way, these are the only such attributes in GCC :-)
I don't think any other port supports switching between different
ISA modes like this.

I think the original authors really wanted "mips16" and "nomips16" to be
type attributes rather than decl attributes.  nomips16 function pointers
and mips16 function pointers would be mutually-incompatible subtypes of
unannotated function pointers, so you would be able to implicitly
convert an annotated function pointer to an unannotated one, but not the
other way, and not between annotated function pointers.  GCC didn't at
the time (and as far as I know still doesn't) have hooks to enforce that
though.  The attributes therefore ended up being implemented as strict
decl attributes in the hope that they could be made type attributes in
future without breaking backwards compatibility.

Not having those hooks means that the validity and semantics of:

 void foo();
 void __attribute((subtype)) foo();

aren't really defined.  Does foo() keep its original type or change to
the subtype?  It's also not defined whether:

 void __attribute((subtype)) foo();
 void foo();

is invalid, or whether the subtype from the first declaration carries
over to the second.  Etc.  (To be clear, I'm not trying to start a
discussion on the right semantics, or anything like that.  I'm just
saying that I don't think the semantics are defined yet, although
I could be wrong.)

FWIW, the original implementation came from MTI, but it was a while ago
and I no longer have a record of the discussion.  (It was discussed and
submitted under a CodeSourcery contract.)  I might be misrepresenting things.

If you (MTI) are sure that we don't want them to be type attributes,
and that we should treat them more like optimisation switches, then we
can probably remove the check.  I think it's a one-way street though.

Richard

Re: Adding Rounding Mode to Operations Opcodes in Gimple and RTL

2013-01-18 Thread Michael Zolotukhin

Sure, the tests are of utmost importance here. By the way, in what
suite should they be?

As for the changes in the compiler itself - what do you think about
introduction of a fake variable, reflecting rounding mode (similar
variables could be introduced for exception flags and other
properties). How difficult would it be to do that (we'll need to add
implicit dependencies from those variables to all FP-operations and
kill their values after each call)?

Thanks, Michael

PS: I'll return to this discussion in two weeks, after my vacations.

On 17 January 2013 20:54, Joseph S. Myers  wrote:
> I should add that the recent observation of bugs on some platforms with
> unordered comparisons being wrongly used instead of ordered ones
> illustrates my point about the value of having proper test coverage for
> each individual operation, even though some bugs will only show in more
> complicated code.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com

-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.

Re: Caller save mode on MIPS

2013-01-18 Thread Richard Sandiford

"Fu, Chao-Ying"  writes:
>   From testing, I found out that the whole width of a MIPS 
> integer/floating-point register
> is saved and restored around a call.  This may hurt the performance.
>
> Ex:
> fu@debian6:/disk/fu/dev/test$ cat add2.c
> void test2(float);
>
> float test(float a, float b)
> {
>   test2(a*b);
>   return a;
> }
>
> fu@debian6:/disk/fu/dev/test$ gcc -S -O2 add2.c -mips64r2 -mabi=n32
>
> (# Or )
>
> fu@debian6:/disk/fu/dev/test$ gcc -S -O2 add2.c -mips32r2 -mfp64
>
> fu@debian6:/disk/fu/dev/test$ grep f0 add2.s
> mov.s   $f0,$f12 <
> sdc1$f0,24($sp) <
> ldc1$f0,24($sp) <
>
> The 64-bit $f0 is saved and restored via sdc1 and ldc1.  However, using lwc1 
> and swc1 should be ok and faster.
>
>   From http://gcc.gnu.org/ml/gcc-patches/2001-02/msg01480.html,
> the patch defines HARD_REGNO_CALLER_SAVE_MODE to return proper mode for i386.
> For MIPS, we may have:
> Ex:
> #define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \
>((MODE) == VOIDmode ? choose_hard_reg_mode ((REGNO), (NREGS), false) \
>: (MODE))
>
>   Any feedback about adding HARD_REGNO_CALLER_SAVE_MODE to MIPS?  Thanks!

Sounds like a good idea.

Richard

Re: GCC cannot move address calculation to store+load?

2013-01-18 Thread Miles Bader

Richard Biener  writes:
> Note that combine does not apply because %eax is used multiple
> times.  This also means that for code-size the combining is not a good
> idea.

Though the lea instruction seems rather large, so in fact the code is
a fair bit smaller without it, e.g. as generated by clang/llvm:

clang/llvm 3.1 (-O2 -m32):

 :
   0:   8b 44 24 08 mov0x8(%esp),%eax
   4:   8b 4c 24 04 mov0x4(%esp),%ecx
   8:   8b 14 c1mov(%ecx,%eax,8),%edx
   b:   89 15 00 00 00 00   mov%edx,0x0
  11:   89 04 c1mov%eax,(%ecx,%eax,8)
  14:   31 c0   xor%eax,%eax
  16:   c3  ret

gcc 4.8 20130113 (-O2 -m32):

 :
   0:   8b 54 24 08 mov0x8(%esp),%edx
   4:   8d 04 d5 00 00 00 00lea0x0(,%edx,8),%eax
   b:   03 44 24 04 add0x4(%esp),%eax
   f:   8b 08   mov(%eax),%ecx
  11:   89 0d 00 00 00 00   mov%ecx,0x0
  17:   89 10   mov%edx,(%eax)
  19:   31 c0   xor%eax,%eax
  1b:   c3  ret

-miles

-- 
Ich bin ein Virus. Mach' mit und kopiere mich in Deine .signature.

Re: Adding Rounding Mode to Operations Opcodes in Gimple and RTL

2013-01-18 Thread Joseph S. Myers

On Fri, 18 Jan 2013, Michael Zolotukhin wrote:

> Sure, the tests are of utmost importance here. By the way, in what
> suite should they be?

I suspect that things such as testing for both -mfpmath=387 and 
-mfpmath=sse indicate a new .exp file.  That may also help for other 
things such as running tests using the same source files for (pragma 
FENV_ACCESS, -f options, ...), (volatile or non-volatile), and other such 
variations over which the tests should be iterated.

Given that a new .exp file is used, it probably goes in a subdirectory of 
gcc.dg, e.g. gcc.dg/ieee/ieee.exp.

> As for the changes in the compiler itself - what do you think about
> introduction of a fake variable, reflecting rounding mode (similar
> variables could be introduced for exception flags and other
> properties). How difficult would it be to do that (we'll need to add
> implicit dependencies from those variables to all FP-operations and
> kill their values after each call)?

I have no comments on this as an implementation approach (but thread-local 
variables, read and modified by various operations and function calls, are 
essentially what rounding mode and exception state are in C language 
terms).

-- 
Joseph S. Myers
jos...@codesourcery.com

Pipe Line

2013-01-18 Thread Chassin

Hi , i am trying to understand the Gcc 4.6 pipeline everything was good 
tell i arrived to lang_hooks.parse_file , i cant figuration if this is 
right or what


(CALL)-> finich_function { cgraph_finilize_function -> 
cgraph_analize_function {cgraph_lower_function}} ?




--
Chaddy Huussin Vazquez , chas...@ceis.cujae.edu.cu

Superior Polytechnic Institute ‘Jose Antonio Echeverrıa’
Informatics Engineering Faculty



48 Aniversario del Instituto Superior Politecnico Jose Antonio Echeverria, Cujae
Una obra de la Revolucion Cubana | 2 de diciembre de 1964 | http://cujae.edu.cu



Consulte la enciclopedia colaborativa cubana. http://www.ecured.cu

gcc-4.6-20130118 is now available

2013-01-18 Thread gccadmin

Snapshot gcc-4.6-20130118 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.6-20130118/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_6-branch 
revision 195306

You'll find:

 gcc-4.6-20130118.tar.bz2 Complete GCC

  MD5=b2398056d14e219efb0139f5d088789f
  SHA1=dc9b80f836b6b96a0490c221d11f807ca1873139

Diffs from 4.6-20130111 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

Re: Imprecise data flow analysis leads to code bloat

Re: Bootstrapping glibc vs. dependency on system headers

Re: Graphite TODO tasks

Re: GCC cannot move address calculation to store+load?

Re: Graphite TODO tasks

Re: Bootstrapping glibc vs. dependency on system headers

Re: Imprecise data flow analysis leads to code bloat

Re: mips16 and nomips16

Re: Adding Rounding Mode to Operations Opcodes in Gimple and RTL

Re: Caller save mode on MIPS

Re: GCC cannot move address calculation to store+load?

Re: Adding Rounding Mode to Operations Opcodes in Gimple and RTL

Pipe Line

gcc-4.6-20130118 is now available

14 matches

Site Navigation

Mail list logo

Footer information