Re: wrong assertion in caller-save.c

2014-01-11 Thread Richard Sandiford
Jeff Law  writes:
> On 01/10/14 14:44, Eric Botcazou wrote:
>>> In my backend movdi pattern has unfortunately code '0' (depends on pattern
>>> declaration order). When gcc tried to determine if my DI regs can be saved
>>> and restore as 'caller saves' (in caller-save.c::init_caller_save()) it
>>> failed on this wrong assertion.
>>
>> I'd arrange for avoiding code 0 instead because this disables the cache.
> Agreed, but the assert in caller-save is still wrong and ought to be fixed.

FWIW, it was fixed in 4.8 and later by making CODE_FOR_nothing be 0
and starting the real instructions at 1.

Thanks,
Richard


Re: wrong assertion in caller-save.c

2014-01-11 Thread Eric Botcazou
> FWIW, it was fixed in 4.8 and later by making CODE_FOR_nothing be 0
> and starting the real instructions at 1.

Indeed, thanks for the heads-up:

2012-07-09  Steven Bosscher  

* gensupport.c (init_rtx_reader_args_cb): Start counting code
generating patterns from 1 to free up 0 for CODE_FOR_nothing.
* gencodes.c (main): Give CODE_FOR_nothing the value 0.  Add
the LAST_INSN_CODE marker at the end.
* genoutput.c (nothing): New static struct data.
(idata): Initialize to ¬hing.
(idata_end): Initialize to ¬hing.next.
(init_insn_for_nothing): New function to create dummy 'nothing' insn.
(main): Use it.
* genpeep.c (insn_code_number): Remove global variable.
(gen_peephole): Take it as an argument instead.
(main): Take insn_code_number from read_md_rtx.
* optabs.h: Revert r161809:
(optab_handlers): Change type of insn_code back to insn_code.
(optab_handler, widening_optab_handler, set_optab_handler,
set_widening_optab_handler, convert_optab_handler,
set_convert_optab_handler, direct_optab_handler,
set_direct_optab_handler): Remove int casts.
Revert to treating the insn_code field as "insn_code".

-- 
Eric Botcazou



Re: [RFC, LRA] Incorrect subreg resolution?

2014-01-11 Thread Richard Sandiford
Tejas Belagod  writes:
> When I relaxed CANNOT_CHANGE_MODE_CLASS to undefined for AArch64, 
> gcc.c-torture/execute/copysign1.c generates incorrect code because LRA cannot 
> seem to handle subregs like
>
>   (subreg:DI (reg:TF hard_reg) 8)
>
> on hard registers where the subreg byte offset is unaligned to a hard 
> register 
> boundary(16 for AArch64). It seems to quietly ignore the 8 and resolves this 
> to 
> incorrect an hard register during reload.
>
> When I compile this test with -O3,
>
> long double
> cl (long double x, long double y)
> {
>return __builtin_copysignl (x, y);
> }
>
> cs.c.213r.ira:
>
> (insn 26 10 33 2 (set (reg:DI 87 [ y+8 ])
>  (subreg:DI (reg:TF 33 v1 [ y ]) 8)) cs.c:4 34 {*movdi_aarch64}
>   (expr_list:REG_DEAD (reg:TF 33 v1 [ y ])
>  (nil)))
> (insn 33 26 35 2 (set (reg:TF 93)
>  (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
>   (expr_list:REG_DEAD (reg:TF 32 v0 [ x ])
>  (nil)))
> (insn 35 33 34 2 (set (reg:DI 92 [ x+8 ])
>  (subreg:DI (reg:TF 93) 8)) cs.c:4 34 {*movdi_aarch64}
>   (nil))
> (insn 34 35 23 2 (set (reg:DI 91 [ x ])
>  (subreg:DI (reg:TF 93) 0)) cs.c:4 34 {*movdi_aarch64}
>   (expr_list:REG_DEAD (reg:TF 93)
>  (nil)))
> 
>
> cs.c.214r.reload
>
> (insn 26 10 33 2 (set (reg:DI 2 x2 [orig:87 y+8 ] [87])
>  (reg:DI 33 v1 [ y+8 ])) cs.c:4 34 {*movdi_aarch64}
>   (nil))
> (insn 33 26 35 2 (set (reg:TF 0 x0 [93])
>  (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
>   (nil))
> (insn 35 33 34 2 (set (reg:DI 1 x1 [orig:92 x+8 ] [92])
>  (reg:DI 1 x1 [+8 ])) cs.c:4 34 {*movdi_aarch64}
>   (nil))
> (insn 34 35 8 2 (set (reg:DI 0 x0 [orig:91 x ] [91])
>  (reg:DI 0 x0 [93])) cs.c:4 34 {*movdi_aarch64}
>   (nil))
> .
>
> You can see the changes to insn 26 before and after reload - the SUBREG_BYTE 
> offset of 8 seems to have been translated to v0 instead of v0.d[1] by 
> get_hard_regno ().
>
> What's interesting here is that the SUBREG_BYTE that is generated for
>
> (subreg:DI (reg:TF 33 v1 [ y ]) 8)
>
> isn't aligned to a hard register boundary on SIMD regs where UNITS_PER_VREG 
> for 
> AArch64 is 16. Therefore when this subreg is resolved, it resolves to v1 
> instead 
> of v1.d[1]. Is this something going wrong in LRA or is this a more 
> fundamental 
> problem with generating subregs of hard regs with unaligned subreg byte 
> offsets? 
> The same subreg on a pseudo works OK because in insn 33, the TF mode is 
> allocated integer registers and all is well.

I think this is the same problem that was being discussed for x86
after your no-op vec-select patch:

   http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00801.html

and long following thread.

I'd still like to solve this in a target-independent way rather than add
an offset to CANNOT_CHANGE_MODE_CLASS, but I haven't had time to look at
it...

Thanks,
Richard


Re: [RFC, LRA] Incorrect subreg resolution?

2014-01-11 Thread H.J. Lu
On Sat, Jan 11, 2014 at 2:12 AM, Richard Sandiford
 wrote:
> Tejas Belagod  writes:
>> When I relaxed CANNOT_CHANGE_MODE_CLASS to undefined for AArch64,
>> gcc.c-torture/execute/copysign1.c generates incorrect code because LRA cannot
>> seem to handle subregs like
>>
>>   (subreg:DI (reg:TF hard_reg) 8)
>>
>> on hard registers where the subreg byte offset is unaligned to a hard 
>> register
>> boundary(16 for AArch64). It seems to quietly ignore the 8 and resolves this 
>> to
>> incorrect an hard register during reload.
>>
>> When I compile this test with -O3,
>>
>> long double
>> cl (long double x, long double y)
>> {
>>return __builtin_copysignl (x, y);
>> }
>>
>> cs.c.213r.ira:
>>
>> (insn 26 10 33 2 (set (reg:DI 87 [ y+8 ])
>>  (subreg:DI (reg:TF 33 v1 [ y ]) 8)) cs.c:4 34 {*movdi_aarch64}
>>   (expr_list:REG_DEAD (reg:TF 33 v1 [ y ])
>>  (nil)))
>> (insn 33 26 35 2 (set (reg:TF 93)
>>  (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
>>   (expr_list:REG_DEAD (reg:TF 32 v0 [ x ])
>>  (nil)))
>> (insn 35 33 34 2 (set (reg:DI 92 [ x+8 ])
>>  (subreg:DI (reg:TF 93) 8)) cs.c:4 34 {*movdi_aarch64}
>>   (nil))
>> (insn 34 35 23 2 (set (reg:DI 91 [ x ])
>>  (subreg:DI (reg:TF 93) 0)) cs.c:4 34 {*movdi_aarch64}
>>   (expr_list:REG_DEAD (reg:TF 93)
>>  (nil)))
>> 
>>
>> cs.c.214r.reload
>>
>> (insn 26 10 33 2 (set (reg:DI 2 x2 [orig:87 y+8 ] [87])
>>  (reg:DI 33 v1 [ y+8 ])) cs.c:4 34 {*movdi_aarch64}
>>   (nil))
>> (insn 33 26 35 2 (set (reg:TF 0 x0 [93])
>>  (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
>>   (nil))
>> (insn 35 33 34 2 (set (reg:DI 1 x1 [orig:92 x+8 ] [92])
>>  (reg:DI 1 x1 [+8 ])) cs.c:4 34 {*movdi_aarch64}
>>   (nil))
>> (insn 34 35 8 2 (set (reg:DI 0 x0 [orig:91 x ] [91])
>>  (reg:DI 0 x0 [93])) cs.c:4 34 {*movdi_aarch64}
>>   (nil))
>> .
>>
>> You can see the changes to insn 26 before and after reload - the SUBREG_BYTE
>> offset of 8 seems to have been translated to v0 instead of v0.d[1] by
>> get_hard_regno ().
>>
>> What's interesting here is that the SUBREG_BYTE that is generated for
>>
>> (subreg:DI (reg:TF 33 v1 [ y ]) 8)
>>
>> isn't aligned to a hard register boundary on SIMD regs where UNITS_PER_VREG 
>> for
>> AArch64 is 16. Therefore when this subreg is resolved, it resolves to v1 
>> instead
>> of v1.d[1]. Is this something going wrong in LRA or is this a more 
>> fundamental
>> problem with generating subregs of hard regs with unaligned subreg byte 
>> offsets?
>> The same subreg on a pseudo works OK because in insn 33, the TF mode is
>> allocated integer registers and all is well.
>
> I think this is the same problem that was being discussed for x86
> after your no-op vec-select patch:
>
>http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00801.html
>
> and long following thread.
>
> I'd still like to solve this in a target-independent way rather than add
> an offset to CANNOT_CHANGE_MODE_CLASS, but I haven't had time to look at
> it...

How about this patch

http://gcc.gnu.org/git/?p=gcc.git;a=patch;h=23023006b946e06b6fd93786585f2f8cd4837956

I tested it on Linux/x86-64 without any regressions.

-- 
H.J.


Surprising Behavior Comparing Floats

2014-01-11 Thread Nick
First, I know that floating point variables should not be compared "raw"
due to the way they're represented.  But the behavior I'm seeing has me
surprised.

Here's a small repo example:

---

#include 

using namespace std;

int main()
{
float f1(4.94f + 0.2f), f2(5.14f), f3(4.94f), f4(0.2f), f5(f3 + f4);
cout << "1) " << "5.14 < 5.14: " << (5.14 < 5.14) << endl;
cout << "2) " << f1 << " < " << f1 << ": " << (f1 < f1) << endl;
cout << "3) " << f2 << " < " << f2 << ": " << (f2 < f2) << endl;
cout << "4) " << f1 << " < " << f2 << ": " << (f1 < f2) << endl;
cout << "5) " << f2 << " < " << f1 << ": " << (f2 < f1) << endl;
cout << "6) " << f2 << " < " << (f3 + f4) << ": " << (f2 < (f3 + f4) )
<< endl;
cout << "7) " << f2 << " < " << f5 << ": " << (f2 < f5) << endl;
}

---

And here's the output from running it:

nick@nimble ~/test2 $ g++ FloatCompare.cpp && ./a.out
1) 5.14 < 5.14: 0
2) 5.14 < 5.14: 0
3) 5.14 < 5.14: 0
4) 5.14 < 5.14: 0
5) 5.14 < 5.14: 0
6) 5.14 < 5.14: 1
7) 5.14 < 5.14: 0


I'm very surprised by the result in #6.  #7 seems to be doing the same
thing, except that it uses a local variable to hold the sum.


Here's my GCC version:

nick@nimble ~/test2 $ gcc -v
Using built-in specs.
COLLECT_GCC=/usr/i686-pc-linux-gnu/gcc-bin/4.7.3/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i686-pc-linux-gnu/4.7.3/lto-wrapper
Target: i686-pc-linux-gnu
Configured
with: /var/tmp/portage/sys-devel/gcc-4.7.3-r1/work/gcc-4.7.3/configure
--host=i686-pc-linux-gnu --build=i686-pc-linux-gnu --prefix=/usr
--bindir=/usr/i686-pc-linux-gnu/gcc-bin/4.7.3
--includedir=/usr/lib/gcc/i686-pc-linux-gnu/4.7.3/include
--datadir=/usr/share/gcc-data/i686-pc-linux-gnu/4.7.3
--mandir=/usr/share/gcc-data/i686-pc-linux-gnu/4.7.3/man
--infodir=/usr/share/gcc-data/i686-pc-linux-gnu/4.7.3/info
--with-gxx-include-dir=/usr/lib/gcc/i686-pc-linux-gnu/4.7.3/include/g
++-v4 --with-python-dir=/share/gcc-data/i686-pc-linux-gnu/4.7.3/python
--enable-languages=c,c++,java --enable-obsolete --enable-secureplt
--disable-werror --with-system-zlib --disable-nls
--enable-checking=release --with-bugurl=https://bugs.gentoo.org/
--with-pkgversion='Gentoo 4.7.3-r1 p1.4, pie-0.5.5'
--enable-libstdcxx-time --enable-shared --enable-threads=posix
--enable-__cxa_atexit --enable-clocale=gnu --disable-multilib
--disable-altivec --disable-fixed-point --with-arch=i686
--enable-targets=all --enable-libgomp --enable-libmudflap
--disable-libssp --disable-libquadmath --enable-lto --without-cloog
--without-ppl
Thread model: posix
gcc version 4.7.3 (Gentoo 4.7.3-r1 p1.4, pie-0.5.5) 

I also have GCC 4.8 on my system and the result is the same.


Is this expected behavior?

Best regards,
Nick




Re: Surprising Behavior Comparing Floats

2014-01-11 Thread Marc Glisse

On Sat, 11 Jan 2014, Nick wrote:


First, I know that floating point variables should not be compared "raw"
due to the way they're represented.  But the behavior I'm seeing has me
surprised.


First, this is not an appropriate list for this question. gcc-help would 
be better. Second, there are hundreds of places on the internet answering 
this same question.



Is this expected behavior?


Yes.

--
Marc Glisse


Re: Surprising Behavior Comparing Floats

2014-01-11 Thread Rob

On Sat, 11 Jan 2014, Nick wrote:

I'm very surprised by the result in #6.  #7 seems to be doing the same
thing, except that it uses a local variable to hold the sum.


Sounds to me like it could be related to excess precision - checkout the
-ffloat-store option. I don't see it on my machine either way, but I'm
on 4.7.2.

Rob


Re: Surprising Behavior Comparing Floats

2014-01-11 Thread Nick

On Sat, 2014-01-11 at 16:24 +0100, Marc Glisse wrote:
> First, this is not an appropriate list for this question. gcc-help would 
> be better.
Sorry about that--my e-mail auto completed the address and I wasn't
paying enough attention.

> Second, there are hundreds of places on the internet answering 
> this same question.
> 
> > Is this expected behavior?
> 
> Yes.

Thanks for the quick reply.



Re: Surprising Behavior Comparing Floats

2014-01-11 Thread Nick

On Sat, 2014-01-11 at 15:24 +, Rob wrote:
> On Sat, 11 Jan 2014, Nick wrote:
> > I'm very surprised by the result in #6.  #7 seems to be doing the same
> > thing, except that it uses a local variable to hold the sum.
> 
> Sounds to me like it could be related to excess precision - checkout the
> -ffloat-store option. I don't see it on my machine either way, but I'm
> on 4.7.2.

Thank you very much!  The -ffloat-store option not only addresses the
behavior I'm seeing, but the information in the man page for this option
gives me a great starting point for information about why it behaves
that way.

Best regards,
Nick




gcc-4.7-20140111 is now available

2014-01-11 Thread gccadmin
Snapshot gcc-4.7-20140111 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.7-20140111/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.7 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_7-branch 
revision 206559

You'll find:

 gcc-4.7-20140111.tar.bz2 Complete GCC

  MD5=dce7fbdc1db8ef8c984e202a9306cddb
  SHA1=babd73d44aa9b10b38573c4d0e71f63b65c3b043

Diffs from 4.7-20140104 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.7
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


gcc 4.x and support of x87 FPU in libstdc++

2014-01-11 Thread Denis K
Hello,

I've been trying to compile gcc 4.5.4 from the sources using
--with-fpmath=387 but I'm getting this error: "Invalid
--with-fpmath=387". I looked in the configs and found that it doesn't
support this option:

case ${with_fpmath} in
  avx)
tm_file="${tm_file} i386/avxmath.h"
;;
  sse)
tm_file="${tm_file} i386/ssemath.h"
;;
  *)
echo "Invalid --with-fpmath=$with_fpmath" 1>&2
exit 1

Basically, I started this whole thing because I need to supply a
statically linked executable for an old target platform (in fact, it's
an old Celeron but without any SSE2 instructions that are apparently
used by libstdc++ by DEFAULT). The executable crashes at the first
instruction (movq XMM0,...) coming from copying routines in the
internals of libstdc++ with an "Illegal instruction" message. Is there
any way to resolve this? I need to be on a fairly recent g++ to be
able to port my existing code base and it should be all statically
linked as the target OS hardly has anything installed.

I was wondering if it's possible to supply these headers/sources from
an older build to enable support for regular x87 instructions, so that
no SSE instructions are referenced?

PS Right now I'm trying a hack in gcc\config\i386\ssemath.h where I replace

#undef TARGET_FPMATH_DEFAULT
#define TARGET_FPMATH_DEFAULT (TARGET_SSE2 ? FPMATH_SSE : FPMATH_387)

#undef TARGET_SUBTARGET32_ISA_DEFAULT
#define TARGET_SUBTARGET32_ISA_DEFAULT \
   (OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE | OPTION_MASK_ISA_SSE2)

with

#undef TARGET_FPMATH_DEFAULT
#define TARGET_FPMATH_DEFAULT (FPMATH_387)

#undef TARGET_SUBTARGET32_ISA_DEFAULT
#define TARGET_SUBTARGET32_ISA_DEFAULT \
   (OPTION_MASK_ISA_MMX )


But I'm not sure this is going to to work and what sort of side
effects this could cause.


Thanks.


Re: gcc 4.x and support of x87 FPU in libstdc++

2014-01-11 Thread H.J. Lu
On Sat, Jan 11, 2014 at 6:54 PM, Denis K  wrote:
> Hello,
>
> I've been trying to compile gcc 4.5.4 from the sources using
> --with-fpmath=387 but I'm getting this error: "Invalid
> --with-fpmath=387". I looked in the configs and found that it doesn't
> support this option:
>
> case ${with_fpmath} in
>   avx)
> tm_file="${tm_file} i386/avxmath.h"
> ;;
>   sse)
> tm_file="${tm_file} i386/ssemath.h"
> ;;
>   *)
> echo "Invalid --with-fpmath=$with_fpmath" 1>&2
> exit 1
>
> Basically, I started this whole thing because I need to supply a
> statically linked executable for an old target platform (in fact, it's
> an old Celeron but without any SSE2 instructions that are apparently
> used by libstdc++ by DEFAULT). The executable crashes at the first

How did you configure GCC?


-- 
H.J.