Re: missing optimization "a << (b & 63) to a << b"

2013-01-17 Thread Richard Biener
On Thu, Jan 17, 2013 at 2:44 AM, Wei Mi  wrote:
> Hi,
>
> For x86, shift insn will automatically mask the count to 5 bits in 32
> bit mode and to 6 bits in 64 bit mode, so for the testcase below, the
> buf_ << (-end & 63) could be optimized to buf_ << -end. But for trunk
> compiler, some place in the testcase is not optimized.
>
> typedef unsigned long long uint64;
> typedef unsigned int uint32;
>
> class Decoder {
>  public:
>  Decoder() : k_minus_1_(0), buf_(0), bits_left_(0) {}
>  ~Decoder() {}
>
>  uint32 ExtractBits(uint64 end, uint64 start);
>  inline uint32 GetBits(int bits) {
>uint32 val = ExtractBits(bits, 0);
>buf_ >>= bits;
>bits_left_ -= bits;
>return val;
>  }
>
>  uint32 Get(uint32 bits);
>
>  uint32 k_minus_1_;
>  uint64 buf_;
>  unsigned long bits_left_;
> };
>
> uint32 Decoder::ExtractBits(uint64 end, uint64 start) {
>  return (buf_ << (-end & 63)) >> ((start - end) & 63);
> }
>
> uint32 Decoder::Get(uint32 bits) {
>  bits += k_minus_1_;
>  uint32 msbit = (bits > (k_minus_1_ + 1));
>  return GetBits(bits - msbit) | (msbit << (bits - 1));
> }
>
> The assembly generated by "g++ -O2 -S 1.C"
>
> .file   "1.c"
> .text
> .align 2
> .p2align 4,,15
> .globl  _ZN7Decoder11ExtractBitsEyy
> .type   _ZN7Decoder11ExtractBitsEyy, @function
> _ZN7Decoder11ExtractBitsEyy:
> .LFB7:
> .cfi_startproc
> movq8(%rdi), %rax
> movl%esi, %ecx
> negl%ecx
> salq%cl, %rax   ===> Here (-end & 63) is
> optimized away.
> movl%edx, %ecx
> subl%esi, %ecx
> shrq%cl, %rax
> ret
> .cfi_endproc
> .LFE7:
> .size   _ZN7Decoder11ExtractBitsEyy, .-_ZN7Decoder11ExtractBitsEyy
> .align 2
> .p2align 4,,15
> .globl  _ZN7Decoder3GetEj
> .type   _ZN7Decoder3GetEj, @function
> _ZN7Decoder3GetEj:
> .LFB8:
> .cfi_startproc
> movl(%rdi), %eax
> movq8(%rdi), %r8
> addl%eax, %esi
> addl$1, %eax
> movq%r8, %r10
> cmpl%esi, %eax
> movl%esi, %ecx
> setb%al
> movzbl  %al, %eax
> subl%eax, %ecx
> movl%ecx, %edx
> shrq%cl, %r10
> movslq  %ecx, %r9
> negl%edx
> movq%r10, 8(%rdi)
> subq%r9, 16(%rdi)
> andl$63, %edx   ==> Inst A:  the (-end & 63) is
> not optimized away.
> movq%r8, %rdi
> movl%edx, %ecx
> salq%cl, %rdi ==> Inst B: use the result of
> (-end & 63) here
> shrq%cl, %rdi ==> Inst C: use the result of
> (-end & 63) here
> leal-1(%rsi), %ecx
> sall%cl, %eax
> orl %edi, %eax
> ret
> .cfi_endproc
> .LFE8:
>
> In Decoder::Get(), (-end & 63) in Inst A is not optimized away because
> the two (-end & 63) exprs are csed after ExtractBits() is inlined to
> GetBits() then to Get(), then it is a single def feeding to multiple
> down uses, which cannot be optimized by combine phase. This is an old
> problem in combine. It is also the reason why (-end & 63) is optimized
> away in _ZN7Decoder11ExtractBitsEyy.
>
> To overcome the weakness of combine phase for this testcase, I am
> wondering whether we can extend fwprop to solve the problem because
> fwprop handle those "single def to multiple down uses" cases. Existing
> fwprop is restrictive since it only tries to propagate when the src of
> def insn is a reg, subreg or const, and it only tries to propagate
> when simplification after propagate can collapse the expr to a const.
> Can we extend it to try propagate more complex exprs from def to use
> (here propagate the andl operation from Inst A to the shift operands
> in Inst B and Inst C), and let try_fwprop_subst in fwprop.c to decide
> whether to keep the change by comparing the costs?
>
> Or better way to solve the problem? Appreciated a lot!

Combine / simplify-rtx should recognize this at the RTL level for
SHIFT_COUNT_TRUNCATED targets.

Richard.

> Thanks,
> Wei.


Re: Identical basic blocks live long in RTL flow.

2013-01-17 Thread Bin.Cheng
On Thu, Jan 17, 2013 at 1:29 AM, Jan Hubicka  wrote:
>> >
>> >Basic blocks 8/9/10 are identical and live until pass jump2, which is
>> >after register allocation.
>> >I think these duplicated BBs do not contain additional information and
>> >should be better to be removed ASAP, because they might interfere with
>> >other passes like ifcvt.
>> >
>> >So should this issue be handled like in jump pass?
>> In this case, I'd expect DCE to optimize away the redundant assignments.
>
> Yeah, we also used to be able to optimize these out in crossjumping before
> regalloc but with SSA we now produce different pseudos in different copies so 
> I
> think it was path by Steven that disabled crossjumping proir jump2 (both 
> passes
> also ought to be renamed)
>

Unfortunately, this case can't be handled by DCE because just before
pass pass_into_cfg_layout_mode, the rtl is like:


1: NOTE_INSN_DELETED
9: NOTE_INSN_BASIC_BLOCK 2
2: r113:SI=r0:SI
3: r114:SI=r1:SI
4: NOTE_INSN_FUNCTION_BEG
   11: pc={(r113:SI==0)?L39:pc}
  REG_BR_PROB 0x3f6
   12: NOTE_INSN_BASIC_BLOCK 4
   13: r111:SI=zero_extend([r113:SI])
   14: pc={(r111:SI==0x2d)?L43:pc}
  REG_BR_PROB 0x7c7
   15: NOTE_INSN_BASIC_BLOCK 5
   16: pc={(r114:SI!=0)?L20:pc}
  REG_BR_PROB 0x1388
   22: L22:
   17: NOTE_INSN_BASIC_BLOCK 6
6: r110:SI=0
   18: pc=L24
   19: barrier
   20: L20:
   21: NOTE_INSN_BASIC_BLOCK 7
   23: pc={(r111:SI!=0x2b)?L22:pc}
  REG_BR_PROB 0x1f49
   35: NOTE_INSN_BASIC_BLOCK 8
8: r110:SI=0x1
   36: pc=L24
   37: barrier
   39: L39:
   38: NOTE_INSN_BASIC_BLOCK 9
7: r110:SI=0x1
   40: pc=L24
   41: barrier
   43: L43:
   42: NOTE_INSN_BASIC_BLOCK 10
5: r110:SI=0x1
   24: L24:
   25: NOTE_INSN_BASIC_BLOCK 11
   26: r112:SI=r110:SI
   30: r0:SI=r112:SI
   33: use r0:SI

Insns 36/40 is deleted in pass pass_into_cfg_layout_mode to make each
edge possibly fall-through. The edges <8, 11> and <9, 11> still exist
in control flow and insns 7/8 are not dead code. It's just coincident
to have three identical basic blocks here. I don't think any passes
other than cross-jumping can handle this, right?

Thanks
--
Best Regards.


Imprecise data flow analysis leads to code bloat

2013-01-17 Thread Georg-Johann Lay
Hi, suppose the following C code:


static __inline__ __attribute__((__always_inline__))
_Fract rbits (const int i)
{
_Fract f;
__builtin_memcpy (&f, &i, sizeof (_Fract));
return f;
}

_Fract func (void)
{
#if B == 1
return rbits (0x1234);
#elif B == 2
return 0.14222r;
#endif
}


Type-punning idioms like in rbits above are very common in libgcc, for example
in fixed-bit.c.

In this example, both compilation variants are equivalent (provided int and
_Fract are 16 bits wide).  The problem with the B=1 variant is that it is
inefficient:

Variant B=1 needs 2 instructions.
Variant B=2 needs 11 instructions, 9 of them are not needed at all.

The problem goes as follows:

The memcpy is represented as a VIEW_CONVERT_EXPR<_Fract> and expanded to memory
moves through the frame:


(insn 5 4 6 (set (reg:HI 45)
(const_int 4660 [0x1234])) bloat.c:5 -1
 (nil))

(insn 6 5 7 (set (mem/c:HI (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])
(reg:HI 45)) bloat.c:5 -1
 (nil))

(insn 7 6 8 (set (reg:HQ 46)
(mem/c:HQ (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])) bloat.c:12 -1
 (nil))

(insn 8 7 9 (set (reg:HQ 43 [  ])
(reg:HQ 46)) bloat.c:12 -1
 (nil))


Is there a specific reason why this is not expanded as subreg like this?


  (set (reg:HQ 46)
   (subreg:HQ [(reg:HI 45)] 0))


The insns are analyzed in .dfinit:

;;  regs ever live   24[r24] 25[r25] 29[r29]

24/25 is the return register, 28/29 is the frame pointer:


(insn 6 5 7 2 (set (mem/c:HI (plus:HI (reg/f:HI 28 r28)
(const_int 1 [0x1])) [2 S2 A8])
(reg:HI 45)) bloat.c:5 82 {*movhi}


Is there a reason why R28 is not marked as live?


The memory accesses are optimized out in .fwprop2 so that no frame pointer is
needed any more.

However, in the subsequent passes, R29 is still reported as "regs ever live"
which is not correct.

The "regs ever live" don't change until .ira, where R28 springs to live again:

;;  regs ever live   24[r24] 25[r25] 28[r28] 29[r29]

And in .reload:

;;  regular block artificial uses28 [r28] 32 [__SP_L__]
;;  eh block artificial uses 28 [r28] 32 [__SP_L__] 34 [argL]
;;  entry block defs 8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 14
[r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22]
23 [r23] 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
;;  exit block uses  24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
;;  regs ever live   24[r24] 25[r25] 28[r28] 29[r29]

Outcome is that the frame pointer is set up without need, which is very costly
on avr.


The compiler is trunk from 2013-01-16 configured for avr:

   --target=avr --enable-languages=c,c++ --disable-nls --with-dwarf2

The example is compiled with

   avr-gcc bloat.c -S -dp -save-temps -Os -da -fdump-tree-optimized -DB=1


In what stage happens the misoptimization?

Is this worth a PR? (I.e. is there a change that anybody cares?)

Thanks.


Re: not-a-number's

2013-01-17 Thread Vincent Lefevre
On 2013-01-17 06:53:45 +0100, Mischa Baars wrote:
> Also this was not what I intended to do, I was trying to work with quiet
> not-a-numbers explicitly to avoid the 'invalid operation' exception to be
> triggered, so that my program can work it's way through the calculations
> without being terminated by the intervention of a signal handler. When any
> not-a-number shows up in results after execution, then you know something
> went wrong. When the program does what it is supposed to do, you could
> remove any remaining debug code, such as the boundary check in the
> trigonometric functions.

Checking whether a (final) result is NaN is not always the correct way
to detect whether something was wrong, because a NaN can disappear.
For instance, pow(NaN,0) gives 0, not NaN. You need to check the
"invalid operation" exception flag (if you don't want a trap). And
this flag will also be set by the <=, <, >, >= comparisons when one
of the operands is NaN, which should be fine for you.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: not-a-number's

2013-01-17 Thread Vincent Lefevre
On 2013-01-17 06:57:12 +0100, Mischa Baars wrote:
> >What exactly do you mean by "terminate the if"??  Do you mean skip the
> >whole compound statement, including any "else" clause?
> Yes, exactly. Skip it, including the 'else'.

This is not how C works. Perhaps instead of

  if (x < y)
...
  else
...

(where "else" means "in any other case"), you want:

  if (x < y)
...
  else if (x >= y)
...

Then if x is NaN and/or y is NaN, neither the "if" nor the "else"
will be executed.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: Imprecise data flow analysis leads to code bloat

2013-01-17 Thread Richard Biener
On Thu, Jan 17, 2013 at 12:20 PM, Georg-Johann Lay  wrote:
> Hi, suppose the following C code:
>
>
> static __inline__ __attribute__((__always_inline__))
> _Fract rbits (const int i)
> {
> _Fract f;
> __builtin_memcpy (&f, &i, sizeof (_Fract));
> return f;
> }
>
> _Fract func (void)
> {
> #if B == 1
> return rbits (0x1234);
> #elif B == 2
> return 0.14222r;
> #endif
> }
>
>
> Type-punning idioms like in rbits above are very common in libgcc, for example
> in fixed-bit.c.
>
> In this example, both compilation variants are equivalent (provided int and
> _Fract are 16 bits wide).  The problem with the B=1 variant is that it is
> inefficient:
>
> Variant B=1 needs 2 instructions.

B == 1 shows that fold-const.c native_interpret/encode_expr lack
support for FIXED_POINT_TYPE.

> Variant B=2 needs 11 instructions, 9 of them are not needed at all.
>
> The problem goes as follows:
>
> The memcpy is represented as a VIEW_CONVERT_EXPR<_Fract> and expanded to 
> memory
> moves through the frame:
>
> (insn 5 4 6 (set (reg:HI 45)
> (const_int 4660 [0x1234])) bloat.c:5 -1
>  (nil))
>
> (insn 6 5 7 (set (mem/c:HI (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])
> (reg:HI 45)) bloat.c:5 -1
>  (nil))
>
> (insn 7 6 8 (set (reg:HQ 46)
> (mem/c:HQ (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])) bloat.c:12 -1
>  (nil))
>
> (insn 8 7 9 (set (reg:HQ 43 [  ])
> (reg:HQ 46)) bloat.c:12 -1
>  (nil))
>
>
> Is there a specific reason why this is not expanded as subreg like this?
>
>
>   (set (reg:HQ 46)
>(subreg:HQ [(reg:HI 45)] 0))

Probably your target does not allow this?  Not sure, but then a testcase like

static __inline__ __attribute__((__always_inline__))
_Fract rbits (const int i)
{
  _Fract f;
  __builtin_memcpy (&f, &i, sizeof (_Fract));
  return f;
}

_Fract func (int i)
{
  return rbits (i);
}

would be more interesting, as your testcase boils down to a constant
folding issue (see above).

> The insns are analyzed in .dfinit:
>
> ;;  regs ever live   24[r24] 25[r25] 29[r29]
>
> 24/25 is the return register, 28/29 is the frame pointer:
>
>
> (insn 6 5 7 2 (set (mem/c:HI (plus:HI (reg/f:HI 28 r28)
> (const_int 1 [0x1])) [2 S2 A8])
> (reg:HI 45)) bloat.c:5 82 {*movhi}
>
>
> Is there a reason why R28 is not marked as live?
>
>
> The memory accesses are optimized out in .fwprop2 so that no frame pointer is
> needed any more.
>
> However, in the subsequent passes, R29 is still reported as "regs ever live"
> which is not correct.
>
> The "regs ever live" don't change until .ira, where R28 springs to live again:
>
> ;;  regs ever live   24[r24] 25[r25] 28[r28] 29[r29]
>
> And in .reload:
>
> ;;  regular block artificial uses28 [r28] 32 [__SP_L__]
> ;;  eh block artificial uses 28 [r28] 32 [__SP_L__] 34 [argL]
> ;;  entry block defs 8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 14
> [r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22]
> 23 [r23] 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
> ;;  exit block uses  24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
> ;;  regs ever live   24[r24] 25[r25] 28[r28] 29[r29]
>
> Outcome is that the frame pointer is set up without need, which is very costly
> on avr.
>
>
> The compiler is trunk from 2013-01-16 configured for avr:
>
>--target=avr --enable-languages=c,c++ --disable-nls --with-dwarf2
>
> The example is compiled with
>
>avr-gcc bloat.c -S -dp -save-temps -Os -da -fdump-tree-optimized -DB=1
>
>
> In what stage happens the misoptimization?
>
> Is this worth a PR? (I.e. is there a change that anybody cares?)

Well, AVR folks may be interested, or people caring for fixed-point support.

Richard.

> Thanks.


Re: not-a-number's

2013-01-17 Thread Gabriel Paubert
On Thu, Jan 17, 2013 at 12:21:04PM +0100, Vincent Lefevre wrote:
> On 2013-01-17 06:53:45 +0100, Mischa Baars wrote:
> > Also this was not what I intended to do, I was trying to work with quiet
> > not-a-numbers explicitly to avoid the 'invalid operation' exception to be
> > triggered, so that my program can work it's way through the calculations
> > without being terminated by the intervention of a signal handler. When any
> > not-a-number shows up in results after execution, then you know something
> > went wrong. When the program does what it is supposed to do, you could
> > remove any remaining debug code, such as the boundary check in the
> > trigonometric functions.
> 
> Checking whether a (final) result is NaN is not always the correct way
> to detect whether something was wrong, because a NaN can disappear.
> For instance, pow(NaN,0) gives 0, not NaN. 

Actually 1, but that's a detail. NaN typically propagate, but there
are a few exceptions.

> You need to check the
> "invalid operation" exception flag (if you don't want a trap). And
> this flag will also be set by the <=, <, >, >= comparisons when one
> of the operands is NaN, which should be fine for you.

Hmm, are you sure that this part is properly implemented in gcc, using
unordered compare versus ordered compare instructions depending on 
the operator you use?

At least on PPC, floating point compares always use the unordered 
instruction (fcmpu) and never the ordered one (fcmpo), so exceptions
flags are never set. 

Regards,
Gabriel


Re: Graphite TODO tasks

2013-01-17 Thread Shakthi Kannan
Hi,

--- On Mon, Jan 14, 2013 at 9:37 PM, Tobias Grosser  wrote:
| 1) Use isl code generation
|
| isl 0.18 provides a new code generation.
\--

Where can I find isl 0.18 sources?

I see 0.11-1 at:

  http://repo.or.cz/w/isl.git

SK

-- 
Shakthi Kannan
http://www.shakthimaan.com


Re: Graphite TODO tasks

2013-01-17 Thread Richard Biener
On Thu, Jan 17, 2013 at 1:20 PM, Shakthi Kannan  wrote:
> Hi,
>
> --- On Mon, Jan 14, 2013 at 9:37 PM, Tobias Grosser  wrote:
> | 1) Use isl code generation
> |
> | isl 0.18 provides a new code generation.
> \--
>
> Where can I find isl 0.18 sources?

It's ISL 0.11.1 actually.  0.18.0 is the current CLooG version.

> I see 0.11-1 at:
>
>   http://repo.or.cz/w/isl.git
>
> SK
>
> --
> Shakthi Kannan
> http://www.shakthimaan.com


Re: not-a-number's

2013-01-17 Thread Vincent Lefevre
On 2013-01-17 13:08:17 +0100, Gabriel Paubert wrote:
> Hmm, are you sure that this part is properly implemented in gcc, using
> unordered compare versus ordered compare instructions depending on 
> the operator you use?
> 
> At least on PPC, floating point compares always use the unordered 
> instruction (fcmpu) and never the ordered one (fcmpo), so exceptions
> flags are never set. 

Not on x86_64 either. I've reported the following bug:

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56020

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: not-a-number's

2013-01-17 Thread Andreas Schwab
Gabriel Paubert  writes:

> Hmm, are you sure that this part is properly implemented in gcc, using
> unordered compare versus ordered compare instructions depending on 
> the operator you use?

For quiet compare operations you need to use isgreater etc.

> At least on PPC, floating point compares always use the unordered 
> instruction (fcmpu) and never the ordered one (fcmpo), so exceptions
> flags are never set. 

The C standard does not place any requirement on < wrt. exceptions or
lack thereof.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: not-a-number's

2013-01-17 Thread Andreas Schwab
Andreas Schwab  writes:

> The C standard does not place any requirement on < wrt. exceptions or
> lack thereof.

But IEC 559 does, of course.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: not-a-number's

2013-01-17 Thread Mischa Baars

On 01/17/2013 01:08 PM, Gabriel Paubert wrote:

On Thu, Jan 17, 2013 at 12:21:04PM +0100, Vincent Lefevre wrote:

On 2013-01-17 06:53:45 +0100, Mischa Baars wrote:

Also this was not what I intended to do, I was trying to work with quiet
not-a-numbers explicitly to avoid the 'invalid operation' exception to be
triggered, so that my program can work it's way through the calculations
without being terminated by the intervention of a signal handler. When any
not-a-number shows up in results after execution, then you know something
went wrong. When the program does what it is supposed to do, you could
remove any remaining debug code, such as the boundary check in the
trigonometric functions.

Checking whether a (final) result is NaN is not always the correct way
to detect whether something was wrong, because a NaN can disappear.
For instance, pow(NaN,0) gives 0, not NaN.

Actually 1, but that's a detail. NaN typically propagate, but there
are a few exceptions.
Actually it is the correct way, as long as you stick to the conventions. 
A QNaN is not supposed to change into anything, also not with the pow(). 
Only the other way around. Normal numbers can change into QNaN's.



You need to check the
"invalid operation" exception flag (if you don't want a trap). And
this flag will also be set by the <=, <, >, >= comparisons when one
of the operands is NaN, which should be fine for you.

Hmm, are you sure that this part is properly implemented in gcc, using
unordered compare versus ordered compare instructions depending on
the operator you use?

At least on PPC, floating point compares always use the unordered
instruction (fcmpu) and never the ordered one (fcmpo), so exceptions
flags are never set.

Regards,
Gabriel




Re: not-a-number's

2013-01-17 Thread Vincent Lefevre
On 2013-01-17 15:01:20 +0100, Andreas Schwab wrote:
> Andreas Schwab  writes:
> > The C standard does not place any requirement on < wrt. exceptions or
> > lack thereof.

Agreed, this is a "may" (7.12.14), but...

> But IEC 559 does, of course.

and in particular, if __STDC_IEC_559__ is defined, then raising
the exception is required.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: not-a-number's

2013-01-17 Thread Mischa Baars

On 01/17/2013 04:33 PM, Mischa Baars wrote:

On 01/17/2013 01:08 PM, Gabriel Paubert wrote:

On Thu, Jan 17, 2013 at 12:21:04PM +0100, Vincent Lefevre wrote:

On 2013-01-17 06:53:45 +0100, Mischa Baars wrote:
Also this was not what I intended to do, I was trying to work with 
quiet
not-a-numbers explicitly to avoid the 'invalid operation' exception 
to be
triggered, so that my program can work it's way through the 
calculations
without being terminated by the intervention of a signal handler. 
When any
not-a-number shows up in results after execution, then you know 
something

went wrong. When the program does what it is supposed to do, you could
remove any remaining debug code, such as the boundary check in the
trigonometric functions.

Checking whether a (final) result is NaN is not always the correct way
to detect whether something was wrong, because a NaN can disappear.
For instance, pow(NaN,0) gives 0, not NaN.

Actually 1, but that's a detail. NaN typically propagate, but there
are a few exceptions.
Actually it is the correct way, as long as you stick to the 
conventions. A QNaN is not supposed to change into anything, also not 
with the pow(). Only the other way around. Normal numbers can change 
into QNaN's.


I'm would be very happy to write you an update to the compiler, but it 
seems I can't get gcc-4.7.2 to compile. I was just enjoying the newest 
glibc, which finally is. Can I now please try to finish the arctan for 
glibc?



You need to check the
"invalid operation" exception flag (if you don't want a trap). And
this flag will also be set by the <=, <, >, >= comparisons when one
of the operands is NaN, which should be fine for you.

Hmm, are you sure that this part is properly implemented in gcc, using
unordered compare versus ordered compare instructions depending on
the operator you use?

At least on PPC, floating point compares always use the unordered
instruction (fcmpu) and never the ordered one (fcmpo), so exceptions
flags are never set.

Regards,
Gabriel






Re: not-a-number's

2013-01-17 Thread Vincent Lefevre
On 2013-01-17 16:33:56 +0100, Mischa Baars wrote:
> Actually it is the correct way, as long as you stick to the
> conventions. A QNaN is not supposed to change into anything, also
> not with the pow(). Only the other way around. Normal numbers can
> change into QNaN's.

The C standard specified pow(NaN,0) = 1, pow(1,NaN) = 1, and
hypot(inf,NaN) = +inf. You may consider that these are bad choices
(I don't like them much either), but this is how these cases have
been specified. You are free to write wrappers for your own programs
and never call pow and hypot directly...

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: not-a-number's

2013-01-17 Thread Mischa Baars

On 01/17/2013 04:46 PM, Vincent Lefevre wrote:

On 2013-01-17 16:33:56 +0100, Mischa Baars wrote:

Actually it is the correct way, as long as you stick to the
conventions. A QNaN is not supposed to change into anything, also
not with the pow(). Only the other way around. Normal numbers can
change into QNaN's.

The C standard specified pow(NaN,0) = 1, pow(1,NaN) = 1, and
hypot(inf,NaN) = +inf. You may consider that these are bad choices
(I don't like them much either), but this is how these cases have
been specified. You are free to write wrappers for your own programs
and never call pow and hypot directly...



Again, personally I think that might just be not the smartest choice. 
QNaN are only useful when normal numbers can be changed into QNaN's but 
not the other way around. Then you can see at the end of any calculation 
if something went wrong or if it gave you the answer you had expected.


Re: Components no longer exist

2013-01-17 Thread Mischa Baars

On 01/17/2013 04:48 PM, Michael Witten wrote:

The documentation here:

   http://gcc.gnu.org/install/download.html

says:

   It is possible to download a full distribution or
   specific components... If you choose to download
   specific components, you must download the core
   GCC distribution plus any language specific
   distributions you wish to use.

However, from what I can tell, this hasn't been the
case since the release of 4.7.0; there is only a
monolithic distribution of source (no separate
components).

Either the components need to be added, or the
documentation needs to be updated.
Sincerely,
Michael Witten


Could you please be so kind to look into this matter, what specific 
components you might want to add to make this distribution worth 
downloading? I have a pretty good arctangent more or less waiting for 
you, although some changes to the documentation are still in order and 
the little problem with the NaN's still needs some attention. Other 
trigonometric functions will be added shortly, but I need to be able to 
compile my compiler to be able to help.


Hope you are willing to help too,

Thanks,
Mischa.



Re: Components no longer exist

2013-01-17 Thread Mischa Baars

On 01/17/2013 04:48 PM, Michael Witten wrote:

The documentation here:

   http://gcc.gnu.org/install/download.html

says:

   It is possible to download a full distribution or
   specific components... If you choose to download
   specific components, you must download the core
   GCC distribution plus any language specific
   distributions you wish to use.

However, from what I can tell, this hasn't been the
case since the release of 4.7.0; there is only a
monolithic distribution of source (no separate
components).

Either the components need to be added, or the
documentation needs to be updated.

Sincerely,
Michael Witten


Perhaps we can exchange parts? It's just that my documentation is not 
finished, you can look at it tough if you like.


Mischa.


Re: Identical basic blocks live long in RTL flow.

2013-01-17 Thread Jeff Law

On 01/17/13 03:39, Bin.Cheng wrote:

On Thu, Jan 17, 2013 at 1:29 AM, Jan Hubicka  wrote:


Basic blocks 8/9/10 are identical and live until pass jump2, which is
after register allocation.
I think these duplicated BBs do not contain additional information and
should be better to be removed ASAP, because they might interfere with
other passes like ifcvt.

So should this issue be handled like in jump pass?

In this case, I'd expect DCE to optimize away the redundant assignments.


Yeah, we also used to be able to optimize these out in crossjumping before
regalloc but with SSA we now produce different pseudos in different copies so I
think it was path by Steven that disabled crossjumping proir jump2 (both passes
also ought to be renamed)



Unfortunately, this case can't be handled by DCE because just before
pass pass_into_cfg_layout_mode, the rtl is like:


 1: NOTE_INSN_DELETED
 9: NOTE_INSN_BASIC_BLOCK 2
 2: r113:SI=r0:SI
 3: r114:SI=r1:SI
 4: NOTE_INSN_FUNCTION_BEG
11: pc={(r113:SI==0)?L39:pc}
   REG_BR_PROB 0x3f6
12: NOTE_INSN_BASIC_BLOCK 4
13: r111:SI=zero_extend([r113:SI])
14: pc={(r111:SI==0x2d)?L43:pc}
   REG_BR_PROB 0x7c7
15: NOTE_INSN_BASIC_BLOCK 5
16: pc={(r114:SI!=0)?L20:pc}
   REG_BR_PROB 0x1388
22: L22:
17: NOTE_INSN_BASIC_BLOCK 6
 6: r110:SI=0
18: pc=L24
19: barrier
20: L20:
21: NOTE_INSN_BASIC_BLOCK 7
23: pc={(r111:SI!=0x2b)?L22:pc}
   REG_BR_PROB 0x1f49
35: NOTE_INSN_BASIC_BLOCK 8
 8: r110:SI=0x1
36: pc=L24
37: barrier
39: L39:
38: NOTE_INSN_BASIC_BLOCK 9
 7: r110:SI=0x1
40: pc=L24
41: barrier
43: L43:
42: NOTE_INSN_BASIC_BLOCK 10
 5: r110:SI=0x1
24: L24:
25: NOTE_INSN_BASIC_BLOCK 11
26: r112:SI=r110:SI
30: r0:SI=r112:SI
33: use r0:SI

Insns 36/40 is deleted in pass pass_into_cfg_layout_mode to make each
edge possibly fall-through. The edges <8, 11> and <9, 11> still exist
in control flow and insns 7/8 are not dead code. It's just coincident
to have three identical basic blocks here. I don't think any passes
other than cross-jumping can handle this, right?
Right.  This has to be caught by tail merging/cross jumping which occurs 
during jump2.


jeff


Re: Adding Rounding Mode to Operations Opcodes in Gimple and RTL

2013-01-17 Thread Joseph S. Myers
I should add that the recent observation of bugs on some platforms with 
unordered comparisons being wrongly used instead of ordered ones 
illustrates my point about the value of having proper test coverage for 
each individual operation, even though some bugs will only show in more 
complicated code.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Imprecise data flow analysis leads to code bloat

2013-01-17 Thread Georg-Johann Lay
Richard Biener wrote:
> On Thu, Jan 17, 2013 at 12:20 PM, Georg-Johann Lay wrote:
>> Hi, suppose the following C code:
>>
>>
>> static __inline__ __attribute__((__always_inline__))
>> _Fract rbits (const int i)
>> {
>> _Fract f;
>> __builtin_memcpy (&f, &i, sizeof (_Fract));
>> return f;
>> }
>>
>> _Fract func (void)
>> {
>> #if B == 1
>> return rbits (0x1234);
>> #elif B == 2
>> return 0.14222r;
>> #endif
>> }
>>
>>
>> Type-punning idioms like in rbits above are very common in libgcc, for 
>> example
>> in fixed-bit.c.
>>
>> In this example, both compilation variants are equivalent (provided int and
>> _Fract are 16 bits wide).  The problem with the B=1 variant is that it is
>> inefficient:
>>
>> Variant B=1 needs 2 instructions.
> 
> B == 1 shows that fold-const.c native_interpret/encode_expr lack
> support for FIXED_POINT_TYPE.
> 
>> Variant B=2 needs 11 instructions, 9 of them are not needed at all.

I confused B=1 and B=2.  The inefficient case with 11 instructions is B=1, of
course.

Would a patch like below be acceptable in the current stage?

It's only the native_interpret and pretty much like the int case.

Difference is that it rejects if the sizes don't match exactly.  A new function
is used for low-level construction of a const_fixed_from_double_int.  Isn't
there a better way?  I wonder that such functionality is not already there...

A quick test shows that it works reasonable on AVR.

>> The problem goes as follows:
>>
>> The memcpy is represented as a VIEW_CONVERT_EXPR<_Fract> and expanded to 
>> memory
>> moves through the frame:
>>
>> (insn 5 4 6 (set (reg:HI 45)
>> (const_int 4660 [0x1234])) bloat.c:5 -1
>>  (nil))
>>
>> (insn 6 5 7 (set (mem/c:HI (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])
>> (reg:HI 45)) bloat.c:5 -1
>>  (nil))
>>
>> (insn 7 6 8 (set (reg:HQ 46)
>> (mem/c:HQ (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])) bloat.c:12 -1
>>  (nil))
>>
>> (insn 8 7 9 (set (reg:HQ 43 [  ])
>> (reg:HQ 46)) bloat.c:12 -1
>>  (nil))
>>
>>
>> Is there a specific reason why this is not expanded as subreg like this?
>>
>>
>>   (set (reg:HQ 46)
>>(subreg:HQ [(reg:HI 45)] 0))
> 
> Probably your target does not allow this?  Not sure, but then a testcase like

This (movhq) is supported, similar for other fixed-point modes moves:

QQ, UQQ, HQ, UHQ, HA, UHA, SQ, USQ, SA, USA, DQ, UDQ, DA, UDA, TA, UTA.


> static __inline__ __attribute__((__always_inline__))
> _Fract rbits (const int i)
> {
>   _Fract f;
>   __builtin_memcpy (&f, &i, sizeof (_Fract));
>   return f;
> }
> 
> _Fract func (int i)
> {
>   return rbits (i);
> }
> 
> would be more interesting, as your testcase boils down to a constant
> folding issue (see above).

This case is optimized fine and the generated code is nice.

>> The insns are analyzed in .dfinit:
>>
>> ;;  regs ever live   24[r24] 25[r25] 29[r29]
>>
>> 24/25 is the return register, 28/29 is the frame pointer:
>>
>>
>> (insn 6 5 7 2 (set (mem/c:HI (plus:HI (reg/f:HI 28 r28)
>> (const_int 1 [0x1])) [2 S2 A8])
>> (reg:HI 45)) bloat.c:5 82 {*movhi}
>>
>>
>> Is there a reason why R28 is not marked as live?
>>
>>
>> The memory accesses are optimized out in .fwprop2 so that no frame pointer is
>> needed any more.
>>
>> However, in the subsequent passes, R29 is still reported as "regs ever live"
>> which is not correct.
>>
>> The "regs ever live" don't change until .ira, where R28 springs to live 
>> again:
>>
>> ;;  regs ever live   24[r24] 25[r25] 28[r28] 29[r29]
>>
>> And in .reload:
>>
>> ;;  regular block artificial uses28 [r28] 32 [__SP_L__]
>> ;;  eh block artificial uses 28 [r28] 32 [__SP_L__] 34 [argL]
>> ;;  entry block defs 8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 14
>> [r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22]
>> 23 [r23] 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
>> ;;  exit block uses  24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
>> ;;  regs ever live   24[r24] 25[r25] 28[r28] 29[r29]
>>
>> Outcome is that the frame pointer is set up without need, which is very 
>> costly
>> on avr.
>>
>>
>> The compiler is trunk from 2013-01-16 configured for avr:
>>
>>--target=avr --enable-languages=c,c++ --disable-nls --with-dwarf2
>>
>> The example is compiled with
>>
>>avr-gcc bloat.c -S -dp -save-temps -Os -da -fdump-tree-optimized -DB=1
>>
>>
>> In what stage happens the misoptimization?
>>
>> Is this worth a PR? (I.e. is there a change that anybody cares?)
> 
> Well, AVR folks may be interested, or people caring for fixed-point support.

Besides the possible optimization in fold-const.c, data flow produces strange
results.

Presumably it misses some properties of the frame pointer which spans two hard
registers?  This is rather uncommon in GCC as far as I know.

There were attempts to use HARD_FRAME_POINTER in the avr BE but the results
were completely unusable because a frame was set up in any function :-(

Bootstrapping glibc vs. dependency on system headers

2013-01-17 Thread Thomas Schwinge
Hi!

Also known as: »I found another one«.

As we already know, glibc's configure script is in a difficult position
in that it uses "standard Autoconf", but its tests shall not depend on
any functionality (for example, system headers) that is to be supplied by
the glibc we're about to build.  In reality, often, this is not much of a
concern: most systems already have a C library installed and its header
files are sufficient for resolving the configure tests to the
correct/expected results.  But for the bootstrapping case (and for
correctness' sake!), it is desirable to avoid any such cases.  We have
fixed several such issues in the past; now I have stumbled over another
one.  Very early when configuring glibc for bootstrapping an ARM
toolchain, I get:

[...]
checking host system type... arm-none-linux-gnueabi
checking for arm-none-linux-gnueabi-gcc... 
[...]/bin/arm-none-linux-gnueabi-gcc [some flags]
[...]
checking how to run the C preprocessor... /lib/cpp
[...]

In the following checks, the C preprocessor is used for deciding whether
this is a hard-float or soft-float toolchain, based upon __ARM_PCS_VFP
being defined or not.  Using the build system's native /lib/cpp for this
test obviously can't yield the desired result.

Issue 1: Can the fallback /lib/cpp for the C preprocessor ever be useful
in the cross-compilation case?  Instead of using that, in this case I'd
rather have configure stop with an error message.

What »checking how to run the C preprocessor« does first is testing
whether »$CC -E« is suitable.  For the curious: this failed in my case as
the sysroot's header files already did contain support for soft-float,
but they were not yet prepared to handle the hard-float case
(__ARM_PCS_VFP being defined), and so  tried to »#include
«, which has not yet been generated and installed into
the sysroot.  In turn, that file is included by means of limits.h, which
is used in the »checking how to run the C preprocessor« Autoconf test:

[autoconf]/lib/autoconf/c.m4:

# _AC_PROG_PREPROC_WORKS_IFELSE(IF-WORKS, IF-NOT)
# ---
# Check if $ac_cpp is a working preprocessor that can flag absent
# includes either by the exit status or by warnings.
# This macro is for all languages, not only C.
AC_DEFUN([_AC_PROG_PREPROC_WORKS_IFELSE],
[ac_preproc_ok=false
for ac_[]_AC_LANG_ABBREV[]_preproc_warn_flag in '' yes
do
  # Use a header file that comes with gcc, so configuring glibc
  # with a fresh cross-compiler works.
  # On the NeXT, cc -E runs the code through the compiler's parser,
  # not just through cpp. "Syntax error" is here to catch this case.
  _AC_PREPROC_IFELSE([AC_LANG_SOURCE([[@%:@include 
 Syntax error]])],
 [],
 [# Broken: fails on valid input.
continue])
[...]

(The test in glibc's configure file is slightly older (missing Autoconf
commit f52459d158bcf5edd4ca75a453bb012efd0f4d90), but in essence is the
same.)  So despite »[using] a header file that comes with gcc«, limits.h
will actually include additional files down the road that are not GCC's
but glibc's.  (Run: »echo '#include ' | gcc -E -x c -« to see.)

Issue 2: Though it will of course never be completely fail-safe,
candidate header files I identified to remedy this issue and to be used
in _AC_PROG_PREPROC_WORKS_IFELSE instead of limits.h are: float.h,
stdarg.h, stddef.h (though the latter might include additional files in
*BSD environments, which may be supported by glibc, so let's better not
use that one).  Is using one of float.h and stdarg.h correct in this
situation, and do you want me to write an Autoconf patch to change that?
By manually modifying configure, I tested using stdarg.h, and that looks
fine.

Issue 3: Assuming fixing Autoconf is the way to go, what do we do in
glibc until we upgrade to the respective future version of Autoconf?
Supply our own copy of _AC_PROG_PREPROC_WORKS_IFELSE (or AC_PROG_CPP)?


Grüße,
 Thomas


pgpgWDoPnIoWN.pgp
Description: PGP signature


Re: Primary and secondary sysroot?

2013-01-17 Thread Dmitry Mikushin
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello Ian,

Thanks for reply, unfortunately setting extra flags does not help in
our case, as we want to conserve original build rules for any application.

A brilliant solution suggested by Richard Guenther on IRC is to place
first-priority includes and headers into
lib/gcc/x86_64-unknown-linux-gnu/4.6.4/ (or also in 32/, if multilib).
Then, this content would have higher priority than system
include/library paths. Thus, lib/gcc/x86_64-unknown-linux-gnu/4.6.4/
becomes primary root and system becomes secondary root!

- - D.

On 01/08/2013 08:37 AM, Ian Lance Taylor wrote:
> On Mon, Jan 7, 2013 at 2:47 AM, Dmitry Mikushin
>  wrote:
>> 
>> We have a version of GCC coming as additional toolchain for
>> several supported Linux distros. Moreover, package we are
>> shipping also contains modified glibc and some other libraries.
>> In this situation, applications built with this compiler should
>> first logically use its own sysroot, but in case when a
>> header/library from host's sysroot is needed - fallback to the 
>> default "/usr" sysroot. I think this could be accomplished by
>> providing two sysroots - one primary, and secondary - to
>> complement it for the rest of things. Looks like currently gcc
>> only supports a single setting for TARGET_SYSTEM_ROOT. Do you see
>> any other options regarding this issue?
> 
> You are correct: GCC only supports a single system root.  But I
> think you can do what you want with appropriate -I and -L options,
> perhaps via the C_INCLUDE_PATH and LIBRARY_PATH environment
> variables.
> 
> Ian
> 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQEcBAEBAgAGBQJQ+DHgAAoJENwm3+sbf/pMYPkH/3EDJ/1j7nbbUMGHKPwfMLzx
1Xg0fP0NaY3eSfSdQk/zQ9ivYCothC9u6lyPeCbiGBus0TR0C1ra/aPMKwqwkJpJ
FeuTExQxtqB/xxD1xmdOFNY023mivkoVZoZ4HYdHa2Xe6Uk+Pg/3MWzVaq+ePUgM
IILUaUz7iVf7rg/8gIy1U11vXBH2br9cbogXpuQv6L7UqlZsPmo3FhXbdWxeXn+3
+ZnhERwY4fKxABGE4gcYDwUCysjQymo8a/AsV5cQsnesLxDV/6svAElBV2MJkf7z
HKDrZuEJt9BG9Nmemn6jl3Jmp8ZlGS4qH8XGAvjZiJYWaf61z/eme5t7hYq1ixg=
=n52n
-END PGP SIGNATURE-


Re: Bootstrapping glibc vs. dependency on system headers

2013-01-17 Thread Joseph S. Myers
On Thu, 17 Jan 2013, Thomas Schwinge wrote:

> Issue 2: Though it will of course never be completely fail-safe,
> candidate header files I identified to remedy this issue and to be used
> in _AC_PROG_PREPROC_WORKS_IFELSE instead of limits.h are: float.h,
> stdarg.h, stddef.h (though the latter might include additional files in
> *BSD environments, which may be supported by glibc, so let's better not
> use that one).  Is using one of float.h and stdarg.h correct in this
> situation, and do you want me to write an Autoconf patch to change that?
> By manually modifying configure, I tested using stdarg.h, and that looks
> fine.

limits.h is unsafe for such bootstrapping, as you found, because of how 
the GCC and glibc copies install each other.

Really, for glibc bootstrapping I don't think you want to include any 
headers there.  If $CPP is defined and nonempty, use that, otherwise use 
$CC -E; no testing for a "working" preprocessor is needed; we require GCC 
4.3 or later for building glibc.

> Issue 3: Assuming fixing Autoconf is the way to go, what do we do in
> glibc until we upgrade to the respective future version of Autoconf?
> Supply our own copy of _AC_PROG_PREPROC_WORKS_IFELSE (or AC_PROG_CPP)?

Yes.  There's already code in configure.in to do something special with 
_AC_INCLUDES_DEFAULT_REQUIREMENTS.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Components no longer exist

2013-01-17 Thread Jonathan Wakely
On 17 January 2013 15:48, Michael Witten :
> The documentation here:
>
>   http://gcc.gnu.org/install/download.html
>
> says:
>
>   It is possible to download a full distribution or
>   specific components... If you choose to download
>   specific components, you must download the core
>   GCC distribution plus any language specific
>   distributions you wish to use.
>
> However, from what I can tell, this hasn't been the
> case since the release of 4.7.0; there is only a
> monolithic distribution of source (no separate
> components).
>
> Either the components need to be added, or the
> documentation needs to be updated.

The documentation is out of date, thanks for reporting it.


Re: Components no longer exist

2013-01-17 Thread Mischa Baars

On 01/17/2013 06:23 PM, Jonathan Wakely wrote:

On 17 January 2013 15:48, Michael Witten :

The documentation here:

   http://gcc.gnu.org/install/download.html

says:

   It is possible to download a full distribution or
   specific components... If you choose to download
   specific components, you must download the core
   GCC distribution plus any language specific
   distributions you wish to use.

However, from what I can tell, this hasn't been the
case since the release of 4.7.0; there is only a
monolithic distribution of source (no separate
components).

Either the components need to be added, or the
documentation needs to be updated.

The documentation is out of date, thanks for reporting it.
Does that mean that you are satisfied with the 'if / else' as is, and 
that you also do not need an improvement of the arctangent in glibc?


Mischa.


Re: Components no longer exist

2013-01-17 Thread Jonathan Wakely
On 17 January 2013 17:29, Mischa Baars  wrote:
> On 01/17/2013 06:23 PM, Jonathan Wakely wrote:
>>
>> On 17 January 2013 15:48, Michael Witten :
>>>
>>> The documentation here:
>>>
>>>http://gcc.gnu.org/install/download.html
>>>
>>> says:
>>>
>>>It is possible to download a full distribution or
>>>specific components... If you choose to download
>>>specific components, you must download the core
>>>GCC distribution plus any language specific
>>>distributions you wish to use.
>>>
>>> However, from what I can tell, this hasn't been the
>>> case since the release of 4.7.0; there is only a
>>> monolithic distribution of source (no separate
>>> components).
>>>
>>> Either the components need to be added, or the
>>> documentation needs to be updated.
>>
>> The documentation is out of date, thanks for reporting it.
>
> Does that mean that you are satisfied with the 'if / else' as is, and that
> you also do not need an improvement of the arctangent in glibc?

You're confused.  This is nothing to do with glibc.

The "distribution" is the tarball of GCC sources.  The "components"
are the separate tarballs that used to be made available for
gcc-core-x.y.z.tar.gz, gcc-c++-x.y.z.tar.gz, gcc-fortran-x.y.z.tar.gz,
etc. but these are no longer made available, only the gcc-x.y.z.tar.gz
tarball containing everything.


Re: Components no longer exist

2013-01-17 Thread Jonathan Wakely
On 17 January 2013 17:31, Jonathan Wakely wrote:
> On 17 January 2013 17:29, Mischa Baars  wrote:
>>
>> Does that mean that you are satisfied with the 'if / else' as is, and that
>> you also do not need an improvement of the arctangent in glibc?
>
> You're confused.  This is nothing to do with glibc.
>
> The "distribution" is the tarball of GCC sources.  The "components"
> are the separate tarballs that used to be made available for
> gcc-core-x.y.z.tar.gz, gcc-c++-x.y.z.tar.gz, gcc-fortran-x.y.z.tar.gz,
> etc. but these are no longer made available, only the gcc-x.y.z.tar.gz
> tarball containing everything.

For example, compare ftp://ftp.gnu.org/gnu/gcc/gcc-4.6.0/ to
ftp://ftp.gnu.org/gnu/gcc/gcc-4.7.0/


Re: missing optimization "a << (b & 63) to a << b"

2013-01-17 Thread Xinliang David Li
>> Or better way to solve the problem? Appreciated a lot!
>
> Combine / simplify-rtx should recognize this at the RTL level for
> SHIFT_COUNT_TRUNCATED targets.

IIUC, combine uses UD chains (LOG links) which is best suitable for
cases where this is only one downward uses of the def. The example
here has a single def and multiple-uses, which combine won't handle.

David

>
> Richard.
>
>> Thanks,
>> Wei.


Re: Components no longer exist

2013-01-17 Thread Mischa Baars

On 01/17/2013 06:31 PM, Jonathan Wakely wrote:

On 17 January 2013 17:29, Mischa Baars  wrote:

On 01/17/2013 06:23 PM, Jonathan Wakely wrote:

On 17 January 2013 15:48, Michael Witten :

The documentation here:

http://gcc.gnu.org/install/download.html

says:

It is possible to download a full distribution or
specific components... If you choose to download
specific components, you must download the core
GCC distribution plus any language specific
distributions you wish to use.

However, from what I can tell, this hasn't been the
case since the release of 4.7.0; there is only a
monolithic distribution of source (no separate
components).

Either the components need to be added, or the
documentation needs to be updated.

The documentation is out of date, thanks for reporting it.

Does that mean that you are satisfied with the 'if / else' as is, and that
you also do not need an improvement of the arctangent in glibc?

You're confused.  This is nothing to do with glibc.

The "distribution" is the tarball of GCC sources.  The "components"
are the separate tarballs that used to be made available for
gcc-core-x.y.z.tar.gz, gcc-c++-x.y.z.tar.gz, gcc-fortran-x.y.z.tar.gz,
etc. but these are no longer made available, only the gcc-x.y.z.tar.gz
tarball containing everything.
Indeed I am, I thought you were trying to say that gcc-x.y.z.tar.gz has 
missing components. I had some trouble compiler: unable to compute 
suffix for object files, but now it seems to work?!


Let me get back to you. See if I can fix the 'if / else' for you, and 
finish the algorithm. It will be my first update to the compiler so it 
will probably take a while.


Mischa.


Re: Components no longer exist

2013-01-17 Thread Jonathan Wakely
On 17 January 2013 17:48, Mischa Baars wrote:
>
> Indeed I am, I thought you were trying to say that gcc-x.y.z.tar.gz has
> missing components. I had some trouble compiler: unable to compute suffix
> for object files, but now it seems to work?!

Did you read http://gcc.gnu.org/wiki/FAQ#configure_suffix ?

> Let me get back to you. See if I can fix the 'if / else' for you, and finish
> the algorithm. It will be my first update to the compiler so it will
> probably take a while.

I have no idea what "if / else" you're talking about.


Re: Components no longer exist

2013-01-17 Thread Mischa Baars

On 01/17/2013 07:40 PM, Jonathan Wakely wrote:

On 17 January 2013 17:48, Mischa Baars wrote:

Indeed I am, I thought you were trying to say that gcc-x.y.z.tar.gz has
missing components. I had some trouble compiler: unable to compute suffix
for object files, but now it seems to work?!

Did you read http://gcc.gnu.org/wiki/FAQ#configure_suffix ?


Let me get back to you. See if I can fix the 'if / else' for you, and finish
the algorithm. It will be my first update to the compiler so it will
probably take a while.

I have no idea what "if / else" you're talking about.


Please read the topic on not-a-number's then. At the very beginning 
there's one of my attachments.


Thanks,
Mischa.


Re: microblaze unroll loops optimization

2013-01-17 Thread David Holsgrove
Hi Richard,

On 16 January 2013 06:58, Richard Henderson  wrote:
>
> You could, however, use two CCmodes for the result of the compares:
>
>   (set (reg:CC r) (compare:CC (reg:SI x) (reg:SI y)))
>   => cmp r, x, y
>
>   (set (reg:CCU r) (compare:CCU (reg:SI x) (reg:SI y)))
>   => cmpu r, x, y
>
> and then the branch insns consume CC and CCU mode inputs:
>
>   (set (pc)
>(if_then_else
>  (match_operator 1 "mb_signed_cmp_op"   // eq, lt, le
>[(match_operand:CC 2 "register_operand" "r")
> (const_int 0)]0
>  (label_ref (match_operand 0))
>  (pc)))
>
> and similar for "mb_unsigned_cmp_op" (eq, ltu, leu) with CCUmode.
>
> I believe you'll find that MODE_CC modes default to word size
> already, so if you arrange for mov{cc,ccu} patterns, reload will
> spill/reload these values as required and everything will Just Work.
>

Thanks for the reply and suggestion - I had implemented a combined
branch_compare
insns as I had indicated worked for us in the past with our out of
tree gcc 4.1.2,
and can confirm that unrolling of loops was successful.

I've also tried using CCmode / CCUmodes method, and while it works and
produces correct code, passing the testsuite with no regressions, it
unfortunately
doesnt unroll loops - it appears to be due to the get_condition check in
loop-iv.c:check_simple_exit, which has allow_cc_mode=false, which means we
return 0 from canonicalize_condition;

/* If OP0 is the result of a comparison, we weren't able to find what
 was really being compared, so fail.  */
  if (!allow_cc_mode
  && GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC)
return 0;

and mark the loop as 'not simple'.

I'm still looking, but these are the instructions as I implemented them;

(define_insn "signed_compare"
  [(set (match_operand:CC 0 "register_operand" "=d")
(compare:CC
   (match_operand:SI 1 "register_operand" "d")
   (match_operand:SI 2 "register_operand" "d")))]
  ""
  "cmp\t%0,%1,%2"
  [(set_attr "type" "arith")
  (set_attr "mode"  "SI")
  (set_attr "length""4")])

(define_insn "unsigned_compare"
  [(set (match_operand:CCU 0 "register_operand" "=d")
(compare:CCU
   (match_operand:SI 1 "register_operand" "d")
   (match_operand:SI 2 "register_operand" "d")))]
  ""
  "cmpu\t%0,%1,%2"
  [(set_attr "type" "arith")
  (set_attr "mode"  "SI")
  (set_attr "length""4")])

and branches;

(define_insn "signed_condbranch"
  [(set (pc)
   (if_then_else (match_operator: 0 "signed_cmp_op"
  [(match_operand:CC 1 "register_operand" "d")
   (const_int 0)])
  (label_ref (match_operand 2))
  (pc)))]
  ""
  "b%C0i%?\t%z1,%2"
  [(set_attr "type" "branch")
   (set_attr "mode" "none")
   (set_attr "length" "4")]
)

(define_insn "unsigned_condbranch"
  [(set (pc)
  (if_then_else (match_operator: 0 "unsigned_cmp_op"
  [(match_operand:CCU 1 "register_operand" "d")
   (const_int 0)])
  (label_ref (match_operand 2))
  (pc)))]
  ""
  "b%C0i%?\t%z1,%2"
  [(set_attr "type" "branch")
   (set_attr "mode" "none")
   (set_attr "length" "4")]
)

thanks,
David


Re: Graphite TODO tasks

2013-01-17 Thread Shakthi Kannan
Hi,

--- On Thu, Jan 17, 2013 at 5:53 PM, Richard Biener
 wrote:
| It's ISL 0.11.1 actually.  0.18.0 is the current CLooG version.
\--

Thanks. I downloaded cloog-0.18.0, compiled and installed the same using:

  $ cd cloog-0.18.0
  $ mkdir build
  $ cd build
  $ ../configure
  $ make
  $ sudo make install

It installed in /usr/local. 'cloog' is present in /usr/local/bin, and
the following in /usr/local/lib:

libcloog-isl.a
libcloog-isl.la
libcloog-isl.so
libcloog-isl.so.4
libcloog-isl.so.4.0.0
libisl.a
libisl.la
libisl.so
libisl.so.10
libisl.so.10.1.1
libisl.so.10.1.1-gdb.py
pkgconfig

I checked out gcc trunk from svn, and used the following to compile:

  $ cd trunk
  $ mkdir build
  $ cd build
  $ ../configure --with-cloog=/usr/local --with-isl=/usr/local

configure fails with

  checking for version 0.10 of ISL... no
  checking for version 0.11 of ISL... no

The relevant output in config.log is:

=== config.log ===

configure:5838: checking for version 0.10 of ISL
configure:5857: gcc -o conftest -g -O2 -I/usr/local/include
-L/usr/local/lib conftest.c  -lisl >&5
configure:5857: $? = 0
configure:5857: ./conftest
configure:5857: $? = 1
configure: program exited with status 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME ""
| #define PACKAGE_TARNAME ""
| #define PACKAGE_VERSION ""
| #define PACKAGE_STRING ""
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define LT_OBJDIR ".libs/"
| /* end confdefs.h.  */
| #include 
|#include 
| int
| main ()
| {
| if (strncmp (isl_version (), "isl-0.10", strlen ("isl-0.10")) != 0)
|  return 1;
|
|   ;
|   return 0;
| }
configure:5866: result: no
configure:5887: checking for version 0.11 of ISL
configure:5906: gcc -o conftest -g -O2 -I/usr/local/include
-L/usr/local/lib conftest.c  -lisl >&5
configure:5906: $? = 0
configure:5906: ./conftest
configure:5906: $? = 1
configure: program exited with status 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME ""
| #define PACKAGE_TARNAME ""
| #define PACKAGE_VERSION ""
| #define PACKAGE_STRING ""
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL ""
| #define LT_OBJDIR ".libs/"
| /* end confdefs.h.  */
| #include 
|#include 
| int
| main ()
| {
| if (strncmp (isl_version (), "isl-0.11", strlen ("isl-0.11")) != 0)
|  return 1;
|
|   ;
|   return 0;
| }
configure:5915: result: no
configure:5950: error: Unable to find a usable ISL.  See config.log for details.

=== END ===

What could I be missing? What is the recommended approach to build gcc with isl?

Appreciate any help in this regard,

Thanks!

SK

-- 
Shakthi Kannan
http://www.shakthimaan.com