question about section 10.12

2013-01-26 Thread Kenneth Zadeck

the definition of vec_duplicate in section 10.12 seems to restrictive.

i have seen examples where the "small vector" is really a scalar. Should 
the doc be "small vector or scalar"?


kenny


Re: question about section 10.12

2013-01-26 Thread Marc Glisse

On Sat, 26 Jan 2013, Kenneth Zadeck wrote:


the definition of vec_duplicate in section 10.12 seems to restrictive.

i have seen examples where the "small vector" is really a scalar. Should the 
doc be "small vector or scalar"?


That might even be the majority of cases. Note that vec_concat is in the 
same case. I took the "or scalar" as implicit when I read this chapter 
last year, but making it explicit sounds like a good idea.


--
Marc Glisse


About new project

2013-01-26 Thread Hongtao Yu

Hi All,

How can I set up a new project under GCC and make it open-sourced? Thanks!

Cheers,
Hongtao


gcc-4.7-20130126 is now available

2013-01-26 Thread gccadmin
Snapshot gcc-4.7-20130126 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.7-20130126/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.7 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_7-branch 
revision 195488

You'll find:

 gcc-4.7-20130126.tar.bz2 Complete GCC

  MD5=5038377f8304fa21b6cfa28ff00cb688
  SHA1=0346818c2a504ab673ef5c3edbf6c35801e7553f

Diffs from 4.7-20130119 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.7
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Initial Stack Padding?

2013-01-26 Thread Matt Davis
This question is similar to my last; however, I think it provides a
bit more clarity as to how I am obtaining offsets from the frame
pointer.  I have an RTL pass that is processed after expand passes.  I
keep track of a list of stack allocated variables.  For each VAR_DECL,
I obtain the DECL_RTL rtx.  Since these variables are local, the RTL
expression reflects an offset from the stack frame pointer.  For
instance, the variable 'matt':

(mem/f/c:DI (plus:DI (reg/f:DI 20 frame)
(const_int -8 [0xfff8])) [0 matt+0 S8 A64])

I interpret this as being -8 bytes away from the frame pointer, when
the function 'matt' has scope in is executing.  Since 'matt' is a
pointer, and the stack grows downward (x86), and this is a 64-bit
machine, the contents of 'matt' end at the frame pointer and span 8
bytes below the frame pointer to where the first byte of 'matt'
begins.  This is fine in some cases, but if I were to rewrite the
source and add a few more variables.  It seems that there might be a
few words of padding before the data for the first variable from the
stack pointer begins.  If I were to add a 4 byte integer to this
function, 'matt' would still be declared in RTL as above, but instead
of really being -8 it is actually -32.  Where do the 24 bytes of
padding between the frame pointer and the last byte of 'matt' come
from?   Further, how can I find this initial padding offset at compile
time?  I originally thought that the offset in the rtx, as above,
would reflect this stack realignment, but it appears not to.

-Matt


Re: question about section 10.12

2013-01-26 Thread Hans-Peter Nilsson
> From: Kenneth Zadeck 
> Date: Sat, 26 Jan 2013 16:19:40 +0100

> the definition of vec_duplicate in section 10.12 seems to restrictive.
> 
> i have seen examples where the "small vector" is really a scalar. Should
> the doc be "small vector or scalar"?

Yes.  This patch has been sitting in a tree of mine forever;
it's just that the tree is a read-only-access-tree and every
time I've re-discovered it, I've been distracted before moving
it to a write-access-tree and committing as obvious...  And
right now when being reminded, I don't have access to one for
another 36h.  Maybe it's a curse. :)  (N.B. if you prefer your
own wording I don't mind; just offering a supporting
observation.)

* doc/rtl.texi (vec_duplicate): Mention that a scalar
can be a valid operand.

Index: doc/rtl.texi
===
--- doc/rtl.texi(revision 195491)
+++ doc/rtl.texi(working copy)
@@ -2634,7 +2634,8 @@ the two inputs.
 
 @findex vec_duplicate
 @item (vec_duplicate:@var{m} @var{vec})
-This operation converts a small vector into a larger one by duplicating the
+This operation converts a scalar into a vector or a small vector
+into a larger one by duplicating the
 input values.  The output vector mode must have the same submodes as the
 input vector mode, and the number of output parts must be an integer multiple
 of the number of input parts.

brgds, H-P


Re: Imprecise data flow analysis leads to code bloat

2013-01-26 Thread Georg-Johann Lay

Richard Biener schrieb:

On Thu, Jan 17, 2013 at 6:04 PM, Georg-Johann Lay  wrote:

Richard Biener wrote:

On Thu, Jan 17, 2013 at 12:20 PM, Georg-Johann Lay wrote:

Hi, suppose the following C code:


static __inline__ __attribute__((__always_inline__))
_Fract rbits (const int i)
{
_Fract f;
__builtin_memcpy (&f, &i, sizeof (_Fract));
return f;
}

_Fract func (void)
{
#if B == 1
return rbits (0x1234);
#elif B == 2
return 0.14222r;
#endif
}


Type-punning idioms like in rbits above are very common in libgcc, for example
in fixed-bit.c.

In this example, both compilation variants are equivalent (provided int and
_Fract are 16 bits wide).  The problem with the B=1 variant is that it is
inefficient:

Variant B=1 needs 2 instructions.

B == 1 shows that fold-const.c native_interpret/encode_expr lack
support for FIXED_POINT_TYPE.


Variant B=2 needs 11 instructions, 9 of them are not needed at all.

I confused B=1 and B=2.  The inefficient case with 11 instructions is B=1, of
course.

Would a patch like below be acceptable in the current stage?


I'd be fine with it (not sure about test coverage).  But please also add
native_encode_fixed.



For the patch, please f'up to

http://gcc.gnu.org/ml/gcc-patches/2013-01/msg01053.html

Johann




It's only the native_interpret and pretty much like the int case.

Difference is that it rejects if the sizes don't match exactly.


Hmm, yeah.  I'm not sure why the _interpret routines chose to ignore
tail padding ... was there any special correctness reason you did it
differently than the int variant?


 A new function
is used for low-level construction of a const_fixed_from_double_int.  Isn't
there a better way?  I wonder that such functionality is not already there...


Good question - there are probably more places that could make use of
this.


A quick test shows that it works reasonable on AVR.


The problem goes as follows:

The memcpy is represented as a VIEW_CONVERT_EXPR<_Fract> and expanded to memory
moves through the frame:

(insn 5 4 6 (set (reg:HI 45)
(const_int 4660 [0x1234])) bloat.c:5 -1
 (nil))

(insn 6 5 7 (set (mem/c:HI (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])
(reg:HI 45)) bloat.c:5 -1
 (nil))

(insn 7 6 8 (set (reg:HQ 46)
(mem/c:HQ (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])) bloat.c:12 -1
 (nil))

(insn 8 7 9 (set (reg:HQ 43 [  ])
(reg:HQ 46)) bloat.c:12 -1
 (nil))


Is there a specific reason why this is not expanded as subreg like this?


  (set (reg:HQ 46)
   (subreg:HQ [(reg:HI 45)] 0))

Probably your target does not allow this?  Not sure, but then a testcase like

This (movhq) is supported, similar for other fixed-point modes moves:

QQ, UQQ, HQ, UHQ, HA, UHA, SQ, USQ, SA, USA, DQ, UDQ, DA, UDA, TA, UTA.



static __inline__ __attribute__((__always_inline__))
_Fract rbits (const int i)
{
  _Fract f;
  __builtin_memcpy (&f, &i, sizeof (_Fract));
  return f;
}

_Fract func (int i)
{
  return rbits (i);
}

would be more interesting, as your testcase boils down to a constant
folding issue (see above).

This case is optimized fine and the generated code is nice.


I see - so it's really confused about the constant-ness.  Yeah, I can imagine
that.

Thanks,
Richard.


The insns are analyzed in .dfinit:

;;  regs ever live   24[r24] 25[r25] 29[r29]

24/25 is the return register, 28/29 is the frame pointer:


(insn 6 5 7 2 (set (mem/c:HI (plus:HI (reg/f:HI 28 r28)
(const_int 1 [0x1])) [2 S2 A8])
(reg:HI 45)) bloat.c:5 82 {*movhi}


Is there a reason why R28 is not marked as live?


The memory accesses are optimized out in .fwprop2 so that no frame pointer is
needed any more.

However, in the subsequent passes, R29 is still reported as "regs ever live"
which is not correct.

The "regs ever live" don't change until .ira, where R28 springs to live again:

;;  regs ever live   24[r24] 25[r25] 28[r28] 29[r29]

And in .reload:

;;  regular block artificial uses28 [r28] 32 [__SP_L__]
;;  eh block artificial uses 28 [r28] 32 [__SP_L__] 34 [argL]
;;  entry block defs 8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 14
[r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22]
23 [r23] 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
;;  exit block uses  24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
;;  regs ever live   24[r24] 25[r25] 28[r28] 29[r29]

Outcome is that the frame pointer is set up without need, which is very costly
on avr.


The compiler is trunk from 2013-01-16 configured for avr:

   --target=avr --enable-languages=c,c++ --disable-nls --with-dwarf2

The example is compiled with

   avr-gcc bloat.c -S -dp -save-temps -Os -da -fdump-tree-optimized -DB=1


In what stage happens the misoptimization?

Is this worth a PR? (I.e. is there a change that anybody cares?)

Well, AVR folks may be interested, or people caring for fixed-point support.

Besides the possible optimization in fold-const.c, data flow produces strange
r