Re: RFC: Extend x86-64 psABI for 256bit AVX register

2008-06-06 Thread Jan Hubicka
> 
> ymm0 and xmm0 are the same register. xmm0 is the lower 128bit
> of xmm0. I am not sure if we need separate XMM registers from
> YMM registers.


Yes, I know that xmm0 is lower part of ymm0.  I still think we ought to
be able to support varargs that do save ymm0 registers only when ymm
values are passed same way as we touch SSE only when SSE values are
passed via EAX hint.
This way we will be able to support e.g. printf that has YMM printing %
construct but don't need YMM enabled hardware when those are not used.

This is why I think extending EAX to contain information about amount of
XMM values to save and in addition YMM values to save is sane.  Then old
non-YMM aware varargs prologues will crash when YMM values are passed,
but all other combinations will work.
> 
> >
> > I personally don't have much preferences over 1. or 2.. 1. seems
> > relatively easy to implement too, or is packaging two 128bit values to
> > single 256bit difficult in va_arg expansion?
> >
> 
> Access to 256bit register as lower and upper 128bits needs 2
> instructions. For store
> 
> vmovaps   %xmm7, -143(%rax)
> vextractf128 $1, %ymm7, -15(%rax)
> 
> For load
> 
> vmovaps  -143(%rax),%xmm7
> vinsert128 $1, -15(%rax),%ymm7,%ymm7
> 
> If we go beyond 256bit, we need more instructions to access
> the full register. For 512bit, it will be split into lower 128bit,
> middle 128bit and upper 256bit. 1024bit will have 4 parts.
> 
> For #2, only one instruction will be needed for 256bit and
> beyond.

Yes, but we will still save half of stack space.  Well, I don't have
much preferences here.  If it seems saner to simply save whole thing
saving lower part twice, I am fine with that.

Honza
> 
> Thanks.
> 
> 
> -- 
> H.J.


Re: [lto] Streaming out language-specific DECL/TYPEs

2008-06-06 Thread Richard Guenther
On Fri, Jun 6, 2008 at 1:44 AM, Daniel Berlin <[EMAIL PROTECTED]> wrote:
> On Thu, Jun 5, 2008 at 5:57 AM, Jan Hubicka <[EMAIL PROTECTED]> wrote:
>>> Jan Hubicka wrote:
>>>
>>> >Sure if it works, we should be lowering the types during gimplification
>>> >so we don't need to store all this in memory...
>>> >But C++ FE still use its local data later in stuff like thunks, but we
>>> >will need to cgraphize them anyway.
>>>
>>> I agree.  The only use of language-specific DECLs and TYPEs after
>>> gimplification should be for generating debug information.  And if
>>> that's already been done, then you shouldn't need it at all.
>>
>> For LTO with debug info we will probably need some frontend neutral
>> debug info representaiton in longer run, since optimization modifying
>> the data types and such will need to compensate.
>>
>> We can translate stuff to in-memory dwarf and update it but that would
>> limit amount of debug info format we will want to support probably.
> DWARF is not exactly memory or space efficient, sadly.
> Then again,  what most other compilers have done is bite the bullet
> and define their own "debug info" data, then transform that to dwarf2
> at the very end.
> I"m not sure we want to do that either :(

What we can do is identify the frontend-specific information that is used
at debug-generation time and simply move them to the middle-end.
OTOH I don't see why debug-information for the modified types need to
be frontend-specific at all...  so I'd be pragmatic and emit debug
information early for all decls and types and for changed types just
re-emit them with middle-end only information.

RIchard.


Shared library without dependence on libgcc_s.so

2008-06-06 Thread Arne Steinarson
Hello,

Working with putting together a Linux installer for an app (to work on
various Linux
versions), I got problems with libgcc_s.so (if distribution is not
based on gcc 4.x the app
won't start).

I wanted to remove dynamic linking to any C++ library (that is outside
of the installer).

The situation is this one in terms of files:

  - MyApplication // Executable file by gcc 4.3, in installer
  - SharedLib1.so // Shared lib, comes with installer
  - SharedLib2.so // Same
  - ...

Using flags  -static-libgcc and making sure that the linker sees
libstdc++.a and libgcc.a
_before_ any shared library version of the same, the dependency on
libstdc++.so and
libgcc_s.so is cleared. But... only for the executable! The shared
libraries themselves still
have a dependency on libgcc_s.so:

  $ ldd libwx_gtk2ud_fwb_core-2.9.so.0 | grep gcc
  libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb6ee8000)

Is it possible to clear this dependence for the shared libraries
(making them back link
into the application for / static lib for this need)?

If not, it seems I have to drop the idea of using C++ shared libs in a
binary to be
used on different distributions.

A page with background info on the subject:
http://alexda.wordpress.com/2007/12/04/c-linking-libstdc-statically/

Good pointers appreciated.

Regards
// ATS.


Re: Shared library without dependence on libgcc_s.so

2008-06-06 Thread Paweł Sikora
6/6/2008, "Arne Steinarson" <[EMAIL PROTECTED]> napisał/a:

>The shared libraries themselves still
>have a dependency on libgcc_s.so:
>
>  $ ldd libwx_gtk2ud_fwb_core-2.9.so.0 | grep gcc
>  libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb6ee8000)

you can use the -nodefaultlibs and manually add what you want.
e.g. you can link a static stlport with static gcc stuff by:

(...) -nodefaultlibs -lstlport -lpthread -lgcc_eh -lgcc -lsupc++ -lc

and get e.g.:

$ ldd libExample.so
linux-vdso.so.1 =>  (0x7fff775fe000)
libdl.so.2 => /lib64/libdl.so.2 (0x2b3933a4e000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x2b3933c53000)
libm.so.6 => /lib64/libm.so.6 (0x2b3933e6e000)
libc.so.6 => /lib64/libc.so.6 (0x2b39340ed000)
/lib64/ld-linux-x86-64.so.2 (0x4000)

as you can see there're only deps to the core system libraries
and all these things related to the stl/gcc are linked in statically.

ps).
i'm using the stlport becuase libstdc++.a can't be linked
statically into shared libs -> http://gcc.gnu.org/PR28811


Re: RFC: Extend x86-64 psABI for 256bit AVX register

2008-06-06 Thread H.J. Lu
On Fri, Jun 06, 2008 at 10:28:34AM +0200, Jan Hubicka wrote:
> > 
> > ymm0 and xmm0 are the same register. xmm0 is the lower 128bit
> > of xmm0. I am not sure if we need separate XMM registers from
> > YMM registers.
> 
> 
> Yes, I know that xmm0 is lower part of ymm0.  I still think we ought to
> be able to support varargs that do save ymm0 registers only when ymm
> values are passed same way as we touch SSE only when SSE values are
> passed via EAX hint.

Which register do you propose for hint? The current psABI uses RAX
for XMM registers. We can't change it to AL and AH for YMM without
breaking backward compatibility.

> This way we will be able to support e.g. printf that has YMM printing %
> construct but don't need YMM enabled hardware when those are not used.
> 
> This is why I think extending EAX to contain information about amount of
> XMM values to save and in addition YMM values to save is sane.  Then old
> non-YMM aware varargs prologues will crash when YMM values are passed,
> but all other combinations will work.

I don't think it is necessary since -mavx will enable AVX code
generation for all SSE codes. Unless the function only uses integer,
it will crash on non-YMM aware hardware.  That is if there is one
SSE register is used, which is hinted in RAX, varargs prologue will
use AVX instructions to save it. We don't need another hint for AVX
instructions.

> > 
> > >
> > > I personally don't have much preferences over 1. or 2.. 1. seems
> > > relatively easy to implement too, or is packaging two 128bit values to
> > > single 256bit difficult in va_arg expansion?
> > >
> > 
> > Access to 256bit register as lower and upper 128bits needs 2
> > instructions. For store
> > 
> > vmovaps   %xmm7, -143(%rax)
> > vextractf128 $1, %ymm7, -15(%rax)
> > 
> > For load
> > 
> > vmovaps  -143(%rax),%xmm7
> > vinsert128 $1, -15(%rax),%ymm7,%ymm7
> > 
> > If we go beyond 256bit, we need more instructions to access
> > the full register. For 512bit, it will be split into lower 128bit,
> > middle 128bit and upper 256bit. 1024bit will have 4 parts.
> > 
> > For #2, only one instruction will be needed for 256bit and
> > beyond.
> 
> Yes, but we will still save half of stack space.  Well, I don't have
> much preferences here.  If it seems saner to simply save whole thing
> saving lower part twice, I am fine with that.

I was told that it wasn't very easy to get decent performance with
split access. I extended my proposal to include a 16bit bitmask to
indicate which YMM regisetrs should be saved. If the bit is 0,
we should only save the the lower 128bit in the original register
save area. Otherwise, we should only save the same whole YMM register.


H.J.

x86-64 psABI defines

typedef struct
{
  unsigned int gp_offset;
  unsigned int fp_offset;
  void *overflow_arg_area;
  void *reg_save_area;
} va_list[1];

for variable argument list. "va_list" is used to access variable argument
list:

void
bar (const char *format, va_list ap)
{
  if (va_arg (ap, int) != 0)
abort ();
}

void
foo(char *fmt, ...)
{
  va_list ap;
  va_start (fmt, ap); 
  bar (fmt, ap);
  va_end (ap);
}

foo and bar may be compiled with different compilers. We have to keep
the current layout for va_list so that we can mix va_list codes compiled
with AVX and non-AVX compilers. We need to extend the variable argument
handling in the x86-64 psABI to support passing __m256/__m256d/__m256i
on the variable argument list. We propose 2 ways to extend the register
save area to add 256bit AVX registers support:

1. Extend the register save area to put upper 128bit at the end.
  Pros: 
Aligned access.
Save stack space if 256bit registers are used. 
  Cons 
Split access. Require more split access beyond 256bit.

2. Extend the register save area to put full 265bit YMMs at the end.
The first DWORD after the register save area has the offset of the
extended array for YMM registers from the start of the register save
area. The next DWORD has the element size of the extended array.  The
next WORD encodes which YMM registers should be saved.  Unaligned access
will be used.

The Offset  Register 
original0   %rdi
register8   %rsi
save16  %rdx
area24  %rcx
32  %r8
40  %r9
48  %xmm0
64  %xmm1
...
288 %xmm15
Hints   304 320 offset from offset 0.
308 32  size of element
312 bitmask for used YMM registers
314 Unused
Extended320 %ymm0
array for   352 %ymm1
YMM ...
registers   800 %ymm15

  Pros: 
No split access.
Easily extendable beyond 256bit.
Limited unaligned access penalty if stack is aligned at 32byte.
  Cons:
May require store both the lower 128bit and full 256bit register
content. We may avoid saving the lower 128bit if correct t

Re: RFC: Extend x86-64 psABI for 256bit AVX register

2008-06-06 Thread H.J. Lu
On Fri, Jun 06, 2008 at 06:50:26AM -0700, H.J. Lu wrote:
> On Fri, Jun 06, 2008 at 10:28:34AM +0200, Jan Hubicka wrote:
> > > 
> > > ymm0 and xmm0 are the same register. xmm0 is the lower 128bit
> > > of xmm0. I am not sure if we need separate XMM registers from
> > > YMM registers.
> > 
> > 
> > Yes, I know that xmm0 is lower part of ymm0.  I still think we ought to
> > be able to support varargs that do save ymm0 registers only when ymm
> > values are passed same way as we touch SSE only when SSE values are
> > passed via EAX hint.
> 
> Which register do you propose for hint? The current psABI uses RAX
> for XMM registers. We can't change it to AL and AH for YMM without
> breaking backward compatibility.
> 
> > This way we will be able to support e.g. printf that has YMM printing %
> > construct but don't need YMM enabled hardware when those are not used.
> > 
> > This is why I think extending EAX to contain information about amount of
> > XMM values to save and in addition YMM values to save is sane.  Then old
> > non-YMM aware varargs prologues will crash when YMM values are passed,
> > but all other combinations will work.
> 
> I don't think it is necessary since -mavx will enable AVX code
> generation for all SSE codes. Unless the function only uses integer,
> it will crash on non-YMM aware hardware.  That is if there is one
> SSE register is used, which is hinted in RAX, varargs prologue will
> use AVX instructions to save it. We don't need another hint for AVX
> instructions.
> 
> > > 
> > > >
> > > > I personally don't have much preferences over 1. or 2.. 1. seems
> > > > relatively easy to implement too, or is packaging two 128bit values to
> > > > single 256bit difficult in va_arg expansion?
> > > >
> > > 
> > > Access to 256bit register as lower and upper 128bits needs 2
> > > instructions. For store
> > > 
> > > vmovaps   %xmm7, -143(%rax)
> > > vextractf128 $1, %ymm7, -15(%rax)
> > > 
> > > For load
> > > 
> > > vmovaps  -143(%rax),%xmm7
> > > vinsert128 $1, -15(%rax),%ymm7,%ymm7
> > > 
> > > If we go beyond 256bit, we need more instructions to access
> > > the full register. For 512bit, it will be split into lower 128bit,
> > > middle 128bit and upper 256bit. 1024bit will have 4 parts.
> > > 
> > > For #2, only one instruction will be needed for 256bit and
> > > beyond.
> > 
> > Yes, but we will still save half of stack space.  Well, I don't have
> > much preferences here.  If it seems saner to simply save whole thing
> > saving lower part twice, I am fine with that.
> 
> I was told that it wasn't very easy to get decent performance with
> split access. I extended my proposal to include a 16bit bitmask to
> indicate which YMM regisetrs should be saved. If the bit is 0,
> we should only save the the lower 128bit in the original register
> save area. Otherwise, we should only save the same whole YMM register.
> 

My second thought. How useful is such a bitmask? Do we really
need it? Is that accepetable to save the lower 128bit twice?

Thanks.


H.J.


Re: RFC: Extend x86-64 psABI for 256bit AVX register

2008-06-06 Thread Richard Guenther
On Fri, Jun 6, 2008 at 4:28 PM, H.J. Lu <[EMAIL PROTECTED]> wrote:
> On Fri, Jun 06, 2008 at 06:50:26AM -0700, H.J. Lu wrote:
>> On Fri, Jun 06, 2008 at 10:28:34AM +0200, Jan Hubicka wrote:
>> > >
>> > > ymm0 and xmm0 are the same register. xmm0 is the lower 128bit
>> > > of xmm0. I am not sure if we need separate XMM registers from
>> > > YMM registers.
>> >
>> >
>> > Yes, I know that xmm0 is lower part of ymm0.  I still think we ought to
>> > be able to support varargs that do save ymm0 registers only when ymm
>> > values are passed same way as we touch SSE only when SSE values are
>> > passed via EAX hint.
>>
>> Which register do you propose for hint? The current psABI uses RAX
>> for XMM registers. We can't change it to AL and AH for YMM without
>> breaking backward compatibility.
>>
>> > This way we will be able to support e.g. printf that has YMM printing %
>> > construct but don't need YMM enabled hardware when those are not used.
>> >
>> > This is why I think extending EAX to contain information about amount of
>> > XMM values to save and in addition YMM values to save is sane.  Then old
>> > non-YMM aware varargs prologues will crash when YMM values are passed,
>> > but all other combinations will work.
>>
>> I don't think it is necessary since -mavx will enable AVX code
>> generation for all SSE codes. Unless the function only uses integer,
>> it will crash on non-YMM aware hardware.  That is if there is one
>> SSE register is used, which is hinted in RAX, varargs prologue will
>> use AVX instructions to save it. We don't need another hint for AVX
>> instructions.
>>
>> > >
>> > > >
>> > > > I personally don't have much preferences over 1. or 2.. 1. seems
>> > > > relatively easy to implement too, or is packaging two 128bit values to
>> > > > single 256bit difficult in va_arg expansion?
>> > > >
>> > >
>> > > Access to 256bit register as lower and upper 128bits needs 2
>> > > instructions. For store
>> > >
>> > > vmovaps   %xmm7, -143(%rax)
>> > > vextractf128 $1, %ymm7, -15(%rax)
>> > >
>> > > For load
>> > >
>> > > vmovaps  -143(%rax),%xmm7
>> > > vinsert128 $1, -15(%rax),%ymm7,%ymm7
>> > >
>> > > If we go beyond 256bit, we need more instructions to access
>> > > the full register. For 512bit, it will be split into lower 128bit,
>> > > middle 128bit and upper 256bit. 1024bit will have 4 parts.
>> > >
>> > > For #2, only one instruction will be needed for 256bit and
>> > > beyond.
>> >
>> > Yes, but we will still save half of stack space.  Well, I don't have
>> > much preferences here.  If it seems saner to simply save whole thing
>> > saving lower part twice, I am fine with that.
>>
>> I was told that it wasn't very easy to get decent performance with
>> split access. I extended my proposal to include a 16bit bitmask to
>> indicate which YMM regisetrs should be saved. If the bit is 0,
>> we should only save the the lower 128bit in the original register
>> save area. Otherwise, we should only save the same whole YMM register.
>>
>
> My second thought. How useful is such a bitmask? Do we really
> need it? Is that accepetable to save the lower 128bit twice?

Why do we need to save the lower 128bit at all if a ymm reg is passed?
Can't we assume "type-correctness"?

Richard.


Re: RFC: Extend x86-64 psABI for 256bit AVX register

2008-06-06 Thread H.J. Lu
On Fri, Jun 6, 2008 at 7:31 AM, Richard Guenther
<[EMAIL PROTECTED]> wrote:
> On Fri, Jun 6, 2008 at 4:28 PM, H.J. Lu <[EMAIL PROTECTED]> wrote:
>> On Fri, Jun 06, 2008 at 06:50:26AM -0700, H.J. Lu wrote:
>>> On Fri, Jun 06, 2008 at 10:28:34AM +0200, Jan Hubicka wrote:
>>> > >
>>> > > ymm0 and xmm0 are the same register. xmm0 is the lower 128bit
>>> > > of xmm0. I am not sure if we need separate XMM registers from
>>> > > YMM registers.
>>> >
>>> >
>>> > Yes, I know that xmm0 is lower part of ymm0.  I still think we ought to
>>> > be able to support varargs that do save ymm0 registers only when ymm
>>> > values are passed same way as we touch SSE only when SSE values are
>>> > passed via EAX hint.
>>>
>>> Which register do you propose for hint? The current psABI uses RAX
>>> for XMM registers. We can't change it to AL and AH for YMM without
>>> breaking backward compatibility.
>>>
>>> > This way we will be able to support e.g. printf that has YMM printing %
>>> > construct but don't need YMM enabled hardware when those are not used.
>>> >
>>> > This is why I think extending EAX to contain information about amount of
>>> > XMM values to save and in addition YMM values to save is sane.  Then old
>>> > non-YMM aware varargs prologues will crash when YMM values are passed,
>>> > but all other combinations will work.
>>>
>>> I don't think it is necessary since -mavx will enable AVX code
>>> generation for all SSE codes. Unless the function only uses integer,
>>> it will crash on non-YMM aware hardware.  That is if there is one
>>> SSE register is used, which is hinted in RAX, varargs prologue will
>>> use AVX instructions to save it. We don't need another hint for AVX
>>> instructions.
>>>
>>> > >
>>> > > >
>>> > > > I personally don't have much preferences over 1. or 2.. 1. seems
>>> > > > relatively easy to implement too, or is packaging two 128bit values to
>>> > > > single 256bit difficult in va_arg expansion?
>>> > > >
>>> > >
>>> > > Access to 256bit register as lower and upper 128bits needs 2
>>> > > instructions. For store
>>> > >
>>> > > vmovaps   %xmm7, -143(%rax)
>>> > > vextractf128 $1, %ymm7, -15(%rax)
>>> > >
>>> > > For load
>>> > >
>>> > > vmovaps  -143(%rax),%xmm7
>>> > > vinsert128 $1, -15(%rax),%ymm7,%ymm7
>>> > >
>>> > > If we go beyond 256bit, we need more instructions to access
>>> > > the full register. For 512bit, it will be split into lower 128bit,
>>> > > middle 128bit and upper 256bit. 1024bit will have 4 parts.
>>> > >
>>> > > For #2, only one instruction will be needed for 256bit and
>>> > > beyond.
>>> >
>>> > Yes, but we will still save half of stack space.  Well, I don't have
>>> > much preferences here.  If it seems saner to simply save whole thing
>>> > saving lower part twice, I am fine with that.
>>>
>>> I was told that it wasn't very easy to get decent performance with
>>> split access. I extended my proposal to include a 16bit bitmask to
>>> indicate which YMM regisetrs should be saved. If the bit is 0,
>>> we should only save the the lower 128bit in the original register
>>> save area. Otherwise, we should only save the same whole YMM register.
>>>
>>
>> My second thought. How useful is such a bitmask? Do we really
>> need it? Is that accepetable to save the lower 128bit twice?
>
> Why do we need to save the lower 128bit at all if a ymm reg is passed?
> Can't we assume "type-correctness"?

Say a double is passed in YMM0/XMM0, we should save it in XMM0 area.
Do we also need to save the whole 256bit YMM0? If we save both XMM0 and
YMM0, we are free to use any location to load the saved register content.
Either one will be correct.


-- 
H.J.


Re: RFC: Extend x86-64 psABI for 256bit AVX register

2008-06-06 Thread Richard Guenther
On Fri, Jun 6, 2008 at 4:40 PM, H.J. Lu <[EMAIL PROTECTED]> wrote:
> On Fri, Jun 6, 2008 at 7:31 AM, Richard Guenther
> <[EMAIL PROTECTED]> wrote:
>> On Fri, Jun 6, 2008 at 4:28 PM, H.J. Lu <[EMAIL PROTECTED]> wrote:
>>> On Fri, Jun 06, 2008 at 06:50:26AM -0700, H.J. Lu wrote:
 On Fri, Jun 06, 2008 at 10:28:34AM +0200, Jan Hubicka wrote:
 > >
 > > ymm0 and xmm0 are the same register. xmm0 is the lower 128bit
 > > of xmm0. I am not sure if we need separate XMM registers from
 > > YMM registers.
 >
 >
 > Yes, I know that xmm0 is lower part of ymm0.  I still think we ought to
 > be able to support varargs that do save ymm0 registers only when ymm
 > values are passed same way as we touch SSE only when SSE values are
 > passed via EAX hint.

 Which register do you propose for hint? The current psABI uses RAX
 for XMM registers. We can't change it to AL and AH for YMM without
 breaking backward compatibility.

 > This way we will be able to support e.g. printf that has YMM printing %
 > construct but don't need YMM enabled hardware when those are not used.
 >
 > This is why I think extending EAX to contain information about amount of
 > XMM values to save and in addition YMM values to save is sane.  Then old
 > non-YMM aware varargs prologues will crash when YMM values are passed,
 > but all other combinations will work.

 I don't think it is necessary since -mavx will enable AVX code
 generation for all SSE codes. Unless the function only uses integer,
 it will crash on non-YMM aware hardware.  That is if there is one
 SSE register is used, which is hinted in RAX, varargs prologue will
 use AVX instructions to save it. We don't need another hint for AVX
 instructions.

 > >
 > > >
 > > > I personally don't have much preferences over 1. or 2.. 1. seems
 > > > relatively easy to implement too, or is packaging two 128bit values 
 > > > to
 > > > single 256bit difficult in va_arg expansion?
 > > >
 > >
 > > Access to 256bit register as lower and upper 128bits needs 2
 > > instructions. For store
 > >
 > > vmovaps   %xmm7, -143(%rax)
 > > vextractf128 $1, %ymm7, -15(%rax)
 > >
 > > For load
 > >
 > > vmovaps  -143(%rax),%xmm7
 > > vinsert128 $1, -15(%rax),%ymm7,%ymm7
 > >
 > > If we go beyond 256bit, we need more instructions to access
 > > the full register. For 512bit, it will be split into lower 128bit,
 > > middle 128bit and upper 256bit. 1024bit will have 4 parts.
 > >
 > > For #2, only one instruction will be needed for 256bit and
 > > beyond.
 >
 > Yes, but we will still save half of stack space.  Well, I don't have
 > much preferences here.  If it seems saner to simply save whole thing
 > saving lower part twice, I am fine with that.

 I was told that it wasn't very easy to get decent performance with
 split access. I extended my proposal to include a 16bit bitmask to
 indicate which YMM regisetrs should be saved. If the bit is 0,
 we should only save the the lower 128bit in the original register
 save area. Otherwise, we should only save the same whole YMM register.

>>>
>>> My second thought. How useful is such a bitmask? Do we really
>>> need it? Is that accepetable to save the lower 128bit twice?
>>
>> Why do we need to save the lower 128bit at all if a ymm reg is passed?
>> Can't we assume "type-correctness"?
>
> Say a double is passed in YMM0/XMM0, we should save it in XMM0 area.
> Do we also need to save the whole 256bit YMM0? If we save both XMM0 and
> YMM0, we are free to use any location to load the saved register content.
> Either one will be correct.

What is the benefit here?  (What would the contents of the upper 128bit
be - apart from "undefined")

I suppose you can load into xmm0 and then "extend" to ymm0?

Richard.


Re: RFC: Extend x86-64 psABI for 256bit AVX register

2008-06-06 Thread Jakub Jelinek
On Thu, Jun 05, 2008 at 07:31:12AM -0700, H.J. Lu wrote:
> 1. Extend the register save area to put upper 128bit at the end.
>   Pros:
> Aligned access.
> Save stack space if 256bit registers are used.
>   Cons
> Split access. Require more split access beyond 256bit.
> 
> 2. Extend the register save area to put full 265bit YMMs at the end.
> The first DWORD after the register save area has the offset of
> the extended array for YMM registers. The next DWORD has the
> element size of the extended array. Unaligned access will be used.
>   Pros:
> No split access.
> Easily extendable beyond 256bit.
> Limited unaligned access penalty if stack is aligned at 32byte.
>   Cons:
> May require store both the lower 128bit and full 256bit register
> content. We may avoid saving the lower 128bit if correct type
> is required when accessing variable argument list, similar to int
> vs. double.
> Waste 272 byte on stack when 256bit registers are used.
> Unaligned load and store.

Or:

3. Pass unnamed __m256 arguments both in YMM registers and on the
stack or just on the stack.  How often do you think people pass
vectors to varargs functions?  I think I haven't seen that yet except
in gcc testcases.  The x86_64 float varargs setup prologue is already
quite slow now, do we want to make it even slower for something
very rarely used?  Although we have tree-stdarg optimization pass
which is able to optimize the varargs prologue setup code in some cases,
e.g. for printf etc. it can't help, as printf etc. just
does va_start, passes the va_list to another function and does va_end,
so it must count with any possibility.  Named __m256 arguments would
still be passed in YMM registers only...

Jakub


Re: Re: Shared library without dependence on libgcc_s.so

2008-06-06 Thread Arne Steinarson
>>The shared libraries themselves still
>>have a dependency on libgcc_s.so:
>>
>>  $ ldd libwx_gtk2ud_fwb_core-2.9.so.0 | grep gcc
>>  libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb6ee8000)
>
>you can use the -nodefaultlibs and manually add what you want.
>e.g. you can link a static stlport with static gcc stuff by:
>
>(...) -nodefaultlibs -lstlport -lpthread -lgcc_eh -lgcc -lsupc++ -lc

I tried inserting this into my make command (see below). Still I get
libgcc_s.so dependency (after rebuiling whole project).

We  really have this situation:

  EXE links to (lib_wx_base.so and lib_wx_net.so)
  lib_wx_net.so links to lib_wx_base.so

  lib_wx_base.so does _not_ depend on libgcc_s.so
  lib_wx_net.so does depend on libgcc_s.so

For some reason, when linking one shared library to another one, it
seems GCC throws in this dependency (contradictory to compiler
options). It doesn't understand that both of the shared libraries are
built against static libgcc.

Maybe I'll have to build the app 100% static...

Regards
// ATS.

Linker command:
g++ -shared -fPIC -o
/usr/src/wxSVN/wxWidgets/build-fwb-debug/lib/libwx_baseud_fwb_net-2.9.so.0.0.0
 netdll_fs_inet.o netdll_ftp.o netdll_http.o netdll_protocol.o
netdll_sckaddr.o netdll_sckfile.o netdll_sckipc.o netdll_sckstrm.o
netdll_socket.o netdll_url.o netdll_gsocket.o
-L/usr/src/wxSVN/wxWidgets/build-fwb-debug/lib -nodefaultlibs
-static-libgcc -lpthread -lgcc_eh -lgcc -lsupc++ -lc
-L/usr/src/wxSVN/wxWidgets/build-fwb-debug/lib
-Wl,-soname,libwx_baseud_fwb_net-2.9.so.0   -lz -ldl -lm
-lwxregexud_fwb-2.9  -pthread
-Wl,--version-script,/usr/src/wxSVN/wxWidgets/build-fwb-debug/version-script
-lz -ldl -lm  -lwx_baseud_fwb-2.9


gcc without binutils

2008-06-06 Thread Alexandros Tzannes

Hi all,
I was wondering if there is a configuration parameter for gcc that would
prevent it from using an assembler and linker. I have a port of
gcc 4.0.2 for an experimental architecture and I have my own assembler 
and linker. Currently I am compiling gcc with binutils compiled for MIPS 
 but renamed to make the scripts think they are what is needed. I only 
ever  use gcc to produce .s files so if there is a way to configure and 
build gcc without having an assembler and linker it would make my build

process cleaner.

By the way I tried using --with-as and --with-ld but I get a make error
saying "*** This configuration requires the GNU assembler". If there is 
a way to achieve that without making too many changes in the configure 
and build scripts it would be preferable.


Thank you in advance,
Alex Tzannes



ln -r and cherry picking.

2008-06-06 Thread Kenneth Zadeck

I want to point out that the current implementation of lto is not
compatible with "ln -r", and will need to be modified to support
"cherry picking" the function bodies.

In the current implementation, each lto section (such as what holds
a function body or the streamed information from an ipa pass)
references an index that is unique to that .o file.  This index allows
the encoding of the function body or the ipa information to use small
integers to reference the global types and declarations.

"ln -r" will not work in the current system for two reasons:

 1) The ipa passes currently just read the ipa sections until they
 hit their "end of file marker".  Assuming that we create these
 section with attributes to tell "ln -r" to concatenate them (I have
 no idea how to do this but I assume it is easy), the ipa pass's
 stream readers will need to modified to restart after the end of
 file marker and do this until the number of bytes in the section is
 exhausted.  This should only require wrapping them in an extra loop.

 2) LTO sections need to be able to find "their index" of decls and
 types.  By "their index" I mean the index that each section used to
 reference the decls and types when the section was generated.

 There is currently no identifying fingerprint in a .o file to match
 an index with the sections that need it.  Note that the indexes for
 different .o files are (obviously) different.  We will need to add a
 fingerprint for each .o file so that when sections from different .o
 files are merged, we can match a section with the proper index.  I
 currently do not know how to generate a finger print.  I assume some
 string with the name of the machine, the process id and the time is
 what we want, but the right answer could also be something like the
 md5 checksum of some large part of the .o file.   


(2) is also an issue for the cherry picker to deal with, because the
picker is not only going to have to cherry pick but it is going to
have to regenerate all of the indexes based on how the decls and types
are merged.

Note that having this index external to the section that holds a
function body is the only way for the cherry picking to work without
having the function bodies modified.  If explicit references to the
decls and types were in the function bodies, you would have to rewrite
the function bodies to make them point to the proper merged type or
decl.

I do not want to tackle either of these tasks until the streaming lto
branch is merged into the lto branch, but this is certainly something
that will need to be done.

Kenny


Re: ln -r and cherry picking.

2008-06-06 Thread Arnaud Charlet
> I want to point out that the current implementation of lto is not
> compatible with "ln -r", and will need to be modified to support
> "cherry picking" the function bodies.

I assume you mean "ld -r", right ?

Arno


Re: ln -r and cherry picking.

2008-06-06 Thread Kenneth Zadeck

Arnaud Charlet wrote:

I want to point out that the current implementation of lto is not
compatible with "ln -r", and will need to be modified to support
"cherry picking" the function bodies.



I assume you mean "ld -r", right ?

Arno
  

yes, of course. Dennis Richie's curse: two letter commands.


GCC 4.3.1 Status Report (2008-06-06)

2008-06-06 Thread Jakub Jelinek
Status
==

The GCC 4.3 branch is now again open for commits under normal release
branch rules.

GCC 4.3.1 has been tagged, tarballs and diffs are on gcc.gnu.org and
so far partly on ftp.gnu.org.  The announcement will go out after the
weekend to let mirrors sync it up.

We got quite a lot of new regressions reported since the last report,
so it would be good to fix some up to get back to nicer stats.

Quality Data


Priority  # Change from Last Report
--- ---
P10 -  1
P2  101 +  1
P3   15 + 12
--- ---
Total   116 + 12

Previous Report
===

http://gcc.gnu.org/ml/gcc/2008-05/msg00212.html

The next report for the 4.3 branch will be sent by Mark.


Re: ln -r and cherry picking.

2008-06-06 Thread Cary Coutant
>  2) LTO sections need to be able to find "their index" of decls and
>  types.  By "their index" I mean the index that each section used to
>  reference the decls and types when the section was generated.

Can't you just put an ELF symbol (can be an unnamed local -- could
even just be a section symbol) on the index section, then add a
pointer in the IR section with a relocation to that symbol? This is
basically how DWARF .debug_info sections point to the abbrev table in
the .debug_abbrev sections.

-cary


Re: ln -r and cherry picking.

2008-06-06 Thread Kenneth Zadeck

Cary Coutant wrote:

 2) LTO sections need to be able to find "their index" of decls and
 types.  By "their index" I mean the index that each section used to
 reference the decls and types when the section was generated.



Can't you just put an ELF symbol (can be an unnamed local -- could
even just be a section symbol) on the index section, then add a
pointer in the IR section with a relocation to that symbol? This is
basically how DWARF .debug_info sections point to the abbrev table in
the .debug_abbrev sections.

-cary
  
I think that one of the goals here is to not make that too dependent on 
elf.  For instance, we are in the process of getting rid of all of the 
dwarf.  After maddox does that, our only dependence on elf will be as a 
container to hold all of the sections.  

Given that gcc is not always an elf compiler, it really is a lot easier 
to invent your own wheel for something like this rather that using elf's 
wheel for the first target and then having to figure out how make 
someone else wheel fit the for the rest of the targets.


kenny


Re: ln -r and cherry picking.

2008-06-06 Thread Cary Coutant
> I think that one of the goals here is to not make that too dependent on elf.
>  For instance, we are in the process of getting rid of all of the dwarf.
>  After maddox does that, our only dependence on elf will be as a container
> to hold all of the sections.
> Given that gcc is not always an elf compiler, it really is a lot easier to
> invent your own wheel for something like this rather that using elf's wheel
> for the first target and then having to figure out how make someone else
> wheel fit the for the rest of the targets.

This is basic functionality that *every* object file format supports.
I don't think using a symbol and a relocation is going to tie you down
to ELF -- no more so than the idea of using sections to store your
data in.

I also think it's simpler and more deterministic than using a hash.

-cary


Re: How to reserve an Elf e_machine value

2008-06-06 Thread Stephen Andieta
Yep, my request to [EMAIL PROTECTED] just bounced, so I will live with a random 
number.

Thanks,

Stephen

- Original Message 
From: Michael Meissner <[EMAIL PROTECTED]>
To: Stephen Andieta <[EMAIL PROTECTED]>
Cc: gcc@gcc.gnu.org
Sent: Wednesday, June 4, 2008 7:44:30 AM
Subject: Re: How to reserve an Elf e_machine value

On Tue, Jun 03, 2008 at 08:46:44AM -0700, Stephen Andieta wrote:
> 
> I am working on a compiler kit for an in-house processor that uses Elf as
> object file format. Since this compiler will be released to external
> customers, I need to reserve an 'official' e_machine value for this
> processor. Somehow I am unable to find out how to reserve such a value. How
> should I do this?
>Thanks,   Stephen.

This is a binutils problem, not a GCC.

The problem is the company that assigns the official numbers (SCO) is rapidily
spinning out of control, and may not be responsive any more.  When I registered
EM_MEP in 2003, the address used was [EMAIL PROTECTED]

If you can't get an official number, there is this comment in elf/common.h:

/* If it is necessary to assign new unofficial EM_* values, please pick large
   random numbers (0x8523, 0xa7f2, etc.) to minimize the chances of collision
   with official or non-GNU unofficial values.

   NOTE: Do not just increment the most recent number by one.
   Somebody else somewhere will do exactly the same thing, and you
   will have a collision.  Instead, pick a random number.

   Normally, each entity or maintainer responsible for a machine with an
   unofficial e_machine number should eventually ask [EMAIL PROTECTED] for
   an officially blessed number to be added to the list above.*/

-- 
Michael Meissner, AMD
90 Central Street, MS 83-29, Boxborough, MA, 01719, USA
[EMAIL PROTECTED]


  



Re: How to reserve an Elf e_machine value

2008-06-06 Thread Cary Coutant
Let me second H.J.'s suggestion to post your request at

http://groups.google.com/group/generic-abi

In the absence of any SCO presence, that group now serves as the
closest thing we have to a standards forum for ELF and the gABI.

-cary


gcc-4.4-20080606 is now available

2008-06-06 Thread gccadmin
Snapshot gcc-4.4-20080606 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20080606/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 136509

You'll find:

gcc-4.4-20080606.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.4-20080606.tar.bz2 C front end and core compiler

gcc-ada-4.4-20080606.tar.bz2  Ada front end and runtime

gcc-fortran-4.4-20080606.tar.bz2  Fortran front end and runtime

gcc-g++-4.4-20080606.tar.bz2  C++ front end and runtime

gcc-java-4.4-20080606.tar.bz2 Java front end and runtime

gcc-objc-4.4-20080606.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.4-20080606.tar.bz2The GCC testsuite

Diffs from 4.4-20080530 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: How to reserve an Elf e_machine value

2008-06-06 Thread Joseph S. Myers
On Fri, 6 Jun 2008, Stephen Andieta wrote:

> Yep, my request to [EMAIL PROTECTED] just bounced, so I will live 
> with a random number.

caldera.com doesn't have an MX record whereas sco.com does, so maybe it's 
a problem with that old domain.  Try Dave Prosser <[EMAIL PROTECTED]> directly 
- 
he allocated the last e_machine value we registered in Jan 2007, and was 
still at SCO (posting to the WG14 reflector from that address) as of Feb 
2008.

Unfortunately the public table at 
 is very out 
of date - it goes up to 110 and the last value we got was 165 - so someone 
else can't just take over where SCO left off without SCO internal 
information, or guessing and leaving a large gap and hoping there isn't 
too much duplication.

-- 
Joseph S. Myers
[EMAIL PROTECTED]


Re: No warning for unreachable code

2008-06-06 Thread Segher Boessenkool

I'm suggesting that there's no big difference between unsigned char and
unsigned int (and unsigned long...) in this case, and, therefore
compiler's behaviour should be consistent.


But there is a difference.

When "x" is an unsigned int, the expression "x < 0" is equivalent to
(unsigned int) x < (unsigned int) 0
which can never be true, whatever "x" is, and results in
warning: comparison of unsigned expression < 0 is always false

When "x" is an unsigned char however, it reads
(int) x < (int) 0
which only cannot happen because of the particular values "x" can take,
and the warning is
warning: comparison is always false due to limited range of data type

Sounds sane to me, and both warning messages are clear.


And in this case I personally
would prefer a warning in both cases with -Wall.


Then please file a PR in bugzilla so your request won't get lost.

It might be that the warning isn't in -Wall because it has too many
false positives.  This is just a guess though.

And this wasn't a purely theoretical observation. There is a driver in 
the
Linux kernel, that does exactly this: assigns function return code to 
an

unsigned int variable, and then checks for "<0"... And the compiler
happily throws the check away. Yes, I will submit a patch, but if 
there is

no strong reason against, maybe it would make sense to warn in these
cases.


I can try to shift the blame and say the kernel should be built with
-Wextra .  But then people will shout at me again, so I won't.


Segher



How to write pattern for addition with carry operation

2008-06-06 Thread Mohamed Shafi
Hello all,

The 16bit target that i am porting to gcc4.1.2 doesn't have any
instructions for 32bit operations. But for addition and subtraction
there is
addc
subc
instructions that consider carry bit also. Presently i have patterns
for SImode addition and subtraction such that the template will have

add %0, %1\naddc %N0, %N1
sub %0, %1\nsubc %N0, %N1

Will it be possible for me to write separate patterns for the
instructions add and addc?

Regards,
Shafi