Re: LTO inlining of transactional builtins

2012-06-26 Thread Richard Guenther
On Mon, Jun 25, 2012 at 4:32 PM, Richard Henderson  wrote:
> On 2012-06-22 06:08, Richard Guenther wrote:
>> Do I understand correctly that inlining the builtin at expansion time is not
>> good because the implementation detail may depend on how libitm was
>> configured?
>
> More or less, yes.
>
> We want tm in gcc to be useful to researchers experimenting in the area.
> That means not building stuff into the main compiler beyond the ABI.
> The ABI is just a function call.
>
> However, in some implementations (including our own) the ABI call is a
> small wrapper that calls a function in a dispatch table.  Avoiding the
> call-to-call at least in static linking situations is more or less
> exactly what LTO does best.
>
> Unfortunately, we have a real phase ordering problem here.  LTO is also
> what gives us the visibility into the program itself and allows us to
> clone functions for tm based on what's in the transaction.

In theory we should be able to do multiple "LTO" passes.  So we could do
 a.c a.o
 ...   ->  -> WPA -> LTRANS and TM lowering -> WPA -> LTRANS and RTL expand
 x.c x.o

Thus, after a first wave of WPA and LTRANS in non-lowered TM we can,
after the TM lowering in the first LTRANS phase write out LTO bytecode again
and re-start WPA from that.  It might get tricky with respect to how we drive
the compile via the linker plugin but in theory GCC itself should not care
if the LTO objects that we feed into WPA stage come from early optimizations
or from LTRANS optimizations (well, you have to cut off at a suitable place
before RTL expansion of course).

So - I suppose enhancing the infrastructure for such multiple runs through
WPA / LTRANS would be a nice thing to have anyways and would probably
solve your issue, too.

Richard.

>

> r~


Re: LTO inlining of transactional builtins

2012-06-26 Thread Jan Hubicka
> On Mon, Jun 25, 2012 at 4:32 PM, Richard Henderson  wrote:
> > On 2012-06-22 06:08, Richard Guenther wrote:
> >> Do I understand correctly that inlining the builtin at expansion time is 
> >> not
> >> good because the implementation detail may depend on how libitm was
> >> configured?
> >
> > More or less, yes.
> >
> > We want tm in gcc to be useful to researchers experimenting in the area.
> > That means not building stuff into the main compiler beyond the ABI.
> > The ABI is just a function call.
> >
> > However, in some implementations (including our own) the ABI call is a
> > small wrapper that calls a function in a dispatch table.  Avoiding the
> > call-to-call at least in static linking situations is more or less
> > exactly what LTO does best.
> >
> > Unfortunately, we have a real phase ordering problem here.  LTO is also
> > what gives us the visibility into the program itself and allows us to
> > clone functions for tm based on what's in the transaction.
> 
> In theory we should be able to do multiple "LTO" passes.  So we could do
>  a.c a.o
>  ...   ->  -> WPA -> LTRANS and TM lowering -> WPA -> LTRANS and RTL 
> expand
>  x.c x.o
> 
> Thus, after a first wave of WPA and LTRANS in non-lowered TM we can,
> after the TM lowering in the first LTRANS phase write out LTO bytecode again
> and re-start WPA from that.  It might get tricky with respect to how we drive
> the compile via the linker plugin but in theory GCC itself should not care
> if the LTO objects that we feed into WPA stage come from early optimizations
> or from LTRANS optimizations (well, you have to cut off at a suitable place
> before RTL expansion of course).
> 
> So - I suppose enhancing the infrastructure for such multiple runs through
> WPA / LTRANS would be a nice thing to have anyways and would probably
> solve your issue, too.

I would say that double streaming would be more expensive than making WPA stage 
to load in the relevant function bodies and modify them if the tm pass really 
can not be split into proper IPA pass (at the moment I do not see why).

Honza
> 
> Richard.
> 
> >
> 
> > r~


Re: LTO inlining of transactional builtins

2012-06-26 Thread Richard Guenther
On Tue, Jun 26, 2012 at 10:29 AM, Jan Hubicka  wrote:
>> On Mon, Jun 25, 2012 at 4:32 PM, Richard Henderson  wrote:
>> > On 2012-06-22 06:08, Richard Guenther wrote:
>> >> Do I understand correctly that inlining the builtin at expansion time is 
>> >> not
>> >> good because the implementation detail may depend on how libitm was
>> >> configured?
>> >
>> > More or less, yes.
>> >
>> > We want tm in gcc to be useful to researchers experimenting in the area.
>> > That means not building stuff into the main compiler beyond the ABI.
>> > The ABI is just a function call.
>> >
>> > However, in some implementations (including our own) the ABI call is a
>> > small wrapper that calls a function in a dispatch table.  Avoiding the
>> > call-to-call at least in static linking situations is more or less
>> > exactly what LTO does best.
>> >
>> > Unfortunately, we have a real phase ordering problem here.  LTO is also
>> > what gives us the visibility into the program itself and allows us to
>> > clone functions for tm based on what's in the transaction.
>>
>> In theory we should be able to do multiple "LTO" passes.  So we could do
>>  a.c     a.o
>>  ...   ->      -> WPA -> LTRANS and TM lowering -> WPA -> LTRANS and RTL 
>> expand
>>  x.c     x.o
>>
>> Thus, after a first wave of WPA and LTRANS in non-lowered TM we can,
>> after the TM lowering in the first LTRANS phase write out LTO bytecode again
>> and re-start WPA from that.  It might get tricky with respect to how we drive
>> the compile via the linker plugin but in theory GCC itself should not care
>> if the LTO objects that we feed into WPA stage come from early optimizations
>> or from LTRANS optimizations (well, you have to cut off at a suitable place
>> before RTL expansion of course).
>>
>> So - I suppose enhancing the infrastructure for such multiple runs through
>> WPA / LTRANS would be a nice thing to have anyways and would probably
>> solve your issue, too.
>
> I would say that double streaming would be more expensive than making WPA 
> stage to load in the relevant function bodies and modify them if the tm pass 
> really can not be split into proper IPA pass (at the moment I do not see why).

I'm not sure TM people care about double streaming cost ;)  As far as I can
see TM people want the non-lowered form go through at least loop optimizations,
so I don't see how even a proper IPA pass would help here.  As of cherry-picking
function bodies at late state, yes, that would be another missing feature to
consider.  But it would at least need the full WPA callgraph to be available,
or in case of static libraries the linker plugin would need to feed even unused
archive parts to WPA ... thus it would require some fake symtab entries we'd
need to feed to the linker plugin.  Quite some special case code in this case.

Richard.

> Honza
>>
>> Richard.
>>
>> >
>>
>> > r~


Re: LTO inlining of transactional builtins

2012-06-26 Thread Jan Hubicka
> 
> I'm not sure TM people care about double streaming cost ;)  As far as I can
> see TM people want the non-lowered form go through at least loop 
> optimizations,
> so I don't see how even a proper IPA pass would help here.  As of 
> cherry-picking

:) Yep, this is kind of similar to what we may want to do for datastructure 
changes etc.
Seems that loading selected functions into WPA stage and modifying them will 
end up
cheaper than double streaming, because this usually afects ust small portion of 
program.
But I am not sure. Double streaming is implementable, too.

Honza

> function bodies at late state, yes, that would be another missing feature to
> consider.  But it would at least need the full WPA callgraph to be available,
> or in case of static libraries the linker plugin would need to feed even 
> unused
> archive parts to WPA ... thus it would require some fake symtab entries we'd
> need to feed to the linker plugin.  Quite some special case code in this case.
> 
> Richard.
> 
> > Honza
> >>
> >> Richard.
> >>
> >> >
> >>
> >> > r~


Build results for x86_64-apple-darwin11.4.0

2012-06-26 Thread Josh Reese
Not sure if this is desired since 11.3.0 is already on the site but:

13:03@legolas:.+gcc4.7.0/objdir$ ../srcdir/config.guess 
x86_64-apple-darwin11.4.0

13:07@legolas:.+4.7.0/local/bin$ ./gcc -v
Using built-in specs.
COLLECT_GCC=./gcc
COLLECT_LTO_WRAPPER=/Users/jreese/Documents/school/edinburgh/project/builds/gcc-4.7.0/local/bin/../libexec/gcc/x86_64-apple-darwin11.4.0/4.7.0/lto-wrapper
Target: x86_64-apple-darwin11.4.0
Configured with: ../srcdir/configure 
--prefix=/Users/jreese/Documents/school/edinburgh/project/compressed/gcc-4.7.0/local
Thread model: posix
gcc version 4.7.0 (GCC)

Cheers,
Josh



Re: ARM: gcc generates two identical strd instructions to store 8 bytes

2012-06-26 Thread Nathanaël Prémillieu



Le 26/06/2012 00:16, Michael Hope a écrit :

On 26 June 2012 00:48, Nathanaël Prémillieu  wrote:

Hi all,

I am using the gcc ARM cross-compiler (gcc version 4.6.3 (Ubuntu/Linaro
4.6.3-1ubuntu5)). Compiling the test.c code (in attachement) with:

'arm-linux-gnueabi-gcc -S test.c'

I obtain the test.s assembly code (in attachement). At lines 56 and 57 of
the test.s there is two identical strd instructions:

56  strdr2, [r7]
57  strdr2, [r7]

I have checked the semantic of the ARM strd instruction and I have not seen
any side effect of this instruction that could explain why gcc need to put
this instruction two times in a row. For me, one is sufficient to store the
8-bytes variable into memory.

Is there an explanation?


Hi Nathanaël.  Your question is more appropriate for the gcc-help
list.  This list is about the development of GCC itself.


I do not ask for help, I just want to highlight what seems to me a 
strange behavior.



You've built with optimisation turned off so GCC has generated correct
but inefficient code.  The double store could be side effect of
expanding the 64 bit multiply into the component 32 bit multiplies or
the conditional.  Try building at -O or higher.


At higher optimization level, the store are eliminated.


-- Michael





GNU MPFR 3.1.1 Release Candidate

2012-06-26 Thread Vincent Lefevre
The release of GNU MPFR 3.1.1 ("canard à l'orange" patch level 1)
is imminent. Please help to make this release as good as possible
by downloading and testing this release candidate:

http://www.mpfr.org/mpfr-3.1.1/mpfr-3.1.1-rc1.tar.xz
http://www.mpfr.org/mpfr-3.1.1/mpfr-3.1.1-rc1.tar.bz2
http://www.mpfr.org/mpfr-3.1.1/mpfr-3.1.1-rc1.tar.gz
http://www.mpfr.org/mpfr-3.1.1/mpfr-3.1.1-rc1.zip

The MD5's:
a1a0abb6f2611a9cc388261b12304ecb  mpfr-3.1.1-rc1.tar.bz2
15399799f44d30e53d4700a4df5e8b5f  mpfr-3.1.1-rc1.tar.gz
edb139ab0b51160de54b611b33f3be57  mpfr-3.1.1-rc1.tar.xz
7b005bc1877db361f29224f42f1c07b8  mpfr-3.1.1-rc1.zip

The SHA1's:
59a005cb515e42a604d791c4dcdc1865b5951ccb  mpfr-3.1.1-rc1.tar.bz2
a93d0764b6f71dcf2259e6d26cc8cb8f715fd545  mpfr-3.1.1-rc1.tar.gz
46ec98c1a3ed352336036e35517032140ee421db  mpfr-3.1.1-rc1.tar.xz
9385d07e18f69cade3d0ca1157cd1f1d67fd92fe  mpfr-3.1.1-rc1.zip

The signatures:
http://www.mpfr.org/mpfr-3.1.1/mpfr-3.1.1-rc1.tar.xz.asc
http://www.mpfr.org/mpfr-3.1.1/mpfr-3.1.1-rc1.tar.bz2.asc
http://www.mpfr.org/mpfr-3.1.1/mpfr-3.1.1-rc1.tar.gz.asc
http://www.mpfr.org/mpfr-3.1.1/mpfr-3.1.1-rc1.zip.asc

Changes from version 3.1.0 to version 3.1.1:
- Bug fixes (see  or ChangeLog file).

Please send success and failure reports with "./config.guess" output
to .

If no problems are found, GNU MPFR 3.1.1 should be released
around 2012-07-03.

Regards,

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: ARM: gcc generates two identical strd instructions to store 8 bytes

2012-06-26 Thread Ian Lance Taylor
Nathanaël Prémillieu  writes:

> I do not ask for help, I just want to highlight what seems to me a
> strange behavior.

The mailing list gcc@gcc.gnu.org is for discussion of the development of
GCC itself.  Discussion of GCC behaviour, including questions about
optimizations and possible bugs, is best conducted on gcc-help.

Thanks for your consideration.

Ian


Re: [x86-64 psABI] RFC: Extend x86-64 psABI to support x32

2012-06-26 Thread H.J. Lu
On Tue, Jun 26, 2012 at 12:36 PM, Mark Butler  wrote:
> On Monday, May 14, 2012 11:31:11 AM UTC-6, H.J. wrote:
>>
>> Support for the x32 psABI:
>>
>> http://sites.google.com/site/x32abi/
>>
>> is added in Linux kernel 3.4-rc1.  X32 uses the ILP32 model for x86-64
>> instruction set with size of long and pointers == 4 bytes.  X32 is
>> already supported in GCC 4.7.0 and binutils 2.22...Here is a
>> patch to extend x86-64 psABI for x32.  Any comments?
>>
>
> May I ask why the decision was made to use ILP32 instead of L64P32?   The
> latter would seem to avoid lots of porting problems in particular.  And if
> porting difficulties are the major complained about x32, is it really too
> late to switch?  Thanks - mdb

x32 is designed to replace ia32 where long is 32-bit, not x86-64.


-- 
H.J.


Re: [x86-64 psABI] RFC: Extend x86-64 psABI to support x32

2012-06-26 Thread H. Peter Anvin
On 06/26/2012 12:47 PM, H.J. Lu wrote:
>>
>> May I ask why the decision was made to use ILP32 instead of L64P32?   The
>> latter would seem to avoid lots of porting problems in particular.  And if
>> porting difficulties are the major complained about x32, is it really too
>> late to switch?  Thanks - mdb
> 
> x32 is designed to replace ia32 where long is 32-bit, not x86-64.
> 

It's worth noting that there are *no* Linux platforms that are not ILP32
or LP64, so adding a third memory model is likely to cause even more
problems...

-hpa



Re: [x86-64 psABI] RFC: Extend x86-64 psABI to support x32

2012-06-26 Thread H.J. Lu
On Tue, Jun 26, 2012 at 2:11 PM, Mark Butler  wrote:
>
>> x32 is designed to replace ia32 where long is 32-bit, not x86-64.
>>
> I understand, but wouldn't L64P32 be much better in the long run? In terms
> of compatibility with LP64, and an LP64 kernel in particular?  The structure
> layouts of any structure that did not contain pointers would be identical,
> for example.  struct timeval, struct timespec, struct stat, and on and on...

Linux/x32 uses the same layout for struct timeval, struct timespec, struct stat,
as Linux/x86-64. It is orthogonal to L64 vs L32.

-- 
H.J.