BImode is treated as normal byte-wide mode and causes bug.

2010-12-10 Thread Bingfeng Mei
Hi,
I am investigating a bug in our target port. It is
due to following optimization done by combine pass. 

(zero_extend:SI (reg:BI 120))

is transformed to 

(and:SI (subreg:SI (reg:BI 120) 0)
(const_int 255 [0xff]))

in expand_compound_operation (combine.c), where BImode is 
just treated as a byte-wide mode.

In machmode.def, BImode is defined as FRACTIONAL_INT_MODE (BI, 1, 1).
But the precision field is not used at all here. 

Even after I hacked the code to bypass the transformation. 

(subreg:QI (zero_extend:SI (reg:BI 120)) 0)

is still transformed to 
(subreg:QI (reg:BI 120) 0))

in simplify_subreg. This is wrong because the higher bits
of paradoxical subreg is undefined here, not zeros.

Grep GET_MODE_PRECISION returns not many results. It seems
that many rtx optimization functions don't consider 
FRACTIONAL_INT_MODE at all. If that is the case, we should
document that limitations (or maybe I missed it).

We need zero_extend BImode to model behaviour of moving 
lowest bit of predicate register into general register. 

Cheers,
Bingfeng



GCC building: Still libquadmath-related failures on bare irons?

2010-12-10 Thread Tobias Burnus

Hello,

given that the has been quite some libquadmath-related configure work: 
Are there still build problems due to link tests if one cross-builds for 
bare-iron targets? Or not? (Cf. PR 46520)


If so, I would start to tackle them next.

(As work around, one can now use --disable-libquadmath; however, I still 
would prefer to fix it such that it works out of the box.)


Tobias


Re: "ld -r" on mixed IR/non-IR objects (

2010-12-10 Thread H.J. Lu
On Thu, Dec 9, 2010 at 8:55 PM, H.J. Lu  wrote:
> On Thu, Dec 9, 2010 at 6:29 PM, H.J. Lu  wrote:
>> On Wed, Dec 8, 2010 at 9:36 AM, H.J. Lu  wrote:
>>> On Wed, Dec 8, 2010 at 5:54 AM, H.J. Lu  wrote:
 On Wed, Dec 8, 2010 at 1:19 AM, Andi Kleen  wrote:
>> On 12/07/2010 04:20 PM, Andi Kleen wrote:
>>>
>>> The only problem left is mixing of lto and non lto objects. this right
>>> now is not handled. IMHO still the best way to handle it is to use
>>> slim lto and then simply separate link the "left overs" after deleting
>>> the LTO objects. This can be actually done with objcopy (with some
>>> limitations), doesn't even need linker support.
>>>
>>
>> Quite possibly a better way to deal with that is to provide a mechanism
>> for encapsulating arbitrary binary code objects inside the LTO IR.
>
> Then you would need to teach your assembler and everything

 The magic section is generated by linker directly. No changes to
 assembler is required.

> else that may generate ELF objects to generate this magic object. But why
> not just ELF directly? that is what it is after all.

 My proposal isn't specific to ELF.

>
> To be honest I don't really see the point of all this complexity you
> guys are proposing just to save fat LTO. Fat LTO is always a bad idea
> because it's slow and  does lots of redundant work. If LTO is to become
> a more wide spread mode it has to go simply because of the poor
> performance.
>
> With slim LTO passthrough is  very straight-forward: simple pass
> through every section that is not LTO and generate code for the LTO
> sections. No new magic sections needed at all.
>

 My proposal works on both fat and slim LTO objects.  The idea is
 you can use "ld -r" on any combination of inputs and its output
 still works as before "ld -r".

>>>
>>> Here is the revised proposal.
>>>
>>
>> The initial implementation of my proposal is available on hjl/lto-mixed
>> branch at
>>
>> http://git.kernel.org/?p=devel/binutils/hjl/x86.git;a=summary
>>
>> Simple case works.  More cleanups are needed.  Feedbacks
>> are welcome.
>>
>
> I checked in patches to remove temporary files.
>
>

More fixes are checked in.  I will try Linux kernel next.


-- 
H.J.


What loop optimizations could increase the code size significantly?

2010-12-10 Thread Fang, Changpeng
Hi,

 I am kooking ways to reduce the code size. What loop optimizations could 
increase the code size significantly?
The optimization I know are: unswitch, vectorization, prefetch and unrolling.
We should not perform these optimizations if the loop just roll a few 
iterations.

In addition, what loop optimizations could generate pre- and/or post loops?
For example, vectorization, unrolling, 

Thanks,

Changpeng


Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org

2010-12-10 Thread Ramana Radhakrishnan
On Wed, 2010-12-08 at 14:42 +0100, Richard Guenther wrote:
> A release candidate for GCC 4.5.2 is available from
> 
>  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5.2-RC-20101208
> 
> and shortly its mirrors.  It has been generated from SVN revision 167585.
> 
> I have so far bootstrapped and tested the release candidate on
> x86_64-linux, bootstraps and tests on
> {i686,ia64,ppc,ppc64,s390,s390x}-linux are running.
> 
> Please test it and report any issues to bugzilla.

I have successfully bootstrapped the release candidate with
arm-linux-gnueabi with the following parameters : 

--with-cpu=cortex-a9 --with-fpu=vfpv3-d16 --with-float=softfp 

Tests are still running. 

Ramana




Tree checking failure in jc1

2010-12-10 Thread Dave Korn

Hi lists,

  I found a couple of new FAILs in my latest libjava testrun:

> FAIL: newarray_overflow -O3 compilation from source
> FAIL: newarray_overflow -O3 -findirect-dispatch compilation from source

  These turn out to be tree checking failures:

> In file included from :3:0:
> newarray_overflow.java:20:0: internal compiler error: tree check: expected 
> class
>  'type', have 'declaration' (function_decl) in put_decl_node, at 
> java/lang.c:405

... happening ...

> /* Append to decl_buf a printable name for NODE.
>Depending on VERBOSITY, more information about NODE
>is printed. Read the comments of decl_printable_name in
>langhooks.h for more.  */
> 
> static void
> put_decl_node (tree node, int verbosity)
> {
>   int was_pointer = 0;
>   if (TREE_CODE (node) == POINTER_TYPE)
> {
>   node = TREE_TYPE (node);
>   was_pointer = 1;
> }
>   if (DECL_P (node) && DECL_NAME (node) != NULL_TREE)
> {
>   if (TREE_CODE (node) == FUNCTION_DECL)
>   {
> if (verbosity == 0 && DECL_NAME (node))
> /* We have been instructed to just print the bare name
>of the function.  */
>   {
> put_decl_node (DECL_NAME (node), 0);
> return;
>   }
> 
> /* We want to print the type the DECL belongs to. We don't do
>that when we handle constructors. */
> if (! DECL_CONSTRUCTOR_P (node)
> && ! DECL_ARTIFICIAL (node) && DECL_CONTEXT (node)
>   /* We want to print qualified DECL names only
>  if verbosity is higher than 1.  */
>   && verbosity >= 1)
>   {
> put_decl_node (TYPE_NAME (DECL_CONTEXT (node)),
>verbosity);
 ... here:^^

  The decl pointed to by 'node' is a function_decl for a builtin:

  chain >
QI
size 
unit size 
align 8 symtab 0 alias set -1 canonical type 0x7fe52ee0
arg-types 
chain >>
pointer_to_this >
addressable public external built-in QI file  line 0 col 0
align 8 built-in BUILT_IN_NORMAL:BUILT_IN_PREFETCH context  chain >

and the DECL_CONTEXT turns out to be another function, one present in the
source of the testcase:

  chain >
QI
size 
unit size 
align 8 symtab 0 alias set -1 canonical type 0x7ff648c0
arg-types >
pointer_to_this >
addressable public decl_2 QI file newarray_overflow.java line 20 col 0
align 8 context  initial 
result 
ignored VOID file newarray_overflow.java line 0 col 0
align 1 context >
struct-function 0x7ff98df8 chain >

... which is why the TYPE_NAME macro complains.

  Is it expected for a builtin to appear as if it were a nested function like
this?  If so, would it make sense to do something like replace this:

  put_decl_node (TYPE_NAME (DECL_CONTEXT (node)),
   verbosity);

with:

  put_decl_node (TREE_CODE (DECL_CONTEXT (node)) == FUNCTION_DECL
   ? DECL_CONTEXT (node)
   : TYPE_NAME (DECL_CONTEXT (node)),
   verbosity);

so we just treat the builtin as another layer of scope?

cheers,
  DaveK



Re: What loop optimizations could increase the code size significantly?

2010-12-10 Thread Gan
Software pipeline (a.k.a, sms) generates prologue and epilogue code.
In addition, loop versioning duplicates loop body, which would also
increase code size. But I guess you don't want to turn on SWP, right?

Gan


On Fri, Dec 10, 2010 at 1:40 PM, Fang, Changpeng  wrote:
> Hi,
>
>  I am kooking ways to reduce the code size. What loop optimizations could 
> increase the code size significantly?
> The optimization I know are: unswitch, vectorization, prefetch and unrolling.
> We should not perform these optimizations if the loop just roll a few 
> iterations.
>
> In addition, what loop optimizations could generate pre- and/or post loops?
> For example, vectorization, unrolling,
>
> Thanks,
>
> Changpeng
>



-- 
Best Regards

Gan


Re: "ld -r" on mixed IR/non-IR objects (

2010-12-10 Thread H.J. Lu
On Fri, Dec 10, 2010 at 7:13 AM, H.J. Lu  wrote:
> On Thu, Dec 9, 2010 at 8:55 PM, H.J. Lu  wrote:
>> On Thu, Dec 9, 2010 at 6:29 PM, H.J. Lu  wrote:
>>> On Wed, Dec 8, 2010 at 9:36 AM, H.J. Lu  wrote:
 On Wed, Dec 8, 2010 at 5:54 AM, H.J. Lu  wrote:
> On Wed, Dec 8, 2010 at 1:19 AM, Andi Kleen  wrote:
>>> On 12/07/2010 04:20 PM, Andi Kleen wrote:

 The only problem left is mixing of lto and non lto objects. this right
 now is not handled. IMHO still the best way to handle it is to use
 slim lto and then simply separate link the "left overs" after deleting
 the LTO objects. This can be actually done with objcopy (with some
 limitations), doesn't even need linker support.

>>>
>>> Quite possibly a better way to deal with that is to provide a mechanism
>>> for encapsulating arbitrary binary code objects inside the LTO IR.
>>
>> Then you would need to teach your assembler and everything
>
> The magic section is generated by linker directly. No changes to
> assembler is required.
>
>> else that may generate ELF objects to generate this magic object. But why
>> not just ELF directly? that is what it is after all.
>
> My proposal isn't specific to ELF.
>
>>
>> To be honest I don't really see the point of all this complexity you
>> guys are proposing just to save fat LTO. Fat LTO is always a bad idea
>> because it's slow and  does lots of redundant work. If LTO is to become
>> a more wide spread mode it has to go simply because of the poor
>> performance.
>>
>> With slim LTO passthrough is  very straight-forward: simple pass
>> through every section that is not LTO and generate code for the LTO
>> sections. No new magic sections needed at all.
>>
>
> My proposal works on both fat and slim LTO objects.  The idea is
> you can use "ld -r" on any combination of inputs and its output
> still works as before "ld -r".
>

 Here is the revised proposal.

>>>
>>> The initial implementation of my proposal is available on hjl/lto-mixed
>>> branch at
>>>
>>> http://git.kernel.org/?p=devel/binutils/hjl/x86.git;a=summary
>>>
>>> Simple case works.  More cleanups are needed.  Feedbacks
>>> are welcome.
>>>
>>
>> I checked in patches to remove temporary files.
>>
>>
>
> More fixes are checked in.  I will try Linux kernel next.
>

I checked in new fixes. "ld -r" works on Linux kernel build.
But the final kernel link failed due to unrelated errors.


-- 
H.J.


Re: "ld -r" on mixed IR/non-IR objects (

2010-12-10 Thread H.J. Lu
On Fri, Dec 10, 2010 at 4:39 PM, H.J. Lu  wrote:
> On Fri, Dec 10, 2010 at 7:13 AM, H.J. Lu  wrote:
>> On Thu, Dec 9, 2010 at 8:55 PM, H.J. Lu  wrote:
>>> On Thu, Dec 9, 2010 at 6:29 PM, H.J. Lu  wrote:
 On Wed, Dec 8, 2010 at 9:36 AM, H.J. Lu  wrote:
> On Wed, Dec 8, 2010 at 5:54 AM, H.J. Lu  wrote:
>> On Wed, Dec 8, 2010 at 1:19 AM, Andi Kleen  wrote:
 On 12/07/2010 04:20 PM, Andi Kleen wrote:
>
> The only problem left is mixing of lto and non lto objects. this right
> now is not handled. IMHO still the best way to handle it is to use
> slim lto and then simply separate link the "left overs" after deleting
> the LTO objects. This can be actually done with objcopy (with some
> limitations), doesn't even need linker support.
>

 Quite possibly a better way to deal with that is to provide a mechanism
 for encapsulating arbitrary binary code objects inside the LTO IR.
>>>
>>> Then you would need to teach your assembler and everything
>>
>> The magic section is generated by linker directly. No changes to
>> assembler is required.
>>
>>> else that may generate ELF objects to generate this magic object. But 
>>> why
>>> not just ELF directly? that is what it is after all.
>>
>> My proposal isn't specific to ELF.
>>
>>>
>>> To be honest I don't really see the point of all this complexity you
>>> guys are proposing just to save fat LTO. Fat LTO is always a bad idea
>>> because it's slow and  does lots of redundant work. If LTO is to become
>>> a more wide spread mode it has to go simply because of the poor
>>> performance.
>>>
>>> With slim LTO passthrough is  very straight-forward: simple pass
>>> through every section that is not LTO and generate code for the LTO
>>> sections. No new magic sections needed at all.
>>>
>>
>> My proposal works on both fat and slim LTO objects.  The idea is
>> you can use "ld -r" on any combination of inputs and its output
>> still works as before "ld -r".
>>
>
> Here is the revised proposal.
>

 The initial implementation of my proposal is available on hjl/lto-mixed
 branch at

 http://git.kernel.org/?p=devel/binutils/hjl/x86.git;a=summary

 Simple case works.  More cleanups are needed.  Feedbacks
 are welcome.

>>>
>>> I checked in patches to remove temporary files.
>>>
>>>
>>
>> More fixes are checked in.  I will try Linux kernel next.
>>
>
> I checked in new fixes. "ld -r" works on Linux kernel build.
> But the final kernel link failed due to unrelated errors.
>

LTO work in BFD linker is done. I will submit a patch in the next
few days, which enables transparent LTO support in BFD linker.
No GCC changes are required.

-- 
H.J.