BImode is treated as normal byte-wide mode and causes bug.
Hi, I am investigating a bug in our target port. It is due to following optimization done by combine pass. (zero_extend:SI (reg:BI 120)) is transformed to (and:SI (subreg:SI (reg:BI 120) 0) (const_int 255 [0xff])) in expand_compound_operation (combine.c), where BImode is just treated as a byte-wide mode. In machmode.def, BImode is defined as FRACTIONAL_INT_MODE (BI, 1, 1). But the precision field is not used at all here. Even after I hacked the code to bypass the transformation. (subreg:QI (zero_extend:SI (reg:BI 120)) 0) is still transformed to (subreg:QI (reg:BI 120) 0)) in simplify_subreg. This is wrong because the higher bits of paradoxical subreg is undefined here, not zeros. Grep GET_MODE_PRECISION returns not many results. It seems that many rtx optimization functions don't consider FRACTIONAL_INT_MODE at all. If that is the case, we should document that limitations (or maybe I missed it). We need zero_extend BImode to model behaviour of moving lowest bit of predicate register into general register. Cheers, Bingfeng
GCC building: Still libquadmath-related failures on bare irons?
Hello, given that the has been quite some libquadmath-related configure work: Are there still build problems due to link tests if one cross-builds for bare-iron targets? Or not? (Cf. PR 46520) If so, I would start to tackle them next. (As work around, one can now use --disable-libquadmath; however, I still would prefer to fix it such that it works out of the box.) Tobias
Re: "ld -r" on mixed IR/non-IR objects (
On Thu, Dec 9, 2010 at 8:55 PM, H.J. Lu wrote: > On Thu, Dec 9, 2010 at 6:29 PM, H.J. Lu wrote: >> On Wed, Dec 8, 2010 at 9:36 AM, H.J. Lu wrote: >>> On Wed, Dec 8, 2010 at 5:54 AM, H.J. Lu wrote: On Wed, Dec 8, 2010 at 1:19 AM, Andi Kleen wrote: >> On 12/07/2010 04:20 PM, Andi Kleen wrote: >>> >>> The only problem left is mixing of lto and non lto objects. this right >>> now is not handled. IMHO still the best way to handle it is to use >>> slim lto and then simply separate link the "left overs" after deleting >>> the LTO objects. This can be actually done with objcopy (with some >>> limitations), doesn't even need linker support. >>> >> >> Quite possibly a better way to deal with that is to provide a mechanism >> for encapsulating arbitrary binary code objects inside the LTO IR. > > Then you would need to teach your assembler and everything The magic section is generated by linker directly. No changes to assembler is required. > else that may generate ELF objects to generate this magic object. But why > not just ELF directly? that is what it is after all. My proposal isn't specific to ELF. > > To be honest I don't really see the point of all this complexity you > guys are proposing just to save fat LTO. Fat LTO is always a bad idea > because it's slow and does lots of redundant work. If LTO is to become > a more wide spread mode it has to go simply because of the poor > performance. > > With slim LTO passthrough is very straight-forward: simple pass > through every section that is not LTO and generate code for the LTO > sections. No new magic sections needed at all. > My proposal works on both fat and slim LTO objects. The idea is you can use "ld -r" on any combination of inputs and its output still works as before "ld -r". >>> >>> Here is the revised proposal. >>> >> >> The initial implementation of my proposal is available on hjl/lto-mixed >> branch at >> >> http://git.kernel.org/?p=devel/binutils/hjl/x86.git;a=summary >> >> Simple case works. More cleanups are needed. Feedbacks >> are welcome. >> > > I checked in patches to remove temporary files. > > More fixes are checked in. I will try Linux kernel next. -- H.J.
What loop optimizations could increase the code size significantly?
Hi, I am kooking ways to reduce the code size. What loop optimizations could increase the code size significantly? The optimization I know are: unswitch, vectorization, prefetch and unrolling. We should not perform these optimizations if the loop just roll a few iterations. In addition, what loop optimizations could generate pre- and/or post loops? For example, vectorization, unrolling, Thanks, Changpeng
Re: GCC 4.5.2 Release Candidate available from gcc.gnu.org
On Wed, 2010-12-08 at 14:42 +0100, Richard Guenther wrote: > A release candidate for GCC 4.5.2 is available from > > ftp://gcc.gnu.org/pub/gcc/snapshots/4.5.2-RC-20101208 > > and shortly its mirrors. It has been generated from SVN revision 167585. > > I have so far bootstrapped and tested the release candidate on > x86_64-linux, bootstraps and tests on > {i686,ia64,ppc,ppc64,s390,s390x}-linux are running. > > Please test it and report any issues to bugzilla. I have successfully bootstrapped the release candidate with arm-linux-gnueabi with the following parameters : --with-cpu=cortex-a9 --with-fpu=vfpv3-d16 --with-float=softfp Tests are still running. Ramana
Tree checking failure in jc1
Hi lists, I found a couple of new FAILs in my latest libjava testrun: > FAIL: newarray_overflow -O3 compilation from source > FAIL: newarray_overflow -O3 -findirect-dispatch compilation from source These turn out to be tree checking failures: > In file included from :3:0: > newarray_overflow.java:20:0: internal compiler error: tree check: expected > class > 'type', have 'declaration' (function_decl) in put_decl_node, at > java/lang.c:405 ... happening ... > /* Append to decl_buf a printable name for NODE. >Depending on VERBOSITY, more information about NODE >is printed. Read the comments of decl_printable_name in >langhooks.h for more. */ > > static void > put_decl_node (tree node, int verbosity) > { > int was_pointer = 0; > if (TREE_CODE (node) == POINTER_TYPE) > { > node = TREE_TYPE (node); > was_pointer = 1; > } > if (DECL_P (node) && DECL_NAME (node) != NULL_TREE) > { > if (TREE_CODE (node) == FUNCTION_DECL) > { > if (verbosity == 0 && DECL_NAME (node)) > /* We have been instructed to just print the bare name >of the function. */ > { > put_decl_node (DECL_NAME (node), 0); > return; > } > > /* We want to print the type the DECL belongs to. We don't do >that when we handle constructors. */ > if (! DECL_CONSTRUCTOR_P (node) > && ! DECL_ARTIFICIAL (node) && DECL_CONTEXT (node) > /* We want to print qualified DECL names only > if verbosity is higher than 1. */ > && verbosity >= 1) > { > put_decl_node (TYPE_NAME (DECL_CONTEXT (node)), >verbosity); ... here:^^ The decl pointed to by 'node' is a function_decl for a builtin: chain > QI size unit size align 8 symtab 0 alias set -1 canonical type 0x7fe52ee0 arg-types chain >> pointer_to_this > addressable public external built-in QI file line 0 col 0 align 8 built-in BUILT_IN_NORMAL:BUILT_IN_PREFETCH context chain > and the DECL_CONTEXT turns out to be another function, one present in the source of the testcase: chain > QI size unit size align 8 symtab 0 alias set -1 canonical type 0x7ff648c0 arg-types > pointer_to_this > addressable public decl_2 QI file newarray_overflow.java line 20 col 0 align 8 context initial result ignored VOID file newarray_overflow.java line 0 col 0 align 1 context > struct-function 0x7ff98df8 chain > ... which is why the TYPE_NAME macro complains. Is it expected for a builtin to appear as if it were a nested function like this? If so, would it make sense to do something like replace this: put_decl_node (TYPE_NAME (DECL_CONTEXT (node)), verbosity); with: put_decl_node (TREE_CODE (DECL_CONTEXT (node)) == FUNCTION_DECL ? DECL_CONTEXT (node) : TYPE_NAME (DECL_CONTEXT (node)), verbosity); so we just treat the builtin as another layer of scope? cheers, DaveK
Re: What loop optimizations could increase the code size significantly?
Software pipeline (a.k.a, sms) generates prologue and epilogue code. In addition, loop versioning duplicates loop body, which would also increase code size. But I guess you don't want to turn on SWP, right? Gan On Fri, Dec 10, 2010 at 1:40 PM, Fang, Changpeng wrote: > Hi, > > I am kooking ways to reduce the code size. What loop optimizations could > increase the code size significantly? > The optimization I know are: unswitch, vectorization, prefetch and unrolling. > We should not perform these optimizations if the loop just roll a few > iterations. > > In addition, what loop optimizations could generate pre- and/or post loops? > For example, vectorization, unrolling, > > Thanks, > > Changpeng > -- Best Regards Gan
Re: "ld -r" on mixed IR/non-IR objects (
On Fri, Dec 10, 2010 at 7:13 AM, H.J. Lu wrote: > On Thu, Dec 9, 2010 at 8:55 PM, H.J. Lu wrote: >> On Thu, Dec 9, 2010 at 6:29 PM, H.J. Lu wrote: >>> On Wed, Dec 8, 2010 at 9:36 AM, H.J. Lu wrote: On Wed, Dec 8, 2010 at 5:54 AM, H.J. Lu wrote: > On Wed, Dec 8, 2010 at 1:19 AM, Andi Kleen wrote: >>> On 12/07/2010 04:20 PM, Andi Kleen wrote: The only problem left is mixing of lto and non lto objects. this right now is not handled. IMHO still the best way to handle it is to use slim lto and then simply separate link the "left overs" after deleting the LTO objects. This can be actually done with objcopy (with some limitations), doesn't even need linker support. >>> >>> Quite possibly a better way to deal with that is to provide a mechanism >>> for encapsulating arbitrary binary code objects inside the LTO IR. >> >> Then you would need to teach your assembler and everything > > The magic section is generated by linker directly. No changes to > assembler is required. > >> else that may generate ELF objects to generate this magic object. But why >> not just ELF directly? that is what it is after all. > > My proposal isn't specific to ELF. > >> >> To be honest I don't really see the point of all this complexity you >> guys are proposing just to save fat LTO. Fat LTO is always a bad idea >> because it's slow and does lots of redundant work. If LTO is to become >> a more wide spread mode it has to go simply because of the poor >> performance. >> >> With slim LTO passthrough is very straight-forward: simple pass >> through every section that is not LTO and generate code for the LTO >> sections. No new magic sections needed at all. >> > > My proposal works on both fat and slim LTO objects. The idea is > you can use "ld -r" on any combination of inputs and its output > still works as before "ld -r". > Here is the revised proposal. >>> >>> The initial implementation of my proposal is available on hjl/lto-mixed >>> branch at >>> >>> http://git.kernel.org/?p=devel/binutils/hjl/x86.git;a=summary >>> >>> Simple case works. More cleanups are needed. Feedbacks >>> are welcome. >>> >> >> I checked in patches to remove temporary files. >> >> > > More fixes are checked in. I will try Linux kernel next. > I checked in new fixes. "ld -r" works on Linux kernel build. But the final kernel link failed due to unrelated errors. -- H.J.
Re: "ld -r" on mixed IR/non-IR objects (
On Fri, Dec 10, 2010 at 4:39 PM, H.J. Lu wrote: > On Fri, Dec 10, 2010 at 7:13 AM, H.J. Lu wrote: >> On Thu, Dec 9, 2010 at 8:55 PM, H.J. Lu wrote: >>> On Thu, Dec 9, 2010 at 6:29 PM, H.J. Lu wrote: On Wed, Dec 8, 2010 at 9:36 AM, H.J. Lu wrote: > On Wed, Dec 8, 2010 at 5:54 AM, H.J. Lu wrote: >> On Wed, Dec 8, 2010 at 1:19 AM, Andi Kleen wrote: On 12/07/2010 04:20 PM, Andi Kleen wrote: > > The only problem left is mixing of lto and non lto objects. this right > now is not handled. IMHO still the best way to handle it is to use > slim lto and then simply separate link the "left overs" after deleting > the LTO objects. This can be actually done with objcopy (with some > limitations), doesn't even need linker support. > Quite possibly a better way to deal with that is to provide a mechanism for encapsulating arbitrary binary code objects inside the LTO IR. >>> >>> Then you would need to teach your assembler and everything >> >> The magic section is generated by linker directly. No changes to >> assembler is required. >> >>> else that may generate ELF objects to generate this magic object. But >>> why >>> not just ELF directly? that is what it is after all. >> >> My proposal isn't specific to ELF. >> >>> >>> To be honest I don't really see the point of all this complexity you >>> guys are proposing just to save fat LTO. Fat LTO is always a bad idea >>> because it's slow and does lots of redundant work. If LTO is to become >>> a more wide spread mode it has to go simply because of the poor >>> performance. >>> >>> With slim LTO passthrough is very straight-forward: simple pass >>> through every section that is not LTO and generate code for the LTO >>> sections. No new magic sections needed at all. >>> >> >> My proposal works on both fat and slim LTO objects. The idea is >> you can use "ld -r" on any combination of inputs and its output >> still works as before "ld -r". >> > > Here is the revised proposal. > The initial implementation of my proposal is available on hjl/lto-mixed branch at http://git.kernel.org/?p=devel/binutils/hjl/x86.git;a=summary Simple case works. More cleanups are needed. Feedbacks are welcome. >>> >>> I checked in patches to remove temporary files. >>> >>> >> >> More fixes are checked in. I will try Linux kernel next. >> > > I checked in new fixes. "ld -r" works on Linux kernel build. > But the final kernel link failed due to unrelated errors. > LTO work in BFD linker is done. I will submit a patch in the next few days, which enables transparent LTO support in BFD linker. No GCC changes are required. -- H.J.