df_insn_refs_record's handling of global_regs[]
I have a bug I'm trying to investigate where, starting in gcc-4.2.x, the loop invariant pass considers a computation involving a global register variable as invariant across a call. The basic structure of the code is: register unsigned long regvar asm ("foo"); func(arg) { for (...) { call(); *arg = expression(regvar); } } The code is built with "-ffixed-foo". Actually the specific case is 64-bit sparc, and the register being used is "%g5" so global_regs[%g5], call_used_regs[%g5] and fixed_regs[%g5] will all be true. loop-invariant.c decides that expression(regvar) is invariant and can be moved outside of the loop, which is an illegal transformation because calls clobber global register variables. When I noticed this I said to myself, "Hmmm, that's peculiar..." I then checked to see what changed in the loop invariant pass between gcc-4.1.x and gcc-4.2.x, and mainly it was changed to use the dataflow layer. So I started looking at how the dataflow layer handles global register variables wrt. calls. When a CALL is encountered, df_insn_refs_record() only marks them as used by going: for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) if (global_regs[i]) df_uses_record (... I thought initially that this coude should be using "df_defs_record (...", but the next few lines take care of that issue by iterating over df_invalidated_by_call, which is initialized with the contents of regs_invalidated_by_call and I validated that the global register in use is included in the invalidation list by making some df_dump() calls in a debugging session. So the dataflow problem should show the loop-invariant pass that the call inside the loop potentially changes 'regvar' and thus expressions involving 'regvar' are not invariant. But for some reason that isn't happening. I'm wondering if there is a bad interaction between how df-scan.c cooks up these fake insn "locations" for things like df_invalidated_by_call. In such cases, DF_REF_LOC will be ®no_reg_rtx[], and in loop-invariant.c it will: 1) Records the DF_REF_LOCs, via record_uses(), in the ->pos member of it's "struct use" data-structure. 2) Later writes directly to *u->pos in move_invariant_reg(). But perhaps we shouldn't even get that far with such fake location expressions. I'm trying to debug this further, but if anyone sees anything obvious or has some suggestions for debugging this, let me know. Thanks.
gcc/doc/md.texi buglet?
md.texi of mainline as of now states at line 4451ff: @cindex @[EMAIL PROTECTED] instruction pattern @item @[EMAIL PROTECTED] Similar to @[EMAIL PROTECTED] but for conditional addition. Conditionally move operand 2 or (operands 2 + operand 3) into operand 0 according to the comparison in operand 1. If the comparison is true, operand 2 is moved into operand 0, otherwise (operand 2 + operand 3) is moved. Now isn't it the other way round? that is if the condition is true, operand 2 + operand 3 is stored, operand 2 otherwise? That's at least how I read the code in ifcvt.c, and that's what my experiments with that pattern resulted in. Tom
Machine dependent Tree optimization?
Hello, I am working on GCC4.2.1 porting to our VLIW processor. Our No. 1 priority is code size. I noticed the following code generation: Source code: if (a == 0x1ff ) c = a + b; return c; After tree copy propagation: foo (a, b, c) { : if (a_2 == 511) goto ; else goto ; :; c_5 = b_4 + 511; # c_1 = PHI ; :; return c_1; } It will generate the following assembly code for our processor tstieqw p0, r0, #0x1ff //Compare r0 with 0x1ff and write result to a predicate p0. addwi r2, r1, #0x1ff//Predicated add sbl [link] : movw r8, r2 In our processor, p0. addwi r2, r1, #0x1ff is a long instruction (64-bit) Ideally, I don't want this copy propagation if the immediate is out of certain range. Then it will generate the following code tstieqw p0, r0, #0x1ff //Compare r0 with 0x1ff and write result to predicate p0. addw r2, r1, r0 //Predicated add (32-bit instruciton) sbl [link] : movw r8, r2 It is going to save us four bytes. Of couse, for processors without long/short instructions, this copy propagation is benefiical for performance by reducing unnecessary dependency. Therefore, whether to apply this copy propagation is machine dependent to some degree. What I do now is to add some check in tree-ssa-copy.c and tree-ssa-dom.c for our target. But this is not very clean. My question is whether there is better way to implement such machine-dependent tree-level optimization (like hooks in RTL level). I believe there are other processors that have the similar problem. What is common solution? Thanks, Bingfeng Mei Broadcom UK
Problem with too many virtual operands ( tree-ssa-operands.c:484)
Hi, In the attached testcase due to an ivopts modification, while rewriting the uses the compiler crashes in tree-ssa-operands.c because the number of virtual operands of the modified stmt is much greater than the thresholds controlled by OP_SIZE_{1,2,3} in tree-ssa-operands.c. I went through http://gcc.gnu.org/ml/gcc-patches/2006-12/msg01269.html and it seemed to me that these values (OP_SIZE etc) have been quite experimentally set. I am wondering If these values should be increased. I did increase these values and the attached testcase compiled fine. I examined the resulting ivopts dump and the modifications seems valid to me. I have attached two dumps ( before ivopts - i.e 105t.cunroll and after ivopts - i.e 107t.ivopts ) . Note that post ivopts dump was generated only after changing OP_SIZE_3 to 700 ( a randomly high value ). For a quick check of the dumps, note that D.1281_6 = Hoopster_ptr_17->Magic; gets changed to D.1281_6 = MEM[index: ivtmp.840_5]; I am wondering If increasing OP_SIZE_{1,2,3} is the way to go. Partly not convinced because it means that the problem could hit again with nastier code. TIA, Pranav testcase-min.i Description: Binary data testcase-min.i.105t.cunroll Description: Binary data testcase-min.i.107t.ivopts Description: Binary data
Re: Machine dependent Tree optimization?
"Bingfeng Mei" <[EMAIL PROTECTED]> writes: > Of couse, for processors without long/short instructions, this copy > propagation is benefiical for performance by reducing unnecessary > dependency. Therefore, whether to apply this copy propagation is machine > dependent to some degree. > > What I do now is to add some check in tree-ssa-copy.c and tree-ssa-dom.c > for our target. But this is not very clean. My question is whether there > is better way to implement such machine-dependent tree-level > optimization (like hooks in RTL level). I believe there are other > processors that have the similar problem. What is common solution? This should normally be done at the RTL level by making long constants more expensive in RTX_COSTS. With luck that will let gcse pick this up. Ian
RE: Machine dependent Tree optimization?
Thanks. Do you mean the TARGET_RTX_COSTS hook? Actually, I already have made the long int more expensive in TARGET_RTX_COSTS function. It does have effect for other optimizations (e.g., combine pass), but doesn't work in the example mentioned in previous example. if (INTVAL(x) >= 0 && INTVAL(x) <= 255) { *total = 1; return true; } *total = 4; return true; Bingfeng Mei -Original Message- From: Ian Lance Taylor [mailto:[EMAIL PROTECTED] Sent: 16 October 2007 15:32 To: Bingfeng Mei Cc: gcc@gcc.gnu.org Subject: Re: Machine dependent Tree optimization? "Bingfeng Mei" <[EMAIL PROTECTED]> writes: > Of couse, for processors without long/short instructions, this copy > propagation is benefiical for performance by reducing unnecessary > dependency. Therefore, whether to apply this copy propagation is machine > dependent to some degree. > > What I do now is to add some check in tree-ssa-copy.c and tree-ssa-dom.c > for our target. But this is not very clean. My question is whether there > is better way to implement such machine-dependent tree-level > optimization (like hooks in RTL level). I believe there are other > processors that have the similar problem. What is common solution? This should normally be done at the RTL level by making long constants more expensive in RTX_COSTS. With luck that will let gcse pick this up. Ian
Library not loaded
Hi all I am having problem starting my application that is successfully built. I am using boost to serialize/deserialize data. I have link boost library and my project is built successfully, but I cannot run it. Running the project (build&go in xcode) I receive this error: dyld: Library not loaded: stage/lib/libboost_serialization-1_34_1.dylib Referenced from: /Users/dtsachov/dev/evdp/temp/TestSerialization/build/Debug/TestSerialization Reason: image not found Note - that is not a problem with boost, that is some problem with linking to the library in runtime. I had the same problem with my 2 projects, one uses another - one is a command line tool and another is a library. If the library is dynamic library I am able to build the project but unable to run it, getting the same error. When I made my library static as library (not dynamic) I can run the application. So I suggest this is some problem of locating the library in runtime. Does anybody have an idea ? Thank you in advance. -- View this message in context: http://www.nabble.com/Library-not-loaded-tf4634681.html#a13235148 Sent from the gcc - Dev mailing list archive at Nabble.com.
double gimplification in C++ FE
Hi Jason. Hi folks. I'm in the process of converting the C++ FE to tuples. In doing so I have noticed that the C++ FE will frequently gimplify bits of a tree, and then expect gimplify_expr() to gimplify the rest. This seems redundant, as gimplify_expr() more often than not will gimplify the entire tree structure, without regard to what parts the C++ FE already gimplified. For example, while gimplifying a TRY_BLOCK in C++, we do: genericize_try_block (tree *stmt_p) { tree body = TRY_STMTS (*stmt_p); tree cleanup = TRY_HANDLERS (*stmt_p); gimplify_stmt (&body);< BOO HISS ... *stmt_p = build2 (TRY_CATCH_EXPR, void_type_node, body, cleanup); } Then, in gimplify.c:gimplify_expr(): case TRY_FINALLY_EXPR: case TRY_CATCH_EXPR: gimplify_to_stmt_list (&TREE_OPERAND (*expr_p, 0)); gimplify_to_stmt_list (&TREE_OPERAND (*expr_p, 1)); ret = GS_ALL_DONE; break; This behavior is common throughout C++. The C++ FE calls gimplify_stmt on bits of trees, but then gimplify_expr() has to gimplify again. It seems to me that a better approach would be to pass the tree structure as generic as we can (without calls to gimplify_stmt in the C++ FE), and then let gimplify_expr do its job. The reason I propose this is because with tuples, gimplify_expr() accepts trees, not tuples. So if we call gimplify_stmt in the C++ FE, we end up with tuples which we cannot be passed to gimplify_expr. So it's better to leave things as generic trees, and let the gimplifier proper do its magic. Am I overlooking something? Can I proceed with this approach? Thanks. Aldy
Re: double gimplification in C++ FE
Aldy Hernandez wrote: I'm in the process of converting the C++ FE to tuples. In doing so I have noticed that the C++ FE will frequently gimplify bits of a tree, and then expect gimplify_expr() to gimplify the rest. This seems redundant, as gimplify_expr() more often than not will gimplify the entire tree structure, without regard to what parts the C++ FE already gimplified. Yes, the gimplifier often makes several passes over the same trees to get them completely lowered. cp_gimplify_expr is a subroutine of the gimplifier. This behavior is common throughout C++. The C++ FE calls gimplify_stmt on bits of trees, but then gimplify_expr() has to gimplify again. It seems to me that a better approach would be to pass the tree structure as generic as we can (without calls to gimplify_stmt in the C++ FE), and then let gimplify_expr do its job. Sure. Another alternative would be to leave the calls to gimplify_stmt (or probably change them to gimplify_to_stmt_list) and return GS_ALL_DONE from cp_gimplify_expr. Jason
Bad unwinder data for __kernel_sigtramp_rt64 in PPC 64 vDSO corrupts Condition Register
The symptom is that if you segfault and then throw an exception in the segfault handler call-saved fields in the Condition Register are corrupted. The reason is that the unwinder data for CR in the vDSO is wrong. The line that affects the CR is here in arch/powerpc/kernel/vdso64/sigtramp.S: rsave (70, 38*RSIZE) /* cr */ This restores a 64-bit register from offset 38 in the sigcontext register save area to DWARF Column 70. This much is correct... Unfortunately, gcc saves and restores only the *least significant* 32-bit half of CR on the stack. As this is a big-endian machine, the result is that when unwinding __kernel_sigtramp_rt64 the correctly saved CR is written to the upper half of the word on the stack, not the lower half, and the saved CR is overwritten with zeroes. It is not immediately clear to me how to fix this: I think you would need to find a DWARF expression that copies a halfword value. Test case attached. Tested on Kernel 2.6.18-8.1.10.el5. Andrew. #include #include #include class SegfaultException { }; void catch_segv (int) { throw new SegfaultException; } void segfault (int *p) { fprintf (stderr, "%n", *p); } int main(int argc, char **argv) { unsigned long cr, cr2; __asm__ __volatile__ ("mtcrf 8, %0" : : "r" (0x2000): "cr4"); struct sigaction sa; sa.sa_handler = catch_segv; sigemptyset (&sa.sa_mask); sa.sa_flags = SA_NODEFER; sigaction (SIGSEGV, &sa, NULL); __asm__ __volatile__ ("mfcr %0" : "=r" (cr)); fprintf (stderr, "cr = 0x%x\n", cr & 0xfff000); try { segfault(NULL); } catch (SegfaultException *a) { } __asm__ __volatile__ ("mfcr %0" : "=r" (cr)); fprintf (stderr, "cr = 0x%x\n", cr & 0xfff000); return 0; }
Re: double gimplification in C++ FE
> Yes, the gimplifier often makes several passes over the same trees to get > them completely lowered. cp_gimplify_expr is a subroutine of the > gimplifier. Good, I just wanted to make sure I wasn't off my rocker or anything. > Sure. Another alternative would be to leave the calls to gimplify_stmt (or > probably change them to gimplify_to_stmt_list) and return GS_ALL_DONE from > cp_gimplify_expr. Yes, in a few places it definitely seems better to completely gimplify the given statement and return GS_ALL_DONE. Will do so when it's easier. Heads up. Thanks. Aldy
Re: Plans for Linux ELF "i686+" ABI ? Like SPARC V8+ ?
On Tue, Oct 16, 2007 at 12:53:13AM +0200, Andi Kleen wrote: > > Actually no. In 32-bit mode, double is aligned on a 4 byte boundary, not > > an 8 > > byte boundary, unless you use -malign-double, which breaks the ABI. This > > has > > been a 'feature' of the original AT&T 386 System V ABI that Linux uses for > > 32-bit x86 processors. With the SCO mess, it may be hard to ever change > > that > > ABI > > My gcc doesn't agree with you (I actually checked before posting) > > ~> cat t.c > > int main(void) > { > double x; > printf("%d\n", __alignof__(x)); > return 0; > } > ~> gcc -m32 -o t t.c > t.c: In function ‘main’: > t.c:5: warning: incompatible implicit declaration of built-in function > ‘printf’ > ~> ./t > 8 > ~> > Doubles that are scalar variables are aligned on a 64-bit boundary, but doubles that are within structures are only aligned to a 32-bit boundary, which comes from the published i386 ABI from System V. Here is the code in question from gcc/config/i386/i386.h: /* The published ABIs say that doubles should be aligned on word boundaries, so lower the alignment for structure fields unless -malign-double is set. */ /* ??? Blah -- this macro is used directly by libobjc. Since it supports no vector modes, cut out the complexity and fall back on BIGGEST_FIELD_ALIGNMENT. */ #ifdef IN_TARGET_LIBS #ifdef __x86_64__ #define BIGGEST_FIELD_ALIGNMENT 128 #else #define BIGGEST_FIELD_ALIGNMENT 32 #endif #else #define ADJUST_FIELD_ALIGN(FIELD, COMPUTED) \ x86_field_alignment (FIELD, COMPUTED) #endif And this is where we recompute the alignment in i386.c: int x86_field_alignment (tree field, int computed) { enum machine_mode mode; tree type = TREE_TYPE (field); if (TARGET_64BIT || TARGET_ALIGN_DOUBLE) return computed; mode = TYPE_MODE (TREE_CODE (type) == ARRAY_TYPE ? get_inner_array_type (type) : type); if (mode == DFmode || mode == DCmode || GET_MODE_CLASS (mode) == MODE_INT || GET_MODE_CLASS (mode) == MODE_COMPLEX_INT) return MIN (32, computed); return computed; } -- Michael Meissner, AMD 90 Central Street, MS 83-29, Boxborough, MA, 01719, USA [EMAIL PROTECTED]
Re: Bad unwinder data for __kernel_sigtramp_rt64 in PPC 64 vDSO corrupts Condition Register
On Tue, Oct 16, 2007 at 06:02:13PM +0100, Andrew Haley wrote: > The reason is that the unwinder data for CR in the vDSO is wrong. The > line that affects the CR is here in According to __builtin_init_dwarf_reg_size_table on ppc64-linux r0..r31, fp0..fp31, mq, lr, ctr, ap, vrsave, vscr, spe_acc, spefcsr, sfp are 64-bit, v0..v31 128-bit and cr0..cr7, xer 32-bit. So both kernel and gcc/config/rs6000/linux-unwind.h are wrong. > arch/powerpc/kernel/vdso64/sigtramp.S: > > rsave (70, 38*RSIZE)/* cr */ This should just be changed to /* Size of CR regs in DWARF unwind info. */ #define CRSIZE 4 ... rsave (70, 38*RSIZE + (RSIZE - CRSIZE)) /* cr */ and similarly linux-unwind.h should do: fs->regs.reg[R_CR2].loc.offset = (long) ®s->ccr - new_cfa; /* CR? regs are just 32-bit and PPC is big-endian. */ fs->regs.reg[R_CR2].loc.offset += sizeof (long) - 4; Jakub
Re: Bad unwinder data for __kernel_sigtramp_rt64 in PPC 64 vDSO corrupts Condition Register
Jakub Jelinek writes: > On Tue, Oct 16, 2007 at 06:02:13PM +0100, Andrew Haley wrote: > > The reason is that the unwinder data for CR in the vDSO is wrong. The > > line that affects the CR is here in > > According to __builtin_init_dwarf_reg_size_table on ppc64-linux > r0..r31, fp0..fp31, mq, lr, ctr, ap, vrsave, vscr, spe_acc, spefcsr, sfp > are 64-bit, v0..v31 128-bit and cr0..cr7, xer 32-bit. > So both kernel and gcc/config/rs6000/linux-unwind.h are wrong. > > > arch/powerpc/kernel/vdso64/sigtramp.S: > > > > rsave (70, 38*RSIZE) /* cr */ > > This should just be changed to > /* Size of CR regs in DWARF unwind info. */ > #define CRSIZE 4 > ... > rsave (70, 38*RSIZE + (RSIZE - CRSIZE)) /* cr */ > > and similarly linux-unwind.h should do: > > fs->regs.reg[R_CR2].loc.offset = (long) ®s->ccr - new_cfa; > /* CR? regs are just 32-bit and PPC is big-endian. */ > fs->regs.reg[R_CR2].loc.offset += sizeof (long) - 4; Won't this generate an alignment fault? Andrew.
Re: Library not loaded
On Tue, 2007-10-16 07:47:51 -0700, Denis Tkachov <[EMAIL PROTECTED]> wrote: > I am having problem starting my application that is successfully built. I am > using boost to serialize/deserialize data. I have link boost library and my > project is built successfully, but I cannot run it. This is most probably not a problem with GCC itself, but with using it (and the linker), so your question would be better handled at <[EMAIL PROTECTED]>. > Running the project (build&go in xcode) I receive this error: > > dyld: Library not loaded: stage/lib/libboost_serialization-1_34_1.dylib > Referenced from: > /Users/dtsachov/dev/evdp/temp/TestSerialization/build/Debug/TestSerialization > Reason: image not found > > Note - that is not a problem with boost, that is some problem with linking > to the library in runtime. I had the same problem with my 2 projects, one > uses another - one is a command line tool and another is a library. If the > library is dynamic library I am able to build the project but unable to run > it, getting the same error. When I made my library static as library (not > dynamic) I can run the application. Some additional hints were somewhat useful, like * Which compiler and linker is used? For which platform? MacOS I guess? * How does the final linker call look like? * How does your dynamic linker configuration look like? Specifically: Is your libboost_serialization-1_34_1 in the library search path? > So I suggest this is some problem of locating the library in runtime. Most probably. > Does anybody have an idea ? You need to tell the dynamic linker where to actually find the libraries. "stage/lib/..." looks like a relative path, but to what location? MfG, JBG -- Jan-Benedict Glaw [EMAIL PROTECTED] +49-172-7608481 Signature of: Friends are relatives you make for yourself. the second : signature.asc Description: Digital signature
Re: Bad unwinder data for __kernel_sigtramp_rt64 in PPC 64 vDSO corrupts Condition Register
On Tue, Oct 16, 2007 at 07:22:31PM +0100, Andrew Haley wrote: > > and similarly linux-unwind.h should do: > > > > fs->regs.reg[R_CR2].loc.offset = (long) ®s->ccr - new_cfa; > > /* CR? regs are just 32-bit and PPC is big-endian. */ > > fs->regs.reg[R_CR2].loc.offset += sizeof (long) - 4; > > Won't this generate an alignment fault? Why? The reg size is 32-bit, so it only should be read/written as 32-bit value. E.g. a brief look at _Unwind_RaiseException shows: lwz 12,5656(1) ... mtcrf 32,12 #, ld 15,5368(1)#, ld 16,5376(1)#, mtcrf 16,12 #, ld 17,5384(1)#, ld 18,5392(1)#, mtcrf 8,12 #, so it shouldn't have any problems with 4 byte alignment (rather than 8 byte alignment). Jakub
Re: Bad unwinder data for __kernel_sigtramp_rt64 in PPC 64 vDSO corrupts Condition Register
Jakub Jelinek writes: > On Tue, Oct 16, 2007 at 07:22:31PM +0100, Andrew Haley wrote: > > > and similarly linux-unwind.h should do: > > > > > > fs->regs.reg[R_CR2].loc.offset = (long) ®s->ccr - new_cfa; > > > /* CR? regs are just 32-bit and PPC is big-endian. */ > > > fs->regs.reg[R_CR2].loc.offset += sizeof (long) - 4; > > > > Won't this generate an alignment fault? > > Why? The reg size is 32-bit, so it only should be read/written > as 32-bit value. > E.g. a brief look at _Unwind_RaiseException shows: > lwz 12,5656(1) > ... > mtcrf 32,12 #, > ld 15,5368(1)#, > ld 16,5376(1)#, > mtcrf 16,12 #, > ld 17,5384(1)#, > ld 18,5392(1)#, > mtcrf 8,12 #, > so it shouldn't have any problems with 4 byte alignment (rather than 8 byte > alignment). Yes, I think it's OK. I was thinking about uw_install_context_1, but that should be fine, as it uses memcpy. Andrew.
How do I upgrade gcc/g++ on a Mac
Hi, We have 2 different Macs, both running 10.4 OS but different builds of the same versions of gcc/g++/gdb. With the older set of tools we see gdb failures debugging some C++ applications (faulty line table information). Debugging the same program on with the later builds, works fine. We've tried the automatic update service on the Mac but it doesn't update either g++ or gdb. Is there a way to get the latest gcc/g++/gdb on a Mac? We're not interested in hearing that we should upgrade to Leopard (we plan to:-) because we're more interested in what we can tell our customers than how anything else. We provide a gdb-based C/C++ debugger in NetBeans 6.0 and this Mac bug makes our debugger fail. If we could tell customers how to upgrade their compilers then that would resolve the issue. Thanks, Gordon
Re: How do I upgrade gcc/g++ on a Mac
Gordon Prieur wrote: We have 2 different Macs, both running 10.4 OS but different builds of the same versions of gcc/g++/gdb. With the older set of tools we see gdb failures debugging some C++ applications (faulty line table information). Debugging the same program on with the later builds, works fine. We've tried the automatic update service on the Mac but it doesn't update either g++ or gdb. Is there a way to get the latest gcc/g++/gdb on a Mac? We're not interested in hearing that we should upgrade to Leopard (we plan to:-) because we're more interested in what we can tell our customers than how anything else. We provide a gdb-based C/C++ debugger in NetBeans 6.0 and this Mac bug makes our debugger fail. If we could tell customers how to upgrade their compilers then that would resolve the issue. This is kind of off-topic on this list, but anyway. I'd update Xcode, which provides the tools you're mentioning. You'll get it from Apple. http://developer.apple.com/tools/download/ Maybe you have to register, don't know. Actual release for 10.4.10 is 2.4.1. Andreas
Re: Library not loaded
Denis Tkachov wrote: Hi all I am having problem starting my application that is successfully built. I am using boost to serialize/deserialize data. I have link boost library and my project is built successfully, but I cannot run it. Running the project (build&go in xcode) I receive this error: dyld: Library not loaded: stage/lib/libboost_serialization-1_34_1.dylib Referenced from: /Users/dtsachov/dev/evdp/temp/TestSerialization/build/Debug/TestSerialization Reason: image not found Note - that is not a problem with boost, that is some problem with linking to the library in runtime. I had the same problem with my 2 projects, one uses another - one is a command line tool and another is a library. If the library is dynamic library I am able to build the project but unable to run it, getting the same error. When I made my library static as library (not dynamic) I can run the application. So I suggest this is some problem of locating the library in runtime. Does anybody have an idea ? You might set your DYLD_LIBRARY_PATH to point to your built lib? with tcsh: 'setenv DYLD_LIBRARY_PATH /path-to-your-lib:$DYLD_LIBRARY_PATH' Given that you already have a DYLD_LIBRARY_PATH, otherwise leave the post $DYLD_LIBRARY_PATH. Andreas
RE: Plans for Linux ELF "i686+" ABI ? Like SPARC V8+ ?
On 15 October 2007 23:53, Andi Kleen wrote: > int main(void) > { > double x; > printf("%d\n", __alignof__(x)); > return 0; > } > ~> gcc -m32 -o t t.c > t.c: In function ‘main’: > t.c:5: warning: incompatible implicit declaration of built-in function > ‘printf’ I find a call to an unprototyped stdargs function in the middle of a discussion about the niceties of ABIs amusingly ironic! :-) cheers, DaveK -- Can't think of a witty .sigline today
Re: Bad unwinder data for __kernel_sigtramp_rt64 in PPC 64 vDSO corrupts Condition Register
On Tue, Oct 16, 2007 at 08:21:55PM +0200, Jakub Jelinek wrote: > On Tue, Oct 16, 2007 at 06:02:13PM +0100, Andrew Haley wrote: > > The reason is that the unwinder data for CR in the vDSO is wrong. The > > line that affects the CR is here in My fault. > According to __builtin_init_dwarf_reg_size_table on ppc64-linux > r0..r31, fp0..fp31, mq, lr, ctr, ap, vrsave, vscr, spe_acc, spefcsr, sfp > are 64-bit, v0..v31 128-bit and cr0..cr7, xer 32-bit. > So both kernel and gcc/config/rs6000/linux-unwind.h are wrong. > > > arch/powerpc/kernel/vdso64/sigtramp.S: > > > > rsave (70, 38*RSIZE) /* cr */ > > This should just be changed to > /* Size of CR regs in DWARF unwind info. */ > #define CRSIZE4 > ... > rsave (70, 38*RSIZE + (RSIZE - CRSIZE)) /* cr */ > > and similarly linux-unwind.h should do: > > fs->regs.reg[R_CR2].loc.offset = (long) ®s->ccr - new_cfa; > /* CR? regs are just 32-bit and PPC is big-endian. */ > fs->regs.reg[R_CR2].loc.offset += sizeof (long) - 4; This looks good to me. I don't think we can change the unwinder to use a different size for cr as that would break unwinding through normal stack frames that save cr. -- Alan Modra Australia Development Lab, IBM
Re: Bad unwinder data for __kernel_sigtramp_rt64 in PPC 64 vDSO corrupts Condition Register
On Wed, 2007-10-17 at 12:58 +0930, Alan Modra wrote: > On Tue, Oct 16, 2007 at 08:21:55PM +0200, Jakub Jelinek wrote: > > On Tue, Oct 16, 2007 at 06:02:13PM +0100, Andrew Haley wrote: > > > The reason is that the unwinder data for CR in the vDSO is wrong. The > > > line that affects the CR is here in > > My fault. > > > According to __builtin_init_dwarf_reg_size_table on ppc64-linux > > r0..r31, fp0..fp31, mq, lr, ctr, ap, vrsave, vscr, spe_acc, spefcsr, sfp > > are 64-bit, v0..v31 128-bit and cr0..cr7, xer 32-bit. > > So both kernel and gcc/config/rs6000/linux-unwind.h are wrong. > > > > > arch/powerpc/kernel/vdso64/sigtramp.S: > > > > > > rsave (70, 38*RSIZE)/* cr */ > > > > This should just be changed to > > /* Size of CR regs in DWARF unwind info. */ > > #define CRSIZE 4 > > ... > > rsave (70, 38*RSIZE + (RSIZE - CRSIZE)) /* cr */ > > > > and similarly linux-unwind.h should do: > > > > fs->regs.reg[R_CR2].loc.offset = (long) ®s->ccr - new_cfa; > > /* CR? regs are just 32-bit and PPC is big-endian. */ > > fs->regs.reg[R_CR2].loc.offset += sizeof (long) - 4; > > This looks good to me. I don't think we can change the unwinder to > use a different size for cr as that would break unwinding through > normal stack frames that save cr. So the kernel fix would look like that right ? If you are ok, I'll submit it tomorrow. Index: linux-work/arch/powerpc/kernel/vdso64/sigtramp.S === --- linux-work.orig/arch/powerpc/kernel/vdso64/sigtramp.S 2007-10-17 13:32:49.0 +1000 +++ linux-work/arch/powerpc/kernel/vdso64/sigtramp.S2007-10-17 13:34:18.0 +1000 @@ -134,13 +134,16 @@ V_FUNCTION_END(__kernel_sigtramp_rt64) 9: /* This is where the pt_regs pointer can be found on the stack. */ -#define PTREGS 128+168+56 +#define PTREGS 128+168+56 /* Size of regs. */ -#define RSIZE 8 +#define RSIZE 8 + +/* Size of CR reg in DWARF unwind info. */ +#define CRSIZE 4 /* This is the offset of the VMX reg pointer. */ -#define VREGS 48*RSIZE+33*8 +#define VREGS 48*RSIZE+33*8 /* Describe where general purpose regs are saved. */ #define EH_FRAME_GEN \ @@ -178,7 +181,7 @@ V_FUNCTION_END(__kernel_sigtramp_rt64) rsave (31, 31*RSIZE); \ rsave (67, 32*RSIZE);/* ap, used as temp for nip */ \ rsave (65, 36*RSIZE);/* lr */ \ - rsave (70, 38*RSIZE) /* cr */ + rsave (70, 38*RSIZE + (RSIZE - CRSIZE)) /* cr */ /* Describe where the FP regs are saved. */ #define EH_FRAME_FP \
Re: df_insn_refs_record's handling of global_regs[]
From: David Miller <[EMAIL PROTECTED]> Date: Tue, 16 Oct 2007 03:12:23 -0700 (PDT) > I have a bug I'm trying to investigate where, starting in gcc-4.2.x, > the loop invariant pass considers a computation involving a global > register variable as invariant across a call. The basic structure > of the code is: Here is the most simplified test case I could come up with, compile it with "-m64 -Os" on sparc. expression(regval) is moved to before the loop by loop-invariant register unsigned long regval asm("g5"); extern void cond_resched(void); unsigned int var; void *expression(unsigned long regval) { void *ret; __asm__("" : "=r" (ret) : "0" (&var)); return ret + regval; } void func(void **pp) { int i; for (i = 0; i < 56; i++) { cond_resched(); *pp = expression(regval); } }
Re: df_insn_refs_record's handling of global_regs[]
On 10/16/07, David Miller <[EMAIL PROTECTED]> wrote: > From: David Miller <[EMAIL PROTECTED]> > Date: Tue, 16 Oct 2007 03:12:23 -0700 (PDT) > > > I have a bug I'm trying to investigate where, starting in gcc-4.2.x, > > the loop invariant pass considers a computation involving a global > > register variable as invariant across a call. The basic structure > > of the code is: > > Here is the most simplified test case I could come up with, > compile it with "-m64 -Os" on sparc. expression(regval) is > moved to before the loop by loop-invariant > > register unsigned long regval asm("g5"); > > extern void cond_resched(void); > > unsigned int var; > > void *expression(unsigned long regval) > { > void *ret; > > __asm__("" : "=r" (ret) : "0" (&var)); > return ret + regval; > } > > void func(void **pp) > { > int i; > > for (i = 0; i < 56; i++) { > cond_resched(); > *pp = expression(regval); > } > } loop-invariant.cc uses ud-chain. So if there's something wrong with the chain, it could go nuts. Can you send me the rtl dump of loop2_invariant pass ? -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: df_insn_refs_record's handling of global_regs[]
From: "Seongbae Park (박성배, 朴成培)" <[EMAIL PROTECTED]> Date: Tue, 16 Oct 2007 21:53:37 -0700 Annyoung haseyo, Park-sanseng-nim, > loop-invariant.cc uses ud-chain. > So if there's something wrong with the chain, > it could go nuts. > Can you send me the rtl dump of loop2_invariant pass ? I have found the problem, and yes it has to do with the ud chains. Because global registers are only marked via df_invalidated_by_call, they get the DF_REF_MAY_CLOBBER flag. This flag causes the dataflow problem solver to not add the global register definitions to the generator set. Specifically I am talking about the code in df_rd_bb_local_compute_process_def(), it says: if (!(DF_REF_FLAGS (def) & (DF_REF_MUST_CLOBBER | DF_REF_MAY_CLOBBER))) bitmap_set_bit (bb_info->gen, DF_REF_ID (def)); Global registers don't get clobbered by calls, they are potentially set as a side effect of calling them. And they are set to valid values we might actually depend upon as inputs later. I tried a potential fix, which is to change df_insn_refs_record(), such that it handles global registers instead like this: if (global_regs[i]) df_ref_record (dflow, regno_reg_rtx[i], ®no_reg_rtx[i], bb, insn, DF_REF_REG_DEF, 0, true); and this made the illegal loop-invariant transformation no longer occur in my test case.
Re: df_insn_refs_record's handling of global_regs[]
On 10/16/07, David Miller <[EMAIL PROTECTED]> wrote: > From: "Seongbae Park (박성배, 朴成培)" <[EMAIL PROTECTED]> > Date: Tue, 16 Oct 2007 21:53:37 -0700 > > Annyoung haseyo, Park-sanseng-nim, :) > > loop-invariant.cc uses ud-chain. > > So if there's something wrong with the chain, > > it could go nuts. > > Can you send me the rtl dump of loop2_invariant pass ? > > I have found the problem, and yes it has to do with the ud chains. > > Because global registers are only marked via df_invalidated_by_call, > they get the DF_REF_MAY_CLOBBER flag. > > This flag causes the dataflow problem solver to not add the global > register definitions to the generator set. Specifically I am > talking about the code in df_rd_bb_local_compute_process_def(), it > says: > > if (!(DF_REF_FLAGS (def) > & (DF_REF_MUST_CLOBBER | DF_REF_MAY_CLOBBER))) > bitmap_set_bit (bb_info->gen, DF_REF_ID (def)); > > Global registers don't get clobbered by calls, they are potentially > set as a side effect of calling them. And they are set to valid > values we might actually depend upon as inputs later. > > I tried a potential fix, which is to change df_insn_refs_record(), > such that it handles global registers instead like this: > > if (global_regs[i]) > df_ref_record (dflow, regno_reg_rtx[i], ®no_reg_rtx[i], > bb, insn, DF_REF_REG_DEF, 0, true); > > and this made the illegal loop-invariant transformation no longer > occur in my test case. Did you replace the DF_REF_REG_USE with DEF ? If so, that's not correct. We need to add DEF as well as USE: diff -r fd0f94fbe89d gcc/df-scan.c --- a/gcc/df-scan.c Wed Oct 10 03:32:43 2007 + +++ b/gcc/df-scan.c Tue Oct 16 22:52:44 2007 -0700 @@ -3109,8 +3109,13 @@ df_get_call_refs (struct df_collection_r so they are recorded as used. */ for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) if (global_regs[i]) - df_ref_record (collection_rec, regno_reg_rtx[i], -NULL, bb, insn, DF_REF_REG_USE, flags); + { +df_ref_record (collection_rec, regno_reg_rtx[i], + NULL, bb, insn, DF_REF_REG_USE, flags); +df_ref_record (collection_rec, regno_reg_rtx[i], + NULL, bb, insn, DF_REF_REG_DEF, flags); + } + is_sibling_call = SIBLING_CALL_P (insn); EXECUTE_IF_SET_IN_BITMAP (df_invalidated_by_call, 0, ui, bi) Then, we'll need to change the df_invalidated_by_call loop not to add global_regs[] again (with MAY_CLOBBER bits). -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
There is a gentleman that.
It should also make it much easier for broadband suppliers to negotiate a. The UK has come in for criticism for its slow rollout of broadband and for setting.
Re: df_insn_refs_record's handling of global_regs[]
From: "Seongbae Park (박성배, 朴成培)" <[EMAIL PROTECTED]> Date: Tue, 16 Oct 2007 22:56:49 -0700 > We need to add DEF as well as USE: > > diff -r fd0f94fbe89d gcc/df-scan.c > --- a/gcc/df-scan.c Wed Oct 10 03:32:43 2007 + > +++ b/gcc/df-scan.c Tue Oct 16 22:52:44 2007 -0700 > @@ -3109,8 +3109,13 @@ df_get_call_refs (struct df_collection_r > so they are recorded as used. */ >for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) > if (global_regs[i]) > - df_ref_record (collection_rec, regno_reg_rtx[i], > -NULL, bb, insn, DF_REF_REG_USE, flags); > + { > +df_ref_record (collection_rec, regno_reg_rtx[i], > + NULL, bb, insn, DF_REF_REG_USE, flags); > +df_ref_record (collection_rec, regno_reg_rtx[i], > + NULL, bb, insn, DF_REF_REG_DEF, flags); > + } > + > >is_sibling_call = SIBLING_CALL_P (insn); >EXECUTE_IF_SET_IN_BITMAP (df_invalidated_by_call, 0, ui, bi) > > > Then, we'll need to change the df_invalidated_by_call loop > not to add global_regs[] again (with MAY_CLOBBER bits). Indeed. I will do some regression testing of the following patch against gcc-4.2.x: --- ./gcc/df-scan.c.ORIG2007-10-16 02:07:46.0 -0700 +++ ./gcc/df-scan.c 2007-10-16 23:00:32.0 -0700 @@ -1584,12 +1584,19 @@ df_insn_refs_record (struct dataflow *df so they are recorded as used. */ for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) if (global_regs[i]) - df_uses_record (dflow, ®no_reg_rtx[i], - DF_REF_REG_USE, bb, insn, - 0); + { + df_uses_record (dflow, ®no_reg_rtx[i], + DF_REF_REG_USE, bb, insn, 0); + df_ref_record (dflow, regno_reg_rtx[i], ®no_reg_rtx[i], + bb, insn, DF_REF_REG_DEF, 0, true); + } + EXECUTE_IF_SET_IN_BITMAP (df_invalidated_by_call, 0, ui, bi) - df_ref_record (dflow, regno_reg_rtx[ui], ®no_reg_rtx[ui], bb, - insn, DF_REF_REG_DEF, DF_REF_MAY_CLOBBER, false); + { + if (!global_regs[ui]) + df_ref_record (dflow, regno_reg_rtx[ui], ®no_reg_rtx[ui], bb, + insn, DF_REF_REG_DEF, DF_REF_MAY_CLOBBER, false); + } } }
Re: Plans for Linux ELF "i686+" ABI ? Like SPARC V8+ ?
On 10/14/07, Darryl L. Miles <[EMAIL PROTECTED]> wrote: > > Hello, > > On SPARC there is an ABI that is V8+ which allows the linking (and > mixing) of V8 ABI but makes uses of features of 64bit UltraSparc CPUs > (that were not available in the older 32bit only CPUs). Admittedly > looking at the way this works it could be said that Sun had a certain > about of forward thinking when they developed their 32bit ABI (this is > not true of the 32bit Intel IA32 ABIs that exist). Sun didn't have much forward thinking with their 32-bit ABI. It's just that their 32-bit ISA was relatively amenable to 64-bit extension with 32-bit ABI compatibility, which can not be said for IA32. > Are there any plans for a plan a new Intel IA32 ABI that is designed > solely to run on 64bit capable Intel IA32 CPUs (that support EMT64) ? > Userspace would have 32bit memory addressing, but access to more > registers, better function call conventions, etc... > > This would be aimed to replace the existing i386/i686 ABIs on the > desktop and would not be looking to maintain backward compatibility > directly. > > My own anecdotal evidence is that I've been using a x86_64 distribution > (with dual 64bit and 32bit runtime support) for a few years now and have > found performance to be lacking in my two largest footprint applications > (my browser and my development IDE totaling 5Gb of footprint between > them). I recently converted both these applications from their 64bit > versions to 32bit (they are the only 32bit applications running) and the > overall interactive performance has improved considerably possibly due > to the reduced memory footprint alone, a 4.5 Gb footprint 64bit > application is equivalent to a 2 Gb footprint 32bit application in these > application domains. > > Maybe someone knows of a white paper published to find out if the > implications and benefit a movement in this direction would mean. Maybe > just using the existing 64bit ABI with 32bit void pointers (and long's) > is as good a specification as any. > > RFCs, > > Darryl More appropriate comparison is probably against MIPS, with their two 32-bit ABIs (O32 and N32 ABI). Essentially you're asking for N32 equivalent. My bet is that most people simply don't care enough about the performance differential. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";