Re: Configure test hangs on powerpc64-linux
On Sat, Nov 25, 2006 at 09:36:00PM -0500, Daniel Jacobowitz wrote: > It never returns from ppc64_elf_gc_mark_hook (spins looking things up > in a hash table, I don't have the matching source handy). I expect > this is fixed in later binutils 2006-01-17 > Is there some way we can avoid an infinite loop in configure for this > case? Not without making _start look like a proper ppc64 function descriptor, ie. defined in .opd. But then the test would pass, and you want it to fail on 2.16.1 otherwise you'll get the hang later when building libstdc++. Ideally configure ought to be able to test for a hang in ld. If that's too hard, then I suppose you could fail this test for powerpc64 based on ld version. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: Implicit altivec vs. linux kernel build
On Mon, Feb 28, 2005 at 10:00:42AM +1100, Benjamin Herrenschmidt wrote: > Oh, and there are gcc version that will refuse -mcpu=power4 -maltivec so > I can't even use -mcpu=power4 for the whole kernel and -maltivec just > for the file containing the raid6 code I guess what you mean here it that the -maltivec option is ignored when gcc is given "-mcpu=power4 -maltivec". That's true. It's a result of handling -mcpu in OVERRIDE_OPTIONS. This means the -mcpu option is processed after other options, so you effectively have your command line permuted. A long time ago I made a fix for a similar problem with -msoft-float, and noted that it should be extended to other options. See http://gcc.gnu.org/ml/gcc-patches/2004-03/msg00688.html That is still the best bandaid fix, IMO. The proper fix is to rewrite option processing to only use the override hook for things that really do need to override other options. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: PowerPC 64 x 32 bits performance
On Fri, Mar 04, 2005 at 04:59:25PM -0600, Edmar Wienskoski wrote: > and notice a considerable number of load instructions in the 64 bits one. > > Does anyone have an insight on why this is happening ? These are most likey 64-bit address constant loads. On ppc32, a 32-bit address constant can be calculated in at most 2 instructions. A 64-bit address takes up to 5 instructions to calculate in-line, or a TOC memory load. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: GCC 4.1: Buildable on GHz machines only?
On Wed, May 04, 2005 at 09:29:44AM -0700, Joe Buck wrote: > So the basic issue is this: why the hell does the linker need so much > memory? - long symbol names and lots of symbols - lots of sections - optimizations that edit section contents, requiring the contents to be kept in memory. eg. string merging, relaxation. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: What is wrong with Bugzilla? [Was: Re: GCC and Floating-Point]
On Tue, May 31, 2005 at 09:53:05PM +0200, Vincent Lefevre wrote: > On 2005-05-31 11:39:39 -0700, Mike Stump wrote: > > On May 31, 2005, at 10:25 AM, Vincent Lefevre wrote: > > >Well, there is no extended precision with GCC under Linux/PPC. > > > > Hum, I do wonder about even that; why do: > > > > 2004-02-07 Alan Modra <[EMAIL PROTECTED]> > > > > * config/rs6000/t-linux64 (LIB2FUNCS_EXTRA): Add darwin- > > ldouble.c. > > > > powerpc64-*-linux*) > > Hmm... this is powerpc64. Yes. powerpc64-linux uses IBM extended precision long doubles. > Under the 32-bit version, there's no extended precision. No. powerpc-linux has 128-bit IEEE soft-float long double. $ cat > fadd.c <<\EOF long double fadd (long double a, long double b) { return a + b; } EOF $ gcc -m32 -mlong-double-128 -c fadd.c $ nm fadd.o T fadd U _q_add Now all you need is a library that supplies _q_add and similar. Let's see, glibc is a likely place.. ./sysdeps/powerpc/soft-fp/Makefile:powerpc-quad-routines := q_add q_cmp q_cmpe q_div q_dtoq q_feq q_fge \ ./sysdeps/powerpc/soft-fp/Versions:_q_add; _q_cmp; _q_cmpe; _q_div; _q_dtoq; _q_feq; _q_fge; _q_fgt; ./sysdeps/powerpc/soft-fp/q_add.c:long double _q_add(const long double a, const long double b) Then of course, you need to convince glibc to build them for you. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: RFH: libgcc_s.so being unnecessarily linked for mipsel-linux crosscompiler...
On Thu, Jul 28, 2005 at 08:36:11PM -0700, David Daney wrote: > I can detect this special case in _bfd_mips_elf_add_symbol_hook() and > cause it to be ignored, thus solving the problem. > > Does this seem like a reasonable course of action? Yes. That does cost a strcmp on every symbol in every input file though. It's somewhat better to do strcmp once on every global symbol in adjust_dynamic_symbol. See elf32-ppc.c handling of _SDA_BASE_. Even better would be a hash lookup in something run before adjust_dynamic_symbol, say, always_size_sections. I didn't think of that when I added the ppc code. > I am not sure how that dynamic symbol got into the shared objects in the > first place. I suppose if the proper solution was to not put it there > in the first place, I could fix that It would be good if new shared objects don't have this bug. Forcing the sym local should cure it. > and rebuild the world. But that > would be much more work. > > David Daney. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: [PATCH]: Proof-of-concept for dynamic format checking
On Wed, Aug 17, 2005 at 11:07:42PM -0400, Kaveh R. Ghazi wrote: > Yeah, BFD can only do that because it forces the %A %B specifiers be > in the front. No, it's worse than that. %A and %B can appear anywhere in the format string, but consume their args first. eg. _bfd_default_error_handler ("section %d is called %A", sec, 1); -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: [PATCH]: Proof-of-concept for dynamic format checking
On Thu, Aug 18, 2005 at 08:46:04AM -0400, Kaveh R. Ghazi wrote: > I don't know how wedded to this style the bfd folks are Not at all. In fact I don't like it, even though I wrote the code. It would be great if _bfd_default_error_handler used the natural arg positions for %A and %B. I couldn't think of a way to do that without incorporating a whole lot of knowledge about printf into the bfd function. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: [PATCH]: Proof-of-concept for dynamic format checking
On Thu, Aug 18, 2005 at 10:35:22AM -0400, Kaveh R. Ghazi wrote: > > > I don't know how wedded to this style the bfd folks are > > > > Not at all. In fact I don't like it, even though I wrote the code. > > It would be great if _bfd_default_error_handler used the natural arg > > positions for %A and %B. I couldn't think of a way to do that without > > incorporating a whole lot of knowledge about printf into the bfd > > function. > > Right, in GCC we ended up doing that except we only implemented the > bits of printf commonly used. So for example we don't implement all > of the specifiers (floating point) or modifers (%h) or flags. In fact > the fortran front end has a format that only has %d %i %c and %s from > printf, (plus two custom specifiers.) No flags or even length > modifiers! > > It's likely that bfd doesn't use a big chunk of printf that you could > leave out as well. (I haven't actually audited bfd though). $ sed -n -e 's,[^%]*\(%[0-9\.# hlL+-]*.\)[^%]*,\1,gp' < bfd/po/bfd.pot | sed -e 's,%,\ %,g' | sort | uniq % %" %-7ld %.2x %.8lx %02X %02x %04lx %08lx %08x %4d %4lx %4x %A %B %X %d %i %ld %lu %lx %p %s %u %x (The '%"' is line wrapping in bfd.pot. I may have missed a few format specifiers because of that. And the '%' is really '%%'.) > Another option is to require positional specifiers for out of order > arguments. E.g. Ick. > So I favor rewriting _bfd_default_error_handler to do the safer thing > which is to use natural arg positions. Then create a format check > with only the stuff you need, not the whole printf style. I'm not motivated to do that myself. :) There aren't that many places that don't have %A or %B first in the format string. $ sed -n -e 's,[^%]*\(%[0-9\.# hlL+-]*.\)[^%]*,\1,gp' < bfd/po/bfd.pot | grep '[^%AB]%[AB]' %B%lx%A %B%d%B%d %s%d%d%B %B%d%B%" %B%d%B%d %B%x%A %B%s%s%A %s%B %s%B %s%B %s%B %%%d%s%B%s%B %s%B%s%B %s%s%B%B %s%B%A%B %s%B%B %s%B%A%B %s%B%B%A %B%lx%lx%lx%A %u%s%B%u%B %s%lu%B%lu%B %B%s%s%B %B%s%A %X%s%A%B%A %B%lx%A %B%s%s%lx%A%lx It's a great pity that vfprintf doesn't return its va_list arg. If it did, you could chop the format string into pieces and have vprintf process the normal parts, consuming args as it goes. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: [mainline/tree-profiling] CFG transparend finish_eh_generation
Jan, the following change of yours is responsible for PR21460. Can you remember why you wanted to look for NOTE_INSN_BASIC_BLOCK? I propose using this instead: for (fn_begin = get_insns (); ; fn_begin = NEXT_INSN (fn_begin)) if (NOTE_P (fn_begin) && NOTE_LINE_NUMBER (fn_begin) == NOTE_INSN_FUNCTION_BEG) break; insert_insn_on_edge (seq, single_succ_edge (BLOCK_FOR_INSN (fn_begin))); On Mon, Feb 23, 2004 at 12:00:41AM +0100, Jan Hubicka wrote: > 2004-02-22 Jan Hubicka <[EMAIL PROTECTED]> > * basic-block.h (make_eh_edge, break_superblocks): Declare. > * cfgbuild.c (make_eh_edge): Make global. > * cfglayout.c (break_superblocks): Likewise; fix memory leak. > * except.c (build_post_landing_pads, connect_post_landing_pads, > dw2_build_landing_pads, sjlj_emit_function_enter, > sjlj_emit_function_exit, sjlj_emit_dispatch_table, > sjlj_build_landing_pads): Update CFG. > (finish_eh_generation): Do not rebuild the CFG. [snip] > *** sjlj_emit_function_enter (rtx dispatch_l > *** 2107,2115 > > for (fn_begin = get_insns (); ; fn_begin = NEXT_INSN (fn_begin)) > if (GET_CODE (fn_begin) == NOTE > ! && NOTE_LINE_NUMBER (fn_begin) == NOTE_INSN_FUNCTION_BEG) > break; > ! emit_insn_after (seq, fn_begin); > } > > /* Call back from expand_function_end to know where we should put > --- 2146,2164 > > for (fn_begin = get_insns (); ; fn_begin = NEXT_INSN (fn_begin)) > if (GET_CODE (fn_begin) == NOTE > ! && (NOTE_LINE_NUMBER (fn_begin) == NOTE_INSN_FUNCTION_BEG > ! || NOTE_LINE_NUMBER (fn_begin) == NOTE_INSN_BASIC_BLOCK)) > break; > ! if (NOTE_LINE_NUMBER (fn_begin) == NOTE_INSN_FUNCTION_BEG) > ! insert_insn_on_edge (seq, ENTRY_BLOCK_PTR->succ); > ! else > ! { > ! for (; ; fn_begin = NEXT_INSN (fn_begin)) > ! if (GET_CODE (fn_begin) == NOTE > ! && NOTE_LINE_NUMBER (fn_begin) == NOTE_INSN_FUNCTION_BEG) > ! break; > ! emit_insn_after (seq, fn_begin); > ! } > } > > /* Call back from expand_function_end to know where we should put -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: [mainline/tree-profiling] CFG transparend finish_eh_generation
On Fri, Sep 02, 2005 at 06:28:08PM +0930, Alan Modra wrote: > I propose using this instead: > > for (fn_begin = get_insns (); ; fn_begin = NEXT_INSN (fn_begin)) > if (NOTE_P (fn_begin) > && NOTE_LINE_NUMBER (fn_begin) == NOTE_INSN_FUNCTION_BEG) > break; > > insert_insn_on_edge (seq, single_succ_edge (BLOCK_FOR_INSN (fn_begin))); Oops, no, that doesn't work. I'll need to dig a bit to see what's happening. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: var_args for rs6000 backend
On Wed, Sep 07, 2005 at 11:14:28AM +0800, Yao Qi qi wrote: > I just have to concentrate on ABI_V4 if I work on gcc develoment on > powerpc-linux, am I right ? Yes, and take care not to break code for the other ABIs. :-) Incidentally, powerpc64-linux is ABI_AIX. -- Alan Modra IBM OzLabs - Linux Technology Centre
[PowerPC] PR23774 stack backchain broken saga
perand" "r")) + (trap_if (const_int 2) (const_int 0))] + "" + "mr %0,%1") + (define_expand "restore_stack_block" - [(use (match_operand 0 "register_operand" "")) - (set (match_dup 2) (match_dup 3)) - (set (match_dup 0) (match_operand 1 "register_operand" "")) - (set (match_dup 3) (match_dup 2))] + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 4) (match_dup 2)) + (parallel [(set (match_operand 0 "register_operand" "") + (match_operand 1 "register_operand" "")) + (trap_if (const_int 2) (const_int 0))])] "" " { operands[2] = gen_reg_rtx (Pmode); - operands[3] = gen_rtx_MEM (Pmode, operands[0]); + operands[3] = gen_frame_mem (Pmode, operands[0]); + /* We don't want the backchain write to be recognized as non-trapping, + so don't use gen_frame_mem here. */ + operands[4] = gen_rtx_MEM (Pmode, operands[1]); }") (define_expand "save_stack_nonlocal" - [(match_operand 0 "memory_operand" "") - (match_operand 1 "register_operand" "")] + [(set (match_dup 3) (match_dup 4)) + (set (match_operand 0 "memory_operand" "") (match_dup 3)) + (set (match_dup 2) (match_operand 1 "register_operand" ""))] "" " { - rtx temp = gen_reg_rtx (Pmode); int units_per_word = (TARGET_32BIT) ? 4 : 8; /* Copy the backchain to the first word, sp to the second. */ - emit_move_insn (temp, gen_rtx_MEM (Pmode, operands[1])); - emit_move_insn (adjust_address_nv (operands[0], Pmode, 0), temp); - emit_move_insn (adjust_address_nv (operands[0], Pmode, units_per_word), - operands[1]); - DONE; + operands[0] = adjust_address_nv (operands[0], Pmode, 0); + operands[2] = adjust_address_nv (operands[0], Pmode, units_per_word); + operands[3] = gen_reg_rtx (Pmode); + operands[4] = gen_frame_mem (Pmode, operands[1]); }") (define_expand "restore_stack_nonlocal" - [(match_operand 0 "register_operand" "") - (match_operand 1 "memory_operand" "")] + [(set (match_dup 2) (match_operand 1 "memory_operand" "")) + (set (match_dup 3) (match_dup 4)) + (set (match_dup 5) (match_dup 2)) + (parallel [(set (match_operand 0 "register_operand" "") + (match_dup 3)) + (trap_if (const_int 2) (const_int 0))])] "" " { - rtx temp = gen_reg_rtx (Pmode); - int units_per_word = (TARGET_32BIT) ? 4 : 8; + int units_per_word = TARGET_32BIT ? 4 : 8; - /* Restore the backchain from the first word, sp from the second. */ - emit_move_insn (temp, - adjust_address_nv (operands[1], Pmode, 0)); - emit_move_insn (operands[0], - adjust_address_nv (operands[1], Pmode, units_per_word)); - emit_move_insn (gen_rtx_MEM (Pmode, operands[0]), temp); - DONE; + operands[2] = gen_reg_rtx (Pmode); + operands[3] = gen_reg_rtx (Pmode); + operands[1] = adjust_address_nv (operands[1], Pmode, 0); + operands[4] = adjust_address_nv (operands[1], Pmode, units_per_word); + /* We don't want the backchain write to be recognized as non-trapping, + so don't use gen_frame_mem here. */ + operands[5] = gen_rtx_MEM (Pmode, operands[3]); }") ;; TOC register handling. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: [PowerPC] PR23774 stack backchain broken saga
On Fri, Sep 09, 2005 at 05:03:48PM -0700, Richard Henderson wrote: > On Sat, Sep 10, 2005 at 01:00:04AM +0930, Alan Modra wrote: > > 2) Next, I defined parallels to keep things together. Like the > > following, with another for DImode. > > This seems most reasonable to me. > > > This works, but doesn't give ideal power4/5 insn grouping, with (I > > think) one too many nops being emitted. > > Who cares? This is after a longjmp isn't it? Also stack deallocation when finished with alloca memory. For some reason 4.0/4.1 doesn't combine this deallocation with stack adjustment in the epilogue, a regression from 3.4. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: [PowerPC] PR23774 stack backchain broken saga
On Tue, Sep 13, 2005 at 11:28:07AM +0200, Segher Boessenkool wrote: > Especially as the ABI states that the write of the backlink > and the stack pointer update _have_ to be done in one insn. That's on allocation. Deallocation isn't so critical. You just need to ensure the backchain is written before updating sp. Hmm, on powerpc64-linux you could even alloc up to 288 bytes without an atomic update, since 288 bytes below the current sp is available for use. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: overcoming info build failures
On Thu, Nov 24, 2005 at 11:56:32AM +1100, Ben Elliston wrote: > I tracked this build problem of mine down. I expect others will > experience it, too, hence this posting. If you're building from > clean, you won't have this problem. > > Mark Mitchell's @file documentation change adds a @set directive to > gcc-vers.texi in the build directory, but that file only depends on > DEV-PHASE and BASE-VER, so it will never be correctly rebuilt using > the new make rule. Just deleting it will remedy the problem. Oh, yes, there is a similar problem when building binutils. I found I needed to delete ${srcdir}/gas/doc/asconfig.texi and ${srcdir}/ld/configdoc.texi. "make clean" doesn't delete these files for you. -- Alan Modra IBM OzLabs - Linux Technology Centre
LEGITIMIZE_RELOAD_ADDRESS vs address_reloaded
Hi Ulrich, In http://gcc.gnu.org/ml/gcc-patches/2004-07/msg01557.html, you changed the return value of find_reloads_address to be tristate, in the process modifying the meaning of a win from LEGITIMIZE_RELOAD_ADDRESS. Prior to your change, a win meant that LEGITIMIZE_RELOAD_ADDRESS had guaranteed that the address would match one of the extra memory constraints if it didn't match some other constraint. After this change, a win meant that the address as a whole might need further reloads. This has caused me a little trouble, because I can't find a way of telling reload to leave the address alone. Perhaps I ought to explain the problem I'm trying to solve. In pr24997, we see reload trying to fix an altivec address that looks like ((rb+ri)+const)&-16 When this whole expression is put into a register, we get an ICE because no ppc insn matches a complex expression like this. I figured I could help reload a little by stripping off the AND and returning (rb+const)+ri from LEGITIMIZE_RELOAD_ADDRESS, requesting a reload into a base reg for (rb+const). (*) After this had been reloaded, the address would be rb2+ri, which is a valid indexed address. This works in so far as the ICE is cured and GCC generates valid code. However, we don't use an indexed address. Instead, we get an indirect address due to find_reloads not matching constraints for the insn, and further reloading rb2+ri into another base reg. Now, I don't really fault your change to the constraint matching because it certainly seemed fragile before, and there is no way to distinguish between alternates. I don't advocate changing things back the way they were, because targets may have changed their LEGITIMIZE_RELOAD_ADDRESS according to the new semantics. What I'd like from you or other reload experts is an indication of the right way to fix this problem. ;-) I can see the following options: a) Before matching constraints in find_reloads, substitute dummy regs for any reloads that have been identified. I'm not sure how much work is involved in doing this, or whether it is even possible. It sounds like this would be the best solution technically, as then the output of LEGITIMIZE_RELOAD_ADDRESS is properly checked. b) Modify LEGITIMIZE_RELOAD_ADDRESS to return a constraint letter that the address is guaranteed to match after reloading. A bit of mechanical work changing all targets. c) Modify the ppc 'Z' constraint to match the indexed address reload generates. This would rely on the pattern we generate in LEGITIMIZE_RELOAD_ADDRESS never being generated elsewhere. d) Hacks like the patch below, that effectively perform the reload substitution with a dummy reg. I fear this isn't proper, even though it seems to work.. (*) This is exactly what code in find_reloads_address does on encoutering invalid indexed address. The trouble is that its transformation isn't valid until the reloads are done, and we check constraints before doing the substitutions. :-( -- Alan Modra IBM OzLabs - Linux Technology Centre Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 107416) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -3354,19 +3360,68 @@ rs6000_legitimize_reload_address (rtx x, /* Reload an offset address wrapped by an AND that represents the masking of the lower bits. Strip the outer AND and let reload - convert the offset address into an indirect address. */ + convert the offset address into an indirect address. Do the + same for indexed addresses with an offset. */ if (TARGET_ALTIVEC && ALTIVEC_VECTOR_MODE (mode) && GET_CODE (x) == AND - && GET_CODE (XEXP (x, 0)) == PLUS - && GET_CODE (XEXP (XEXP (x, 0), 0)) == REG - && GET_CODE (XEXP (XEXP (x, 0), 1)) == CONST_INT && GET_CODE (XEXP (x, 1)) == CONST_INT && INTVAL (XEXP (x, 1)) == -16) { - x = XEXP (x, 0); - *win = 1; - return x; + rtx ad = XEXP (x, 0); + if (GET_CODE (ad) == PLUS + && GET_CODE (XEXP (ad, 1)) == CONST_INT) + { + if (GET_CODE (XEXP (ad, 0)) == REG) + { + x = ad; + *win = 1; + return x; + } + else if (GET_CODE (XEXP (ad, 0)) == PLUS + && GET_CODE (XEXP (XEXP (ad, 0), 0)) == REG + && GET_CODE (XEXP (XEXP (ad, 0), 1)) == REG) + { +#if 1 + rtx rb = XEXP (XEXP (ad, 0), 0); + rtx ri = XEXP (XEXP (ad, 0), 1); + rtx c = XEXP (ad, 1); + + /* Use an indexed address as in the original instruction, +but reload rb+c part. Generate the final form of the +address here, so that we match Z constraint. Use r1 +
Re: LEGITIMIZE_RELOAD_ADDRESS vs address_reloaded
On Fri, Nov 25, 2005 at 07:20:52PM +0100, Ulrich Weigand wrote: > > c) Modify the ppc 'Z' constraint to match the indexed address reload > > generates. This would rely on the pattern we generate in > > LEGITIMIZE_RELOAD_ADDRESS never being generated elsewhere. [snip] > > Overall, I'd tend to prefer something along the lines of (c), in > particular as it would also catch the cases where > LEGITIMIZE_RELOAD_ADDRESS isn't actually involved, as you note: Thanks. I went ahead and implemented this, and yes, the testcase in pr24997 has better code in other places too. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: Installing libgcj consumes huge amounts of memory
On Sun, Dec 04, 2005 at 12:35:31AM +0100, Gerald Pfeifer wrote: > spawns a recursive make (GNU make 3.80) that consumes some 450MB of memory > and triggers a system load of 12+, basically rendering the machine dead > for about a minute. > > On a different machine with only 512MB + 1GB swap, this time running > FreeBSD 5.3, I cannot install GCC any longer. I noticed something similar on a Linux machine with 512M + 1G swap when remaking libjava after editing some files. Thrashing for around 15 minutes before finally proceeding. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: Installing libgcj consumes huge amounts of memory
On Sun, Dec 04, 2005 at 11:45:21AM +, Andrew Haley wrote: > Alan Modra writes: > > On Sun, Dec 04, 2005 at 12:35:31AM +0100, Gerald Pfeifer wrote: > > > spawns a recursive make (GNU make 3.80) that consumes some 450MB of > memory > > > and triggers a system load of 12+, basically rendering the machine dead > > > for about a minute. > > > > > > On a different machine with only 512MB + 1GB swap, this time running > > > FreeBSD 5.3, I cannot install GCC any longer. > > > > I noticed something similar on a Linux machine with 512M + 1G swap when > > remaking libjava after editing some files. Thrashing for around 15 > > minutes before finally proceeding. > > This might be make or the disastrously slow split-for-gcj shell It was "make". I forget the exact figures, but I saw approx the same mem usage as Gerald. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: Silly GIT related question
On Wed, Jan 15, 2020 at 03:11:13AM +, Gary Oblock wrote: > If you just do a clone and don't checkout a branch, is this equivalent > the top of the trunk in the old scheme? Yes. More details in "git help clone". -- Alan Modra Australia Development Lab, IBM
Re: Git ChangeLog policy for GCC Testsuite inquiry
On Fri, Feb 07, 2020 at 10:08:25AM +, Jonathan Wakely wrote: > With Git you can't really have unwanted local commits present in a > tree if you use a sensible workflow, so if you tested in a tree that > was at commit 1234abcd and you push from another machine that is at > the same commit, you know there are no unintended differences. Maybe I don't have a sensible workflow, but often with lots of tiddly little binutils patches I don't bother with branches for everything. (I do use branches for larger work.) I also like to test my patches. I'll test individually on a few relevant targets but do a test over a large number of targets (162 currently) for a bunch of patches. Some of those patches tested might not be ready for commit upstream (lacking comments, changelogs, even lacking that vital self review), so I'll "git rebase -i" to put the ones that are ready first, then "git push origin :master" just to push up to the relevant commit. That works quite well for me. -- Alan Modra Australia Development Lab, IBM
set_src_cost lying comment
set_src_cost says it is supposed to /* Return the cost of moving X into a register, relative to the cost of a register move. SPEED_P is true if optimizing for speed rather than size. */ Now, set_src_cost of a register move (set (reg1) (reg2)), is zero. Why? Well, set_src_cost is used just on the right hand side of a SET, so the cost is that of (reg2), which is zero according to rtlanal.c rtx_cost. targetm.rtx_costs doesn't get a chance to modify this. Now consider (set (reg1) (ior (reg2) (reg3))), for which set_src_cost on rs6000 currently returns COSTS_N_INSNS(1). It seems to me that this also ought to return zero, if the set_src_cost comment is to be believed. I'd claim the right hand side of this expression costs the same as a register move. A register move machine insn "mr reg1,reg2" is encoded as "or reg1,reg2,reg2" on rs6000! Continuing in the same vein, an AND is no more expensive than an IOR, and similarly for other ALU operations. So they all ought to cost zero?? But this is ridiculous since set_src_cost is used as in many places as the cost of an entire insn, eg. synth_mult compares the cost of implementing a multiply as a series of adds and shifts against the cost of a multiply. If all those adds and shifts are costed at zero, then synth_mult can't do its job. So what should that comment say? -- Alan Modra Australia Development Lab, IBM
Re: set_src_cost lying comment
On Tue, Jun 23, 2015 at 11:05:45PM -0600, Jeff Law wrote: > I certainly agree that the cost of a move, logicals and arithmetic is > essentially the same at the chip level for many processors. But a copy has > other properties that make it "cheaper" -- namely we can often propagate it > away or arrange for the source & dest of the copy to have the same hard > register which achieves the same effect. > > So one could argue that a copy should have cost 0 as it has a reasonable > chance of just going away, while logicals, alu operations on the appropriate > chips should have a cost of 1. That's an interesting point, and perhaps true for rtl expansion. I'm not so sure it is correct for later rtl passes where you'd like to discourage register moves.. Case in point: The rs6000 backend happens to use zero for the cost of setting registers to simple constants. That might be an accident, but when I fixed this by making (set (reg) (const_int)) cost one insn as it actually does for a range of constants, I found some call sequences regressesd. A call like foo(0,0) is better as (set (reg r3) (const_int 0)) li 3,0 (set (reg r4) (const_int 0)) li 4,0 (call ...)bl foo rather than (set (reg r3) (const_int 0)) li 3,0 (set (reg r4) (reg r3)) mr 4,3 (call ...)bl foo CSE will say the second sequence is cheaper if loading a constant is more expensive than a copy. In reality the second sequence is less preferable since you have a register dependency. A similar problem happens with foo(x+1,x+1) which currently emits (set (reg r3) (plus (reg x) (const_int 1))) (set (reg r4) (reg r3)) for the arg setup insns. On modern processors it would be better as (set (reg r3) (plus (reg x) (const_int 1))) (set (reg r4) (plus (reg x) (const_int 1))) So in these examples we'd really like register moves to cost one insn. Hmm, at least, moves from hard regs ought to cost something. -- Alan Modra Australia Development Lab, IBM
rtx_cost of insns
On Thu, Jun 25, 2015 at 01:28:39PM +0100, Richard Earnshaw wrote: > Perhaps the best thing to do is to use the OUTER code to spot the > specific case where you've got a SET and return non-zero in that case. That's exactly the path I've been following. It's not as easy as it sounds.. First, some backends call rtx_cost from their targetm.rtx_costs. ix86_rtx_costs for instance has this case PLUS: ... if (val == 2 || val == 4 || val == 8) { *total = cost->lea; *total += rtx_cost (XEXP (XEXP (x, 0), 1), outer_code, opno, speed); *total += rtx_cost (XEXP (XEXP (XEXP (x, 0), 0), 0), outer_code, opno, speed); *total += rtx_cost (XEXP (x, 1), outer_code, opno, speed); return true; } which, when using a non-zero register move cost, results in Successfully matched this instruction: (set (reg:DI 198 [ D.74663 ]) (plus:DI (plus:DI (reg/v/f:DI 172 [ use_entry ]) (reg:DI 196 [ D.74662 ])) (const_int -32 [0xffe0]))) rejecting combination of insns 179 and 180 original costs 6 + 4 = 10 replacement cost 15 So here the x86 backend is calculating the cost of an lea, plus the cost of (reg:DI 196), plus the cost of (reg/v/f:DI 172), plus the cost of (const_int -32). outer_code is SET. That means we add two register moves, increasing the overall cost from 7 to 15. The second problem I've hit is that fwprop.c:should_replace_address has this: /* If the addresses have equivalent cost, prefer the new address if it has the highest `set_src_cost'. That has the potential of eliminating the most insns without additional costs, and it is the same that cse.c used to do. */ if (gain == 0) gain = (set_src_cost (new_rtx, VOIDmode, speed) - set_src_cost (old_rtx, VOIDmode, speed)); return (gain > 0); If register moves have the same cost as adding a small constant to a register, then this code no longer replaces a pseudo with its value as an offset from a base. I think this particular problem can be fixed quite simply by "return gain >= 0;", but really, this code, like the x86 code, is expecting the cost of a register move to be zero. You'll notice that these example problems are not trying to cost a whole instruction. In both cases they want the cost of just a piece of an instruction, but rtx_cost is called in a way that is indistinguishable from other code that calls rtx_cost on whole register move instructions. The real difficulty is in separating out the whole insn cases from the partial insn cases. Note that we already have insn_rtx_cost, and it returns a minimum cost for a SET, so register move insns get a cost of 1 insn. However, despite insn_rtx_cost starting life in combine.c, even combine doesn't use it in all whole insn cases. :-( -- Alan Modra Australia Development Lab, IBM
Re: rtx_cost of insns
On Mon, Jun 29, 2015 at 09:34:40AM -0500, Segher Boessenkool wrote: > On Mon, Jun 29, 2015 at 05:16:39PM +0930, Alan Modra wrote: > > Note that we already have insn_rtx_cost, and it returns a minimum cost > > for a SET, so register move insns get a cost of 1 insn. However, > > despite insn_rtx_cost starting life in combine.c, even combine doesn't > > use it in all whole insn cases. :-( > > In what cases does it not? Practically all of the occurrences of set_src_cost in combine.c can be called on whole insns. By "whole insn" I mean of course the right hand side of a set, or a single set inside a parallel. I'm not saying that this causes trouble, since I haven't seen a register move there (but I haven't looked very hard either). -- Alan Modra Australia Development Lab, IBM
Re: configure.{in -> ac} rename (commit 35eafcc71b) broke in-tree binutils building of gcc
On Tue, Jul 14, 2015 at 10:13:06AM +0100, Jan Beulich wrote: > Alan, gcc maintainers, > > I was quite surprised for my gcc 4.9.3 build (using binutils 2.25 instead > of 2.24 as I had in use with 4.9.2) to fail in rather obscure ways. Quite > a bit of digging resulted in me finding that gcc/configure.ac looks for > configure.in in a number of binutils subtrees. I haven't used combined tree builds of binutils+gcc for a very long time, so this issue wasn't on my radar at all, sorry. > Globally replacing > configure.in by configure.[ai][cn] appears to address this, but I'm not > sure whether that would be an acceptable change Certainly sounds reasonable. > (there doesn't seem > to be a fix for this in gcc trunk either, which I originally expected I could > simply backport). The configure.in->configure.ac rename happened over a year ago so I guess this shows that not too many people use combined binutils+gcc builds nowadays. I've always found combined binutils+gcc builds not worth the bother compared to simply building and installing binutils first, as Jim suggests. -- Alan Modra Australia Development Lab, IBM
Re: CFI directives and dynamic stack alignment
On Mon, Aug 03, 2015 at 02:48:09PM -0700, Steve Ellcey wrote: > When I generate code to dynamically align the stack my code looks like > this: > > fn2: > .frame $fp,32,$31 # vars= 0, regs= 2/0, args= 16, gp= 8 > .mask 0xc000,-4 > .fmask 0x,0 > .setnoreorder > .setnomacro > lui $2,%hi(null) > li $3,-16 # 0xfff0 > lw $2,%lo(null)($2) > and $sp,$sp,$3 > addiu $sp,$sp,-32 > .cfi_def_cfa_offset 32 > sw $fp,24($sp) > .cfi_offset 30, -8 > move$fp,$sp > .cfi_def_cfa_register 30 > sw $31,28($sp) > .cfi_offset 31, -4 > jal abort > sb $0,0($2) > > The 'and' instruction is where the stack gets aligned and if I remove that > one instruction, everything works. I think I need to put out some new CFI > psuedo-ops to handle this but I am not sure what they should be. I am just > not very familiar with the CFI directives. I don't speak mips assembly very well, but it looks to me that you have more than just CFI problems. How do you restore sp on return from the function, assuming sp wasn't 16-byte aligned to begin with? Past that "and $sp,$sp,$3" you don't have any means of calculating the original value of sp! (Which of course is why you also can't find a way of representing the frame address.) -- Alan Modra Australia Development Lab, IBM
Re: CFI directives and dynamic stack alignment
On Mon, Aug 17, 2015 at 10:38:22AM -0700, Steve Ellcey wrote: > On Tue, 2015-08-11 at 10:05 +0930, Alan Modra wrote: > > > > The 'and' instruction is where the stack gets aligned and if I remove that > > > one instruction, everything works. I think I need to put out some new CFI > > > psuedo-ops to handle this but I am not sure what they should be. I am > > > just > > > not very familiar with the CFI directives. > > > > I don't speak mips assembly very well, but it looks to me that you > > have more than just CFI problems. How do you restore sp on return > > from the function, assuming sp wasn't 16-byte aligned to begin with? > > Past that "and $sp,$sp,$3" you don't have any means of calculating > > the original value of sp! (Which of course is why you also can't find > > a way of representing the frame address.) > > I have code in expand_prologue that copies the incoming stack pointer to > a temporary hard register and then I have code to the entry_block to > copy that register into a virtual register. In the exit block that > virtual register is copied back to a temporary hard register and > expand_epilogue copies it back to $sp to restore the stack pointer. OK, then you need to emit a .cfi directive to say the frame top is given by the temp hard reg sometime after that assignment and before sp is aligned in the prologue, and another .cfi directive when copying to the pseudo. It's a while since I looked at the CFI code in gcc, but arranging this might be as simple as setting RTX_FRAME_RELATED_P on the insns involved. If -fasynchronous-unwind-tables, then you'll also need to track the frame in the epilogue. > This function (fn2) ends with a call to abort, which is noreturn, so the > optimizer sees that the epilogue is dead code and GCC determines that > there is no need to save the old stack pointer since it will never get > restored. I guess I need to tell GCC to save the stack pointer in > expand_prologue even if it never sees a use for it. I guess I need to > make the temporary register where I save $sp volatile or do something > else so that the assignment (and its associated .cfi) is not deleted by > the optimizer. Ah, I see. Yes, the temp and pseudo are not really dead if they are needed for unwinding. -- Alan Modra Australia Development Lab, IBM
Re: Adding static-PIE support to binutils
On Tue, Aug 18, 2015 at 08:58:43PM -0400, Rich Felker wrote: > I've updated the patch to cover the changes needed for all the > elf??-*.c target files (lots of code duplication already there), skip > the clearing of command_line.interpreter, and based it on current git > master with your output_type changes. This is OK to commit with a suitable ChangeLog. I think a separate ld option is best too, because historically -static and its aliases -Bstatic, -dn, -non_shared really are about what type of libraries are accepted rather than choosing linker output type. -- Alan Modra Australia Development Lab, IBM
Re: ppc eabi float arguments
On Tue, Sep 22, 2015 at 07:39:43PM +0200, Bernhard Schommer wrote: > Does anyone know the reason why the gcc passes the argument as single float? That's how the first powerpc gcc implementation behaved. Once gcc compiled code is out in the field, you need to ask everyone to recompile their code in order to fix an ABI problem. That may be more disrupting than just leaving gcc incompatible with other compilers. -- Alan Modra Australia Development Lab, IBM
Re: ppc eabi float arguments
On Wed, Sep 23, 2015 at 07:09:43PM -0400, Michael Meissner wrote: > On Tue, Sep 22, 2015 at 01:43:55PM -0400, David Edelsohn wrote: > > On Tue, Sep 22, 2015 at 1:39 PM, Bernhard Schommer > > wrote: > > > Hi, > > > > > > if been working with the windriver Diab c compiler for 32bit ppc for and > > > encountered an incompatibly with the eabi version of the gcc 4.83. When > > > calling functions with more than 8 float arguments the gcc stores the 9th > > > float argument (and so on) as a float where as the diab compiler stores > > > the > > > argument as a double using 8 byte. > > > > > > I checked the EABI document and it seems to support the way the diab > > > compiler passes the arguments: > > > > > > "Arguments not otherwise handled above [i.e. not passed in registers] > > > are passed in the parameter words of the caller=E2=80=99s stack frame. > > > [...= > > > ] > > > float, long long (where implemented), and double arguments are > > > considered to have 8-byte size and alignment, *with float arguments > > > converted to double representation*. " > > > > > > Does anyone know the reason why the gcc passes the argument as single > > > float? > > > > Hi, Bernhard > > > > First, are you certain that you have the final version of the 32 bit > > PPC eABI? There were a few versions in circulation. > > > > Mike may remember the history of this. > > Well I worked on it around 1980 or so. I don't remember the details (nor do I > have the original manuals I was working from). From this distance, it sure > looks like a bug, but I'm not sure whether it should be fixed or > grand-fathered > in (and updating the stdargs.h support, if this is the offical calling > sequence). I recall this question coming up before, and we decided to leave gcc as is so that new ppc32 gcc code stayed compatible with old ppc32 gcc code. Also, even if we were starting with a clean slate, we might want to pass floats without promoting to double: Stack frames are potentially smaller. Against that is the fact that we promote to double when calling an unprototyped function, so you'll run into trouble trying to define a function with more than eight float args if writing K&R code. Old programmers tend to know about such issues, and don't use float function parameters in K&R code. :) Incidentally, there are other rather more nasty parameter passing problems with ppc32, ones I would have liked to fix. For instance, "complex double" is passed in 4 gprs. -- Alan Modra Australia Development Lab, IBM
Re: September 2015 GNU Toolchain Update
On Fri, Sep 25, 2015 at 01:33:34PM +0100, Nick Clifton wrote: > * The new PowerPC64 specific linker command line option > --no-save-restore-funcs tells the linker not to provide the > out-of-line register save and restore functions used by -Os compiled > code. The default is to provide any such referenced function for > a normal final link, but not do so for a relocatable link. Actually, --save-restore-funcs and --no-save-restore-funcs have been around since 2014-02. The recent new PowerPC64 option is --tls-get-addr-optimize, a complement to --no-tls-get-addr-optimize. -- Alan Modra Australia Development Lab, IBM
Re: Some aliasing questions
On Fri, Apr 08, 2016 at 01:41:05PM -0700, Richard Henderson wrote: > On 04/08/2016 11:10 AM, Bill Schmidt wrote: > > The first is an issue with TOC-relative addresses on PowerPC. These are > > symbolic addresses that are to be loaded from a fixed slot in the table > > of contents, as addressed by the TOC pointer (r2). In the RTL phases > > prior to register allocation, these are described in an UNSPEC that > > looks like this for an example store: > > > > (set (mem/c:DI (unspec:DI [ > >(symbol_ref:DI ("*.LANCHOR0") [flags 0x182]) > >(reg:DI 2 2) > > ] UNSPEC_TOCREL) [1 svul+0 S8 A128]) > > (reg:DI 178)) > > > > The UNSPEC helps keep track of the r2 reference until this is split into > > two or more insns depending on the memory model. > > > That's why Alpha uses LO_SUM for pre-reload tracking of such things. > > Even though that's a bit of a liberty, since there's no HIGH to go along with > the LO_SUM. But at least it allows the middle-end to continue to find the > symbol. I wish I'd been made aware of the problem with alias analysis when I invented this scheme for -mcmodel=medium code.. Back in gcc-4.3 days, when small-model code was the only option, we used to generate mem (plus ((reg 2) (const (minus ((symbol_ref) (symbol_ref toc_base)) for a toc mem reference, which accurately reflects the addressing. The problem is that when splitting this to a high/lo_sum you lose the r2 reference in the lo_sum, and that allows r2 to die prematurely, breaking an important linker code editing optimisation. Hmm. Maybe if we rewrote the mem to mem (plus ((symbol_ref toc_base) (const (minus ((symbol_ref) (reg 2)) It might look odd, but is no lie. r2 is equal to toc_base. Or perhaps we could lie a litte and simply omit the plus and toc_base reference? Either way, when we split to set (reg tmp) (high (const (minus ((symbol_ref) (reg 2) .. mem (lo_sum (reg tmp) (const (minus ((symbol_ref) (reg 2) both high and lo_sum reference r2 and the linker could happily replace rtmp in the lo_sum insn with r2 when the high address is known to be zero. Bill, do you have test cases for the alias problem? Is this something that needs fixing for gcc-6? -- Alan Modra Australia Development Lab, IBM
Re: Preventing preemption of 'protected' symbols in GNU ld 2.26 [aka should we revert the fix for 65248]
On Mon, Apr 18, 2016 at 11:01:48AM +0200, Richard Biener wrote: > To summarize: there is currently no testcase for a wrong-code issue > because there is no wrong-code issue. That depends entirely on how far you are willing to bend the ELF gABI. Any testcase the takes the address of a protected visibility variable defined in a shared library now can get the wrong answer, since you can argue that any address outside the shared library is wrong according to the gABI. I expect you can also write a testcase using a const protected var in a shared library that ought to segfault on writing to the var from code within the shared library, that now merrily writes to a .dynbss copy. -- Alan Modra Australia Development Lab, IBM
Re: Preventing preemption of 'protected' symbols in GNU ld 2.26 [aka should we revert the fix for 65248]
On Mon, Apr 18, 2016 at 07:59:50AM -0700, H.J. Lu wrote: > On Mon, Apr 18, 2016 at 7:49 AM, Alan Modra wrote: > > On Mon, Apr 18, 2016 at 11:01:48AM +0200, Richard Biener wrote: > >> To summarize: there is currently no testcase for a wrong-code issue > >> because there is no wrong-code issue. I've added a testcase at https://sourceware.org/bugzilla/show_bug.cgi?id=19965#c3 that shows the address problem (&x != x) with older gcc *or* older glibc, and shows the program behaviour problem with current binutils+gcc+glibc. -- Alan Modra Australia Development Lab, IBM
Re: Preventing preemption of 'protected' symbols in GNU ld 2.26 [aka should we revert the fix for 65248]
On Tue, Apr 19, 2016 at 10:20:23AM +0200, Richard Biener wrote: > On Tue, Apr 19, 2016 at 7:08 AM, Alan Modra wrote: > > On Mon, Apr 18, 2016 at 07:59:50AM -0700, H.J. Lu wrote: > >> On Mon, Apr 18, 2016 at 7:49 AM, Alan Modra wrote: > >> > On Mon, Apr 18, 2016 at 11:01:48AM +0200, Richard Biener wrote: > >> >> To summarize: there is currently no testcase for a wrong-code issue > >> >> because there is no wrong-code issue. > > > > I've added a testcase at > > https://sourceware.org/bugzilla/show_bug.cgi?id=19965#c3 > > that shows the address problem (&x != x) with older gcc *or* older > > glibc, and shows the program behaviour problem with current > > binutils+gcc+glibc. > > Thanks. > > So with all this it sounds that current protected visibility is just broken > and we should forgo with it, making it equal to default visibility? Well, using protected visibility variables makes no sense in executables. They really are only useful in shared libraries, but have been of limited use on architectures like x86 for a long time due to non-PIC executable copying shared library variables into .dynbss. The concepts of copying variables into .dynbss, and protected visibility, are fundamentally incompatible. HJ's changes addressed the program level semantic issues, but in the process lost the main reason to use protected visibility variables, which is to tell a compiler that a global variable cannot be preempted (and therefore can use faster code for access, typically pc or GOT pointer relative rather than GOT indirect.) So IMO, "of limited use" has now become "not much use at all" on x86_64 and other architectures that have blindly followed suit. > At least I couldn't decipher a solution that solves all of the issues > with protected visibility apart from trying to error at link-time > (or runtime?) for the cases that are tricky (impossible?) to solve. I described the problem and solutions in https://sourceware.org/ml/binutils/2016-03/msg00431.html. A followup by Cary pointed out that one of the solutions, emitting text dynamic relocations, won't work on some architectures (of which x86_64 is one). > glibc uses "protected visibility" via its using of local aliases, correct? Yes, glibc defines a hidden visibility symbol for internal use, with an exported alias. > But it doesn't use anything like that for data symbols? I believe it does. See occurrences of libc_hidden_data_def. -- Alan Modra Australia Development Lab, IBM
Re: Preventing preemption of 'protected' symbols in GNU ld 2.26 [aka should we revert the fix for 65248]
On Mon, Apr 25, 2016 at 11:35:46AM -0600, Jeff Law wrote: > No, we revert to the gcc-4.9 behavior WRT protected visibility and ensure > that we're getting a proper diagnostic from the linker. > > That direction is consistent with the intent of protected visibility, fixes > the problem with preemption of protected symbols and gives us a diagnostic > for the case that can't be reasonably handled. I agree that this is the correct solution. Unfortunately there is a complication. PIE + shared lib using protected visibility worked fine with gcc-4.9, but since then code generated by gcc for PIEs on x86_64 has been optimized to rely on the horrible old hack of .dynbss and copy relocations. That means you'll have regressions from 4.9 if just reverting the protected visibility change.. The PIE optimization will need reverting too, and I imagine you'll see some resistance to that idea due to the fact that it delivers quite a nice performance improvement for PIEs. -- Alan Modra Australia Development Lab, IBM
GNU indirect functions vs. symbol visibility
Discussion started here: https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01678.html On Wed, Aug 24, 2016 at 08:51:16PM +0300, Alexander Monakov wrote: > On Wed, 24 Aug 2016, Alan Modra wrote: > > Given a hidden visibility function declaration, the toolchain can say > > that the function is local to the module. This generally means that a > > call to the function can be direct, ie. doesn't need to go via the PLT > > even in a shared library. However, ifunc breaks this promise. GNU > > indirect functions may resolve non-locally, and are implemented by > > always using a PLT call. > > > > This causes trouble for targets like ppc32 where the -msecure-plt PIC > > PLT call stub needs a valid GOT pointer. Any call that potentially > > might be to an ifunc therefore requires a GOT pointer, and can't be a > > sibling call (because the GOT pointer on ppc32 is a caller saved reg). > > The same issue exists on 32-bit x86: PLT calls require that %ebx holds the > address of GOT (and the sibcall issue arises as well). I've just confirmed > using a simple testcase that the scenario you describe leads to a runtime > error > on i386, and even LD_BIND_NOW=1 doesn't help, as it doesn't trigger early > resolution of ifuncs. I'm happy to see that ppc32 isn't alone. ;-) > > So unless we require that all ifuncs are declared as ifunc, > > (note, that would be impossible with today's GCC because the ifunc attribute > requires designating the resolver, and the resolver cannot be extern -- so > ultimately you cannot declare an extern-ifunc symbol) > > > it seems that ppc32 can't assume extern or weak functions are local. > > It doesn't seem nice to penalize all normal calls due to this issue. I whole-heartedly agree. > I think a > solution without this cost is possible: have ld synthesize a forwarder > function > when it sees a non-plt call to an ifunc symbol. The forwarder can push the GOT > register, load the GOT address, call the ifunc symbol, pop the GOT register > and > return. Does this sound right? I'd considered this idea too. It should work, but isn't ideal. The resulting code will be slower than if the ifuncs were simply not declared hidden. The idea also isn't quite as simple to implement as it might seem, since frame unwinding must work through any such stub, and gdb probably would need to know about them too. I prefer to simply make ld error on seeing calls to ifuncs where it detects that such a stub would be needed. ppc32 GNU ld should do that reliably as of git commit 888a7fc3. glibc people: As the main user of ifuncs, how do you feel about not declaring functions hidden that are implemented in glibc by ifuncs? It's fine to make them hidden via a version script, or even define them as hidden (which requires just the rs6000_elf_encode_section_info part of my gcc patch to make ppc32 behave). -- Alan Modra Australia Development Lab, IBM
Re: GNU indirect functions vs. symbol visibility
On Thu, Aug 25, 2016 at 01:36:53PM +0200, Florian Weimer wrote: > * Alan Modra: > > > glibc people: As the main user of ifuncs, how do you feel about not > > declaring functions hidden that are implemented in glibc by ifuncs? > > We have run into this before, I think: > > <https://sourceware.org/ml/libc-alpha/2016-07/msg00089.html> Yes, this is exactly the same problem, a hidden visibility prototype with an ifunc definition. Don't add the visibility attribute to the prototype and the problem will no longer occur. Also note that adding hidden visibility to a prototype that has an ifunc definition in glibc gives no benefit on targets that can handle this situation. The difficulty of course is that where glibc does not provide an ifunc implementation you *do* want the hidden visibility attribute, and whether or not ifuncs are used varies from target to target. > > It's fine to make them hidden via a version script, or even define > > them as hidden (which requires just the rs6000_elf_encode_section_info > > part of my gcc patch to make ppc32 behave). > > If it doesn't work, we'd certainly prefer an early diagnostic. Right. https://sourceware.org/bugzilla/show_bug.cgi?id=20515 opened. -- Alan Modra Australia Development Lab, IBM
[RFC] PR61300 K&R incoming args
g on the value of ACCUMULATE_OUTGOING_ARGS, REG_PARM_STACK_SPACE, and OUTGOING_REG_PARM_STACK_SPACE. */ -- Alan Modra Australia Development Lab, IBM
Re: [RFC] PR61300 K&R incoming args
On Fri, May 30, 2014 at 11:27:52AM -0600, Jeff Law wrote: > On 05/26/14 01:38, Alan Modra wrote: > >PR61300 shows a need to differentiate between incoming and outgoing > >REG_PARM_STACK_SPACE for the PowerPC64 ELFv2 ABI, due to code like > >function.c:assign_parm_is_stack_parm determining that a stack home > >is available for incoming args if REG_PARM_STACK_SPACE is non-zero. > > > >Background: The ELFv2 ABI requires a parameter save area only when > >stack is actually used to pass parameters, and since varargs are > >passed on the stack, unprototyped calls must pass both on the stack > >and in registers. OK, easy you say, !prototype_p(fun) means a > >parameter save area is needed. However, a prototype might not be in > >scope when compiling an old K&R style C function body, but this does > >*not* mean a parameter save area has necesasrily been allocated. A > >caller may well have a prototype in scope at the point of the call. > Ugh. This reminds me a lot of the braindamage we had to deal with > in the original PA abi's handling of FP values. > > In the general case, how can any function ever be sure as to whether > or not its prototype was in scope at a call site? Yea, we can know > for things with restricted scope, but if it's externally visible, I > don't see how we're going to know the calling context with absolute > certainty. > > What am I missing here? When compiling the function body you don't need to know whether a prototype was in scope at the call site. You just need to know the rules. :) For functions with variable argument lists, you'll always have a parameter save area. For other functions, whether or not you have a parameter save area just depends on the number of arguments and their types (ie. whether you run out of registers for parameter passing), and you have that whether or not the function is prototyped. A simple example might help clear up any confusion. Given void fun1(int a, int b, double c); void fun2(int a, ...); ... fun1 (1, 2, 3.0); fun2 (1, 2, 3.0); A call to fun1 with a prototype in scope won't allocate a parameter save area, and will pass the first arg in r3, the second in r4, and the third in f1. A call to fun2 with a prototype in scope will allocate a parameter save area of 64 bytes (the minimum size of a parameter save area), and will pass the first arg in r3, the second in the second slot of the parameter save area, and the third in the third slot of the parameter save area. Now the first eight slots/double-words of the parameter save area are passed in r3 thru r10, so this means the second arg is actually passed in r4 and the third in r5, not the stack! A call to fun1 or fun2 without a prototype in scope will allocate a parameter save area, and pass the first arg in r3, the second in r4, and the third in both f1 and r5. When compiling fun1 body, the first arg is known to be in r3, the second in r4, and the third in f1, and we don't use the parameter save area for storing incoming args to a stack slot. (At least, after PR61300 is fixed..) It doesn't matter if the parameter save area was allocated or not, we just don't use it. When compiling fun2 body, the first arg is known to be in r3, the second in r4 and the third in r5. Since the function has a variable argument list, registers r4 thru r10 are saved to the parameter save area stack, and we set up our va_list pointer to the second double-word of the parameter save area stack. Of course, code optimisation might lead to removing the saves and using the args in their incoming regs, but this is conceptually what happens. -- Alan Modra Australia Development Lab, IBM
Re: [RFC] PR61300 K&R incoming args
On Fri, May 30, 2014 at 09:22:30PM +0200, Florian Weimer wrote: > On 05/26/2014 09:38 AM, Alan Modra wrote: > > >Background: The ELFv2 ABI requires a parameter save area only when > >stack is actually used to pass parameters, and since varargs are > >passed on the stack, unprototyped calls must pass both on the stack > >and in registers. OK, easy you say, !prototype_p(fun) means a > >parameter save area is needed. However, a prototype might not be in > >scope when compiling an old K&R style C function body, but this does > >*not* mean a parameter save area has necesasrily been allocated. > > It's fine to change ABI when compiling an old-style function > definition for which a prototype exists (relative to the > non-prototype case). It happens on i386, too. That might be so, but when compiling the function body you must assume the worst case, whatever that might be, at the call site. For K&R code, our error was to assume the call was unprototyped (which paradoxically is the best case) when compiling the function body. -- Alan Modra Australia Development Lab, IBM
Re: [RFC] PR61300 K&R incoming args
On Mon, Jun 02, 2014 at 12:00:41PM +0200, Florian Weimer wrote: > On 05/31/2014 08:56 AM, Alan Modra wrote: > > >>It's fine to change ABI when compiling an old-style function > >>definition for which a prototype exists (relative to the > >>non-prototype case). It happens on i386, too. > > > >That might be so, but when compiling the function body you must assume > >the worst case, whatever that might be, at the call site. For K&R > >code, our error was to assume the call was unprototyped (which > >paradoxically is the best case) when compiling the function body. > > Is this really a supported use case? Of course! We still have K&R code lying around, as evidenced by the PR. > I think I remember tracking > down a bug which was related to a lack of float -> double promotion > because the call was prototyped, and the old-style function > definition wasn't. This would have been on, ugh, SPARC. I think > this happened only in certain cases (float arguments, probably). Yes, there are some limitations on parameter types that may be used with unprototyped functions. > Does this trigger more often on ppc64 ELFv2, to the extend it > becomes a quality-of-implementation issue? I'm pretty sure the > standards do not require a particular behavior in such cases. The PR isn't about the sort of parameter mismatch that you seem to be thinking about. The code in question is perfectly legal old-style K&R where there is no float/double or int/long/void * trouble. -- Alan Modra Australia Development Lab, IBM
Re: [RFC] PR61300 K&R incoming args
On Thu, Jun 05, 2014 at 01:19:19PM -0600, Jeff Law wrote: > And so the problem you're trying to solve is that when compiling the > callee. You incorrectly assumed that if there was not a prototype > for the callee's definition that the caller had set up the save area > and that you could flush arguments to it. That's not true in the > case where the caller had a prototype for the callee in-scope (and > the callee was not a varargs function). > > Right? Just want to make sure I understand the problem. Exactly correct. -- Alan Modra Australia Development Lab, IBM
Re: Reload generate invalid instruction on ppc64
On Mon, Aug 04, 2014 at 05:54:04PM -0700, Carrot Wei wrote: > Another problem is in the definition of insn pattern "*movdi_internal64". > > (define_insn "*movdi_internal64" > [(set (match_operand:DI 0 "nonimmediate_operand" > "=Y,r,r,r,r,r,?m,?*d,?*d,r,*h,*h,r,?*wg,r,?*wm") > (match_operand:DI 1 "input_operand" > "r,Y,r,I,L,nF,d,m,d,*h,r,0,*wg,r,*wm,r"))] > "TARGET_POWERPC64 >&& (gpc_reg_operand (operands[0], DImode) >|| gpc_reg_operand (operands[1], DImode))" > > The predicates of this insn pattern allow the moving of an integer to > VSX register, but there is no constraint allow this case. Can this > cause problem in reload? Probably, just as you found with fprs. The underlying issue is that the operand predicates don't match the operand constraints. What's more, you can't make them match without breaking up the insn, or adding a whole lot of extra alternatives. -- Alan Modra Australia Development Lab, IBM
Re: LTO and version scripts
On Mon, Jul 07, 2014 at 11:04:17AM +0200, Richard Biener wrote: > On Mon, Jun 30, 2014 at 2:35 PM, Ulrich Drepper wrote: > > Using LTO to create a DSO works fine (i.e., it performs the expected > > optimizations) for symbols which are marked with visibility > > attributes. It does not work, though, when the symbol is not > > restricted in its visibility in the source file but instead is > > prevented from being exported from the DSO by a version script (ld > > --version-script=FILE). > > > > Is this known? I only found general problems related to linker > > scripts although version script parameters do not cause any other > > failures. > > Yes, I've run into this as well. IMHO the issue is that the linker(s) > do not process the linker script "properly" when handing off > the resolution data to the linker plugin. So it's a linker bug AFAIU. What version linker? In particular, do you have the fix for PR12975? -- Alan Modra Australia Development Lab, IBM
Re: LTO and version scripts
On Tue, Aug 05, 2014 at 08:18:06PM -0400, Ulrich Drepper wrote: > On Tue, Aug 5, 2014 at 12:57 AM, Alan Modra wrote: > > What version linker? In particular, do you have the fix for PR12975? > > The Fedora 19 version. I think it hasn't changed since then which > means it is 2.23.88.0.1-13 (from the RPM version number). No idea > whether that fix is included and unfortunately won't have time to try > before the weekend. Both Fedora 19 and 20 have the patch needed for this to work. Hmm, I suppose the other thing necessary is a gcc that implements LDPT_GET_SYMBOLS_V2. You may be lacking that. Here's what I see with mainline gcc and ld. cat > ltoshare.c <<\EOF int cond (void) { return 0; } extern void something (void); int main (void) { if (cond ()) something (); return 0; } EOF cat > ltoshare.ver <<\EOF { global: main; local: *; }; EOF ~/build/gcc-current/gcc/xgcc -B ~/build/gcc-current/gcc/ -B ld/tmpdir/ld -O2 -fPIC -flto -c ltoshare.c ~/build/gcc-current/gcc/xgcc -B ~/build/gcc-current/gcc/ -B ld/tmpdir/ld -shared -flto -o ltoshare.so ltoshare.o nm -D ltoshare.so | grep something U something ~/build/gcc-current/gcc/xgcc -B ~/build/gcc-current/gcc/ -B ld/tmpdir/ld -shared -flto -o ltoshare.so ltoshare.o -Wl,--version-script=ltoshare.ver nm -D ltoshare.so | grep something -- Alan Modra Australia Development Lab, IBM
Re: Failure to dlopen libgomp due to static TLS data
On Thu, Feb 12, 2015 at 12:07:24PM -0500, Rich Felker wrote: > On Thu, Feb 12, 2015 at 08:56:26AM -0800, H.J. Lu wrote: > > On Thu, Feb 12, 2015 at 8:11 AM, Jakub Jelinek wrote: > > > On Thu, Feb 12, 2015 at 11:09:59AM -0500, Rich Felker wrote: > > >> On Thu, Feb 12, 2015 at 04:18:57PM +0100, Ulrich Weigand wrote: > > >> > Hello, > > >> > > > >> > we're running into a problem related to use of initial-exec access to > > >> > TLS variables in dynamically-loaded libraries. Now, in general, this > > >> > is actually not supported. However, there seems to an "inofficial" > > >> > extension that allows selected system libraries to use small amounts > > >> > of static TLS space to allow critical variables to be defined to use > > >> > the initial-exec model even in dynamically-loaded libraries. > > >> > > >> This usage is supposed to be deprecated. Why isn't libgomp using > > >> TLSDESC/gnu2 model? > > > > > > Because it is significantly slower. > > > > And TLSDESC/gnu2 model isn't implemented for x32. > > There are no tests for TLSDESC/gnu2 model in glibc. > > I have no ideas if it works in glibc master on x86-32 or > > x86-64 today. > > Then fixing this should be a priority, IMO. Broken libraries using IE > model "for performance" are a problem that's not going to go away > until TLSDESC gets properly adopted. I posted support for TLSDESC on powerpc back in 2009 (search for powerpc _tls_get_addr call optimization). The patch wasn't reviewed, and I didn't push it because my benchmark tests didn't show a much of a gain. Quite possibly I wasn't using the right benchmark. -- Alan Modra Australia Development Lab, IBM
Re: Failure to dlopen libgomp due to static TLS data
On Thu, Feb 12, 2015 at 06:55:30PM -0500, Rich Felker wrote: > On Fri, Feb 13, 2015 at 10:12:11AM +1030, Alan Modra wrote: > > I posted support for TLSDESC on powerpc back in 2009 (search for > > powerpc _tls_get_addr call optimization). The patch wasn't reviewed, > > and I didn't push it because my benchmark tests didn't show a much of > > a gain. Quite possibly I wasn't using the right benchmark. > > Were you measuring static-allocated TLSDESC vs non-TLSDESC GD model? > That's the case where there should be a "big" difference, though I'm > still somewhat skeptical of the benefits in real-world usage cases. I can't remember, sorry, it was too long ago. -- Alan Modra Australia Development Lab, IBM
--disable-shared bootstrap dies building libcc1
On both x86_64-linux and powerpc64-linux, a --disable-shared bootstrap dies with linker errors when building libcc1.so. You can't build a shared library using objects from the static libstdc++ (or any other library built without -fpic/-fPIC). OK, so there is a workaround, specify --disable-plugin too, but shouldn't this be automatic if --disable-shared is given? -- Alan Modra Australia Development Lab, IBM
Re: Undefined Local Symbol on PowerPC
On Wed, Apr 15, 2015 at 04:10:33PM -0500, Joel Sherrill wrote: > Based on the grep, the .4byte directives are referencing a bogus symbol. > > Does this look like a GCC bug? Yes, unless you have some horrible asm there referencing the symbol. -- Alan Modra Australia Development Lab, IBM
[RFC] Combine related fail of gcc.target/powerpc/ti_math1.c
FAIL: gcc.target/powerpc/ti_math1.c scan-assembler-times adde 1 is seen on powerpc64le-linux since somewhere between revision 218587 and 218616. See https://gcc.gnu.org/ml/gcc-testresults/2014-12/msg01287.html and https://gcc.gnu.org/ml/gcc-testresults/2014-12/msg01325.html A regression hunt fingers one of Segher's 2014-12-10 patches to the rs6000 backend, git commit 0f1bedb4 or svn rev 218595. Segher might argue that generated code is better after this commit, and I'd agree that his change is a good one in general, but even so it would be nice to generate the ideal code. Curiously, the ideal code is generated at -O1, but we regress at -O2.. before after ideal (-O1) add_128:add_128:add_128: ld 10,0(3) ld 9,0(3) ld 9,0(3) ld 11,8(3) ld 10,8(3) ld 10,8(3) addc 8,4,10 addc 3,4,9 addc 3,4,9 adde 9,5,11 addze 5,5 adde 4,5,10 mr 3,8 add 4,5,10 blr mr 4,9 blr blr I went looking into where the addze appeared, and found combine. Trying 18, 9 -> 24: Failed to match this instruction: (set (reg:DI 4 4 [+8 ]) (plus:DI (plus:DI (reg:DI 5 5 [ val+8 ]) (reg:DI 76 ca)) (reg:DI 169 [+8 ]))) Successfully matched this instruction: (set (reg:DI 167 [ D.2366+8 ]) (plus:DI (reg:DI 5 5 [ val+8 ]) (reg:DI 76 ca))) Successfully matched this instruction: (set (reg:DI 4 4 [+8 ]) (plus:DI (reg:DI 167 [ D.2366+8 ]) (reg:DI 169 [+8 ]))) allowing combination of insns 18, 9 and 24 original costs 4 + 8 + 4 = 16 replacement costs 4 + 4 = 8 Here are the three insns involved, sans source line numbers and notes. (insn 18 17 4 2 (set (reg:DI 165 [ val+8 ]) (reg:DI 5 5 [ val+8 ])) {*movdi_internal64}) ... (insn 9 8 23 2 (parallel [ (set (reg:DI 167 [ D.2366+8 ]) (plus:DI (plus:DI (reg:DI 165 [ val+8 ]) (reg:DI 169 [+8 ])) (reg:DI 76 ca))) (clobber (reg:DI 76 ca)) ]) {*adddi3_carry_in_internal}) ... (insn 24 23 15 2 (set (reg:DI 4 4 [+8 ]) (reg:DI 167 [ D.2366+8 ])) {*movdi_internal64}) So, a move copying an argument register to a pseudo, one insn from the body of the function, and a move copying a pseudo to a result register. The thought I had was: It is really combine's business to look at copies from/to ABI mandated hard registers? Isn't removing the copies something that register allocation can do better? If so, then combine is doing unnecessary work. As a quick hack, I tried the following. Index: gcc/combine.c === --- gcc/combine.c (revision 223431) +++ gcc/combine.c (working copy) @@ -1281,6 +1281,16 @@ combine_instructions (rtx_insn *f, unsigned int nr if (!NONDEBUG_INSN_P (insn)) continue; + if (this_basic_block == EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb) + { + rtx set = single_set (insn); + if (set + && REG_P (SET_DEST (set)) + && HARD_REGISTER_P (SET_DEST (set)) + && REG_P (SET_SRC (set))) + continue; + } + while (last_combined_insn && last_combined_insn->deleted ()) last_combined_insn = PREV_INSN (last_combined_insn); This cures the powerpc64le testcase failure, but Segher said on irc I was risking breaking x86 and other targets. Perhaps that was trying to push me to fix the underlying combine problem. :) In any case, I didn't believe him, and tested the patch on powerpc64le-linux and x86_64-linux. No regressions in --languages=all,go and objdump -d comparison for gcc/*.o against virgin source show no unexpected changes. powerpc64le-linux actually shows no changes at all apart from combine.o while x86_64-linux shows some changes in register allocation and cmove arg swapping with inversion of the condition. There were no extra instructions. So, is this worth pursuing in order to speed up combine? I'd be inclined to patch create_log_links instead for a proper patch. Incidentally the underlying problem in combine (well the first one I spotted, there might be more), is that if (flag_expensive_optimizations) { /* Pass pc_rtx so no substitutions are done, just simplifications. */ "simplifies" this i2src (plus:DI (plus:DI (reg:DI 165 [ val+8 ]) (reg:DI 169 [+8 ])) (reg:DI 76 ca)) to this (plus:DI (plus:DI (reg:DI 76 ca) (reg:DI 165 [ val+8 ])) (reg:DI 169 [+8 ])) and the latter has the ca register in the wrong place. So a split is tried and you get addze. I'm working on this. The reordering happens inside simplify_plus_minus. -- Alan Modra Australia Development Lab, IBM
Re: [RFC] Combine related fail of gcc.target/powerpc/ti_math1.c
On Thu, May 21, 2015 at 07:39:16AM -0500, Segher Boessenkool wrote: > On Thu, May 21, 2015 at 08:06:04PM +0930, Alan Modra wrote: > > FAIL: gcc.target/powerpc/ti_math1.c scan-assembler-times adde 1 > > It doesn't trigger on big-endian; what is different? Register dependencies. One of the arguments is in r4,r5, the return value in r3,r4. We calculate the low 64 bits first, which goes to r4 on big-endian, overlapping the argument. > > Trying 18, 9 -> 24: > > Failed to match this instruction: > > (set (reg:DI 4 4 [+8 ]) > > (plus:DI (plus:DI (reg:DI 5 5 [ val+8 ]) > > (reg:DI 76 ca)) > > (reg:DI 169 [+8 ]))) > > For some reason it has the CA reg not last. simplify-rtx.c:simplify_plus_minus_op_data_cmp > I think we should add to > the canonicalisation rules so that fixed regs sort after other regs. > That requires a lot of testing. What if you have two hard regs as above? Which of reg 5 and reg 76 sorts first? If they are sorted by register number, then ca appears in the wrong place. Reverse sorting hard regs might work for this pattern on powerpc, but that seems an odd choice. And if you say hard regs ought to keep their original order in rtl like the above, then it is no more difficult to keep all regs in their original order > > original costs 4 + 8 + 4 = 16 > > replacement costs 4 + 4 = 8 > > Still need to fix the costs as well (but they work as-is; well enough > that is). Yes, I noticed that too. > Are these copies guaranteed to (still) be in this basic block, > after the passes before combine? Did those passes do anything to > prevent moving it? I'm asking because it would be good to use the > same conditions in that case. Something I need to investigate. As I said, the patch was just a quick hack. -- Alan Modra Australia Development Lab, IBM
Re: [RFC] Combine related fail of gcc.target/powerpc/ti_math1.c
On Thu, May 21, 2015 at 01:44:31PM -0500, Segher Boessenkool wrote: > Let's wait for Alan's patch that makes combine not reorder things > unnecessarily, that should take care of it all as far as I see. Patch here https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02055.html It doesn't do anything fancy, just stops gratuitous register reordering. If simplification or canonicalization occurs, then registers may well be reordered. -- Alan Modra Australia Development Lab, IBM
Better info for combine results in worse code generated
opportunity to try this three insn combination, because we've already reduced down to two insns. Does anyone have any clues as to how I might fix this? I'm not keen on adding an insn_and_split to rs6000.md to recognize the 6 -> 8 combination, because one of the aims of the wider patch I was working on was to remove patterns like rotlsi3_64, ashlsi3_64, lshrsi3_64 and ashrsi3_64. Adding patterns in order to remove others doesn't sound like much of a win. -- Alan Modra Australia Development Lab, IBM
Re: Better info for combine results in worse code generated
On Thu, May 28, 2015 at 10:47:53AM -0400, David Edelsohn wrote: > This seems like a problem with the cost model. Rc instructions are > more expensive and should be represented as such in rtx_costs. The record instructions do have a higher cost (8 vs. 4 for normal insns). If the cost is increaed I don't think you'll see them generated at all, which would fix my testcase but probably regress others. -- Alan Modra Australia Development Lab, IBM
Re: Better info for combine results in worse code generated
REGNO (x) { - *result = rsp->last_set_sign_bit_copies; + int signbits = rsp->last_set_sign_bit_copies; + signbits -= (GET_MODE_PRECISION (rsp->last_set_mode) + - GET_MODE_PRECISION (mode)); + if (signbits <= 0) + signbits = 1; + *result = signbits; return NULL; } @@ -12716,9 +12723,26 @@ record_value_for_reg (rtx reg, rtx_insn *insn, rtx if (GET_MODE_CLASS (mode) == MODE_INT && HWI_COMPUTABLE_MODE_P (mode)) mode = nonzero_bits_mode; - rsp->last_set_nonzero_bits = nonzero_bits (value, mode); - rsp->last_set_sign_bit_copies - = num_sign_bit_copies (value, GET_MODE (reg)); + unsigned HOST_WIDE_INT nonzero = nonzero_bits (value, mode); +#if defined (WORD_REGISTER_OPERATIONS) && defined (EXTEND_OP) + /* Some operations might be known to zero extend to a wider mode. */ + if (GET_MODE_PRECISION (GET_MODE (reg)) < BITS_PER_WORD + && EXTEND_OP (value) == ZERO_EXTEND) + nonzero &= GET_MODE_MASK (GET_MODE (reg)); +#endif + rsp->last_set_nonzero_bits = nonzero; + unsigned int signbits = num_sign_bit_copies (value, GET_MODE (reg)); +#if defined (WORD_REGISTER_OPERATIONS) && defined (EXTEND_OP) + /* Some operations might be known to sign extend to a wider mode. */ + if (GET_MODE_PRECISION (GET_MODE (reg)) < BITS_PER_WORD + && GET_MODE_CLASS (GET_MODE (reg)) == MODE_INT + && EXTEND_OP (value) == SIGN_EXTEND) + { + rsp->last_set_mode = word_mode; + signbits += BITS_PER_WORD - GET_MODE_PRECISION (GET_MODE (reg)); + } +#endif + rsp->last_set_sign_bit_copies = signbits; } } Index: config/rs6000/rs6000.h === --- config/rs6000/rs6000.h (revision 223725) +++ config/rs6000/rs6000.h (working copy) @@ -2043,6 +2043,23 @@ do { \ on the full register even if a narrower mode is specified. */ #define WORD_REGISTER_OPERATIONS +/* Describe how rtl operations on registers behave on this target when + operating on less than the entire register. */ +#define EXTEND_OP(OP) \ + (GET_MODE (OP) != SImode \ + || !TARGET_POWERPC64\ + ? UNKNOWN \ + : (GET_CODE (OP) == AND \ + || GET_CODE (OP) == ZERO_EXTEND \ + || GET_CODE (OP) == ASHIFT \ + || GET_CODE (OP) == ROTATE \ + || GET_CODE (OP) == LSHIFTRT)\ + ? ZERO_EXTEND \ + : (GET_CODE (OP) == SIGN_EXTEND \ + || GET_CODE (OP) == ASHIFTRT)\ + ? SIGN_EXTEND \ + : UNKNOWN) + /* Define if loading in MODE, an integral mode narrower than BITS_PER_WORD will either zero-extend or sign-extend. The value of this macro should be the code that says which one of the two operations is implicitly -- Alan Modra Australia Development Lab, IBM
Re: Better info for combine results in worse code generated
On Fri, May 29, 2015 at 07:58:38AM -0500, Segher Boessenkool wrote: > On Fri, May 29, 2015 at 12:41:20PM +0930, Alan Modra wrote: > > +/* Describe how rtl operations on registers behave on this target when > > + operating on less than the entire register. */ > > +#define EXTEND_OP(OP) \ > > + (GET_MODE (OP) != SImode \ > > + || !TARGET_POWERPC64\ > > + ? UNKNOWN \ > > + : (GET_CODE (OP) == AND \ > > + || GET_CODE (OP) == ZERO_EXTEND \ > > + || GET_CODE (OP) == ASHIFT \ > > + || GET_CODE (OP) == ROTATE \ > > + || GET_CODE (OP) == LSHIFTRT)\ > > + ? ZERO_EXTEND \ > > + : (GET_CODE (OP) == SIGN_EXTEND \ > > + || GET_CODE (OP) == ASHIFTRT)\ > > + ? SIGN_EXTEND \ > > + : UNKNOWN) > > I think this is too simplistic though. For example, AND with -7 is not > zero-extended (rlwinm rD,rA,0,31,28 sets the high 32 bits of rD to the low > 32 bits of rA). We take some pains in rs6000.md to ensure that the wrap-around case for rlwinm does not occur for TARGET_POWERPC64. You'll find that an SImode AND with any value is in fact zero extending. -- Alan Modra Australia Development Lab, IBM
Re: Better info for combine results in worse code generated
On Fri, May 29, 2015 at 10:00:04AM -0500, Segher Boessenkool wrote: > On Fri, May 29, 2015 at 11:20:08PM +0930, Alan Modra wrote: > > On Fri, May 29, 2015 at 07:58:38AM -0500, Segher Boessenkool wrote: > > > On Fri, May 29, 2015 at 12:41:20PM +0930, Alan Modra wrote: > > > > +/* Describe how rtl operations on registers behave on this target when > > > > + operating on less than the entire register. */ > > > > +#define EXTEND_OP(OP) \ > > > > + (GET_MODE (OP) != SImode \ > > > > + || !TARGET_POWERPC64\ > > > > + ? UNKNOWN \ > > > > + : (GET_CODE (OP) == AND \ > > > > + || GET_CODE (OP) == ZERO_EXTEND \ > > > > + || GET_CODE (OP) == ASHIFT \ > > > > + || GET_CODE (OP) == ROTATE \ > > > > + || GET_CODE (OP) == LSHIFTRT)\ > > > > + ? ZERO_EXTEND \ > > > > + : (GET_CODE (OP) == SIGN_EXTEND \ > > > > + || GET_CODE (OP) == ASHIFTRT)\ > > > > + ? SIGN_EXTEND \ > > > > + : UNKNOWN) > > > > > > I think this is too simplistic though. For example, AND with -7 is not > > > zero-extended (rlwinm rD,rA,0,31,28 sets the high 32 bits of rD to the low > > > 32 bits of rA). > > > > We take some pains in rs6000.md to ensure that the wrap-around case > > for rlwinm does not occur for TARGET_POWERPC64. > > I consider that a bug; it pessimises code. At the time I added the checks for wrap-around, I recall that gcc generated wrong code without the fix. > > You'll find that an > > SImode AND with any value is in fact zero extending. > > int f(int x) { return x & 0xc000; } > > is a counter-example with current trunk (it does a rldicr). Huh, that does look like you've destroyed my claim about SImode AND. -- Alan Modra Australia Development Lab, IBM
Re: Better info for combine results in worse code generated
On Sat, May 30, 2015 at 08:02:20AM -0500, Segher Boessenkool wrote: > On Sat, May 30, 2015 at 10:47:27AM +0930, Alan Modra wrote: > > Huh, that does look like you've destroyed my claim about SImode AND. > > Carefully worded :-) Yes, I wrote it meaning as in refuted an argument, but it also fits the culprit who broke the AND patterns. :-) Unifying andsi_mask with anddi_mask, and the fact that constraints for const_int see VOIDmode rather than the operand mode is why we get rldicr rather than rlwinm. Easily fixed by separating the si/di patterns, and with a little more work I may even be able to keep them together. There are some other problems too. In and3 expander I think you want the following since and64_2_operand covers the extra double-rotate cases, not all DImode. - if ((mode == DImode && !and64_2_operand (operands[2], mode)) - || (mode != DImode && !and_operand (operands[2], mode))) + if (!and_operand (operands[2], mode) + && (mode != DImode || !and64_2_operand (operands[2], mode))) In and3_imm_mask_dot and and3_imm_mask_dot2. Typo? - && any_mask_operand (operands[2], mode)" + && !any_mask_operand (operands[2], mode)" And that calls into question the !logical_const_operand in the insn predicates for and3_mask_dot and and3_mask_dot2. Certain masks satisfy both any_mask_operand and logical_const_operand.. After fixing the typo, neither the andi./andis. patterns nor the rlwinm./rldic[rl]. patterns will be enabled for those masks. Seems to me we should omit !logical_const_operand from those insn predicates. > I don't think it is a good idea to optimise code based on assumptions > of what SImode SETs will do to the dest seen as DImode, without making > those assumptions explicit in the RTL. I agree. Do you intend to get rid of WORD_REGISTER_OPERATIONS, POINTERS_EXTEND_UNSIGNED, PUSH_ROUNDING, SHORT_IMMEDIATES_SIGN_EXTEND, and LOAD_EXTEND_OP? ;-) -- Alan Modra Australia Development Lab, IBM
Re: Better info for combine results in worse code generated
On Mon, Jun 01, 2015 at 08:39:05AM -0500, Segher Boessenkool wrote: > On Mon, Jun 01, 2015 at 11:33:18AM +0930, Alan Modra wrote: > > Unifying andsi_mask with anddi_mask, and the fact that constraints for > > const_int see VOIDmode rather than the operand mode is why we get > > rldicr rather than rlwinm. Easily fixed by separating the si/di > > patterns, and with a little more work I may even be able to keep them > > together. > > Maybe just swapping T to be before S will do what you want, already? Nope. Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 223878) +++ gcc/config/rs6000/predicates.md (working copy) @@ -764,7 +764,11 @@ if (TARGET_POWERPC64) { - /* Fail if the mask is not 32-bit. */ + /* Fail if the mask is not 32-bit. Note: If constraints are +implemented using mask_operand then they will never fail this +test. const_ints are VOIDmode, which is what is seen here +when called from a constraint. When called as a predicate, +the match_operand mode is seen. */ if (mode == DImode && (c & ~(unsigned HOST_WIDE_INT) 0x) != 0) return 0; The above, part of a patch I was writing to fix these problems, is why putting T before S doesn't work. eg. "T" matches 0x8000, which is good for SImode where you're really masking with 0x8000, but using rlwinm for the same constant in DImode would of course mask off the top 32 bits. > > In and3 expander I think you want the following since > > and64_2_operand covers the extra double-rotate cases, not all DImode. > > > > - if ((mode == DImode && !and64_2_operand (operands[2], mode)) > > - || (mode != DImode && !and_operand (operands[2], mode))) > > + if (!and_operand (operands[2], mode) > > + && (mode != DImode || !and64_2_operand (operands[2], > > mode))) > > and64_2_operand includes all of and_operand. I agree it is a mess. but and64_2_operand doesn't include all of and_operand! > > In and3_imm_mask_dot and and3_imm_mask_dot2. Typo? > > - && any_mask_operand (operands[2], mode)" > > + && !any_mask_operand (operands[2], mode)" > > Thinko; that whole line should just be removed. We prefer e.g. "rlwinm" > over "andi.", but "andi." over "rlwinm.". I'll do a patch. OK, I have a patch too.. > > get rid of WORD_REGISTER_OPERATIONS, > > rs6000 should not define it. What e.g. does it mean for mullw? Or, > worse, mulhw? Pretty much anything with "w" in its name is problematic. In many places WORD_REGISTER_OPERATIONS is used, it is saying "don't trust the high bits". At the moment we definitely do need it defined! > and gets rid of 2rld completely. That's a good idea. If we want it still, and I think we do, just implement the two rldicl/r in the expander. -- Alan Modra Australia Development Lab, IBM
Re: Better info for combine results in worse code generated
On Tue, Jun 02, 2015 at 11:28:09AM -0500, Segher Boessenkool wrote: > On Tue, Jun 02, 2015 at 08:49:37AM +0930, Alan Modra wrote: > > but and64_2_operand doesn't include all of and_operand! > > Maybe I'm slow today, but I don't see it? Do you have an example? I need to get new glasses. That's the best excuse I can come up with at short notice. :) mask64_2_operand, used by and64_2_operand, does indeed cover all of mask_operand and mask64_operand. Even so, the predicate deserves to die. > > > > get rid of WORD_REGISTER_OPERATIONS, > > > > > > rs6000 should not define it. What e.g. does it mean for mullw? Or, > > > worse, mulhw? Pretty much anything with "w" in its name is problematic. > > > > In many places WORD_REGISTER_OPERATIONS is used, it is saying "don't > > trust the high bits". At the moment we definitely do need it defined! > > I don't see that either; do you have a pointer for me? The first occurrence in combine.c looks like such a place to me. Also the first one in rtlanal.c:nonzero_bits1. -- Alan Modra Australia Development Lab, IBM
Re: RS6000 emitting sign extention for unsigned type
On Tue, Jan 15, 2019 at 04:48:27PM +0530, kamlesh kumar wrote: > Hi all, > > Analysed it further and find out that > function ' rs6000_promote_function_mode ' (rs6000.c) needs modifcation. > """ > static machine_mode > rs6000_promote_function_mode (const_tree type ATTRIBUTE_UNUSED, > machine_mode mode, > int *punsignedp ATTRIBUTE_UNUSED, > const_tree, int) > { > PROMOTE_MODE (mode, *punsignedp, type); > return mode; > } > """ > Here, This function is promoting the mode but > it is not even touching 'punsignedp' and it is always initialized to zero > by default. > So in all cases 'punsignedp' remain zero even if it is for unsigned type. > which cause the sign extension to happen even for unsigned type. > > is there any way to set 'punsignedp' appropriately here. No. The call to promote_function_mode in emit_library_call_value_1 does not pass type info (because it isn't available for libcalls). -- Alan Modra Australia Development Lab, IBM
Re: Question regarding constraint usage within inline asm
On Mon, Feb 18, 2019 at 01:13:31PM -0600, Peter Bergner wrote: > I have a question about constraint usage in inline asm when we have > an early clobber output operand. The test case is from PR89313 and > looks like the code below (I'm using "r3" for the reg on ppc, but > you could also use "rax" on x86_64, etc.). > > long input; > long > bug (void) > { > register long output asm ("r3"); > asm ("blah %0, %1, %2" : "=&r" (output) : "r" (input), "0" (input)); > return output; > } > > I know an input operand can have a matching constraint associated with > an early clobber operand, as there seems to be code that explicitly > mentions this scenario. In this case, the user has to manually ensure > that the input operand is not clobbered by the early clobber operand. > In the case that the input operand uses an "r" constraint, we just > ensure that the early clobber operand and the input operand are assigned > different registers. My question is, what about the case above where > we have the same variable being used for two different inputs with > constraints that seem to be incompatible? Without the asm("r3") gcc will provide your "blah" instruction with one register for %0 and %2, and another register for %1. Both registers will be initialised with the value of "input". > Clearly, we cannot assign > a register to the "input" variable that is both the same and different > to the register that is assigned to "output". No, you certainly can do that. I think you have found a bug in lra. -- Alan Modra Australia Development Lab, IBM
Re: Question regarding constraint usage within inline asm
On Wed, Feb 20, 2019 at 10:08:07AM -0600, Peter Bergner wrote: > On 2/19/19 9:09 PM, Alan Modra wrote: > > On Mon, Feb 18, 2019 at 01:13:31PM -0600, Peter Bergner wrote: > >> long input; > >> long > >> bug (void) > >> { > >> register long output asm ("r3"); > >> asm ("blah %0, %1, %2" : "=&r" (output) : "r" (input), "0" (input)); > >> return output; > >> } > >> > >> I know an input operand can have a matching constraint associated with > >> an early clobber operand, as there seems to be code that explicitly > >> mentions this scenario. In this case, the user has to manually ensure > >> that the input operand is not clobbered by the early clobber operand. > >> In the case that the input operand uses an "r" constraint, we just > >> ensure that the early clobber operand and the input operand are assigned > >> different registers. My question is, what about the case above where > >> we have the same variable being used for two different inputs with > >> constraints that seem to be incompatible? > > > > Without the asm("r3") gcc will provide your "blah" instruction with > > one register for %0 and %2, and another register for %1. Both > > registers will be initialised with the value of "input". > > That's not what I'm seeing. I see one pseudo (123) used for the output > operand and one pseudo (121) used for both input operands. Like so: I meant by the time you get to assembly. blah 3, 9, 3 > That said, talking with Segher and Uli offline, they both think the > inline asm usage in the test case should be legal Good, it seems we are in agreement. Incidentally, the single pseudo for the inputs happens even for testcases like long input; long bug (void) { register long output /* asm ("r3") */; asm ("blah %0, %1, %2" : "=r" (output) : "wi" (input), "0" (input)); return output; } -- Alan Modra Australia Development Lab, IBM
Re: Question regarding constraint usage within inline asm
I forgot to say, gcc-6, gcc-7 and gcc-8 handle your original testcase with the register asm just fine. -- Alan Modra Australia Development Lab, IBM
Re: Question regarding constraint usage within inline asm
On Wed, Feb 20, 2019 at 08:57:52PM -0600, Peter Bergner wrote: > On 2/20/19 4:19 PM, Alan Modra wrote: > > I forgot to say, gcc-6, gcc-7 and gcc-8 handle your original testcase > > with the register asm just fine. > > Yes, because they don't have my IRA and LRA patches that exposed this > problem. I would say they were buggy for not complaining and silently > spilling a hard register in the case where we used asm reg("..."). I don't follow your reasoning. It seems to me that giving some variable a register asm doesn't mean that the value of that variable can't appear in some other register. An obvious example is when passing that variable to a function. So why shouldn't a hard reg be reloaded in order to satisfy incompatible constraints? -- Alan Modra Australia Development Lab, IBM
Re: [PowerPC 64]r12 is not updated to GEP when control transferred from virtual thunk function .
On Thu, May 16, 2019 at 05:52:42PM -0500, Segher Boessenkool wrote: > Hi Umesh, > > On Thu, May 16, 2019 at 06:12:48PM +0530, Umesh Kalappa wrote: > > We are very new to Power abi and we are thinking to handle this case > > in loader like go through the relocations like R_PPC64_REL24 and > > found symbol has the localentry ,then compute the delta (GEP - LEP ) > > and patch the caller address like (sym.value - delta). > > I wonder if you have found a bug in the compiler after all. Most things > are supposed to work without the linker/loader having to do special > things; e.g. using the global entry point should always work, using the > local entry point is just an optimisation. That isn't true for direct calls. If using the global entry point, the linker must provide stub code to load up r12 with the global entry address and modify the nop after the bl. The linker must also adjust calls using the local entry point; the call instruction (and relocation) specify the function symbol not the function plus local entry offset. So I don't think there is any compiler bug here, just a broken kernel module loader. Incidentally, if thunks are broken then it's very likely local function calls are broken too. -- Alan Modra Australia Development Lab, IBM
Re: [PowerPC 64]r12 is not updated to GEP when control transferred from virtual thunk function .
On Mon, May 20, 2019 at 02:55:33AM -0500, Segher Boessenkool wrote: > On Mon, May 20, 2019 at 04:19:54PM +0930, Alan Modra wrote: > > On Thu, May 16, 2019 at 05:52:42PM -0500, Segher Boessenkool wrote: > > > Hi Umesh, > > > > > > On Thu, May 16, 2019 at 06:12:48PM +0530, Umesh Kalappa wrote: > > > > We are very new to Power abi and we are thinking to handle this case > > > > in loader like go through the relocations like R_PPC64_REL24 and > > > > found symbol has the localentry ,then compute the delta (GEP - LEP ) > > > > and patch the caller address like (sym.value - delta). > > > > > > I wonder if you have found a bug in the compiler after all. Most things > > > are supposed to work without the linker/loader having to do special > > > things; e.g. using the global entry point should always work, using the > > > local entry point is just an optimisation. > > > > That isn't true for direct calls. If using the global entry point, > > the linker must provide stub code to load up r12 with the global entry > > address and modify the nop after the bl. The linker must also adjust > > calls using the local entry point; the call instruction (and > > relocation) specify the function symbol not the function plus local > > entry offset. > > > > So I don't think there is any compiler bug here, just a broken kernel > > module loader. Incidentally, if thunks are broken then it's very > > likely local function calls are broken too. > > The ABI says > > "When a linker causes control to transfer to a global entry point, it > must insert a glue code sequence that loads r12 with the global > entry-point address. Code at the global entry point can assume that > register r12 points to the GEP." > > But in the testcase the jump *already* was to the global entry point: So? We never add the local entry offset to the call assembly. Compile this to assembly (without -fPIC) and note "bl print" in main. #include void __attribute__ ((noclone, noinline)) print (const char *str) { puts (str); } int main () { print ("Hello"); return 0; } Now if the thunk code produced a branch to a local label that *wasn't* a function symbol, I'd agree that gcc was wrong. -- Alan Modra Australia Development Lab, IBM
Re: [PowerPC 64]r12 is not updated to GEP when control transferred from virtual thunk function .
On Mon, May 20, 2019 at 03:39:50AM -0500, Segher Boessenkool wrote: > But it means it needs to make a stub for every global entry point that > is used? Mostly. Calls via function pointer don't (*), nor do you need stubs when generating inline PLT calls. I'll note that use of the global entry point for direct calls is closely associated with needing a PLT entry, and the stubs we're talking about here are similar to the code other architectures put in their .plt section. *) The exception is when a non-PIC executable initialises a function pointer in read-only memory to a function defined outside the executable. This case requires a special stub in the executable to serve as the address of the function. -- Alan Modra Australia Development Lab, IBM
Re: POWER PC-relative addressing and new text relocations
On Mon, Sep 23, 2019 at 09:42:52AM +0200, Florian Weimer wrote: > At Cauldron, the question came up whether the dynamic loader needs to > be taught about the new relocations for PC-relative addressing. > > I think they would only matter if we supported PC-relative addressing > *and* text relocations. Is that really necessary? > > These text relocations would not work reliably anyway because the > maximum displacement is not large enough. For example, with the > current process layout, it's impossible to reach shared objects from > the main program and vice versa. And some systems might want to add > additional randomization, so that shared objects are not mapped closed > together anymore. We've been discussing this inside IBM too. The conclusion is that only one of the new relocs makes any possible sense as a dynamic reloc, R_PPC64_TPREL34, and that one only if you allow -ftls-model=local-exec when building shared libraries and accept that DF_STATIC_TLS shared libraries that can't be dlopen'd are OK. See https://sourceware.org/ml/binutils/2019-09/msg00164.html, which doesn't allow even R_PPC64_TPREL34. I haven't put this patch on the binutils 2.33 branch. -- Alan Modra Australia Development Lab, IBM
Re: POWER PC-relative addressing and new text relocations
On Mon, Sep 23, 2019 at 10:37:29AM +0200, Florian Weimer wrote: > * Alan Modra: > > > On Mon, Sep 23, 2019 at 09:42:52AM +0200, Florian Weimer wrote: > >> At Cauldron, the question came up whether the dynamic loader needs to > >> be taught about the new relocations for PC-relative addressing. > >> > >> I think they would only matter if we supported PC-relative addressing > >> *and* text relocations. Is that really necessary? > >> > >> These text relocations would not work reliably anyway because the > >> maximum displacement is not large enough. For example, with the > >> current process layout, it's impossible to reach shared objects from > >> the main program and vice versa. And some systems might want to add > >> additional randomization, so that shared objects are not mapped closed > >> together anymore. > > > > We've been discussing this inside IBM too. The conclusion is that > > only one of the new relocs makes any possible sense as a dynamic > > reloc, R_PPC64_TPREL34, and that one only if you allow > > -ftls-model=local-exec when building shared libraries and accept that > > DF_STATIC_TLS shared libraries that can't be dlopen'd are OK. > > Is this still a text relocation? Yes. I should have mentioned that too. > The displacement relative to the > thread pointer is (usually) small, so I can see how this could work > reliable. > > What's the restriction on dlopen? Wouldn't it be the same as regular > initial-exec TLS memory, which also uses static TLS, but without a > text relocation and an additional indirection to load the TLS offset > from a place where a regular relocation has put it? I thought you can't dlopen libraries with static TLS, except when the amount of TLS storage needed fits within a certain limit, but it's a while since I looked at glibc code in this area so things may have changed. -- Alan Modra Australia Development Lab, IBM
Re: POWER PC-relative addressing and new text relocations
On Mon, Sep 23, 2019 at 11:14:12AM +0200, Florian Weimer wrote: > * Alan Modra: > > > On Mon, Sep 23, 2019 at 10:37:29AM +0200, Florian Weimer wrote: > >> * Alan Modra: > >> > >> > On Mon, Sep 23, 2019 at 09:42:52AM +0200, Florian Weimer wrote: > >> > We've been discussing this inside IBM too. The conclusion is that > >> > only one of the new relocs makes any possible sense as a dynamic > >> > reloc, R_PPC64_TPREL34, and that one only if you allow > >> > -ftls-model=local-exec when building shared libraries and accept that > >> > DF_STATIC_TLS shared libraries that can't be dlopen'd are OK. > >> > >> Is this still a text relocation? > > > > Yes. I should have mentioned that too. > > Yuck. Is this *really* necessary? The idea was to allow lusers to do the same as they can on other architectures, to minimise the number of bug reports saying "but I can do this on x86". Hmm, I just checked. $ gcc -shared -fPIC -ftls-model=local-exec -o thread.so ~/src/tmp/thread.c /usr/bin/ld: /tmp/ccoXMrxD.o: relocation R_X86_64_TPOFF32 against symbol `p' can not be used when making a shared object; recompile with -fPIC So I'm not fussed if we drop the idea of supporting R_PPC64_TPREL34 as a dynamic reloc. -- Alan Modra Australia Development Lab, IBM
Re: RFC: Extending --with-advance-toolchain to aarch64
On Wed, Oct 09, 2019 at 10:29:48PM +, Steve Ellcey wrote: > I have a question about building a toolchain that uses (at run time) a > dynamic linker and system libraries and headers that are in a non-standard > place. I had scripts a long time ago to build a complete toolchain including glibc that could be installed in a non-standard location and co-exist with other system libraries. I worked around.. > Inconsistency detected by ld.so: get-dynamic-info.h: 147: > elf_get_dynamic_info: > Assertion `info[DT_RPATH] == NULL' failed! ..this by patching glibc. -- Alan Modra Australia Development Lab, IBM
PowerPC -many
Since we've been talking about obsoleting cpu support, how about getting rid of -many in ASM_CPU_SPEC for gcc-8? It's a horrible hack of mine to work around gcc -mcpu option handling bugs which I think have been fixed, and to silence complaints from gas about asm() written for multiple cpus (with presumably run-time selection of which block of code gets executed depending on cpu). It used to be just a linux hack, but I see David uses it in aix61.h and aix71.h too? -- Alan Modra Australia Development Lab, IBM
Re: PowerPC -many
On Tue, Feb 14, 2017 at 06:38:40PM -0600, Segher Boessenkool wrote: > On Wed, Feb 15, 2017 at 10:36:02AM +1030, Alan Modra wrote: > > Since we've been talking about obsoleting cpu support, how about > > getting rid of -many in ASM_CPU_SPEC for gcc-8? > > Sure, but that doesn't need advance warning to the users, does it? Probably not. > Things worked before and stay working, nothing user-visible? Except for bad user asm() that ought to be true. Oh, and gcc bugs like emitting power9 insns when -mcpu=power8. You'd have some chance that the assembler would complain rather than getting sigill at run-time. -- Alan Modra Australia Development Lab, IBM
Re: Optimization breaks inline asm code w/ptrs
On Sun, Aug 13, 2017 at 03:35:15AM -0700, David Wohlferd wrote: > Using "m"(*pStr) as an (unused) input parameter has no effect. Use "m" (*(const void *)pStr) and ignore the warning, or use "m" (*(const struct {char a; char x[];} *) pStr). The issue is one of letting gcc know what memory is accessed by the asm, if you don't want to use a "memory" clobber. And there are very good reasons to avoid clobbering all memory. "m"(*pStr) ought to work IMO, but apparently just tells gcc you are only interested in the first character. Of course that is exactly what *pStr is, but in this context it would be nicer if it meant the entire array. -- Alan Modra Australia Development Lab, IBM
Re: Optimization breaks inline asm code w/ptrs
On Sun, Aug 13, 2017 at 10:25:14PM +0930, Alan Modra wrote: > On Sun, Aug 13, 2017 at 03:35:15AM -0700, David Wohlferd wrote: > > Using "m"(*pStr) as an (unused) input parameter has no effect. > > Use "m" (*(const void *)pStr) and ignore the warning, or use > "m" (*(const struct {char a; char x[];} *) pStr). or even better "m" (*(const char (*)[]) pStr). > The issue is one of letting gcc know what memory is accessed by the > asm, if you don't want to use a "memory" clobber. And there are very > good reasons to avoid clobbering all memory. > > "m"(*pStr) ought to work IMO, but apparently just tells gcc you are > only interested in the first character. Of course that is exactly > what *pStr is, but in this context it would be nicer if it meant the > entire array. I take that back. The relatively simple cast to differentiate a pointer to a char from a pointer to an indeterminate length char array makes it quite unnecessary for "m"(*pStr) to be treated as as array reference. I've opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81845 to track the lack of documentation. -- Alan Modra Australia Development Lab, IBM
Re: Optimization breaks inline asm code w/ptrs
On Tue, Aug 15, 2017 at 03:09:15PM +0800, Liu Hao wrote: > On 2017/8/14 20:41, Alan Modra wrote: > >On Sun, Aug 13, 2017 at 10:25:14PM +0930, Alan Modra wrote: > >>On Sun, Aug 13, 2017 at 03:35:15AM -0700, David Wohlferd wrote: > >>>Using "m"(*pStr) as an (unused) input parameter has no effect. > >> > >>Use "m" (*(const void *)pStr) and ignore the warning, or use > >>"m" (*(const struct {char a; char x[];} *) pStr). > > > >or even better "m" (*(const char (*)[]) pStr). > > > > This should work in the sense that GCC now thinks bytes adjacent to `pStr` > are subject to modification by the asm statement. > > But I just tried GCC 7.2 and it seems that even if such a "+m" constraint is > the only output parameter of an asm statement and there is no `volatile` or > the "memory" clobber, GCC optimizer will not optimize the asm statement > away, which is the case if a plain `"+m"(*pStr)` is used. I wasn't advocating a "+m" constraint in this case. Obviously it's wrong to say scasb modifies memory. That aside though, I'm mainly interested in gcc-8 and see "+m"(*p) preventing dead code removal, even when all outputs of the asm are unused (including of course the array pointed at by p). Probably a bug. -- Alan Modra Australia Development Lab, IBM
Re: Optimization breaks inline asm code w/ptrs
On Thu, Aug 17, 2017 at 04:27:12PM +0200, Michael Matz wrote: > Hi, > > On Mon, 14 Aug 2017, Alan Modra wrote: > > > I've opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81845 to track > > the lack of documentation. > > You mean like in this paragraph discussing memory clobbers and uses in > extended asms that we have since 2004? : The paragraph you show below (from gcc-4 sources) disappeared with git commit 3aabc45f2. We currently have this: -- Flushing registers to memory has performance implications and may be an issue for time-sensitive code. You can use a trick to avoid this if the size of the memory being accessed is known at compile time. For example, if accessing ten bytes of a string, use a memory input like: @code{@{"m"( (@{ struct @{ char x[10]; @} *p = (void *)ptr ; *p; @}) )@}}. -- So, no example even of the simplest "m" (*y) type memory input. This lack was part of the reason I submitted https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01562.html which died in the review process, mostly due to the example being rather large, and partly I fear, due to not being x86. I didn't push the patch for a number of reasons. Then later realized that the constraints I was using for arrays, while they work for OpenBLAS, were not strict enough. "m" (*y) for an array y only makes the asm depend on y[0]. I have a couple of documentation patches prepared, and have been poking around in the source to verify that what I'm proposing for indeterminate length arrays, "m" (*(const T (*)[]) ptr) and "=m" (*(T (*)[]) ptr) is reasonable. One obvious problem is that the cast expression isn't a proper lvalue, but I'm encouraged to find comments in the source complaining that such things need to be tolerated in asm. :) > > > If your assembler instructions access memory in an unpredictable > fashion, add `memory' to the list of clobbered registers. This will > cause GCC to not keep memory values cached in registers across the > assembler instruction and not optimize stores or loads to that memory. > You will also want to add the `volatile' keyword if the memory affected > is not listed in the inputs or outputs of the `asm', as the `memory' > clobber does not count as a side-effect of the `asm'. If you know how > large the accessed memory is, you can add it as input or output but if > this is not known, you should add `memory'. As an example, if you > access ten bytes of a string, you can use a memory input like: > > {"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )}. > > Note that in the following example the memory input is necessary, > otherwise GCC might optimize the store to `x' away: > int foo () > { >int x = 42; >int *y = &x; > int result; >asm ("magic stuff accessing an 'int' pointed to by '%1'" > "=&d" (r) : "a" (y), "m" (*y)); >return result; > } > > > > Ciao, > Michael. -- Alan Modra Australia Development Lab, IBM
Re: strlen optimizations based on whether stpcpy is declared?
On Mon, Oct 02, 2017 at 09:11:53AM +0200, Jakub Jelinek wrote: > On Sun, Oct 01, 2017 at 03:52:39PM -0600, Martin Sebor wrote: > > While debugging some of my tests I noticed unexpected differences > > between the results depending on whether or not the stpcpy function > > is declared. It turns out that the differences are caused by > > the handle_builtin_strcpy function in tree-ssa-strlen.c testing > > for stpcpy having been declared: > > > > if (srclen == NULL_TREE) > > switch (bcode) > > { > > case BUILT_IN_STRCPY: > > case BUILT_IN_STRCPY_CHK: > > case BUILT_IN_STRCPY_CHKP: > > case BUILT_IN_STRCPY_CHK_CHKP: > > if (lhs != NULL_TREE || !builtin_decl_implicit_p (BUILT_IN_STPCPY)) > > return; > > > > and taking different paths depending on whether or not the test > > succeeds. > > > > As far as can see, the tests have been there since the pass was > > added, but I don't understand from the comments in the file what > > their purpose is or why optimization decisions involving one set > > of functions (I think strcpy and strcat at a minimum) are based > > on whether another function has been declared or not. > > > > Can you explain what they're for? > > The reason is that stpcpy is not a standard C function, so in non-POSIX > environments one could have stpcpy with completely unrelated prototype > used for something else. In such case we don't want to introduce stpcpy > into a TU that didn't have such a call. So, we use the existence of > a matching prototype as a sign that stpcpy can be synthetized. Why is the test for stpcpy being declared done for the strcpy cases rather than the stpcpy cases? -- Alan Modra Australia Development Lab, IBM
Re: PowerPC shrink-wrap support 0 of 3
On Thu, Sep 22, 2011 at 12:58:51AM +0930, Alan Modra wrote: > I spent a little time today looking at why shrink wrap is failing to > help on PowerPC, and it turns out that the optimization simply doesn't > trigger that often due to prologue clobbered regs. PowerPC uses r0 as > a temp in the prologue to save LR to the stack, and unfortunately r0 > seems to often be live across the candidate edge chosen for > shrink-wrapping, ie. where the prologue will be inserted. I suppose > it's no surprise that r0 is often live; rs6000.h:REG_ALLOC_ORDER makes > r0 the first gpr to be used. > > As a quick hack, I'm going to try a different REG_ALLOC_ORDER but I > suspect the real fix will require register renaming in the prologue. Hi Bernd, Rearranging the rs6000 register allocation order did in fact help a lot as far as making more opportunities available for shrink-wrap. So did your http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01499.html patch. The two together worked so well that gcc won't bootstrap now.. The problem is that shrink wrapping followed by basic block reordering breaks dwarf unwind info, triggering "internal compiler error: in maybe_record_trace_start at dwarf2cfi.c:2243". From your emails on the list, I gather you've seen this yourself. The bootstrap breakage happens on libmudflap/mf-hooks1.c, compiling __wrap_malloc. Eliding some detail, this function starts off as void *__wrap_malloc (size_t c) { if (__mf_starting_p) return __real_malloc (c); The "if" is bb2, the sibling call bb3, and shrink wrap rather nicely puts the prologue for the rest of the function in bb4. A great example of shrink wrap doing as it should, if you ignore the fact that optimizing for startup isn't so clever. However, bb-reorder inverts the "if" and moves the sibling call past other blocks in the function. That's wrong, because the dwarf unwind info for the prologue is not applicable for the sibling call block: The prologue hasn't been executed for that block. (The unwinder sequentially executes all unwind opcodes from the start of the function to find the unwind state at any instruction address.) Exactly the same sort of problem is generated by your "unconverted_simple_returns" code. What should I do here? bb-reorder could be disabled for these blocks, but that won't help unconverted_simple_returns. I'm willing to spend some time fixing this, but don't want to start if you already have partial or full solutions. Another thing I'd like to work on is stopping ifcvt transformations from killing shrink wrap opportunities. We have one in CPU2006 povray Ray_In_Bound that ought to give 5% (figure from shrink wrap by hand), but currently only gets shrink wrapping there with -fno-if-conversion. -- Alan Modra Australia Development Lab, IBM
Re: Need help resolving PR target/50906
On Mon, Oct 31, 2011 at 10:58:03AM -0500, Moffett, Kyle D wrote: > I have not yet been able to figure out if it's a libgcc issue or an > actual compiler issue. It is a gcc bug. I've added a comment to the PR. -- Alan Modra Australia Development Lab, IBM
Re: Shrink wrapping issues
On Sat, Nov 05, 2011 at 10:50:44AM +0100, Jakub Jelinek wrote: > >From quick look, f1 isn't shrink-wrapped probably because of the set > of bb's that need prologue/epilogue around it doesn't end in a return, > but in a tail call. Can't we just add a prologue before the bar call > and throw the epilogue away (normally the epilogue in a function that > ends only in a tail call is just emitted after the barrier and > optimized away I think, we could do the same?). http://gcc.gnu.org/ml/gcc-patches/2011-11/msg00046.html ought to cure this particular problem. With that patch, similar code on powerpc-linux does result in shrink wrapping. > And f2 is something that IMHO with especially AVX/AVX2 code happens very > often, the prologue is expensive as it realigns the stack. The reason > for that is that until reload we don't know whether something won't be > spilled on the stack and we need/want 32-byte aligned stack slots > for that spilling. Huh? thread_prologue_and_epilogue is after reload. So your backend ought to be able to figure out whether an aligned stack is needed. -- Alan Modra Australia Development Lab, IBM
powerpc compare_and_swap fails
I'm seeing a lot of testsuite failures on powerpc-linux, some of which are locking related. For example: WARNING: Program timed out. FAIL: libgomp.c/atomic-10.c execution test This one fails in f3() here: #pragma omp atomic z4 *= 3; z4 is an unsigned char, so we hit the QImode case in rs6000_expand_atomic_compare_and_swap. operands[3] is modified. The rather horrible piece of code below corresponds with z4 *= 3; At 1c60 you can see operands[3], oldval, being shifted. At 1c90 and 1c94, the newly loaded value from the z4 word is shifted to the low byte position and masked. Then in 1c98 this is compared against oldval. The comparison never succeeds, because r9 has the value 00yy (the shift happens to be 16 for z4) while r8 has 00yy. 1c34: 57 c6 1e f8 rlwinm r6,r30,3,27,28 1c38: 38 a0 00 ff li r5,255 1c3c: 89 39 13 d1 lbz r9,5073(r25) 1c40: 68 c6 00 18 xorir6,r6,24 1c44: 57 de 00 3a rlwinm r30,r30,0,0,29 1c48: 7c a5 30 30 slw r5,r5,r6 1c4c: 48 00 00 08 b 1c54 1c50: 7d 09 43 78 mr r9,r8 1c54: 7c 00 04 ac sync 1c58: 55 2a 08 3c rlwinm r10,r9,1,0,30 1c5c: 7d 4a 4a 14 add r10,r10,r9 1c60: 7d 29 30 30 slw r9,r9,r6 1c64: 55 4a 06 3e clrlwi r10,r10,24 1c68: 7d 4a 30 30 slw r10,r10,r6 1c6c: 7d 00 f0 28 lwarx r8,0,r30 1c70: 7d 07 28 38 and r7,r8,r5 1c74: 7f 87 48 00 cmpwcr7,r7,r9 1c78: 7d 07 28 78 andcr7,r8,r5 1c7c: 7c e7 53 78 or r7,r7,r10 1c80: 40 9e 00 0c bne-cr7,1c8c 1c84: 7c e0 f1 2d stwcx. r7,0,r30 1c88: 40 a2 ff e4 bne-1c6c 1c8c: 4c 00 01 2c isync 1c90: 7d 08 34 30 srw r8,r8,r6 1c94: 55 08 06 3e clrlwi r8,r8,24 1c98: 7f 89 40 00 cmpwcr7,r9,r8 1c9c: 40 9e ff b4 bne+cr7,1c50 I suspect the fix to this problem doesn't belong in rs6000.c, but the following does seem to cure this failure. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 181400) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -17334,10 +17366,13 @@ rs6000_expand_atomic_compare_and_swap (r mask = shift = NULL_RTX; if (mode == QImode || mode == HImode) { + rtx orig = oldval; + mem = rs6000_adjust_atomic_subword (mem, &shift, &mask); /* Shift and mask OLDVAL into position with the word. */ - oldval = convert_modes (SImode, mode, oldval, 1); + oldval = gen_reg_rtx (SImode); + convert_move (oldval, orig, 1); oldval = expand_simple_binop (SImode, ASHIFT, oldval, shift, oldval, 1, OPTAB_LIB_WIDEN); -- Alan Modra Australia Development Lab, IBM
Re: [LTO merge][0/15] Description of the final 15 patches
On Mon, Sep 28, 2009 at 10:46:29PM -0400, DJ Delorie wrote: > > > gets from the linker. Since the linker plugin is a shared > > object, and it uses libiberty functions, it needs to use a > > shared libiberty. > > Why can't they just link a static libiberty? This comment from opcodes/configure.in is relevant # When building a shared libopcodes, link against the pic version of libiberty # so that apps that use libopcodes won't need libiberty just to satisfy any # libopcodes references. # We can't do that if a pic libiberty is unavailable since including non-pic # code would insert text relocations into libopcodes. -- Alan Modra Australia Development Lab, IBM
Re: Gprof can account for less than 1/3 of execution time?!?!
On Sun, Feb 21, 2010 at 12:27:04PM -0600, Jon Turner wrote: > The program in question has been compiled with -pg for all > source code files. Linked statically too? If not, the missing time is probably spent in libc.so or other shared libraries. -- Alan Modra Australia Development Lab, IBM
Re: Should -Wjump-misses-init be in -Wall?
On Mon, Jun 22, 2009 at 09:45:52PM -0400, Robert Dewar wrote: > Joe Buck wrote: >> I think that this should be the standard: a warning belongs in -Wall if >> it tends to expose bugs. If it doesn't, then it's just somebody's idea >> of proper coding style but with no evidence in support of its correctness. >> >> A -Wall warning should expose bugs, and should be easy to silence in >> correct code. > > To understand what you are saying, we need to know what bug means, since > it can have two meanings: > > 1. An actual error, that could show up right now in certain circumstances > > 2. An error resulting in undefined behavior in the standard, but > for the current version of gcc, it cannot actually cause any real > misbehavior, but some future version of gcc might take advantage > of this error status and do something weird. > > For me it is enough if warnings expose case 2 situations, even if > they find few if any case 1 situations. I agree, but I think this warning should be in -Wc++-compat, not -Wall or even -Wextra. Why? I'd argue the warning is useless for C code, unless you care about C++ style. There were five places in binutils that triggered -Wjump-misses-init warnings. Not one of them was a real bug, even using Robert's case 2 definition. I believe the same is true of the three places in gcc where the warning triggered. So far, no one has generated a C testcase having undefined behaviour where -Wjump-misses-init warns but -Wuninitialized (already in -Wall) doesn't, when optimizing. If such a testcase is found, I'm guessing it probably should be filed as a -Wuninitialized bug. In C, an auto variable initialization is just an assignment. (I'm of course aware that arrays can be initialized and their size set, structs and unions initialized, but by and large, in C, an initialization is simply an assignment.) So, why single out the initial assignment? If skipping it deserves a warning then skipping other assignments deserves a warning too, which would be ridiculous. -- Alan Modra Australia Development Lab, IBM
Re: powerpc-eabi-gcc no implicit FPU usage
On Thu, May 20, 2010 at 09:40:47AM -0700, Mark Mitchell wrote: > It is of course a feature much > less valuable on a workstation/server class operating system than on the > VxWorks/RTEMS class of RTOS systems. Even on servers this option may be quite valuable. I recall seing figures that showed using fp regs for something like structure copies could cost thousands of cpu cycles. Why? With lazy fpu save and restore, the first use of the fpu in a given time slice takes an interrupt. So if your task is only using the fpu occasionally it is a severe misoptimization to choose to use fp regs rather than gp regs. -- Alan Modra Australia Development Lab, IBM
PowerPC64, optimization too aggressive?
On Tue, Jun 08, 2010 at 10:27:03PM +0930, Alan Modra wrote: > PowerPC64 gcc support for a larger TOC via -mcmodel option. [snip] I'm having second thoughts about the optimization I added to PowerPC64 gcc with the patch hunk below. Its effect is to use a more efficient TOC/GOT pointer relative address calculation on references known to be local, rather than loading an address out of the TOC/GOT. ie. addis rx,2,s...@toc@ha addi ry,rx,s...@toc@l instead of addis rx,2,s...@got@ha ld ry,s...@got@l(rx) This saves a word in the TOC/GOT and is a little faster too. However, there is a problem: If people build PowerPC64 shared libraries without -fpic/-fPIC then gcc will emit code that requires text relocs to properly support ELF shared library semantics, and I don't intend to change ld and ld.so to do that. It may be better to not do this optimization in gcc at all, especially since we can do the same transformation in ld. This would mean PowerPC64 gcc would lose -mcmodel=medium, retaining -mcmodel=small and -mcmodel=large. If there are no dissenting opinions I'll prepare a gcc patch to do that. > + || (TARGET_CMODEL == CMODEL_MEDIUM > + && GET_CODE (operands[1]) == SYMBOL_REF > + && !CONSTANT_POOL_ADDRESS_P (operands[1]) > + && SYMBOL_REF_LOCAL_P (operands[1]) > + && offsettable_ok_by_alignment (SYMBOL_REF_DECL (operands[1] -- Alan Modra Australia Development Lab, IBM
Re: %pc relative addressing of string literals/const data
On Tue, Oct 05, 2010 at 11:40:11PM +0200, Joakim Tjernlund wrote: > yes, but this could be a new PIC mode that uses a new better > PIC mode for everything. Especially one that doesn't require each function > to calculate the GOT address in the function prologue(why is that so?) The ppc32 ABI is old, much like x86. cf. x86 -O2 -fPIC (without hidden pragma). foo: call__i686.get_pc_thunk.cx addl$_GLOBAL_OFFSET_TABLE_, %ecx pushl %ebp movl%esp, %ebp popl%ebp movly...@got(%ecx), %eax movlx...@got(%ecx), %edx movl(%eax), %eax addl(%edx), %eax ret [snip] __i686.get_pc_thunk.cx: movl(%esp), %ecx ret The new ppc64 -mcmodel=medium support does give you pic access to locals. -fPIC -O2 without hidden .LC0: .tc x[TC],x <-- compiler managed GOT entries .LC1: .tc y[TC],y [snip] .L.foo: addis 11,2,@toc@ha addis 9,2,@toc@ha ld 11,@toc@l(11) ld 9,@toc@l(9) lwz 3,0(11) lwz 0,0(9) add 3,3,0 extsw 3,3 blr -fPIC -O2 with hidden pragma .L.foo: addis 11,2,x...@toc@ha addis 9,2,y...@toc@ha lwz 3,x...@toc@l(11) <-- TOC/GOT pointer relative lwz 0,y...@toc@l(9) add 3,3,0 extsw 3,3 blr x...@toc is equivalent to @GOTOFF on other processors. -- Alan Modra Australia Development Lab, IBM
Re: %pc relative addressing of string literals/const data
On Sun, Oct 10, 2010 at 11:20:06AM +0200, Joakim Tjernlund wrote: > Now I have had a closer look at this and it looks much like -fpic > on ppc32, you still use the GOT/TOC to load the address where the data is. No, with ppc64 -mcmodel=medium you use the GOT/TOC pointer plus an offset to address local data. > I was looking for true %pc relative addressing of data. I guess this is really > hard on PowerPC? Yes, PowerPC lacks pc-relative instructions. > I am not sure this is all it takes to make -fpic to work with -mrelocatable, > any ideas? You might be lucky. With -mrelocatable, .got2 only contains addresses. No other constants. So a simple run-time loader can relocate the entire .got2 section, plus those locations specified in .fixup. You'll have to make sure gcc does the same for .got, and your run-time loader will need to be modified to handle .got (watch out for the .got header!). -- Alan Modra Australia Development Lab, IBM
Re: %pc relative addressing of string literals/const data
On Wed, Oct 27, 2010 at 12:53:00AM +0100, Dave Korn wrote: > On 26/10/2010 23:37, Joakim Tjernlund wrote: > > > Everything went dead quiet the minute I stated to send patches, what did > > I do wrong? > > Nothing, you just ran into the lack-of-manpower problem. Sorry! And I > can't even help, I'm not a ppc maintainer. I also cannot approve gcc patches. -- Alan Modra Australia Development Lab, IBM
Re: RFC: Add zlib source to src CVS resposity
On Mon, Nov 01, 2010 at 05:13:44PM +, Nick Clifton wrote: > * We have to make sure that zlib will build on all of the > hosts that we care about. Should the situation arise > where the zlib does not build on a particular host, and > the zlib maintainers are not interested in making it > build there, then it will be down to us to fix it. Or > else abandon compression support on that host. This would mean we need to keep machinery to conditionally compile in compressed debug support, removal of said support being HJ's stated reason for importing zlib. I'm against importing zlib into binutils, and I think we should keep support of compressed debug sections conditional, to avoid potential bootstrap problems or circular dependencies. -- Alan Modra Australia Development Lab, IBM
Re: PATCH: 2 stage BFD linker for LTO plugin
On Mon, Dec 06, 2010 at 09:57:14AM -0800, H.J. Lu wrote: > Personally, I think 2 stage linking is one way to fix this issue. Ian has stated that he thinks this is a really bad idea. I haven't approved the patch because I value Ian's opinion, and can see why he thinks it is the wrong way to go. On the other hand, BFD is full of bad ideas.. I'm not strongly opposed to your patch myself. HJ, you showed that link times for gcc did not regress too much with your 2 stage lto link patch. It would be more interesting to see the results for a large C++ project, mozilla for example. -- Alan Modra Australia Development Lab, IBM
Re: PATCH: Support mixing .init_array.* and .ctors.* input sections
On Tue, Dec 14, 2010 at 09:55:42AM -0800, H.J. Lu wrote: > bfd/ > > 2010-12-14 H.J. Lu > > * elf.c (_bfd_elf_new_section_hook): Special handling for > .init_array/.fini_array output sections. > > ld/ > > 2010-12-13 H.J. Lu > > * Makefile.am (GENSCRIPTS): Add @enable_initfini_ar...@. > > * NEWS: Mention SORT_BY_INIT_PRIORITY. > > * configure.in: Add AC_CANONICAL_BUILD. > Add --enable-initfini-array. > > * genscripts.sh (ENABLE_INITFINI_ARRAY): New. > > * ld.h (sort_type): Add by_init_priority. > > * ld.texinfo: Document SORT_BY_INIT_PRIORITY. > > * ldgram.y (SORT_BY_INIT_PRIORITY): New. > (wildcard_spec): Handle SORT_BY_INIT_PRIORITY. > > * ldlang.c (get_init_priority): New. > (compare_section): Use get_init_priority for by_init_priority. > > * ldlex.l (SORT_BY_INIT_PRIORITY): New. > > * scripttempl/elf.sc: Support ENABLE_INITFINI_ARRAY. > > * Makefile.in: Regenerated. > * aclocal.m4: Regenerated. > * config.in: Likewise. > * configure: Likewise. > > ld/testsuite/ > > 2010-12-13 H.J. Lu > > * ld-elf/elf.exp (array_tests): Add init-mixed. > (array_tests_static): Likewise. > Also delete tmpdir/init-mixed. > > * ld-elf/init-mixed.c: New. > * ld-elf/init-mixed.out: Likewise. OK. Except > +static long int unsigned long > +get_init_priority (const char *name) > +{ > + char *end; > + long int init_priority; unsigned long > + > + /* GCC uses the following section names for the init_priority > + attribute with numerical values 101 and 65535 inclusive: > + > + 1: .init_array./.fini_array.: Where is the > + decimal numerical value of the init_priority attribute. > + 2: .ctors./.ctors.: Where is 65535 minus the > + decimal numerical value of the init_priority attribute. > + */ I would like to see this comment expanded. Specify what the init_priority values mean, ie. a lower value means a higher priority. Also specify the order of execution in .init_array and .fini_array. >From memory .init_array is forward, .fini_array reverse, and just to make things interesting .ctors/.dtors goes the other way, .ctors reverse and .dtors forward. > + if (strncmp (name, ".init_array.", 12) == 0 > + || strncmp (name, ".fini_array.", 12) == 0) > +{ > + init_priority = strtoul (name + 12, &end, 10); > + return *end ? 0 : init_priority; > +} > + else if (strncmp (name, ".ctors.", 7) == 0 > +|| strncmp (name, ".dtors.", 7) == 0) > +{ > + init_priority = strtoul (name + 7, &end, 10); > + return *end ? 0 : 65535 - init_priority; > +} > + > + return 0; > +} > + > /* Compare sections ASEC and BSEC according to SORT. */ > > static int > compare_section (sort_type sort, asection *asec, asection *bsec) > { >int ret; > + long int ainit_priority, binit_priority; unsigned long > @@ -247,19 +274,16 @@ CTOR=".ctors${CONSTRUCTING-0} : > linker won't look for a file to match a > wildcard. The wildcard also means that it > doesn't matter which directory crtbegin.o > - is in. */ > + is in. > > -KEEP (*crtbegin.o(.ctors)) > -KEEP (*crtbegin?.o(.ctors)) > - > -/* We don't want to include the .ctor section from > + We don't want to include the .ctor section from > the crtend.o file until after the sorted ctors. > The .ctor section from the crtend file contains the > end of ctors marker and it must be last */ > > -KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o $OTHER_EXCLUDE_FILES) .ctors)) > -KEEP (*(SORT(.ctors.*))) > -KEEP (*(.ctors)) > +KEEP (*crtbegin.o(.ctors)) > +KEEP (*crtbegin?.o(.ctors)) > +${CTORS} > ${CONSTRUCTING+${CTOR_END}} >}" > DTOR=".dtors${CONSTRUCTING-0} : > @@ -267,9 +291,7 @@ DTOR=".dtors${CONSTRUCTING-0} : > ${CONSTRUCTING+${DTOR_START}} > KEEP (*crtbegin.o(.dtors)) > KEEP (*crtbegin?.o(.dtors)) > -KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o $OTHER_EXCLUDE_FILES) .dtors)) > -KEEP (*(SORT(.dtors.*))) > -KEEP (*(.dtors)) > +${DTORS} > ${CONSTRUCTING+${DTOR_END}} >}" No need to make any changes to .ctors or .dtors. If .init_array and .fini_array match input .ctors or .dtors sections, then any later match will simply be ignored. -- Alan Modra Australia Development Lab, IBM
Re: R_PPC_REL24 overflow
On Wed, Mar 29, 2006 at 01:53:31PM -0500, James Lemke wrote: > The generated asm makes the reference as: > bl [EMAIL PROTECTED] # 141 *call_value_nonlocal_sysv/1 [length = 4] > > And for this, gas generates: > R_PPC_REL24 __pthread_mutex_lock Nope, you're looking at the wrong asm. unlock vs. lock (and underscore difference too). As rth said, you need R_PPC_PLTREL24. My guess is that you have some buggy asm somewhere lacking @plt on the call. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: Qemu and GCC-3.4 on powerpc
On Tue, Mar 28, 2006 at 12:00:47PM +0200, Gabriel Paubert wrote: > On Tue, Mar 28, 2006 at 12:56:13AM +0200, Dieter Schuster wrote: > > If I try to compile qemu with GCC 3.4 without the patch I get the following > > error: > > > > qemu-0.8.0/linux-user/elfload.c: In function `load_elf_binary': > > qemu-0.8.0/cpu-all.h:253: error: inconsistent operand constraints in an > > `asm' > > qemu-0.8.0/cpu-all.h:253: error: inconsistent operand constraints in an > > `asm' > > Weird. CC'ed to gcc list despite the fact that the 3.4 branch > is definitely closed. I've not found anything remotely similar > from bugzilla. > > > > > But if I copy the function stl_le_p to a seperate file, the function > > will compile with GCC 3.4. Check preprocessor output. My guess is that you have some unexpected substitution. -- Alan Modra IBM OzLabs - Linux Technology Centre
Re: Legitimacy of replacing divide-by-power-of-2 with right shifts.
On Thu, Apr 20, 2006 at 04:52:14PM +0100, Dave Korn wrote: > Yet it would seem to me at first glance that, since dividing unsigned by an > exact power-of-2 can be optimised to a right shift, and since we can deduce You might like to build yourself a new compiler. :) 2006-04-19 Alan Modra <[EMAIL PROTECTED]> PR rtl-optimization/26026 * fold-const.c (fold_binary): Optimize div and mod where the divisor is a known power of two shifted left a variable amount. -- Alan Modra IBM OzLabs - Linux Technology Centre