linking time eroor with -fast-math -O0
Hi, Following test fails in linking if compiled with ffast-math and O0, but it compiled successfully with ffast-math and O2. Also no problem if -lm is added. $ cat t.c #include float foo(float x) { float y = 0; while (x > 0.0001) { y += x*x*x*x*x*x*x*x*x*x*x*x*x; x = x/2; } return y; } int main (int argc, char*argv[]) { float y = atoi(argv[1]); printf("%f\n", foo(y)); return 0; } $ gcc -ffast-math -O0 t.c /tmp/cccA1sUB.o: In function `foo': t.c:(.text+0x2c): undefined reference to `powf' collect2: error: ld returned 1 exit status $ gcc -ffast-math -O2 t.c $ ./a.out 5 1220852096.00 FE with -ffast-math replaced x*x*...*x with __builtin_powf. Later with -O2 this call is replaced back into multiplications in sincos phase. The stability with -O0 is because sincos phase doesn't work on -O0. I think we must avoid doing this optimization in FE and turn off -ffast-math if -O0 is used. Your opinion. Thanks, Vladimir
Re: linking time eroor with -fast-math -O0
On Fri, Aug 26, 2011 at 2:52 PM, Vladimir Yakovlev wrote: > Hi, > > Following test fails in linking if compiled with ffast-math and O0, > but it compiled successfully with ffast-math and O2. Also no problem > if -lm is added. > > $ cat t.c > #include > > float foo(float x) > { > float y = 0; > while (x > 0.0001) { > y += x*x*x*x*x*x*x*x*x*x*x*x*x; > x = x/2; > } > return y; > } > > int main (int argc, char *argv[]) > { > float y = atoi(argv[1]); > printf("%f\n", foo(y)); > return 0; > } > > > $ gcc -ffast-math -O0 t.c > /tmp/cccA1sUB.o: In function `foo': > t.c:(.text+0x2c): undefined reference to `powf' > collect2: error: ld returned 1 exit status > $ gcc -ffast-math -O2 t.c > $ ./a.out 5 > 1220852096.00 > > > FE with -ffast-math replaced x*x*...*x with __builtin_powf. Later with > -O2 this call is replaced back into multiplications in sincos phase. > The stability with -O0 is because sincos phase doesn't work on -O0. > > I think we must avoid doing this optimization in FE and turn off > -ffast-math if -O0 is used. Your opinion. No, I think we should avoid most of the builtin related folding at -O0. Can you open a bugreport on gcc.gnu.org/bugzilla please? Thanks, Richard. > Thanks, > Vladimir >
What goes into function_rodata_section
Implementing TARGET_ASM_FUNCTION_RODATA_SECTION hook I wonder that will go into these sections an I am a but unsure as the internals don't say a word about that. From the only two uses of that hook in final.c I would conclude that it's only used for jump tables generated by switch/case statements. The default returns readonly_data_section which is not correct for AVR; the avr BE did override this to switch to appropriate section in ASM_OUTPUT_ADDR_VEC_ELT and ASM_OUTPUT_BEFORE_CASE_LABEL. ASM_OUTPUT_BEFORE_CASE_LABEL alone won't do because the switch to function_rodata_section takes place after that hook. So I'd like to reassure me that the function_rodata_hook is only used for jump tables and not for other stuff so that I can clean up the avr BE at that point. From what I found it's not used for vtables, but for all the linkonce stuff I couldn't find comprehensive explanantion. Thanks for any hints. Johann
Re: What goes into function_rodata_section
Georg-Johann Lay writes: > Implementing TARGET_ASM_FUNCTION_RODATA_SECTION hook I wonder that will go > into these sections an I > am a but unsure as the internals don't say a word about that. From the only > two uses of that hook > in final.c I would conclude that it's only used for jump tables generated by > switch/case statements. > > The default returns readonly_data_section which is not correct for AVR; the > avr BE did override this > to switch to appropriate section in ASM_OUTPUT_ADDR_VEC_ELT and > ASM_OUTPUT_BEFORE_CASE_LABEL. > ASM_OUTPUT_BEFORE_CASE_LABEL alone won't do because the switch to > function_rodata_section takes > place after that hook. > > So I'd like to reassure me that the function_rodata_hook is only used for > jump tables and not for > other stuff so that I can clean up the avr BE at that point. From what I > found it's not used for > vtables, but for all the linkonce stuff I couldn't find comprehensive > explanantion. TARGET_ASM_FUNCTION_RODATA_SECTION is in principle used for any read-only data associated only with a specific function. For most targets this is chosen so that the rodata is grouped with the function, so that if the function is discarded the associated rodata is also discarded. That grouping is what the linkonce stuff is about. In practice you are correct that the only read-only data associated with a specific function is the jump table (for targets for which JUMP_TABLES_IN_TEXT_SECTION is false). In principle the section could be used for targets with function-specific constant pools, but targets which use those tend to put the constant pool in the text section anyhow. TARGET_ASM_FUNCTION_RODATA_SECTION is not used for vtables. That would not make sense, as vtables are not function-specific. Ian
Re: Just what are rtx costs?
Peter Bigot wrote: > On Sun, Aug 21, 2011 at 12:01 PM, Georg-Johann Lay wrote: >> Richard Sandiford schrieb: >>> Georg-Johann Lay writes: >>> Richard Sandiford schrieb: > I've been working on some patches to make insn_rtx_cost take account > of the cost of SET_DESTs as well as SET_SRCs. But I'm slowly > beginning to realise that I don't understand what rtx costs are > supposed to represent. > > AIUI the rules have historically been: > > 1) Registers have zero cost. > > 2) Constants have a cost relative to that of registers. By > extension, constants have zero cost if they are as cheap as a > register. > > 3) With an outer code of SET, actual operations have the cost of the > associated instruction. E.g. the cost of a PLUS is the cost of an > addition instruction. > > 4) With other outer codes, actual operations have the cost of the > combined instruction, if available, or the cost of a separate > instruction otherwise. E.g. the cost of a NEG inside an AND might > be zero on targets that support BIC-like instructions, and > COSTS_N_INSNS (1) on most others. > > [...] > > But that hardly seems clean either. Perhaps we should instead make > the SET_SRC always include the cost of the SET, even for registers, > constants and the like. Thoughts? IMO a clean approach would be to query the costs of a whole insn (resp. it's pattern) rather than the cost of an RTX. COSTS_N_INSNS already indicates that the costs are compared to *insn* costs i.e. cost of the whole pattern (modulo clobbers). >>> The problem is that we sometimes want the cost of something that cannot >>> be done using a single instruction. E.g. some CONST_INTs take several >>> instructions to create on MIPS. In this case the costs are really >>> measuring the cost of an emit_move_insn sequence, not a single insn. >>> >>> I suppose we could use emit_move_insn to create a temporary sequence and >>> sum the cost of each individual instruction. But that's potentially >>> expensive. >> No, that complexity is not needed. For (set (reg) (const_int)) the BE can >> just return the cost of the expanded sequence because it knows how it will >> be expanded and how much it will cost. There's no need to really expand >> the sequence. >> >> That's the way, e.g. AVR backend works: Shifts/mul/div must be expanded >> because the hardware does not support them natively. The rtx_cost for >> such an expression (which are always interpreted as RHS of a (set (reg) >> ...)) are the sum over the costs of all insns the expander will produce. > > One of my problems with this approach is that the logic that's put into an > expander definition preparation statement (or, in the case of AVR, the > function invoked by the insn output statement) gets replicated abstractly in > rtx_costs: both places have long switch statements on operand mode and const > shift value to determine the instructions that get emitted (in the former) > or how many of them there are (in the latter). How likely is it the two are > kept consistent over the years? Yes, it's hard and uncomfortable to have the information duplicated in several places. But recursing into the RTXes won't solve that problem because the costs cannot be separated into a sum in general. In the case where an expression is lowered down from tree to rtl and the expander has to weight several approaches against each other (like to shift or to multiply), it's not even possible to get the costs from insn attribute because there are no such insn. Getting costs from attributes -- which avoids keeping information in several places -- works fine from expand to reload. But for costs of insns sequences that are to be expanded I don't know how to avoid that duplication. > I'm working on the (not yet pushed upstream) back-end for the TI MSP430, > which has some historical relationship to AVR from about a decade ago, and > the answer to that question is "not very likely". I've changed the msp430 > back-end so that instead of putting all that logic in the output statement > for the insn, it goes into a preparation statement for a standard expander. > This way the individual insns that result in (say) a constant shift of 8 > bits using xor and bswap are available for the optimizer and register > allocator to improve. Don't know if that works fine with AVR. For shifts it's a bit similar to MSP because it can just shift by 1. However, I observed in many situations that splitting too early leads to bad code, mainly when there are SUBREGs all over the place which register allocation does handle optimally, i.e. you will see moves all over the place. Some patterns are split late because that depends on register allocation or rtl peephole (clobber available). > This works pretty well, but still leaves me with problems when it comes to > computing RTX costs, because there seems
Re: What goes into function_rodata_section
Ian Lance Taylor wrote: > Georg-Johann Lay writes: > >> Implementing TARGET_ASM_FUNCTION_RODATA_SECTION hook I wonder that will go >> into these sections an I am a but unsure as the internals don't say a word >> about that. From the only two uses of that hook in final.c I would >> conclude that it's only used for jump tables generated by switch/case >> statements. >> >> The default returns readonly_data_section which is not correct for AVR; >> the avr BE did override this to switch to appropriate section in >> ASM_OUTPUT_ADDR_VEC_ELT and ASM_OUTPUT_BEFORE_CASE_LABEL. >> ASM_OUTPUT_BEFORE_CASE_LABEL alone won't do because the switch to >> function_rodata_section takes place after that hook. >> >> So I'd like to reassure me that the function_rodata_hook is only used for >> jump tables and not for other stuff so that I can clean up the avr BE at >> that point. From what I found it's not used for vtables, but for all the >> linkonce stuff I couldn't find comprehensive explanantion. > > TARGET_ASM_FUNCTION_RODATA_SECTION is in principle used for any read-only > data associated only with a specific function. For most targets this is Constant lookup tables from tree-switch-conversion don't go there; there is no mechanism to put these in specific sections (I had a bit trouble with that when implementing a named address space for Harvard AVR where .rodata is part or RAM and RAM is very scarce.) Actually, putting constant switch tables into function_rodata_section instead of in readonly_data_section would break avr BE. > chosen so that the rodata is grouped with the function, so that if the > function is discarded the associated rodata is also discarded. That > grouping is what the linkonce stuff is about. Is there a description of that concept somewhere? Didn't find it on the web or in GCC wiki or in gcc sources. > In practice you are correct that the only read-only data associated with a > specific function is the jump table (for targets for which > JUMP_TABLES_IN_TEXT_SECTION is false). In principle the section could be > used for targets with function-specific constant pools, but targets which > use those tend to put the constant pool in the text section anyhow. JUMP_TABLES_IN_TEXT_SECTION is false and there are no constant pools for that target. Still I'm wondering about the condition flag_function_sections && flag_data_sections in varasm.c:default_function_rodata_section where I'd just expected flag_function_sections as it's data, but bound to specific function and not user-defined. So -fdata-sections might not be desired to take advantage of constant merging; yet you'd like to get rid of unused switch tables just by means of -ffunction-sections. > TARGET_ASM_FUNCTION_RODATA_SECTION is not used for vtables. That would not > make sense, as vtables are not function-specific. > > Ian Johann
Re: What goes into function_rodata_section
Georg-Johann Lay writes: >> chosen so that the rodata is grouped with the function, so that if the >> function is discarded the associated rodata is also discarded. That >> grouping is what the linkonce stuff is about. > > Is there a description of that concept somewhere? Didn't find it on the web or > in GCC wiki or in gcc sources. For the general concept, look for SHT_GROUP. I don't know if its use in gcc is documented anywhere. >> In practice you are correct that the only read-only data associated with a >> specific function is the jump table (for targets for which >> JUMP_TABLES_IN_TEXT_SECTION is false). In principle the section could be >> used for targets with function-specific constant pools, but targets which >> use those tend to put the constant pool in the text section anyhow. > > JUMP_TABLES_IN_TEXT_SECTION is false and there are no constant pools for that > target. Still I'm wondering about the condition > > flag_function_sections && flag_data_sections > > in varasm.c:default_function_rodata_section where I'd just expected > > flag_function_sections > > as it's data, but bound to specific function and not user-defined. So > -fdata-sections might not be desired to take advantage of constant merging; > yet > you'd like to get rid of unused switch tables just by means of > -ffunction-sections. I don't know why default_function_rodata_section is testing flag_data_sections. I agree that that looks odd. Ian
Re: [fedora-arm] ARM summit at Plumbers 2011
russell, good to hear from you. can i recommend, that although this is a really wide set of cross-posting on a discussion that underpins pretty much everything (except gnu/hurd and minix) because it's linux kernel, that, just as steve kindly advised, we keep this to e.g. cross-dis...@lists.linaro.org? i'll be doing that from now on [after this] perhaps including arm-netbooks as well, but will be taking off all the distros. so - folks, let's be clear: please move this discussion to cross-dis...@lists.linaro.org, and, if it's worthwhile discussing in person, please do contact steve, so he can keep the slot open at the Plumbers 2011 summit. On Fri, Aug 26, 2011 at 5:35 PM, Russell King - ARM Linux wrote: > On Fri, Aug 26, 2011 at 11:11:41AM -0500, Bill Gatliff wrote: >> As such refactoring consolidated larger and larger chunks of kernel >> code, new designs would gravitate towards those consolidated >> implementations because they would be the dominant references. > > Don't bet on it. That's not how it works (unfortunately.) > > Just look at the many serial port inventions dreamt up by SoC designers - > everyone is different from each other. Now consider: why didn't they use > a well established standard 16550A or later design? *sigh* because they wanted to save power. or pins. or... just be bloody-minded. > This "need to be different" is so heavily embedded in the mindset of the > hardware people that I doubt "providing consolidated implementations" > will make the blind bit of difference. i think... russell... after they are told, repeatedly, "no, you can't have that pile of junk in the mainline linux kernel, Get With The Programme", you'd think that, cumulatively if they end up having to maintain a 6mb patch full of such shit, they _might_ get with the programme? and if they don't, well who honestly cares? if they don't, it's not *your* problem, is it? _they_ pay their employees to continue to main a pile of junk, instead of spongeing off of _your_ time (and linus's, and everyone else's in the Free Software Community). > I doubt that hardware people > coming up with these abominations even care one bit about what's in > the kernel. then don't f**g make it _your_ problem, or anyone else's, upstream!! :) this is the core of the proposal that i have been advocating: if it's "selfish", i.e. as bill and many many others clearly agree with "if the bang-per-buck ratio is on the low side" then keep it *out* the mainline linux kernel... ... and that really is the end of the matter. the sensible people that i've been talking to about this are truly puzzled as to why the principles of "cooperation and collaboration" behind free software are just being... completely ignored, in something as vital as The Linux Kernel, and they feel that it's really blindingly obvious that the "bang-per-buck" ratio of patches to mainline linux kernel need to go up. so the core of the proposal that is the proposed "selfish-vs-cooperation patch policy" is quite simple: if the patch has _some_ evidence of collaboration, cooperation, refactoring, sharing - *anything* that increases the bang-per-buck ratio with respect to the core fundamental principles of Free Software - it goes to the next phase [which is technical evaluation etc. etc.]. otherwise, it's absolutely out, regardless of its technical correctness, and that's the end of it. the linux kernel mainline source tree should *not* be a dumping-ground for a bunch of selfish self-centred pathological profit-mongering corporations whose employees end up apologising in sheer embarrassment as they submit time-pressured absolutely shit non-cooperative and impossible-to-maintain code. you're not the only one, russell, who is pissed off at having to tidy up SoC vendors' patches. there's another ARM-Linux guy, forget his name, specialises in samsung: two years ago he said that he was getting fed up with receiving yet another pile of rushed junk... and that's *just* him, specialising in samsung ARM SoCs! we're just stunned that you, the recipient of _multiple_ SoC vendors piles of shite, have tolerated this for so long! anyway - i've endeavoured to put together some examples, in case that's not clear: i admit it's quite hard to create clear examples, and would greatly appreciate help doing so. i've had some very much appreciated help from one of the openwrt developers (thanks!) clarifying by creating another example that's similar to one which wasn't clear. http://lkcl.net/linux/linux-selfish.vs.cooperation.html this should be _fun_, guys. it shouldn't be a chore. if you're not enjoying it, and not being paid, tell the people who are clearly taking the piss to f*** off! but - i also would like to underscore this with another idea: "lead by example" (which is why i've kept the large cross-distro list) we - the free software community - are seeing tons of nice lovely android tablets, tons of nice lovely expensive bits of big iron and/or x86 l
gcc-4.6-20110826 is now available
Snapshot gcc-4.6-20110826 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.6-20110826/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.6 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_6-branch revision 178125 You'll find: gcc-4.6-20110826.tar.bz2 Complete GCC MD5=edd15e1a9597628079a1e5c954a82e86 SHA1=32b44caae8c20f6f1b8fc334d381cdc0cf9942a9 Diffs from 4.6-20110819 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.6 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Inline Expansion Problem
Hello, I am having the compiler insert a call to a function which is defined inside another object file. However, during inline expansion via expand_call_inline(), the following assertion fails in tree-inline.c: >> 3775: edge = cgraph_edge (id->dst_node, stmt); >> 3776: gcc_checking_assert (cg_edge); cg_node comes back as being NULL since there is only one callee and no indirect calls, the function that has the inserted call is main(). Is there something I forgot to do after inserting the gimple call statement? This works fine without optimization. -Matt