linking time eroor with -fast-math -O0

2011-08-26 Thread Vladimir Yakovlev
Hi,

Following test fails in linking if compiled with ffast-math and O0,
but it compiled successfully with ffast-math and O2. Also no problem
if -lm is added.

$ cat t.c
#include 

float foo(float x)
{
   float y = 0;
   while (x > 0.0001) {
 y += x*x*x*x*x*x*x*x*x*x*x*x*x;
 x = x/2;
   }
   return y;
}

int main (int argc, char*argv[])
{
  float y = atoi(argv[1]);
  printf("%f\n", foo(y));
  return 0;
}


$ gcc  -ffast-math -O0   t.c
/tmp/cccA1sUB.o: In function `foo':
t.c:(.text+0x2c): undefined reference to `powf'
collect2: error: ld returned 1 exit status
$ gcc  -ffast-math -O2   t.c
$ ./a.out 5
1220852096.00


FE with -ffast-math replaced x*x*...*x with __builtin_powf. Later with
-O2 this call is replaced back into multiplications in sincos phase.
The stability with -O0 is because sincos phase doesn't work on -O0.

I think we must avoid doing this optimization in FE and turn off
-ffast-math if -O0 is used. Your opinion.

Thanks,
Vladimir


Re: linking time eroor with -fast-math -O0

2011-08-26 Thread Richard Guenther
On Fri, Aug 26, 2011 at 2:52 PM, Vladimir Yakovlev  wrote:
> Hi,
>
> Following test fails in linking if compiled with ffast-math and O0,
> but it compiled successfully with ffast-math and O2. Also no problem
> if -lm is added.
>
> $ cat t.c
> #include 
>
> float foo(float x)
> {
>   float y = 0;
>   while (x > 0.0001) {
>     y += x*x*x*x*x*x*x*x*x*x*x*x*x;
>     x = x/2;
>   }
>   return y;
> }
>
> int main (int argc, char    *argv[])
> {
>  float y = atoi(argv[1]);
>  printf("%f\n", foo(y));
>  return 0;
> }
>
>
> $ gcc  -ffast-math -O0   t.c
> /tmp/cccA1sUB.o: In function `foo':
> t.c:(.text+0x2c): undefined reference to `powf'
> collect2: error: ld returned 1 exit status
> $ gcc  -ffast-math -O2   t.c
> $ ./a.out 5
> 1220852096.00
>
>
> FE with -ffast-math replaced x*x*...*x with __builtin_powf. Later with
> -O2 this call is replaced back into multiplications in sincos phase.
> The stability with -O0 is because sincos phase doesn't work on -O0.
>
> I think we must avoid doing this optimization in FE and turn off
> -ffast-math if -O0 is used. Your opinion.

No, I think we should avoid most of the builtin related folding at -O0.

Can you open a bugreport on gcc.gnu.org/bugzilla please?

Thanks,
Richard.

> Thanks,
> Vladimir
>


What goes into function_rodata_section

2011-08-26 Thread Georg-Johann Lay
Implementing TARGET_ASM_FUNCTION_RODATA_SECTION hook I wonder that will go into 
these sections an I
am a but unsure as the internals don't say a word about that.  From the only 
two uses of that hook
in final.c I would conclude that it's only used for jump tables generated by 
switch/case statements.

The default returns readonly_data_section which is not correct for AVR; the avr 
BE did override this
to switch to appropriate section in ASM_OUTPUT_ADDR_VEC_ELT and 
ASM_OUTPUT_BEFORE_CASE_LABEL.
ASM_OUTPUT_BEFORE_CASE_LABEL alone won't do because the switch to 
function_rodata_section takes
place after that hook.

So I'd like to reassure me that the function_rodata_hook is only used for jump 
tables and not for
other stuff so that I can clean up the avr BE at that point. From what I found 
it's not used for
vtables, but for all the linkonce stuff I couldn't find comprehensive 
explanantion.

Thanks for any hints.

Johann


Re: What goes into function_rodata_section

2011-08-26 Thread Ian Lance Taylor
Georg-Johann Lay  writes:

> Implementing TARGET_ASM_FUNCTION_RODATA_SECTION hook I wonder that will go 
> into these sections an I
> am a but unsure as the internals don't say a word about that.  From the only 
> two uses of that hook
> in final.c I would conclude that it's only used for jump tables generated by 
> switch/case statements.
>
> The default returns readonly_data_section which is not correct for AVR; the 
> avr BE did override this
> to switch to appropriate section in ASM_OUTPUT_ADDR_VEC_ELT and 
> ASM_OUTPUT_BEFORE_CASE_LABEL.
> ASM_OUTPUT_BEFORE_CASE_LABEL alone won't do because the switch to 
> function_rodata_section takes
> place after that hook.
>
> So I'd like to reassure me that the function_rodata_hook is only used for 
> jump tables and not for
> other stuff so that I can clean up the avr BE at that point. From what I 
> found it's not used for
> vtables, but for all the linkonce stuff I couldn't find comprehensive 
> explanantion.

TARGET_ASM_FUNCTION_RODATA_SECTION is in principle used for any
read-only data associated only with a specific function.  For most
targets this is chosen so that the rodata is grouped with the function,
so that if the function is discarded the associated rodata is also
discarded.  That grouping is what the linkonce stuff is about.

In practice you are correct that the only read-only data associated with
a specific function is the jump table (for targets for which
JUMP_TABLES_IN_TEXT_SECTION is false).  In principle the section could
be used for targets with function-specific constant pools, but targets
which use those tend to put the constant pool in the text section
anyhow.

TARGET_ASM_FUNCTION_RODATA_SECTION is not used for vtables.  That would
not make sense, as vtables are not function-specific.

Ian


Re: Just what are rtx costs?

2011-08-26 Thread Georg-Johann Lay
Peter Bigot wrote:
> On Sun, Aug 21, 2011 at 12:01 PM, Georg-Johann Lay  wrote:
>> Richard Sandiford schrieb:
>>> Georg-Johann Lay  writes:
>>> 
 Richard Sandiford schrieb:
 
> I've been working on some patches to make insn_rtx_cost take account
>  of the cost of SET_DESTs as well as SET_SRCs.  But I'm slowly
> beginning to realise that I don't understand what rtx costs are
> supposed to represent.
> 
> AIUI the rules have historically been:
> 
> 1) Registers have zero cost.
> 
> 2) Constants have a cost relative to that of registers.  By
> extension, constants have zero cost if they are as cheap as a
> register.
> 
> 3) With an outer code of SET, actual operations have the cost of the
> associated instruction.  E.g. the cost of a PLUS is the cost of an
> addition instruction.
> 
> 4) With other outer codes, actual operations have the cost of the
> combined instruction, if available, or the cost of a separate
> instruction otherwise.  E.g. the cost of a NEG inside an AND might
> be zero on targets that support BIC-like instructions, and
> COSTS_N_INSNS (1) on most others.
> 
> [...]
> 
> But that hardly seems clean either.  Perhaps we should instead make 
> the SET_SRC always include the cost of the SET, even for registers, 
> constants and the like.  Thoughts?
 IMO a clean approach would be to query the costs of a whole insn
 (resp. it's pattern) rather than the cost of an RTX.  COSTS_N_INSNS
 already indicates that the costs are compared to *insn* costs i.e.
 cost of the whole pattern (modulo clobbers).
>>> The problem is that we sometimes want the cost of something that cannot 
>>> be done using a single instruction.  E.g. some CONST_INTs take several 
>>> instructions to create on MIPS.  In this case the costs are really 
>>> measuring the cost of an emit_move_insn sequence, not a single insn.
>>> 
>>> I suppose we could use emit_move_insn to create a temporary sequence and
>>> sum the cost of each individual instruction.  But that's potentially 
>>> expensive.
>> No, that complexity is not needed.  For (set (reg) (const_int)) the BE can
>> just return the cost of the expanded sequence because it knows how it will
>> be expanded and how much it will cost.  There's no need to really expand
>> the sequence.
>> 
>> That's the way, e.g. AVR backend works: Shifts/mul/div must be expanded
>> because the hardware does not support them natively.  The rtx_cost for
>> such an expression (which are always interpreted as RHS of a (set (reg)
>> ...)) are the sum over the costs of all insns the expander will produce.
> 
> One of my problems with this approach is that the logic that's put into an
> expander definition preparation statement (or, in the case of AVR, the
> function invoked by the insn output statement) gets replicated abstractly in
> rtx_costs: both places have long switch statements on operand mode and const
> shift value to determine the instructions that get emitted (in the former)
> or how many of them there are (in the latter).  How likely is it the two are
> kept consistent over the years?

Yes, it's hard and uncomfortable to have the information duplicated in several
places.  But recursing into the RTXes won't solve that problem because the
costs cannot be separated into a sum in general.  In the case where an
expression is lowered down from tree to rtl and the expander has to weight
several approaches against each other (like to shift or to multiply), it's not
even possible to get the costs from insn attribute because there are no such
insn.  Getting costs from attributes -- which avoids keeping information in
several places -- works fine from expand to reload.  But for costs of insns
sequences that are to be expanded I don't know how to avoid that duplication.

> I'm working on the (not yet pushed upstream) back-end for the TI MSP430,
> which has some historical relationship to AVR from about a decade ago, and
> the answer to that question is "not very likely". I've changed the msp430
> back-end so that instead of putting all that logic in the output statement
> for the insn, it goes into a preparation statement for a standard expander.
> This way the individual insns that result in (say) a constant shift of 8
> bits using xor and bswap are available for the optimizer and register
> allocator to improve.

Don't know if that works fine with AVR. For shifts it's a bit similar to MSP
because it can just shift by 1.  However, I observed in many situations that
splitting too early leads to bad code, mainly when there are SUBREGs all over
the place which register allocation does handle optimally, i.e. you will see
moves all over the place.  Some patterns are split late because that depends on
register allocation or rtl peephole (clobber available).

> This works pretty well, but still leaves me with problems when it comes to
> computing RTX costs, because there seems

Re: What goes into function_rodata_section

2011-08-26 Thread Georg-Johann Lay
Ian Lance Taylor wrote:
> Georg-Johann Lay  writes:
> 
>> Implementing TARGET_ASM_FUNCTION_RODATA_SECTION hook I wonder that will go
>> into these sections an I am a but unsure as the internals don't say a word
>> about that.  From the only two uses of that hook in final.c I would
>> conclude that it's only used for jump tables generated by switch/case
>> statements.
>> 
>> The default returns readonly_data_section which is not correct for AVR;
>> the avr BE did override this to switch to appropriate section in
>> ASM_OUTPUT_ADDR_VEC_ELT and ASM_OUTPUT_BEFORE_CASE_LABEL. 
>> ASM_OUTPUT_BEFORE_CASE_LABEL alone won't do because the switch to
>> function_rodata_section takes place after that hook.
>> 
>> So I'd like to reassure me that the function_rodata_hook is only used for
>> jump tables and not for other stuff so that I can clean up the avr BE at
>> that point. From what I found it's not used for vtables, but for all the
>> linkonce stuff I couldn't find comprehensive explanantion.
> 
> TARGET_ASM_FUNCTION_RODATA_SECTION is in principle used for any read-only
> data associated only with a specific function.  For most targets this is

Constant lookup tables from tree-switch-conversion don't go there; there is no
mechanism to put these in specific sections (I had a bit trouble with that when
implementing a named address space for Harvard AVR where .rodata is part or RAM
 and RAM is very scarce.) Actually, putting constant switch tables into
function_rodata_section instead of in readonly_data_section would break avr BE.

> chosen so that the rodata is grouped with the function, so that if the
> function is discarded the associated rodata is also discarded.  That
> grouping is what the linkonce stuff is about.

Is there a description of that concept somewhere? Didn't find it on the web or
in GCC wiki or in gcc sources.

> In practice you are correct that the only read-only data associated with a
> specific function is the jump table (for targets for which 
> JUMP_TABLES_IN_TEXT_SECTION is false).  In principle the section could be
> used for targets with function-specific constant pools, but targets which
> use those tend to put the constant pool in the text section anyhow.

JUMP_TABLES_IN_TEXT_SECTION is false and there are no constant pools for that
target.  Still I'm wondering about the condition

flag_function_sections && flag_data_sections

in varasm.c:default_function_rodata_section where I'd just expected

flag_function_sections

as it's data, but bound to specific function and not user-defined.  So
-fdata-sections might not be desired to take advantage of constant merging; yet
you'd like to get rid of unused switch tables just by means of 
-ffunction-sections.

> TARGET_ASM_FUNCTION_RODATA_SECTION is not used for vtables.  That would not
> make sense, as vtables are not function-specific.
> 
> Ian

Johann



Re: What goes into function_rodata_section

2011-08-26 Thread Ian Lance Taylor
Georg-Johann Lay  writes:

>> chosen so that the rodata is grouped with the function, so that if the
>> function is discarded the associated rodata is also discarded.  That
>> grouping is what the linkonce stuff is about.
>
> Is there a description of that concept somewhere? Didn't find it on the web or
> in GCC wiki or in gcc sources.

For the general concept, look for SHT_GROUP.  I don't know if its use in
gcc is documented anywhere.


>> In practice you are correct that the only read-only data associated with a
>> specific function is the jump table (for targets for which 
>> JUMP_TABLES_IN_TEXT_SECTION is false).  In principle the section could be
>> used for targets with function-specific constant pools, but targets which
>> use those tend to put the constant pool in the text section anyhow.
>
> JUMP_TABLES_IN_TEXT_SECTION is false and there are no constant pools for that
> target.  Still I'm wondering about the condition
>
> flag_function_sections && flag_data_sections
>
> in varasm.c:default_function_rodata_section where I'd just expected
>
> flag_function_sections
>
> as it's data, but bound to specific function and not user-defined.  So
> -fdata-sections might not be desired to take advantage of constant merging; 
> yet
> you'd like to get rid of unused switch tables just by means of 
> -ffunction-sections.

I don't know why default_function_rodata_section is testing
flag_data_sections.  I agree that that looks odd.

Ian


Re: [fedora-arm] ARM summit at Plumbers 2011

2011-08-26 Thread Luke Kenneth Casson Leighton
russell, good to hear from you.

can i recommend, that although this is a really wide set of
cross-posting on a discussion that underpins pretty much everything
(except gnu/hurd and minix) because it's linux kernel, that, just as
steve kindly advised, we keep this to e.g.
cross-dis...@lists.linaro.org?  i'll be doing that from now on [after
this] perhaps including arm-netbooks as well, but will be taking off
all the distros.

so - folks, let's be clear: please move this discussion to
cross-dis...@lists.linaro.org, and, if it's worthwhile discussing in
person, please do contact steve, so he can keep the slot open at the
Plumbers 2011 summit.

On Fri, Aug 26, 2011 at 5:35 PM, Russell King - ARM Linux
 wrote:
> On Fri, Aug 26, 2011 at 11:11:41AM -0500, Bill Gatliff wrote:
>> As such refactoring consolidated larger and larger chunks of kernel
>> code, new designs would gravitate towards those consolidated
>> implementations because they would be the dominant references.
>
> Don't bet on it.  That's not how it works (unfortunately.)
>
> Just look at the many serial port inventions dreamt up by SoC designers -
> everyone is different from each other.  Now consider: why didn't they use
> a well established standard 16550A or later design?

 *sigh* because they wanted to save power.  or pins.  or... just be
bloody-minded.

> This "need to be different" is so heavily embedded in the mindset of the
> hardware people that I doubt "providing consolidated implementations"
> will make the blind bit of difference.

 i think... russell... after they are told, repeatedly, "no, you can't
have that pile of junk in the mainline linux kernel, Get With The
Programme", you'd think that, cumulatively if they end up having to
maintain a 6mb patch full of such shit, they _might_ get with the
programme?

 and if they don't, well who honestly cares?  if they don't, it's
not *your* problem, is it?  _they_ pay their employees to continue to
main a pile of junk, instead of spongeing off of _your_ time (and
linus's, and everyone else's in the Free Software Community).


>  I doubt that hardware people
> coming up with these abominations even care one bit about what's in
> the kernel.

 then don't f**g make it _your_ problem, or anyone else's, upstream!! :)

 this is the core of the proposal that i have been advocating: if it's
"selfish", i.e. as bill and many many others clearly agree with "if
the bang-per-buck ratio is on the low side" then keep it *out* the
mainline linux kernel...

 ... and that really is the end of the matter.

the sensible people that i've been talking to about this are truly
puzzled as to why the principles of "cooperation and collaboration"
behind free software are just being... completely ignored, in
something as vital as The Linux Kernel, and they feel that it's really
blindingly obvious that the "bang-per-buck" ratio of patches to
mainline linux kernel need to go up.

 so the core of the proposal that is the proposed
"selfish-vs-cooperation patch policy" is quite simple: if the patch
has _some_ evidence of collaboration, cooperation, refactoring,
sharing - *anything* that increases the bang-per-buck ratio with
respect to the core fundamental principles of Free Software - it goes
to the next phase [which is technical evaluation etc. etc.].
otherwise, it's absolutely out, regardless of its technical
correctness, and that's the end of it.

 the linux kernel mainline source tree should *not* be a
dumping-ground for a bunch of selfish self-centred pathological
profit-mongering corporations whose employees end up apologising in
sheer embarrassment as they submit time-pressured absolutely shit
non-cooperative and impossible-to-maintain code.

 you're not the only one, russell, who is pissed off at having to tidy
up SoC vendors' patches.  there's another ARM-Linux guy, forget his
name, specialises in samsung: two years ago he said that he was
getting fed up with receiving yet another pile of rushed junk... and
that's *just* him, specialising in samsung ARM SoCs!

we're just stunned that you, the recipient of _multiple_ SoC vendors
piles of shite, have tolerated this for so long!

anyway - i've endeavoured to put together some examples, in case
that's not clear: i admit it's quite hard to create clear examples,
and would greatly appreciate help doing so.  i've had some very much
appreciated help from one of the openwrt developers (thanks!)
clarifying by creating another example that's similar to one which
wasn't clear.

   http://lkcl.net/linux/linux-selfish.vs.cooperation.html

this should be _fun_, guys.  it shouldn't be a chore.  if you're not
enjoying it, and not being paid, tell the people who are clearly
taking the piss to f*** off!

 but - i also would like to underscore this with another idea: "lead
by example" (which is why i've kept the large cross-distro list)  we -
the free software community - are seeing tons of nice lovely android
tablets, tons of nice lovely expensive bits of big iron and/or x86
l

gcc-4.6-20110826 is now available

2011-08-26 Thread gccadmin
Snapshot gcc-4.6-20110826 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.6-20110826/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_6-branch 
revision 178125

You'll find:

 gcc-4.6-20110826.tar.bz2 Complete GCC

  MD5=edd15e1a9597628079a1e5c954a82e86
  SHA1=32b44caae8c20f6f1b8fc334d381cdc0cf9942a9

Diffs from 4.6-20110819 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Inline Expansion Problem

2011-08-26 Thread Matt Davis
Hello,
I am having the compiler insert a call to a function which is defined inside
another object file.  However, during inline expansion via expand_call_inline(),
the following assertion fails in tree-inline.c:
>> 3775: edge = cgraph_edge (id->dst_node, stmt);
>> 3776: gcc_checking_assert (cg_edge);

cg_node comes back as being NULL since there is only one callee and no indirect
calls, the function that has the inserted call is main().  Is there something I
forgot to do after inserting the gimple call statement?  This works fine without
optimization.

-Matt