Re: gmake-4.0 and multiple jobs (-j X) testing

2014-08-25 Thread Uros Bizjak
On Thu, Aug 21, 2014 at 8:18 AM, Uros Bizjak  wrote:

> It looks that gmake-4.0 terminates gcc testrun immediately after one
> of the jobs fails. Does anybody else see this behavior? Do I need to
> update gmake invocation or is "gmake -j 4 -k check" from the toplevel
> build directory still OK?

Please disregard this message. Something was wrong with my dejagnu installation.

Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-25 Thread Vladimir Makarov

On 2014-08-22 8:21 AM, Ilya Enkovich wrote:

Hi,

On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 
32bit PIC mode.  It was decided that the best approach would be to not fix ebx 
register, use speudo register for GOT base address and let allocator do the 
rest.  This should be similar to how clang and icc work with GOT base address.  
I've been working for some time on such patch and now want to share my results.

The idea of the patch was very simple and included few things;
  1.  Set PIC_OFFSET_TABLE_REGNUM to INVALID_REGNUM to specify that we do not 
have any hard reg fixed for PIC.
  2.  Initialize pic_offset_table_rtx with a new pseudo register in the 
begining of a function expand.
  3.  Change ABI so that there is a possible implicit PIC argument for calls; 
pic_offset_table_rtx is used as an arg value if such implicit arg exist.

Such approach worked well on small tests but trying to run some benchmarks we 
faced a problem with reload of address constants.  The problem is that when we 
try to rematerialize address constant or some constant memory reference, we 
have to use pic_offset_table_rtx.  It means we insert new usages of a speudo 
register and alocator cannot handle it correctly.  Same problem also applies 
for float and vector constants.

Rematerialization is not the only case causing new pic_offset_table_rtx usage.  
Another case is a split of some instructions using constant but not having 
proper constraints.  E.g. pushtf pattern allows push of constant but it has to 
be replaced with push of memory in reload pass causing additional usage of 
pic_offset_table_rtx.

There are two ways to fix it.  The first one is to support modifications of 
pseudo register live range during reload and correctly allocate hard regs for 
its new usages (currently we have some hard reg allocated for new usage of 
pseudo reg but it may contain value of some other pseudo reg; thus we reveal 
the problem at runtime only).



I believe there is already code to deal with this situation.  It is code 
for risky transformations (please check flag 
lra_risky_transformation_p).  If this flag is set, next lra assign 
subpass is running and checking correctness of assignments (e.g. 
checking situation when two different pseudos have intersected live 
ranges and the same assigned hard reg.  If such dangerous situation is 
found, it is fixed).



The second way is to avoid all cases when new usages of pic_offset_table_rtx 
appear in reload.  That is a way I chose because it appeared simplier to me and 
would allow me to get some performance data faster.  Also having 
rematerialization of address anf float constants in PIC mode would mean we have 
higher register pressure, thus having them on stack should be even more 
efficient.  To achieve it I had to cut off reg equivs to all exprs using symbol 
references and all constants living in the memory.  I also had to avoid 
instructions requiring split in reload causing load of constant from memory 
(*push[txd]f).

Resulting compiler successfully passes make check, compiles EEMBC and SPEC2000 
benchmarks.  There is no confidence I covered all cases and there still may be 
some templates causing split in reload with new pic_offset_table_rtx usages.  I 
think support of reload with pseudo PIC would be better and more general 
solution.  But I don't know how difficult is to implement it though.  Any ideas 
on resolving this reload issue?



Please see what I mentioned above.  May be it can fix the degradation. 
Rematerialization is important for performance and switching it of 
completely is not wise.




I collected some performance numbers for EEMBC and SPEC2000 benchmarks.  Here 
are patch results for -Ofast optlevel with LTO collectd on Avoton server:
AUTOmark +1,9%
TELECOMmark +4,0%
DENmark +10,0%
SPEC2000 -0,5%

There are few degradations on EEMBC benchmarks but on SPEC2000 situation is 
different and we see more performance losses.  Some of them are caused by 
disabled rematerialization of address constants.  In some cases relaxed ebx 
causes more spills/fills in plaecs where GOT is frequently used.  There are 
also some minor fixes required in the patch to allow more efficient function 
prolog (avoid unnecessary GOT register initialization and allow its 
initialization without ebx usage).  Suppose some performance problems may be 
resolved but a good fix for reload should go first.




Ilya, the optimization you are trying to implement is important in many 
cases and should be in some way included in gcc.  If the degradations 
can be solved in a way i mentioned above we could introduce a 
machine-dependent flag.




Re: Enable EBX for x86 in 32bits PIC code

2014-08-25 Thread Jeff Law

On 08/22/14 06:21, Ilya Enkovich wrote:


Such approach worked well on small tests but trying to run some
benchmarks we faced a problem with reload of address constants.  The
problem is that when we try to rematerialize address constant or some
constant memory reference, we have to use pic_offset_table_rtx.  It
means we insert new usages of a speudo register and alocator cannot
handle it correctly.  Same problem also applies for float and vector
constants.
Isn't this typically handled with secondary reloads?   It's not an exact 
match, but if you look at the PA port, you can see cases where we need 
to have %r1 available when we rematerialize certain constants.  Several 
ports have secondary reloads that you may be able to refer back to.  LRA 
may handle things differently, so first check LRA's paths.






Rematerialization is not the only case causing new
pic_offset_table_rtx usage.  Another case is a split of some
instructions using constant but not having proper constraints.  E.g.
pushtf pattern allows push of constant but it has to be replaced with
push of memory in reload pass causing additional usage of
pic_offset_table_rtx.

Yup.  I think those would be handled the same way.


Jeff


RE: selective linking of floating point support for *printf / *scanf

2014-08-25 Thread Thomas Preud'homme
> From: Joern Rennecke [mailto:joern.renne...@embecosm.com]
> Sent: Thursday, August 14, 2014 4:52 PM
> 
> So my idea is to make the compile emit special calls when there are no
> floating
> point arguments.  A library that provides the floating point enabled
> *printf/*scanf
> precedes libc in link order.
> Libc contains the integer-only implementations of *scanf/*printf, in two
> parts:
> entry points with the special function name, which in the same object file
> also contain a reference to the ordinary function name, and another object
> file
> with the ordinary symbol and the integer-only implementation.
> Thus, if any application translation unit has pulled in a floating-point 
> enabled
> implementation, this is the one that'll be used.  Otherwise, the integer-only
> one will be used.
> Use of special sections and alphasorting of these in the linker script
> ensures that the integer-only entry points appear in the right place at
> the start of the chosen implementation.
> If vfprintf is used

What happens in the case that a program contains both some printf and
__int_printf call?

If the undefined printf is resolved first then everything is fine but if it's 
the
other way around and __int_printf is resolved first the printf implementation
that will be pulled will not have float support and all the call that needs 
float
support will fail. Did I miss something?

> 
> I've implemented this for AVR with these commits:
> https://github.com/embecosm/avr-
> gcc/commit/3b3bfe33fe29b6d29d8fb96e5d57ee025adf7af0
> https://github.com/embecosm/avr-
> libc/commit/c55eba74838635613c8b80d86a85ed605a79d337
> https://github.com/embecosm/avr-binutils-
> gdb/commit/72b3a1ea3659577198838a7149c6882a079da403
> 
> Although it could use some more testing, and thought how to best
> introduce the change as to avoid getting broken toolchains when
> components
> are out-of-sync.

I didn't do extensive yet but it seems you miss the case of variadic functions.
Consider the example in attachment: two calls to __int_printf will be
generated yet my_printf could be called with the format string "%f\n".

As to the patch it seems to me the macro should be per library, not per backend.
Else two backend supporting newlib would need to write the same code twice.

I'll also try to think how to support the new scheme for printf with float in 
newlib.
As I said it relies on printf calling _printf_float that is a weak symbol. 
Previous
scheme could pull 2 implementations of printf and consume more size. The
problem is that compiling newlib with automatic selection would detect some case
where float might be needed (variadic functions) and define _printf_float
accordingly.

Best regards,

Thomas


auto-float-io-failure_1.c
Description: Binary data