[PATCH GCC/pr56124] Don't prefer memory if the source of load operation has side effect

2013-03-25 Thread Bin Cheng
Hi,
As reported in PR56124, IRA causes redundant reload by preferring to put
pseudo which is target of loading in memory. Generally this is good but the
case in which the src of loading has side effect.
This patch fixes this issue by checking whether source of loading has side
effect.

I tested the patch on x86/thumb2. Is it OK? Thanks.

2013-03-25  Bin Cheng  

PR target/56124
* ira-costs.c (scan_one_insn): Check whether the source rtx of
loading has side effect.Index: gcc/ira-costs.c
===
--- gcc/ira-costs.c (revision 197029)
+++ gcc/ira-costs.c (working copy)
@@ -1293,10 +1293,13 @@ scan_one_insn (rtx insn)
  a memory requiring special instructions to load it, decreasing
  mem_cost might result in it being loaded using the specialized
  instruction into a register, then stored into stack and loaded
- again from the stack.  See PR52208.  */
+ again from the stack.  See PR52208.
+ 
+ Don't do this if SET_SRC (set) has side effect.  See PR56124.  */
   if (set != 0 && REG_P (SET_DEST (set)) && MEM_P (SET_SRC (set))
   && (note = find_reg_note (insn, REG_EQUIV, NULL_RTX)) != NULL_RTX
-  && ((MEM_P (XEXP (note, 0)))
+  && ((MEM_P (XEXP (note, 0))
+  && !side_effects_p (SET_SRC (set)))
  || (CONSTANT_P (XEXP (note, 0))
  && targetm.legitimate_constant_p (GET_MODE (SET_DEST (set)),
XEXP (note, 0))


[PATCH GCC]Relax the probability condition in CE pass when optimizing for code size

2013-03-25 Thread Bin Cheng
Hi,
The CE pass has been adapted to work with the probability of then/else
branches. Now the transformation is done only when it's profitable.
Problem is the change affects both performance and size, causing size
regression in many cases (especially in C library like Newlib). 
So this patch relaxes the probability condition when we are optimizing for
size.

Below is an example from Newlib:

unsigned int strlen (const char *);
void * realloc (void * __r, unsigned int __size) ;
void * memcpy (void *, const void *, unsigned int);
int argz_add(char **argz , unsigned int *argz_len , const char *str)
{
  int len_to_add = 0;
  unsigned int last = *argz_len;

  if (str == ((void *)0))
return 0;

  len_to_add = strlen(str) + 1;
  *argz_len += len_to_add;

  if(!(*argz = (char *)realloc(*argz, *argz_len)))
return 12;

  memcpy(*argz + last, str, len_to_add);
  return 0;
}

The generated assembly for Os/cortex-m0 is like:

argz_add:
push{r0, r1, r2, r4, r5, r6, r7, lr}
mov r6, r0
mov r7, r1
mov r4, r2
ldr r5, [r1]
beq .L3
mov r0, r2
bl  strlen
add r0, r0, #1
add r1, r0, r5
str r0, [sp, #4]
str r1, [r7]
ldr r0, [r6]
bl  realloc
mov r3, #12
str r0, [r6]
cmp r0, #0
beq .L2
add r0, r0, r5
mov r1, r4
ldr r2, [sp, #4]
bl  memcpy
mov r3, #0
b   .L2
.L3:
mov r3, r2
.L2:
mov r0, r3

In which branch/mov instructions around .L3 can be CEed with this patch.

During the work I observed passes before combine might interfere with CE
pass, so this patch is enabled for ce2/ce3 after combination pass.

It is tested on x86/thumb2 for both normal and Os. Is it ok for trunk?


2013-03-25  Bin Cheng  

* ifcvt.c (ifcvt_after_combine): New static variable.
(cheap_bb_rtx_cost_p): Set scale to REG_BR_PROB_BASE when optimizing
for size.
(rest_of_handle_if_conversion, rest_of_handle_if_after_combine):
Clear/set the variable ifcvt_after_combine.Index: gcc/ifcvt.c
===
--- gcc/ifcvt.c (revision 197029)
+++ gcc/ifcvt.c (working copy)
@@ -67,6 +67,9 @@
 
 #define NULL_BLOCK ((basic_block) NULL)
 
+/* TRUE if after combine pass.  */
+static bool ifcvt_after_combine;
+
 /* # of IF-THEN or IF-THEN-ELSE blocks we looked at  */
 static int num_possible_if_blocks;
 
@@ -144,8 +147,14 @@ cheap_bb_rtx_cost_p (const_basic_block bb, int sca
   /* Our branch probability/scaling factors are just estimates and don't
  account for cases where we can get speculation for free and other
  secondary benefits.  So we fudge the scale factor to make speculating
- appear a little more profitable.  */
+ appear a little more profitable when optimizing for performance.  */
   scale += REG_BR_PROB_BASE / 8;
+
+  /* Set the scale to REG_BR_PROB_BASE to be more agressive when
+ optimizing for size and after combine pass.  */
+  if (!optimize_function_for_speed_p (cfun) && ifcvt_after_combine)
+scale = REG_BR_PROB_BASE;
+
   max_cost *= scale;
 
   while (1)
@@ -4445,6 +4454,7 @@ gate_handle_if_conversion (void)
 static unsigned int
 rest_of_handle_if_conversion (void)
 {
+  ifcvt_after_combine = false;
   if (flag_if_conversion)
 {
   if (dump_file)
@@ -4494,6 +4504,7 @@ gate_handle_if_after_combine (void)
 static unsigned int
 rest_of_handle_if_after_combine (void)
 {
+  ifcvt_after_combine = true;
   if_convert ();
   return 0;
 }


Gestión y Organización del Mantenimiento Seminario para técnicos y mandos intermedios

2013-03-25 Thread Ramos, Cesar
Gestión y Organización del Mantenimiento Seminario para técnicos y mandos 
intermedios
18 y 19 de Abril de 2013 Mexico DF

Un programa de Mantenimiento bien ejecutado genera ahorros de mas de un 42 % en 
los costos de Producción. Los beneficios que un buen nivel del Mantenimiento 
reporta como consecuencia de la mejora de la calidad de la producción, derivada 
del buen estado de los equipos productivos permiten su utilización a un ritmo y 
nivel 
de calidad que solo unas condiciones de estado idóneas permiten.
Los Mandos Intermedios son pieza clave para conseguir objetivos, por lo que 
Industrial Training Skills ha programado este curso, con el deseo de ayudar a 
estos técnicos 
en la siempre difícil tarea de lograr la máxima eficiencia en la labor que la 
empresa les ha encomendado.

Para recibir un folleto responda este mensaje con los siguientes datos:

Nombre: 
Empresa: 
Teléfono: 
Email: 
Numero de Interesados:
(Descuento especiales para grupos)

O si lo prefiere Llame a nuestro principal Centro de Atención Telefónica en el 
Pais 
Tels (33) 3121 1350 con 10 Líneas / Lada Nacional Sin Costo : 01800 8417770


This message was sent to...@gnu.org from: Training Division | Leibnitz 11 – 204 
| Distrito Federal , Mexico 11590, Mexico




TYPO - http://gcc.gnu.org/gcc-4.8/changes.html

2013-03-25 Thread John Franklin
"cpmpilation"

probably meant "compilation"


[Patch, wwwdocs, committed] was: Re: TYPO - http://gcc.gnu.org/gcc-4.8/changes.html

2013-03-25 Thread Tobias Burnus

John Franklin wrote:

"cpmpilation"
probably meant "compilation"


Thanks for the report. I have fixed it with the attached patch.

Tobias

Index: gcc-4.8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.111
diff -u -r1.111 changes.html
--- gcc-4.8/changes.html	23 Mar 2013 00:54:43 -	1.111
+++ gcc-4.8/changes.html	25 Mar 2013 08:35:07 -
@@ -675,7 +675,7 @@
 RX
   
 This target will now issue a warning message whenever multiple fast
-interrupt handlers are found in the same cpmpilation unit.  This feature can
+interrupt handlers are found in the same compilation unit.  This feature can
 be turned off by the new -mno-warn-multiple-fast-interrupts
 command-line option.
   


Compiler speed (vanilla vs. LTO, PGO and LTO+PGO)

2013-03-25 Thread Markus Trippelsdorf
On 2013.03.25 at 08:06 +0100, Markus Trippelsdorf wrote:
> On 2013.03.24 at 20:53 +0100, gcc_mailingl...@abwesend.de wrote:
> > 
> > is it useful to compile gcc 4.8.0 with the lto option?
> 
> If you want a (slightly) faster compiler then yes.
> Simply add "--with-build-config=bootstrap-lto" to your configuration.
> You can combine this with profile feedback: "make profiledbootstrap".

To qualify "(slightly) faster" in the statement above, I build gcc with
four different configurations on my AMD64 4-core machine (vanilla, LTO
only, PGO only, LTO+PGO). Then I measured how much time it takes to
build the Linux kernel and Firefox. Here are the results:

Firefox:
vanilla:  5143.27s user 267.27s system 346% cpu 26:02.03 total
PGO:  4590.37s user 270.21s system 344% cpu 23:28.89 total
LTO:  5056.11s user 268.04s system 348% cpu 25:28.73 total
LTO+PGO:  4598.79s user 269.01s system 347% cpu 23:22.13 total

kernel (measured three times):
vanilla:  382.34s user 23.74s system 334% cpu 2:01.41 total 382.08s user 24.05s 
system 333% cpu 2:01.93 total 385.20s user 23.63s system 330% cpu 2:03.73 total
PGO:  341.18s user 23.25s system 323% cpu 1:52.71 total 341.72s user 23.66s 
system 323% cpu 1:52.93 total 340.32s user 23.42s system 326% cpu 1:51.38 total
LTO:  381.23s user 23.55s system 328% cpu 2:03.05 total 380.41s user 24.35s 
system 328% cpu 2:03.24 total 379.47s user 23.98s system 331% cpu 2:01.82 total
LTO+PGO:  347.12s user 25.11s system 317% cpu 1:57.34 total 344.38s user 24.05s 
system 326% cpu 1:52.99 total 344.74s user 24.61s system 323% cpu 1:54.03 total

To summarize: 
 * GCC build with PGO is ~10% faster than a vanilla bootstrapped compiler.
 * GCC build with LTO only is only ~2% faster when building Firefox. The
   kernel build time difference is in the noise.
 * A LTO+PGO build is almost exactly as fast as a pure PGO build.

So it appears, contrary to the advice given above, that it is not useful
to build gcc 4.8.0 with the lto option at the moment.

-- 
Markus


Re: Compiler speed (vanilla vs. LTO, PGO and LTO+PGO)

2013-03-25 Thread Andi Kleen
Markus Trippelsdorf  writes:
>
> So it appears, contrary to the advice given above, that it is not useful
> to build gcc 4.8.0 with the lto option at the moment.

Did you build firefox/kernel with debug info on/off?

Often debug info on changes the compiler performance significantly, as it
generates a lot more IO.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: Compiler speed (vanilla vs. LTO, PGO and LTO+PGO)

2013-03-25 Thread Richard Biener
On Mon, Mar 25, 2013 at 1:56 PM, Markus Trippelsdorf
 wrote:
> On 2013.03.25 at 08:06 +0100, Markus Trippelsdorf wrote:
>> On 2013.03.24 at 20:53 +0100, gcc_mailingl...@abwesend.de wrote:
>> >
>> > is it useful to compile gcc 4.8.0 with the lto option?
>>
>> If you want a (slightly) faster compiler then yes.
>> Simply add "--with-build-config=bootstrap-lto" to your configuration.
>> You can combine this with profile feedback: "make profiledbootstrap".
>
> To qualify "(slightly) faster" in the statement above, I build gcc with
> four different configurations on my AMD64 4-core machine (vanilla, LTO
> only, PGO only, LTO+PGO). Then I measured how much time it takes to
> build the Linux kernel and Firefox. Here are the results:
>
> Firefox:
> vanilla:  5143.27s user 267.27s system 346% cpu 26:02.03 total
> PGO:  4590.37s user 270.21s system 344% cpu 23:28.89 total
> LTO:  5056.11s user 268.04s system 348% cpu 25:28.73 total
> LTO+PGO:  4598.79s user 269.01s system 347% cpu 23:22.13 total
>
> kernel (measured three times):
> vanilla:  382.34s user 23.74s system 334% cpu 2:01.41 total 382.08s user 
> 24.05s system 333% cpu 2:01.93 total 385.20s user 23.63s system 330% cpu 
> 2:03.73 total
> PGO:  341.18s user 23.25s system 323% cpu 1:52.71 total 341.72s user 
> 23.66s system 323% cpu 1:52.93 total 340.32s user 23.42s system 326% cpu 
> 1:51.38 total
> LTO:  381.23s user 23.55s system 328% cpu 2:03.05 total 380.41s user 
> 24.35s system 328% cpu 2:03.24 total 379.47s user 23.98s system 331% cpu 
> 2:01.82 total
> LTO+PGO:  347.12s user 25.11s system 317% cpu 1:57.34 total 344.38s user 
> 24.05s system 326% cpu 1:52.99 total 344.74s user 24.61s system 323% cpu 
> 1:54.03 total
>
> To summarize:
>  * GCC build with PGO is ~10% faster than a vanilla bootstrapped compiler.
>  * GCC build with LTO only is only ~2% faster when building Firefox. The
>kernel build time difference is in the noise.
>  * A LTO+PGO build is almost exactly as fast as a pure PGO build.
>
> So it appears, contrary to the advice given above, that it is not useful
> to build gcc 4.8.0 with the lto option at the moment.

Probably Honza did a too good job in making sure optimizations LTO does
can be done without LTO as well by fixing up GCC sources ;)

Did you compare binary sizes of the compiler itself (w/o debuginfo)?

Richard.

> --
> Markus


Re: Compiler speed (vanilla vs. LTO, PGO and LTO+PGO)

2013-03-25 Thread Markus Trippelsdorf
On 2013.03.25 at 06:07 -0700, Andi Kleen wrote:
> Markus Trippelsdorf  writes:
> >
> > So it appears, contrary to the advice given above, that it is not useful
> > to build gcc 4.8.0 with the lto option at the moment.
> 
> Did you build firefox/kernel with debug info on/off?
> 
> Often debug info on changes the compiler performance significantly, as it
> generates a lot more IO.

Debug info was turned off in all cases (kernel, Firefox, gcc).

-- 
Markus


Re: Compiler speed (vanilla vs. LTO, PGO and LTO+PGO)

2013-03-25 Thread Markus Trippelsdorf
On 2013.03.25 at 14:11 +0100, Richard Biener wrote:
> On Mon, Mar 25, 2013 at 1:56 PM, Markus Trippelsdorf
>  wrote:
> > On 2013.03.25 at 08:06 +0100, Markus Trippelsdorf wrote:
> >> On 2013.03.24 at 20:53 +0100, gcc_mailingl...@abwesend.de wrote:
> >> >
> >> > is it useful to compile gcc 4.8.0 with the lto option?
> >>
> >> If you want a (slightly) faster compiler then yes.
> >> Simply add "--with-build-config=bootstrap-lto" to your configuration.
> >> You can combine this with profile feedback: "make profiledbootstrap".
> >
> > To qualify "(slightly) faster" in the statement above, I build gcc with
> > four different configurations on my AMD64 4-core machine (vanilla, LTO
> > only, PGO only, LTO+PGO). Then I measured how much time it takes to
> > build the Linux kernel and Firefox. Here are the results:
> >
> > Firefox:
> > vanilla:  5143.27s user 267.27s system 346% cpu 26:02.03 total
> > PGO:  4590.37s user 270.21s system 344% cpu 23:28.89 total
> > LTO:  5056.11s user 268.04s system 348% cpu 25:28.73 total
> > LTO+PGO:  4598.79s user 269.01s system 347% cpu 23:22.13 total
> >
> > kernel (measured three times):
> > vanilla:  382.34s user 23.74s system 334% cpu 2:01.41 total 382.08s user 
> > 24.05s system 333% cpu 2:01.93 total 385.20s user 23.63s system 330% cpu 
> > 2:03.73 total
> > PGO:  341.18s user 23.25s system 323% cpu 1:52.71 total 341.72s user 
> > 23.66s system 323% cpu 1:52.93 total 340.32s user 23.42s system 326% cpu 
> > 1:51.38 total
> > LTO:  381.23s user 23.55s system 328% cpu 2:03.05 total 380.41s user 
> > 24.35s system 328% cpu 2:03.24 total 379.47s user 23.98s system 331% cpu 
> > 2:01.82 total
> > LTO+PGO:  347.12s user 25.11s system 317% cpu 1:57.34 total 344.38s user 
> > 24.05s system 326% cpu 1:52.99 total 344.74s user 24.61s system 323% cpu 
> > 1:54.03 total
> >
> > To summarize:
> >  * GCC build with PGO is ~10% faster than a vanilla bootstrapped compiler.
> >  * GCC build with LTO only is only ~2% faster when building Firefox. The
> >kernel build time difference is in the noise.
> >  * A LTO+PGO build is almost exactly as fast as a pure PGO build.
> >
> > So it appears, contrary to the advice given above, that it is not useful
> > to build gcc 4.8.0 with the lto option at the moment.
> 
> Probably Honza did a too good job in making sure optimizations LTO does
> can be done without LTO as well by fixing up GCC sources ;)
> 
> Did you compare binary sizes of the compiler itself (w/o debuginfo)?

Vanilla:
-rwxr-xr-x 1 markus markus 16219976 Mar 25 09:28 cc1
-rwxr-xr-x 1 markus markus 17762824 Mar 25 09:28 cc1plus
-rwxr-xr-x 1 markus markus 15354320 Mar 25 09:28 lto1
-rwxr-xr-x 4 markus markus 664920 Mar 25 09:28 c++
-rwxr-xr-x 1 markus markus 663496 Mar 25 09:28 cpp
-rwxr-xr-x 4 markus markus 664920 Mar 25 09:28 g++
-rwxr-xr-x 3 markus markus 662464 Mar 25 09:28 gcc

PGO:
-rwxr-xr-x 1 markus markus 14778600 Mar 25 09:14 cc1
-rwxr-xr-x 1 markus markus 16106120 Mar 25 09:14 cc1plus
-rwxr-xr-x 1 markus markus 14054448 Mar 25 09:14 lto1
-rwxr-xr-x 4 markus markus 579744 Mar 25 09:14 c++
-rwxr-xr-x 1 markus markus 575600 Mar 25 09:14 cpp
-rwxr-xr-x 4 markus markus 579744 Mar 25 09:14 g++
-rwxr-xr-x 3 markus markus 575560 Mar 25 09:14 gcc

LTO:
-rwxr-xr-x 1 markus markus 17147688 Mar 25 08:58 cc1
-rwxr-xr-x 1 markus markus 18728200 Mar 25 08:58 cc1plus
-rwxr-xr-x 1 markus markus 16227224 Mar 25 08:58 lto1
-rwxr-xr-x 4 markus markus 567968 Mar 25 08:58 c++
-rwxr-xr-x 1 markus markus 568224 Mar 25 08:58 cpp
-rwxr-xr-x 4 markus markus 567968 Mar 25 08:58 g++
-rwxr-xr-x 3 markus markus 563728 Mar 25 08:58 gcc

LTO+PGO:
-rwxr-xr-x 1 root root 16319480 Mar 22 13:02 cc1
-rwxr-xr-x 1 root root 17616608 Mar 22 13:02 cc1plus
-rwxr-xr-x 1 root root 15445824 Mar 22 13:02 lto1
-rwxr-xr-x 4 root root 492344 Mar 22 13:02 c++
-rwxr-xr-x 1 root root 492320 Mar 22 13:02 cpp
-rwxr-xr-x 4 root root 492344 Mar 22 13:02 g++
-rwxr-xr-x 3 root root 492232 Mar 22 13:02 gcc

-- 
Markus


Re: Compiler speed (vanilla vs. LTO, PGO and LTO+PGO)

2013-03-25 Thread Richard Biener
On Mon, Mar 25, 2013 at 2:24 PM, Markus Trippelsdorf
 wrote:
> On 2013.03.25 at 14:11 +0100, Richard Biener wrote:
>> On Mon, Mar 25, 2013 at 1:56 PM, Markus Trippelsdorf
>>  wrote:
>> > On 2013.03.25 at 08:06 +0100, Markus Trippelsdorf wrote:
>> >> On 2013.03.24 at 20:53 +0100, gcc_mailingl...@abwesend.de wrote:
>> >> >
>> >> > is it useful to compile gcc 4.8.0 with the lto option?
>> >>
>> >> If you want a (slightly) faster compiler then yes.
>> >> Simply add "--with-build-config=bootstrap-lto" to your configuration.
>> >> You can combine this with profile feedback: "make profiledbootstrap".
>> >
>> > To qualify "(slightly) faster" in the statement above, I build gcc with
>> > four different configurations on my AMD64 4-core machine (vanilla, LTO
>> > only, PGO only, LTO+PGO). Then I measured how much time it takes to
>> > build the Linux kernel and Firefox. Here are the results:
>> >
>> > Firefox:
>> > vanilla:  5143.27s user 267.27s system 346% cpu 26:02.03 total
>> > PGO:  4590.37s user 270.21s system 344% cpu 23:28.89 total
>> > LTO:  5056.11s user 268.04s system 348% cpu 25:28.73 total
>> > LTO+PGO:  4598.79s user 269.01s system 347% cpu 23:22.13 total
>> >
>> > kernel (measured three times):
>> > vanilla:  382.34s user 23.74s system 334% cpu 2:01.41 total 382.08s user 
>> > 24.05s system 333% cpu 2:01.93 total 385.20s user 23.63s system 330% cpu 
>> > 2:03.73 total
>> > PGO:  341.18s user 23.25s system 323% cpu 1:52.71 total 341.72s user 
>> > 23.66s system 323% cpu 1:52.93 total 340.32s user 23.42s system 326% cpu 
>> > 1:51.38 total
>> > LTO:  381.23s user 23.55s system 328% cpu 2:03.05 total 380.41s user 
>> > 24.35s system 328% cpu 2:03.24 total 379.47s user 23.98s system 331% cpu 
>> > 2:01.82 total
>> > LTO+PGO:  347.12s user 25.11s system 317% cpu 1:57.34 total 344.38s user 
>> > 24.05s system 326% cpu 1:52.99 total 344.74s user 24.61s system 323% cpu 
>> > 1:54.03 total
>> >
>> > To summarize:
>> >  * GCC build with PGO is ~10% faster than a vanilla bootstrapped compiler.
>> >  * GCC build with LTO only is only ~2% faster when building Firefox. The
>> >kernel build time difference is in the noise.
>> >  * A LTO+PGO build is almost exactly as fast as a pure PGO build.
>> >
>> > So it appears, contrary to the advice given above, that it is not useful
>> > to build gcc 4.8.0 with the lto option at the moment.
>>
>> Probably Honza did a too good job in making sure optimizations LTO does
>> can be done without LTO as well by fixing up GCC sources ;)
>>
>> Did you compare binary sizes of the compiler itself (w/o debuginfo)?
>
> Vanilla:
> -rwxr-xr-x 1 markus markus 16219976 Mar 25 09:28 cc1
> -rwxr-xr-x 1 markus markus 17762824 Mar 25 09:28 cc1plus
> -rwxr-xr-x 1 markus markus 15354320 Mar 25 09:28 lto1
> -rwxr-xr-x 4 markus markus 664920 Mar 25 09:28 c++
> -rwxr-xr-x 1 markus markus 663496 Mar 25 09:28 cpp
> -rwxr-xr-x 4 markus markus 664920 Mar 25 09:28 g++
> -rwxr-xr-x 3 markus markus 662464 Mar 25 09:28 gcc
>
> PGO:
> -rwxr-xr-x 1 markus markus 14778600 Mar 25 09:14 cc1
> -rwxr-xr-x 1 markus markus 16106120 Mar 25 09:14 cc1plus
> -rwxr-xr-x 1 markus markus 14054448 Mar 25 09:14 lto1
> -rwxr-xr-x 4 markus markus 579744 Mar 25 09:14 c++
> -rwxr-xr-x 1 markus markus 575600 Mar 25 09:14 cpp
> -rwxr-xr-x 4 markus markus 579744 Mar 25 09:14 g++
> -rwxr-xr-x 3 markus markus 575560 Mar 25 09:14 gcc
>
> LTO:
> -rwxr-xr-x 1 markus markus 17147688 Mar 25 08:58 cc1
> -rwxr-xr-x 1 markus markus 18728200 Mar 25 08:58 cc1plus
> -rwxr-xr-x 1 markus markus 16227224 Mar 25 08:58 lto1
> -rwxr-xr-x 4 markus markus 567968 Mar 25 08:58 c++
> -rwxr-xr-x 1 markus markus 568224 Mar 25 08:58 cpp
> -rwxr-xr-x 4 markus markus 567968 Mar 25 08:58 g++
> -rwxr-xr-x 3 markus markus 563728 Mar 25 08:58 gcc
>
> LTO+PGO:
> -rwxr-xr-x 1 root root 16319480 Mar 22 13:02 cc1
> -rwxr-xr-x 1 root root 17616608 Mar 22 13:02 cc1plus
> -rwxr-xr-x 1 root root 15445824 Mar 22 13:02 lto1
> -rwxr-xr-x 4 root root 492344 Mar 22 13:02 c++
> -rwxr-xr-x 1 root root 492320 Mar 22 13:02 cpp
> -rwxr-xr-x 4 root root 492344 Mar 22 13:02 g++
> -rwxr-xr-x 3 root root 492232 Mar 22 13:02 gcc

Hmm, does the default --enable-plugin (GCC plugin support) which results
in -rdynamic being used maybe prevent some of the useful LTO optimizations
(mainly due to cost constraints)?  That is, is a LTO + PGO build with
--disable-plugin any different?

Richard.

> --
> Markus


Re: GCC 4.8.0 does not compile for DJGPP

2013-03-25 Thread David Edelsohn
On Mon, Mar 25, 2013 at 12:29 AM, Andris Pavenis  wrote:

>> Forgot to say that I also had to apply this patch
>>
>> --- ../gcc-4.8.0/libbacktrace/alloc.c2013-01-14 19:17:30.0
>> +0100
>> +++ ../gcc-4.80/libbacktrace/alloc.c2013-03-24 18:07:11.995891959
>> +0100
>> @@ -34,6 +34,7 @@
>>
>>   #include 
>>   #include 
>> +#include 
>>
>>   #include "backtrace.h"
>>   #include "internal.h"
>>
>
> This fix is required for current stable version (2.03) of DJGPP only. I only
> built for
> development version 2.0.4 (really recent once from CVS) which does not need
> this fix.
>
> Native build for DJGPP v2.03 fails due to DJGPP own problems and was left
> out
> for that reason and because of DJGPP v2.03 is already too old.

I believe that Ian is asking what *specific* declaration is missing
that prevents DJGPP from building and is supplied by including that
header file.

Thanks, David


Inquiry about GCC Summer Of Code project idea.

2013-03-25 Thread Fotis Koutoulakis
Greetings,

I am writing this email with regard to a potential project idea that's
hosted on the GCC wiki about porting the go programming language GCC
(gccgo) frontend to the GNU/HURD operating system (information found
here-> http://gcc.gnu.org/wiki/SummerOfCode and here->
http://www.gnu.org/software/hurd/open_issues/gccgo.html).

My specific queries would be:

- This particular idea seems to be eligible for this year's Google
Summer Of Code. Further research on the GCC wiki shows that this
particular idea has never been implemented in the past - or assigned.
However, I would like someone else to assert my assumption that this
is eligible for this year's GSOC.

- What would be the specific educational and knowledge background that
the student who wishes to implement this particular idea should have?
I can see mentions of good POSIX API knowledge, go language knowledge
and HURD knowledge here, but I would like to know if there would be
more requirements regarding this specific project idea that are not
immediately obvious.

- What would be a skill level estimate for someone wishing to try this
project in an attempt to get his feet wet in compiler engineering?

I really appreciate any information you could provide.

--
Fotis 'NlightNFotis' Koutoulakis

- "Non semper aestas erit; venit hiems."


Re: Inquiry about GCC Summer Of Code project idea.

2013-03-25 Thread Ian Lance Taylor
On Mon, Mar 25, 2013 at 7:42 AM, Fotis Koutoulakis
 wrote:
>
> I am writing this email with regard to a potential project idea that's
> hosted on the GCC wiki about porting the go programming language GCC
> (gccgo) frontend to the GNU/HURD operating system (information found
> here-> http://gcc.gnu.org/wiki/SummerOfCode and here->
> http://www.gnu.org/software/hurd/open_issues/gccgo.html).
>
> My specific queries would be:
>
> - This particular idea seems to be eligible for this year's Google
> Summer Of Code. Further research on the GCC wiki shows that this
> particular idea has never been implemented in the past - or assigned.
> However, I would like someone else to assert my assumption that this
> is eligible for this year's GSOC.

Yes, it is eligible.

(This is of course no guarantee that this particular project will be
selected.  It depends on the other proposals we receive.)


> - What would be the specific educational and knowledge background that
> the student who wishes to implement this particular idea should have?
> I can see mentions of good POSIX API knowledge, go language knowledge
> and HURD knowledge here, but I would like to know if there would be
> more requirements regarding this specific project idea that are not
> immediately obvious.

I think you're pretty much right.  I think the most important part
coming in would be a clear understanding of the HURD, its system call
interface, and its object file format.  You would have to be able to
dig into the Go library, to understand how implements the system call
layer.


> - What would be a skill level estimate for someone wishing to try this
> project in an attempt to get his feet wet in compiler engineering?

Unfortunately it's hard for me to judge.  The most important skill
would be the ability to dig into some large code bases and understand
how to change them.

Ian


Re: Compiler speed (vanilla vs. LTO, PGO and LTO+PGO)

2013-03-25 Thread Markus Trippelsdorf
On 2013.03.25 at 15:17 +0100, Richard Biener wrote:
> On Mon, Mar 25, 2013 at 2:24 PM, Markus Trippelsdorf
>  wrote:
> > On 2013.03.25 at 14:11 +0100, Richard Biener wrote:
> >> On Mon, Mar 25, 2013 at 1:56 PM, Markus Trippelsdorf
> >>  wrote:
> >> > On 2013.03.25 at 08:06 +0100, Markus Trippelsdorf wrote:
> >> >> On 2013.03.24 at 20:53 +0100, gcc_mailingl...@abwesend.de wrote:
> >> >> >
> >> >> > is it useful to compile gcc 4.8.0 with the lto option?
> >> >>
> >> >> If you want a (slightly) faster compiler then yes.
> >> >> Simply add "--with-build-config=bootstrap-lto" to your configuration.
> >> >> You can combine this with profile feedback: "make profiledbootstrap".
> >> >
> >> > To qualify "(slightly) faster" in the statement above, I build gcc with
> >> > four different configurations on my AMD64 4-core machine (vanilla, LTO
> >> > only, PGO only, LTO+PGO). Then I measured how much time it takes to
> >> > build the Linux kernel and Firefox. Here are the results:
> >> >
> >> > Firefox:
> >> > vanilla:  5143.27s user 267.27s system 346% cpu 26:02.03 total
> >> > PGO:  4590.37s user 270.21s system 344% cpu 23:28.89 total
> >> > LTO:  5056.11s user 268.04s system 348% cpu 25:28.73 total
> >> > LTO+PGO:  4598.79s user 269.01s system 347% cpu 23:22.13 total
> >> >
> >> > kernel (measured three times):
> >> > vanilla:  382.34s user 23.74s system 334% cpu 2:01.41 total 382.08s user 
> >> > 24.05s system 333% cpu 2:01.93 total 385.20s user 23.63s system 330% cpu 
> >> > 2:03.73 total
> >> > PGO:  341.18s user 23.25s system 323% cpu 1:52.71 total 341.72s user 
> >> > 23.66s system 323% cpu 1:52.93 total 340.32s user 23.42s system 326% cpu 
> >> > 1:51.38 total
> >> > LTO:  381.23s user 23.55s system 328% cpu 2:03.05 total 380.41s user 
> >> > 24.35s system 328% cpu 2:03.24 total 379.47s user 23.98s system 331% cpu 
> >> > 2:01.82 total
> >> > LTO+PGO:  347.12s user 25.11s system 317% cpu 1:57.34 total 344.38s user 
> >> > 24.05s system 326% cpu 1:52.99 total 344.74s user 24.61s system 323% cpu 
> >> > 1:54.03 total
> >> >
> >> > To summarize:
> >> >  * GCC build with PGO is ~10% faster than a vanilla bootstrapped 
> >> > compiler.
> >> >  * GCC build with LTO only is only ~2% faster when building Firefox. The
> >> >kernel build time difference is in the noise.
> >> >  * A LTO+PGO build is almost exactly as fast as a pure PGO build.
> >> >
> >> > So it appears, contrary to the advice given above, that it is not useful
> >> > to build gcc 4.8.0 with the lto option at the moment.
> >>
> >> Probably Honza did a too good job in making sure optimizations LTO does
> >> can be done without LTO as well by fixing up GCC sources ;)
> >>
> >> Did you compare binary sizes of the compiler itself (w/o debuginfo)?
> >
> > Vanilla:
> > -rwxr-xr-x 1 markus markus 16219976 Mar 25 09:28 cc1
> > -rwxr-xr-x 1 markus markus 17762824 Mar 25 09:28 cc1plus
> > -rwxr-xr-x 1 markus markus 15354320 Mar 25 09:28 lto1
> > -rwxr-xr-x 4 markus markus 664920 Mar 25 09:28 c++
> > -rwxr-xr-x 1 markus markus 663496 Mar 25 09:28 cpp
> > -rwxr-xr-x 4 markus markus 664920 Mar 25 09:28 g++
> > -rwxr-xr-x 3 markus markus 662464 Mar 25 09:28 gcc
> >
> > PGO:
> > -rwxr-xr-x 1 markus markus 14778600 Mar 25 09:14 cc1
> > -rwxr-xr-x 1 markus markus 16106120 Mar 25 09:14 cc1plus
> > -rwxr-xr-x 1 markus markus 14054448 Mar 25 09:14 lto1
> > -rwxr-xr-x 4 markus markus 579744 Mar 25 09:14 c++
> > -rwxr-xr-x 1 markus markus 575600 Mar 25 09:14 cpp
> > -rwxr-xr-x 4 markus markus 579744 Mar 25 09:14 g++
> > -rwxr-xr-x 3 markus markus 575560 Mar 25 09:14 gcc
> >
> > LTO:
> > -rwxr-xr-x 1 markus markus 17147688 Mar 25 08:58 cc1
> > -rwxr-xr-x 1 markus markus 18728200 Mar 25 08:58 cc1plus
> > -rwxr-xr-x 1 markus markus 16227224 Mar 25 08:58 lto1
> > -rwxr-xr-x 4 markus markus 567968 Mar 25 08:58 c++
> > -rwxr-xr-x 1 markus markus 568224 Mar 25 08:58 cpp
> > -rwxr-xr-x 4 markus markus 567968 Mar 25 08:58 g++
> > -rwxr-xr-x 3 markus markus 563728 Mar 25 08:58 gcc
> >
> > LTO+PGO:
> > -rwxr-xr-x 1 root root 16319480 Mar 22 13:02 cc1
> > -rwxr-xr-x 1 root root 17616608 Mar 22 13:02 cc1plus
> > -rwxr-xr-x 1 root root 15445824 Mar 22 13:02 lto1
> > -rwxr-xr-x 4 root root 492344 Mar 22 13:02 c++
> > -rwxr-xr-x 1 root root 492320 Mar 22 13:02 cpp
> > -rwxr-xr-x 4 root root 492344 Mar 22 13:02 g++
> > -rwxr-xr-x 3 root root 492232 Mar 22 13:02 gcc
> 
> Hmm, does the default --enable-plugin (GCC plugin support) which results
> in -rdynamic being used maybe prevent some of the useful LTO optimizations
> (mainly due to cost constraints)?  That is, is a LTO + PGO build with
> --disable-plugin any different?

Yes, the binary size is 8-10% smaller. Unfortunately there are no performance
improvements.

LTO+PGO-disable-plugin:
-rwxr-xr-x 1 markus markus 15025568 Mar 25 15:49 cc1
-rwxr-xr-x 1 markus markus 16198584 Mar 25 15:49 cc1plus
-rwxr-xr-x 1 markus markus 13907328 Mar 25 15:49 lto1
-rwxr-xr-x 4 markus markus 492360 Mar 25 15:49 c++
-rwxr-xr-x 1 markus marku

Re: Inquiry about GCC Summer Of Code project idea.

2013-03-25 Thread Samuel Thibault
Ian Lance Taylor, le Mon 25 Mar 2013 08:22:15 -0700, a écrit :
> > - What would be a skill level estimate for someone wishing to try this
> > project in an attempt to get his feet wet in compiler engineering?
> 
> Unfortunately it's hard for me to judge.  The most important skill
> would be the ability to dig into some large code bases and understand
> how to change them.

Agreed.  I don't think there would be much about compiler engineering
actually, but rather about runtime and system calls.

Samuel


Re: GCC 4.8.0 does not compile for DJGPP

2013-03-25 Thread Fabrizio Gennari

Il 25/03/2013 00:00, Ian Lance Taylor ha scritto:

On Sun, Mar 24, 2013 at 10:51 AM, Fabrizio Gennari
 wrote:

Il 24/03/2013 18:48, Fabrizio Gennari ha scritto:


Il 23/03/2013 18:07, DJ Delorie ha scritto:

The DJGPP build of gcc 4.8.0 was just uploaded, it might have some
patches that haven't been committed upstream yet.

Thank you DJ. I downloaded beta/v2gnu/gcc480s.zip from a mirror, and that
compiles. And, indeed, the file gcc/config/i386/djgpp.h is different from
the one in the official gcc-4.8.0.tar.bz2, meaning that some DJGPP patches
are not present upstream.

Forgot to say that I also had to apply this patch

--- ../gcc-4.8.0/libbacktrace/alloc.c2013-01-14 19:17:30.0 +0100
+++ ../gcc-4.80/libbacktrace/alloc.c2013-03-24 18:07:11.995891959 +0100
@@ -34,6 +34,7 @@

  #include 
  #include 
+#include 

  #include "backtrace.h"
  #include "internal.h"


What failed without that patch?

Ian
libtool: compile: /home/fabrizio/dev/djgpp/cross/gcc2/./gcc/xgcc 
-B/home/fabrizio/dev/djgpp/cross/gcc2/./gcc/ 
-B/home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/bin/ 
-B/home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/lib/ -isystem 
/home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/include -isystem 
/home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/sys-include -DHAVE_CONFIG_H 
-I. -I../../../gcc-4.80/libbacktrace -I 
../../../gcc-4.80/libbacktrace/../include -I 
../../../gcc-4.80/libbacktrace/../libgcc -I ../libgcc -funwind-tables 
-frandom-seed=alloc.lo -W -Wall -Wwrite-strings -Wstrict-prototypes 
-Wmissing-prototypes -Wold-style-definition -Wmissing-format-attribute 
-Wcast-qual -Werror -fPIC -g -O2 -c 
../../../gcc-4.80/libbacktrace/alloc.c -o alloc.o

-fPIC ignored (not supported for DJGPP)
In file included from ../../../gcc-4.80/libbacktrace/alloc.c:39:0:
../../../gcc-4.80/libbacktrace/internal.h:141:11: error: unknown type 
name ‘off_t’

off_t offset, size_t size,
^
make[3]: *** [alloc.lo] Errore 1
make[3]: uscita dalla directory 
"/home/fabrizio/dev/djgpp/cross/gcc2/i586-pc-msdosdjgpp/libbacktrace"


internal.h (included by libbacktrace/alloc.c) uses off_t, which is not 
declared unless sys/types.h is included


BTW, I am using beta/v2/djcrx204.zip for basic DJGPP headers and 
libraries, so no version 2.03 involved


Fabrizio


Re: GCC 4.8.0 does not compile for DJGPP

2013-03-25 Thread Ian Lance Taylor
On Mon, Mar 25, 2013 at 11:02 AM, Fabrizio Gennari
 wrote:
> Il 25/03/2013 00:00, Ian Lance Taylor ha scritto:
>
>> On Sun, Mar 24, 2013 at 10:51 AM, Fabrizio Gennari
>>  wrote:
>>>
>>> Il 24/03/2013 18:48, Fabrizio Gennari ha scritto:
>>>
 Il 23/03/2013 18:07, DJ Delorie ha scritto:
>
> The DJGPP build of gcc 4.8.0 was just uploaded, it might have some
> patches that haven't been committed upstream yet.

 Thank you DJ. I downloaded beta/v2gnu/gcc480s.zip from a mirror, and
 that
 compiles. And, indeed, the file gcc/config/i386/djgpp.h is different
 from
 the one in the official gcc-4.8.0.tar.bz2, meaning that some DJGPP
 patches
 are not present upstream.
>>>
>>> Forgot to say that I also had to apply this patch
>>>
>>> --- ../gcc-4.8.0/libbacktrace/alloc.c2013-01-14 19:17:30.0
>>> +0100
>>> +++ ../gcc-4.80/libbacktrace/alloc.c2013-03-24 18:07:11.995891959
>>> +0100
>>> @@ -34,6 +34,7 @@
>>>
>>>   #include 
>>>   #include 
>>> +#include 
>>>
>>>   #include "backtrace.h"
>>>   #include "internal.h"
>>
>>
>> What failed without that patch?
>>
>> Ian
>
> libtool: compile: /home/fabrizio/dev/djgpp/cross/gcc2/./gcc/xgcc
> -B/home/fabrizio/dev/djgpp/cross/gcc2/./gcc/
> -B/home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/bin/
> -B/home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/lib/ -isystem
> /home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/include -isystem
> /home/fabrizio/dev/djgpp/i586-pc-msdosdjgpp/sys-include -DHAVE_CONFIG_H -I.
> -I../../../gcc-4.80/libbacktrace -I
> ../../../gcc-4.80/libbacktrace/../include -I
> ../../../gcc-4.80/libbacktrace/../libgcc -I ../libgcc -funwind-tables
> -frandom-seed=alloc.lo -W -Wall -Wwrite-strings -Wstrict-prototypes
> -Wmissing-prototypes -Wold-style-definition -Wmissing-format-attribute
> -Wcast-qual -Werror -fPIC -g -O2 -c ../../../gcc-4.80/libbacktrace/alloc.c
> -o alloc.o
> -fPIC ignored (not supported for DJGPP)
> In file included from ../../../gcc-4.80/libbacktrace/alloc.c:39:0:
> ../../../gcc-4.80/libbacktrace/internal.h:141:11: error: unknown type name
> ‘off_t’
> off_t offset, size_t size,
> ^
> make[3]: *** [alloc.lo] Errore 1
> make[3]: uscita dalla directory
> "/home/fabrizio/dev/djgpp/cross/gcc2/i586-pc-msdosdjgpp/libbacktrace"
>
> internal.h (included by libbacktrace/alloc.c) uses off_t, which is not
> declared unless sys/types.h is included

Thanks.

I committed the following patch to mainline and 4.8 branch.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Ian


2013-03-25  Ian Lance Taylor  

* alloc.c: #include .
* mmap.c: Likewise.


foo.patch
Description: Binary data


Debugging C++ Function Calls

2013-03-25 Thread Lawrence Crowl
On 3/25/13, Tom Tromey  wrote:
>> "Lawrence" == Lawrence Crowl  writes:
>
> Lawrence> My model is that I should be able to cut and paste an expression
> Lawrence> from the source to the debugger and have it work.  I concede that
> Lawrence> C++ function overload resolution is a hard problem.  However, gdb
> Lawrence> has a slightly easier task in that it won't be doing instantiation
> Lawrence> (as that expression has already instantiated everything it needs)
> Lawrence> and so it need only pick among what exists.
>
> Yeah, what isn't clear to me is that even this can be done in a
> behavior-preserving way, at least short of having full source available
> and the entire compiler in the debugger.
>
> I'd be very pleased to be wrong, but my current understanding is that
> one can play arbitrary games with SFINAE to come up with code that
> defeats any less complete solution.

Hm.  I haven't thought about this deeply, but I think SFINAE may
not be less of an issue because it serves to remove candidates
from potential instantiation, and gdb won't be instantiating.
The critical distinction is that I'm not trying to call arbitrary
expressions (which would have a SFINAE problem) but call expressions
that already appear in the source.

I agree that the best long-term solution is an integrated compiler,
interpreter, and debugger.  That's not likely to happen soon.  :-)

>
> Sergio is going to look at this area again.  So if you know differently,
> it would be great to have your input.
>
> I can dig up the current ("pending" -- but really unreviewed for a few
> years for the above reasons) gdb patch if you are interested.  I believe
> it worked by applying overload-resolution-like rules to templates
> (though it has been a while).

I don't know anything about gdb internals, so it may not be helpful
for me to look at it.

-- 
Lawrence Crowl


Re: Debugging C++ Function Calls

2013-03-25 Thread Tom Tromey
> "Lawrence" == Lawrence Crowl  writes:

Lawrence> Hm.  I haven't thought about this deeply, but I think SFINAE may
Lawrence> not be less of an issue because it serves to remove candidates
Lawrence> from potential instantiation, and gdb won't be instantiating.
Lawrence> The critical distinction is that I'm not trying to call arbitrary
Lawrence> expressions (which would have a SFINAE problem) but call expressions
Lawrence> that already appear in the source.

Thanks.
I will think about it.

Lawrence> I agree that the best long-term solution is an integrated compiler,
Lawrence> interpreter, and debugger.  That's not likely to happen soon.  :-)

Sergio is re-opening our look into reusing GCC.
Keith Seitz wrote a GCC plugin to try to let us farm out
expression-parsing to the compiler.  This has various issues, some
because gdb allows various C++ extensions that are useful when
debugging; and also g++ was too slow.
Even if g++ can't be used we at least hope this time to identify some of
the things that make it slow and file a few bug reports...

Lawrence> I don't know anything about gdb internals, so it may not be helpful
Lawrence> for me to look at it.

Sure, but maybe for a critique of the approach.  But only if you are
interested.

Tom


Re: Debugging C++ Function Calls

2013-03-25 Thread Lawrence Crowl
On 3/25/13, Tom Tromey  wrote:
>> "Lawrence" == Lawrence Crowl  writes:
>
> Lawrence> Hm.  I haven't thought about this deeply, but I think SFINAE may
> Lawrence> not be less of an issue because it serves to remove candidates
> Lawrence> from potential instantiation, and gdb won't be instantiating.
> Lawrence> The critical distinction is that I'm not trying to call arbitrary
> Lawrence> expressions (which would have a SFINAE problem) but call
> expressions
> Lawrence> that already appear in the source.
>
> Thanks.
> I will think about it.
>
> Lawrence> I agree that the best long-term solution is an integrated
> compiler,
> Lawrence> interpreter, and debugger.  That's not likely to happen soon.
> :-)
>
> Sergio is re-opening our look into reusing GCC.
> Keith Seitz wrote a GCC plugin to try to let us farm out
> expression-parsing to the compiler.  This has various issues, some
> because gdb allows various C++ extensions that are useful when
> debugging; and also g++ was too slow.
> Even if g++ can't be used we at least hope this time to identify some of
> the things that make it slow and file a few bug reports...
>
> Lawrence> I don't know anything about gdb internals, so it may not be
> helpful
> Lawrence> for me to look at it.
>
> Sure, but maybe for a critique of the approach.  But only if you are
> interested.

Sure, send it.

-- 
Lawrence Crowl


Re: Debugging C++ Function Calls

2013-03-25 Thread Tom Tromey
> "Lawrence" == Lawrence Crowl  writes:

Tom> Sure, but maybe for a critique of the approach.  But only if you are
Tom> interested.

Lawrence> Sure, send it.

I think the intro text of this message provides the best summary of the
approach:

http://sourceware.org/ml/gdb-patches/2010-07/msg00284.html

Tom


Re: GCC 4.8.0 does not compile for DJGPP

2013-03-25 Thread Andris Pavenis

On 03/25/2013 08:02 PM, Fabrizio Gennari wrote:

Il 25/03/2013 00:00, Ian Lance Taylor ha scritto:


What failed without that patch?


In file included from ../../../gcc-4.80/libbacktrace/alloc.c:39:0:
../../../gcc-4.80/libbacktrace/internal.h:141:11: error: unknown type name 
‘off_t’
off_t offset, size_t size,
^
make[3]: *** [alloc.lo] Errore 1
make[3]: uscita dalla directory 
"/home/fabrizio/dev/djgpp/cross/gcc2/i586-pc-msdosdjgpp/libbacktrace"

internal.h (included by libbacktrace/alloc.c) uses off_t, which is not declared 
unless sys/types.h
is included

BTW, I am using beta/v2/djcrx204.zip for basic DJGPP headers and libraries, so 
no version 2.03
involved



That's also an ancient version. I have newer CVS version build at:

http://ap1.pp.fi/djgpp/djdev/djgpp/20130306/

(2013-Mar-06 CVS version, built in Linux using cross-compiler).

With version from beta/v2/djcrx204.zip one would run into next trouble when 
building
libstdc++-v3: both time.h and xmintrin86.h included and time.h contains 
incompatible
_rtdsc(). Fixed in later version by using gcc own _rdtsc with gcc-4.8+.

Andris




rfc: another switch optimization idea

2013-03-25 Thread Dinar Temirbulatov
Hi,
We noticed some performance gains if we are not using jump over some
simple switch statements. Here is the idea: Check whether the switch
statement can be expanded with conditional instructions. In that case
jump tables should be avoided since some branch instructions can be
eliminated in further passes (replaced by conditional execution).

   For example:
   switch (i)
   {
     case 1: sum += 1;
     case 2: sum += 3;
     case 3: sum += 5;
     case 4: sum += 10;
   }

Using jump tables the following code will be generated (ARM assembly):

   ldrcc pc, [pc, r0, lsl #2]
   b .L5
   .L0:
        .word L1
        .word L2
        .word L3
        .word L4

   .L1:
        add r3, #1
   .L2:
        add r3, #4
   .L3:
        add r3, #5
   .L4:
        add r3, #10
   .L5

Although this code has a constant complexity it can be improved by the
conditional execution to avoid implicit branching:

   cmp r0,1
   addeq r3, #1
   cmp r0,2
   addeq r3, #4
   cmp r0,3
   addeq r3, #5
   cmp r0,4
   addeq r3, #10

Although the assembly below requires more assembly instructions to be
executed, it doesn't violate the CPU pipeline (since no branching is
performed).

The original version of patch for was developed by Alexey Kravets. I
measured some performance improvements/regressions using spec 2000 int
benchmark on Samsumg's exynos 5250. Here is the result:

before:
                           Base      Base      Base      Peak
Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
                        
  
   164.gzip          1400        287       487*     1400       288       485*
   175.vpr           1400        376       373*     1400       374       374*
   176.gcc           1100        121       912*     1100       118       933*
   181.mcf           1800        242       743*     1800       251       718*
   186.crafty        1000        159       628*     1000       165       608*
   197.parser        1800       347       518*     1800       329       547*
   252.eon           1300       960       135*     1300       960       135*
   253.perlbmk       1800      214       842*     1800       212       848*
   254.gap           1100       138       797*     1100       136       806*
   255.vortex        1900       253       750*     1900       255       744*
   256.bzip2         1500       237       632*     1500       230       653*
   300.twolf                                 X                             X
   SPECint_base2000                       561
   SPECint2000                                                          563

After:
   164.gzip          1400   286       490    *     1400   288       486    *
   175.vpr           1400   213       656    *     1400   215       650    *
   176.gcc           1100   119       923    *     1100   118       933    *
   181.mcf          1800   247       730    *     1800   251       717    *
   186.crafty        1000   145       688    *     1000   150       664    *
   197.parser       1800   296       608    *     1800   275       654    *
   252.eon                                   X                             X
   253.perlbmk     1800   206       872    *     1800   211       853    *
   254.gap           1100   133       825    *     1100   131       838    *
   255.vortex        1900   241       789    *     1900   239       797    *
   256.bzip2         1500   235       638    *     1500   226       663    *
   300.twolf                                 X                             X

The error in 252.eon was due to incorrect setup. Also "if (count >
3*PARAM_VALUE (PARAM_SWITCH_JUMP_TABLES_BB_OPS_LIMIT))" does not look
correct, and probably it is better to move this code in the earlier
stage just before the gimple expand and keep preference expand state
(jump-tables or not) for every switch statement to avoid dealing with
the RTL altogether.

                     thanks, Dinar.


switch.patch
Description: Binary data


Re: rfc: another switch optimization idea

2013-03-25 Thread Ondřej Bílka
On Tue, Mar 26, 2013 at 01:23:58AM +0400, Dinar Temirbulatov wrote:
> Hi,
> We noticed some performance gains if we are not using jump over some
> simple switch statements. Here is the idea: Check whether the switch
> statement can be expanded with conditional instructions. In that case
> jump tables should be avoided since some branch instructions can be
> eliminated in further passes (replaced by conditional execution).
> 
>    For example:
>    switch (i)
>    {
>      case 1: sum += 1;
>      case 2: sum += 3;
>      case 3: sum += 5;
>      case 4: sum += 10;
>    }
> 
> Using jump tables the following code will be generated (ARM assembly):
> 
>    ldrcc pc, [pc, r0, lsl #2]
>    b .L5
>    .L0:
>         .word L1
>         .word L2
>         .word L3
>         .word L4
> 
>    .L1:
>         add r3, #1
>    .L2:
>         add r3, #4
>    .L3:
>         add r3, #5
>    .L4:
>         add r3, #10
>    .L5
> 
> Although this code has a constant complexity it can be improved by the
> conditional execution to avoid implicit branching:
> 
>    cmp r0,1
>    addeq r3, #1
>    cmp r0,2
>    addeq r3, #4
>    cmp r0,3
>    addeq r3, #5
>    cmp r0,4
>    addeq r3, #10
> 
> Although the assembly below requires more assembly instructions to be
> executed, it doesn't violate the CPU pipeline (since no branching is
> performed).
>
How simple are other expansions? You can rewrite this as
a={1,4,5,10} 
sum += a[i]

and in similar cases it is posible to set a[i]=0 to simulate nop.

In this example is also second optimization possible, jump table is not
neccessary and address can be computed directly.

> The original version of patch for was developed by Alexey Kravets. I
> measured some performance improvements/regressions using spec 2000 int
> benchmark on Samsumg's exynos 5250. Here is the result:
> 
> before:
>                            Base      Base      Base      Peak
> Peak      Peak
>    Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
>                         
>   
>    164.gzip          1400        287       487*     1400       288       485*
>    175.vpr           1400        376       373*     1400       374       374*
>    176.gcc           1100        121       912*     1100       118       933*
>    181.mcf           1800        242       743*     1800       251       718*
>    186.crafty        1000        159       628*     1000       165       608*
>    197.parser        1800       347       518*     1800       329       547*
>    252.eon           1300       960       135*     1300       960       135*
>    253.perlbmk       1800      214       842*     1800       212       848*
>    254.gap           1100       138       797*     1100       136       806*
>    255.vortex        1900       253       750*     1900       255       744*
>    256.bzip2         1500       237       632*     1500       230       653*
>    300.twolf                                 X                             X
>    SPECint_base2000                       561
>    SPECint2000                                                          563
> 
> After:
>    164.gzip          1400   286       490    *     1400   288       486    *
>    175.vpr           1400   213       656    *     1400   215       650    *
>    176.gcc           1100   119       923    *     1100   118       933    *
>    181.mcf          1800   247       730    *     1800   251       717    *
>    186.crafty        1000   145       688    *     1000   150       664    *
>    197.parser       1800   296       608    *     1800   275       654    *
>    252.eon                                   X                             X
>    253.perlbmk     1800   206       872    *     1800   211       853    *
>    254.gap           1100   133       825    *     1100   131       838    *
>    255.vortex        1900   241       789    *     1900   239       797    *
>    256.bzip2         1500   235       638    *     1500   226       663    *
>    300.twolf                                 X                             X
> 
> The error in 252.eon was due to incorrect setup. Also "if (count >
> 3*PARAM_VALUE (PARAM_SWITCH_JUMP_TABLES_BB_OPS_LIMIT))" does not look
> correct, and probably it is better to move this code in the earlier
> stage just before the gimple expand and keep preference expand state
> (jump-tables or not) for every switch statement to avoid dealing with
> the RTL altogether.
> 
>                      thanks, Dinar.