Re: %fs and %gs segments on x86/x86-64

2015-07-04 Thread Armin Rigo
Hi Richard,

On 3 July 2015 at 10:29, Richard Biener  wrote:
> It's nice to have the ability to test address-space issues on a
> commonly available target at least (not sure if adding runtime
> testcases is easy though).

It should be easy to add testcases that run only on CPUs with the
"fsgsbase" feature, using __builtin_ia32_wrgsbase64().  Either that,
or we have to rely on the Linux-specific system call
arch_prctl(ARCH_SET_GS).  Which option is preferred?  Of course
we can also try both depending on what is available.

Once %gs can be set, the test case can be as simple as setting it to
the address of some 4096-bytes array and checking that various ways to
access small %gs-based addresses really access the array instead of
segfaulting.

>> * One case in which this patched gcc miscompiles code is found in the
>> attached bug1.c/bug1.s.
>
> Hmm, without being able to dive into it with a debugger it's hard to tell ;)
> You might want to open a bugreport in bugzilla for this at least.

Ok, I will.  For reference, I'm not sure why you are not able to dive
into it with a debugger: the gcc patch and the test were included as
attachements...

>> * The extra byte needed for the "%gs:" prefix is not explicitly
>> accounted for.  Is it only by chance that I did not observe gcc
>> underestimating how large the code it writes is, and then e.g. use
>> jump instructions that would be rejected by the assembler?
>
> Yes, I think you are just lucky here.

Note that I suspect that gcc does overestimates that end up
compensating, otherwise pure luck would likely run out before the end
of the hundreds of MBs of C code.  But I agree it is still a bug.  I
will look into that more.


A bientôt,

Armin.


Live on Exit renaming.

2015-07-04 Thread Ajit Kumar Agarwal
All:

Design and Analysis of Profile-Based Optimization in Compaq's
 Compilation Tools for Alpha; Journal of Instruction-Level
 Parallelism 3 (2000) 1-25

The above paper based on this paper the existing tracer pass (This pass 
performs the tail duplication needed for superblock formation.) is 
Implemented in the GCC.

There is another optimization  that of interest in the above paper is the 
following.

Live on Exit Renamer:

This optimizations tries to remove a constraint that force the compiler to 
create long dependent chains of operations in unrolled loops.

The following example

While (a[i] != key)
Return I;

Fig(1)

Unrolled Loop:

1.While (a[i] == key)
{
   2.I = I +1;
  3. If(a[i] == key ) goto E
  4. I = i+1;
 5. If(a[i] == key) goto E
6.I = i+1;
7.}
8.E: return;

Fig(2)

Live on Exit renamer transformation.


While ( a[i] == key)
{
   I1 = I +1;
  If( a[i1] == key) goto E1
  I2 = i+2;
 If(a[i2] == key) goto E2;
I3 = i+3;
}
E: return I;
E1: I = i1 goto E
E2: I = i2 goto E


Fig(3).

The above transformation removes the Liveness of exits and make the unrolled 
loop non-overlapping. Thus the line 4 in Fig(2) cannot be moved 
Above 3 because of Live on Exit. The transformation in the Fig(3) remove the 
Live on Exits and the register allocator can be allocated with optimized
Register sets. This can form the non-overlapping regions in the unrolled loop.

I am not sure why the above optimization is not implemented in GCC.
If there is no specific reasons I would like the implement the same.

Thanks & Regards
Ajit




RE: Live on Exit renaming.

2015-07-04 Thread Ajit Kumar Agarwal
Sorry for the typo error.

The below is the corrected Fig (1).

While (a[i] != key)
I = i+1;
Return I;

  Fig (1).

Thanks & Regards
Ajit
-Original Message-
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Ajit 
Kumar Agarwal
Sent: Saturday, July 04, 2015 7:15 PM
To: l...@redhat.com; Richard Biener; gcc@gcc.gnu.org
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Live on Exit renaming.

All:

Design and Analysis of Profile-Based Optimization in Compaq's
 Compilation Tools for Alpha; Journal of Instruction-Level
 Parallelism 3 (2000) 1-25

The above paper based on this paper the existing tracer pass (This pass 
performs the tail duplication needed for superblock formation.) is Implemented 
in the GCC.

There is another optimization  that of interest in the above paper is the 
following.

Live on Exit Renamer:

This optimizations tries to remove a constraint that force the compiler to 
create long dependent chains of operations in unrolled loops.

The following example

While (a[i] != key)
Return I;

Fig(1)

Unrolled Loop:

1.While (a[i] == key)
{
   2.I = I +1;
  3. If(a[i] == key ) goto E
  4. I = i+1;
 5. If(a[i] == key) goto E
6.I = i+1;
7.}
8.E: return;

Fig(2)

Live on Exit renamer transformation.


While ( a[i] == key)
{
   I1 = I +1;
  If( a[i1] == key) goto E1
  I2 = i+2;
 If(a[i2] == key) goto E2;
I3 = i+3;
}
E: return I;
E1: I = i1 goto E
E2: I = i2 goto E


Fig(3).

The above transformation removes the Liveness of exits and make the unrolled 
loop non-overlapping. Thus the line 4 in Fig(2) cannot be moved Above 3 because 
of Live on Exit. The transformation in the Fig(3) remove the Live on Exits and 
the register allocator can be allocated with optimized Register sets. This can 
form the non-overlapping regions in the unrolled loop.

I am not sure why the above optimization is not implemented in GCC.
If there is no specific reasons I would like the implement the same.

Thanks & Regards
Ajit