[ARM] Implementing doloop pattern

2010-12-30 Thread Roman Zhuykov

Hello!

The main idea of the work described below was to estimate speedup we can 
gain from SMS on ARM.  SMS depends on doloop_end pattern and there is no 
appropriate instruction on ARM.  We decided to create a "fake" 
doloop_end pattern on ARM using a pair of "subs" and "bne" assembler 
instructions.  In implementation we used ideas from machine description 
files of other architectures, e. g. spu, which expands doloop_end 
pattern only when SMS is enabled.  The patch is attached.


This patch allows to use any possible register for the doloop pattern.  
It was tested on trunk snapshot from 30 Aug 2010.  It works fine on 
several small examples, but gives an ICE on sqlite-amalgamation-3.6.1 
source:

sqlite3.c: In function 'sqlite3WhereBegin':
sqlite3.c:76683:1: internal compiler error: in patch_jump_insn, at 
cfgrtl.c:1020


ICE happens in ira pass, when cleanup_cfg is called at the end or ira.

The "bad" instruction looks like
(jump_insn 3601 628 4065 76 (parallel [
(set (pc)
(if_then_else (ne (mem/c:SI (plus:SI (reg/f:SI 13 sp)
(const_int 36 [0x24])) [105 %sfp+-916 
S4 A32])

(const_int 1 [0x1]))
(label_ref 3600)
(pc)))
(set (mem/c:SI (plus:SI (reg/f:SI 13 sp)
(const_int 36 [0x24])) [105 %sfp+-916 S4 A32])
(plus:SI (mem/c:SI (plus:SI (reg/f:SI 13 sp)
(const_int 36 [0x24])) [105 %sfp+-916 S4 A32])
(const_int -1 [0x])))
]) sqlite3.c:75235 328 {doloop_end_internal}
 (expr_list:REG_BR_PROB (const_int 9100 [0x238c])
(nil))
 -> 3600)

So, the problem seems to be with ira.  Memory is used instead of a 
register to store doloop counter.  We tried to fix this by explicitly 
specifying hard register (r5) for doloop pattern.  The fixed version 
seems to work, but this doesn't look like a proper fix.  On trunk 
snapshot from 17 Dec 2010 the ICE described above have disappeared, but 
probably it's just a coincidence, and it will shop up anyway on some 
other test case.


The r5-fix shows the following results (compare "-O2 -fno-auto-inc-dec 
-fmodulo-sched" vs "-O2 -fno-auto-inc-dec").

Aburto benchmarks: heapsort and matmult - 3% speedup. nsieve - 7% slowdown.
Other aburto tests, sqlite tests and libevas rasterization library 
(expedite testsuite) show around zero results.


A motivating example shows about 23% speedup:

char scal (int n, char *a, char *b)
{
  int i;
  char s = 0;
  for (i = 0; i < n; i++)
s += a[i] * b[i];
  return s;
}

We have analyzed SMS results, and can conclude that if SMS has 
successfully built a schedule for the loop we usually gain a speedup, 
and when SMS fails, we often have some slowdown, which have appeared 
because of do-loop conversion.


The questions are:
How to properly fix the ICE described?
Do you think this approach (after the fixes) can make its way into trunk?

Happy holidays!
--
Roman Zhuykov

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 9d7310b..ab0373d 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -10699,6 +10699,49 @@
   "
 )
 
+(define_expand "doloop_end"
+   [(use (match_operand 0 "arm_general_register_operand" ""))  ; loop 
pseudo
+(use (match_operand 1 "" ""))  ; iterations; zero if unknown
+(use (match_operand 2 "" ""))  ; max iterations
+(use (match_operand 3 "" ""))  ; loop level
+(use (match_operand 4 "" ""))] ; label
+   ""
+   "
+ {
+   if (optimize > 0 && flag_modulo_sched)
+   {
+ /* Only use this on innermost loops. */
+ if (INTVAL (operands[3]) > 1)
+   FAIL;
+ if (GET_MODE (operands[0]) != SImode)
+   FAIL;
+ emit_jump_insn (gen_doloop_end_internal(operands[0], operands[4]));
+ DONE;
+   }else
+ FAIL;
+ }")
+
+(define_insn "doloop_end_internal"
+  [(set (pc) (if_then_else
+  (ne (match_operand:SI 0 "arm_general_register_operand" "")
+  (const_int 1))
+  (label_ref (match_operand 1 "" ""))
+  (pc)))
+  (set (match_dup 0)
+   (plus:SI (match_dup 0)
+   (const_int -1)))]
+  "TARGET_32BIT && optimize > 0 && flag_modulo_sched"
+  "*
+  if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
+{
+  arm_ccfsm_state += 2;
+}
+  return \"subs\\t%0, %0, #1\;bne\\t%l1\";
+  "
+  [(set_attr "length" "8")
+   (set_attr "type" "branch")]
+)
+
 ;; Load the load/store multiple patterns
 (include "ldmstm.md")
 ;; Load the FPA co-processor patterns


Re: [ARM] Implementing doloop pattern

2010-12-30 Thread Ulrich Weigand
Roman Zhuykov wrote:

> Memory is used instead of a register to store doloop counter.

Yes, this can happen, and your doloop insn pattern *must* be
able to handle this.  This is usually done via a splitter
(and possibly an additional scratch register allocated via
an extra insn operand).  See various other doloop implementations
for examples, like s390 or rs6000.

(The reason why the register allocator and/or reload cannot fix
this is: the doloop counter is an *output* as well as an input
to the isns, therefore it would require an output reload to fix;
however, the doloop insn is also a *jump* pattern, and those
must never have output reloads, since reload has no place to
put them.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [ARM] Implementing doloop pattern

2010-12-30 Thread Revital1 Eres
Hello,

The attached patch is my latest attempt to model doloop for arm.
I followed Chung-Lin Tang suggestion and used subs+jump similar to your
patch.
On crotex-A8 I see gain of 29% on autocor benchmark (telecom suite) with
SMS using the following flags: -fmodulo-sched-allow-regmoves
-funsafe-loop-optimizations -fmodulo-sched   -fno-auto-inc-dec
-fdump-rtl-sms -mthumb  -mcpu=cortex-a8 -O3. (compare to using only
-mthumb  -mcpu=cortex-a8 -O3)

I have not fully tested the patch and it's not in the proper format of
submission yet.

Thanks,
Revital

(See attached file: patch_arm_doloop.txt)



From:   Roman Zhuykov 
To: gcc@gcc.gnu.org
Cc: d...@ispras.ru
Date:   30/12/2010 04:04 PM
Subject:[ARM] Implementing doloop pattern
Sent by:gcc-ow...@gcc.gnu.org



Hello!

The main idea of the work described below was to estimate speedup we can
gain from SMS on ARM.  SMS depends on doloop_end pattern and there is no
appropriate instruction on ARM.  We decided to create a "fake"
doloop_end pattern on ARM using a pair of "subs" and "bne" assembler
instructions.  In implementation we used ideas from machine description
files of other architectures, e. g. spu, which expands doloop_end
pattern only when SMS is enabled.  The patch is attached.

This patch allows to use any possible register for the doloop pattern.
It was tested on trunk snapshot from 30 Aug 2010.  It works fine on
several small examples, but gives an ICE on sqlite-amalgamation-3.6.1
source:
sqlite3.c: In function 'sqlite3WhereBegin':
sqlite3.c:76683:1: internal compiler error: in patch_jump_insn, at
cfgrtl.c:1020

ICE happens in ira pass, when cleanup_cfg is called at the end or ira.

The "bad" instruction looks like
(jump_insn 3601 628 4065 76 (parallel [
 (set (pc)
 (if_then_else (ne (mem/c:SI (plus:SI (reg/f:SI 13 sp)
 (const_int 36 [0x24])) [105 %sfp+-916
S4 A32])
 (const_int 1 [0x1]))
 (label_ref 3600)
 (pc)))
 (set (mem/c:SI (plus:SI (reg/f:SI 13 sp)
 (const_int 36 [0x24])) [105 %sfp+-916 S4 A32])
 (plus:SI (mem/c:SI (plus:SI (reg/f:SI 13 sp)
 (const_int 36 [0x24])) [105 %sfp+-916 S4 A32])
 (const_int -1 [0x])))
 ]) sqlite3.c:75235 328 {doloop_end_internal}
  (expr_list:REG_BR_PROB (const_int 9100 [0x238c])
 (nil))
  -> 3600)

So, the problem seems to be with ira.  Memory is used instead of a
register to store doloop counter.  We tried to fix this by explicitly
specifying hard register (r5) for doloop pattern.  The fixed version
seems to work, but this doesn't look like a proper fix.  On trunk
snapshot from 17 Dec 2010 the ICE described above have disappeared, but
probably it's just a coincidence, and it will shop up anyway on some
other test case.

The r5-fix shows the following results (compare "-O2 -fno-auto-inc-dec
-fmodulo-sched" vs "-O2 -fno-auto-inc-dec").
Aburto benchmarks: heapsort and matmult - 3% speedup. nsieve - 7% slowdown.
Other aburto tests, sqlite tests and libevas rasterization library
(expedite testsuite) show around zero results.

A motivating example shows about 23% speedup:

char scal (int n, char *a, char *b)
{
   int i;
   char s = 0;
   for (i = 0; i < n; i++)
 s += a[i] * b[i];
   return s;
}

We have analyzed SMS results, and can conclude that if SMS has
successfully built a schedule for the loop we usually gain a speedup,
and when SMS fails, we often have some slowdown, which have appeared
because of do-loop conversion.

The questions are:
How to properly fix the ICE described?
Do you think this approach (after the fixes) can make its way into trunk?

Happy holidays!
--
Roman Zhuykov

[attachment "sms-doloop-any-reg.diff" deleted by Revital1 Eres/Haifa/IBM]Index: modulo-sched.c
===
--- modulo-sched.c  (revision 167637)
+++ modulo-sched.c  (working copy)
@@ -1021,7 +1021,8 @@ sms_schedule (void)
 if (CALL_P (insn)
 || BARRIER_P (insn)
 || (NONDEBUG_INSN_P (insn) && !JUMP_P (insn)
-&& !single_set (insn) && GET_CODE (PATTERN (insn)) != USE)
+&& !single_set (insn) && GET_CODE (PATTERN (insn)) != USE
+&& !reg_mentioned_p (count_reg, insn))
 || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)
 || (INSN_P (insn) && (set = single_set (insn))
 && GET_CODE (SET_DEST (set)) == SUBREG))
Index: loop-doloop.c
===
--- loop-doloop.c   (revision 167637)
+++ loop-doloop.c   (working copy)
@@ -96,7 +96,15 @@ doloop_condition_get (rtx doloop_pat)
  2)  (set (reg) (plus (reg) (const_int -1))
  (set (pc) (if_then_else (reg != 0)
 (label_ref (label))
-  

Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread Joseph S. Myers
On Thu, 30 Dec 2010, H.J. Lu wrote:

> Hi,
> 
> This patch adds 32bit x86-64 support to binutils. Support in compiler,
> library and OS is required to use it.  It can be used to implement the
> new 32bit OS for x86-64.  Any comments?

Do you have a public psABI document?  I think the psABI at the ELF level 
needs to come before the binutils bits, at the function call level needs 
to come before the GCC bits, etc.

You appear (judging by the support for Linux targets in the binutils 
patch) to envisage Linux support for this ABI.  How do you plan to avoid 
the problems that have plagued the MIPS n32 syscall ABI, which seems like 
a similar case?

(If you could arrange for the syscall ABI always to be the same as the 
existing 64-bit ABI, rather than needing to handle three different syscall 
ABIs in the kernel, that might be one solution, but it could have its own 
complexities in ensuring that none of the types whose layout forms part of 
the kernel/userspace interface have layout differing between n32 and the 
existing ABI; without any action, structures would tend to get layout 
similar to that of the existing 32-bit ABI, though quite possibly not the 
same depending on alignment peculiarities - I'm guessing that the new ABI 
will use natural alignment - while long long arguments would tend to be 
passed in a single register, resulting in the complicated hybrid syscall 
ABI present on MIPS.  If you do have an all-new syscall ABI rather than 
sharing the existing 64-bit one, I imagine it would need to follow the 
cut-down set of syscalls for new ports, so involving the issue of how to 
build glibc for that set of syscalls discussed three months ago in the 
Tilera context.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H.J. Lu
On Thu, Dec 30, 2010 at 10:42 AM, Joseph S. Myers
 wrote:
> On Thu, 30 Dec 2010, H.J. Lu wrote:
>
>> Hi,
>>
>> This patch adds 32bit x86-64 support to binutils. Support in compiler,
>> library and OS is required to use it.  It can be used to implement the
>> new 32bit OS for x86-64.  Any comments?
>
> Do you have a public psABI document?  I think the psABI at the ELF level
> needs to come before the binutils bits, at the function call level needs
> to come before the GCC bits, etc.

The psABI is the same as x86-64 psABI, except for 32bit ELF instead of
64bit.

> You appear (judging by the support for Linux targets in the binutils
> patch) to envisage Linux support for this ABI.  How do you plan to avoid

I enabled it for Linux so that I can run ILP32 binutils tests on Linux/x86-64.

> the problems that have plagued the MIPS n32 syscall ABI, which seems like
> a similar case?

Can you describe MIPS n32 problems?

> (If you could arrange for the syscall ABI always to be the same as the
> existing 64-bit ABI, rather than needing to handle three different syscall
> ABIs in the kernel, that might be one solution, but it could have its own
> complexities in ensuring that none of the types whose layout forms part of
> the kernel/userspace interface have layout differing between n32 and the
> existing ABI; without any action, structures would tend to get layout
> similar to that of the existing 32-bit ABI, though quite possibly not the
> same depending on alignment peculiarities - I'm guessing that the new ABI
> will use natural alignment - while long long arguments would tend to be
> passed in a single register, resulting in the complicated hybrid syscall
> ABI present on MIPS.  If you do have an all-new syscall ABI rather than
> sharing the existing 64-bit one, I imagine it would need to follow the
> cut-down set of syscalls for new ports, so involving the issue of how to
> build glibc for that set of syscalls discussed three months ago in the
> Tilera context.)
>

You are right.  Add ILP32 support to Linux kernel may be tricky.
We did some experiment to use IA32 syscall interface for ILP32:

[...@gnu-18 simple]$ make ilp32
make LDFLAGS="-m elf32_x86_64" CFLAGS="-g -O2 -D__i386__ -mn32"
make[1]: Entering directory `/export/gnu/import/svn/psABI/x86-64/ilp32/simple'
/export/build/gnu/gcc-ilp32/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-ilp32/build-x86_64-linux/gcc/ -g -O2
-D__i386__ -mn32 -c -D__ASSEMBLY__ start.S
/export/build/gnu/gcc-ilp32/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-ilp32/build-x86_64-linux/gcc/ -g -O2
-D__i386__ -mn32   -c -o simple.o simple.c
/export/build/gnu/gcc-ilp32/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc-ilp32/build-x86_64-linux/gcc/ -g -O2
-D__i386__ -mn32 -c -D__ASSEMBLY__ syscall.S
ld -m elf32_x86_64 -o simple start.o simple.o syscall.o
./simple This is a test.
This is a test.
Hello world
make[1]: Leaving directory `/export/gnu/import/svn/psABI/x86-64/ilp32/simple'
[...@gnu-18 simple]$ file simple
simple: ELF 32-bit LSB executable, x86-64, version 1 (SYSV),
statically linked, not stripped
[...@gnu-18 simple]$

I also have a patch for GDB:

[...@gnu-18 simple]$ ./gdb simple
GNU gdb (GDB) 7.2.50.20101229-cvs
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from
/export/gnu/import/svn/psABI/x86-64/ilp32/simple/simple...done.
(gdb) b main
Breakpoint 1 at 0x4000f0: file simple.c, line 25.
(gdb) r
Starting program: /export/gnu/import/svn/psABI/x86-64/ilp32/simple/simple

Breakpoint 1, main (argc=1, argv=0xd4f4) at simple.c:25
25  {
(gdb) info reg
rax0x0  0
rbx0x0  0
rcx0x0  0
rdx0x0  0
rsi0xd4f4   4294956276
rdi0x1  1
rbp0x0  0x0
rsp0xd4e8   0xd4e8
r8 0x0  0
r9 0x0  0
r100x0  0
r110x200512
r120x0  0
r130x0  0
r140x0  0
r150x0  0
rip0x4000f0 0x4000f0 
eflags 0x286[ PF SF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0  0
---Type  to continue, or q  to quit---
gs 0x0  0
(gdb) p $rsp
$1 = (void *) 0xd4e8
(gdb) p $sp
$2 = (void *) 0xd4e8
(gdb) p $pc
$3 = (void (*)()) 0x4000f0 
(gdb)

int syscall(int number, ...);

is implemented with

pushq %rbp
pushq %rbx
mov %edi, %eax  /* Syscall number -> rax.  */
mov %esi, %ebx

Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H. Peter Anvin
On 12/30/2010 10:59 AM, H.J. Lu wrote:
> 
>> (If you could arrange for the syscall ABI always to be the same as the
>> existing 64-bit ABI, rather than needing to handle three different syscall
>> ABIs in the kernel, that might be one solution, but it could have its own
>> complexities in ensuring that none of the types whose layout forms part of
>> the kernel/userspace interface have layout differing between n32 and the
>> existing ABI; without any action, structures would tend to get layout
>> similar to that of the existing 32-bit ABI, though quite possibly not the
>> same depending on alignment peculiarities - I'm guessing that the new ABI
>> will use natural alignment - while long long arguments would tend to be
>> passed in a single register, resulting in the complicated hybrid syscall
>> ABI present on MIPS.  If you do have an all-new syscall ABI rather than
>> sharing the existing 64-bit one, I imagine it would need to follow the
>> cut-down set of syscalls for new ports, so involving the issue of how to
>> build glibc for that set of syscalls discussed three months ago in the
>> Tilera context.)
>>
> 
> You are right.  Add ILP32 support to Linux kernel may be tricky.
> We did some experiment to use IA32 syscall interface for ILP32:
> 

The current plan is to simply use the 32-bit kernel ABI more or less
unmodified, although probably with a different entry point using syscall
rather than int 0x80 for performance.  In order for the ABI to map 1:1,
there needs to be a few concessions:

a) 64-bit arguments will need to be split in user space.
b) The Linux kernel  exported __u64 type will need to be declared
   __attribute__((aligned(4))).  This will only affect a handful of
   structures in practice since implicit padding is frowned upon.

(a) could also be fixed by a different syscall dispatch table, it's not
the hard part of this.  We definitely want to avoid adding a different
memory ABI; that's the part that hurts.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread Joseph S. Myers
On Thu, 30 Dec 2010, H.J. Lu wrote:

> On Thu, Dec 30, 2010 at 10:42 AM, Joseph S. Myers
>  wrote:
> > On Thu, 30 Dec 2010, H.J. Lu wrote:
> >
> >> Hi,
> >>
> >> This patch adds 32bit x86-64 support to binutils. Support in compiler,
> >> library and OS is required to use it.  It can be used to implement the
> >> new 32bit OS for x86-64.  Any comments?
> >
> > Do you have a public psABI document?  I think the psABI at the ELF level
> > needs to come before the binutils bits, at the function call level needs
> > to come before the GCC bits, etc.
> 
> The psABI is the same as x86-64 psABI, except for 32bit ELF instead of
> 64bit.

I don't think that's an adequate description.  If the "ILP32" name is 
accurate then it's certainly wrong at the C level since some type sizes 
are different, with effects in turn on such things as the description of 
initial stack layout storing argv.  At the ELF level are you saying that 
each relocation applies to a relocatable field of the same width as for 
64-bit (but with the width of the addend being restricted, of course)?  Do 
any relocations applying to word64 fields need 32-bit variants applying to 
word32 for the ILP32 ABI?

The right thing to do would be to go through the ABI sources and prepare a 
patch adding a description of the ILP32 ABI at least place where any 
change is needed.

> > the problems that have plagued the MIPS n32 syscall ABI, which seems like
> > a similar case?
> 
> Can you describe MIPS n32 problems?

Syscalls sometimes need three different versions in the kernel; sometimes 
the wrong version gets put in the n32 syscall table.  Special syscall 
wrappers are often needed in glibc; although for most purposes the glibc 
port is a 32-bit one, including having separate functions for 32-bit and 
64-bit off_t, syscalls tend to need to be called in the 64-bit way (long 
long values as single arguments, in particular).

> You are right.  Add ILP32 support to Linux kernel may be tricky.
> We did some experiment to use IA32 syscall interface for ILP32:

That seems likely to run into the structure layout issues I mentioned 
(long long only being 4-byte aligned for IA32, but 8-byte aligned for 
x86-64 and I presume for the new ABI).

> That means we pass 64bit integers to kernel the same way as ia32.

That has its own problems.  If the C prototype of a system call, at the 
userspace level, has a 64-bit integer argument, it's desirable that the 
argument is passed to the kernel in the same way as it is passed to the C 
function implementing the syscall.  This allows glibc's automatically 
generated syscall wrappers to work.  (Note how fanotify_mark is defined to 
use i:is for 32-bit, i:s for 64-bit - and i:s for MIPS n32 
because a 64-bit argument is passed in a single register for n32.  
Recently added syscalls have had care taken to position 64-bit arguments 
so that this sort of thing does work for 32-bit architectures, rather than 
leaving an unused register for alignment as can arise with bad positioning 
of the 64-bit arguments.)  MIPS n32 still needs special wrappers in 
various cases where generic Linux needs C wrappers (posix_fadvise, 
fallocate, posix_fallocate, for example) but avoids them for syscalls.list 
cases.

If the C ABI for 64-bit integers passes them in single registers (as is 
the natural adaptation of the 64-bit ABI), then such integers should also 
be passed to the kernel in single registers; otherwise you need custom 
wrappers for each affected syscall.  But then you certainly can't use the 
IA32 syscall ABI

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread David Daney

On 12/30/2010 10:59 AM, H.J. Lu wrote:

On Thu, Dec 30, 2010 at 10:42 AM, Joseph S. Myers
  wrote:

On Thu, 30 Dec 2010, H.J. Lu wrote:


Hi,

This patch adds 32bit x86-64 support to binutils. Support in compiler,
library and OS is required to use it.  It can be used to implement the
new 32bit OS for x86-64.  Any comments?


Do you have a public psABI document?  I think the psABI at the ELF level
needs to come before the binutils bits, at the function call level needs
to come before the GCC bits, etc.


The psABI is the same as x86-64 psABI, except for 32bit ELF instead of
64bit.


You appear (judging by the support for Linux targets in the binutils
patch) to envisage Linux support for this ABI.  How do you plan to avoid


I enabled it for Linux so that I can run ILP32 binutils tests on Linux/x86-64.


the problems that have plagued the MIPS n32 syscall ABI, which seems like
a similar case?


Can you describe MIPS n32 problems?



I can.  As Joseph indicated, any syscall that passes data in memory 
(ioctl, {set,get}sockopt, etc) potentially must have a translation done 
between kernel and user ABIs.


Currently this is done in kernel/compat.c fs/compat_binfmt_elf.c and 
fs/compat_ioctl.c as well as a bunch of architecture specific ad hoc 
code.  Look at the change history for those files to see that there is 
an unending flow of bugs being fixed due to this ABI mismatch.


Even today there are many obscure ioctls that don't work on MIPS n32. 
Most of the code works most of the time, but then someone tries 
something new, and BAM! ABI mismatch hits anew.


My suggestion:  Since people already spend a great deal of effort 
maintaining the existing i386 compatible Linux syscall infrastructure, 
make your new 32-bit x86-64 Linux syscall ABI identical to the existing 
i386 syscall ABI.  This means that the psABI must use the same size and 
alignment rules for in-memory structures as the i386 does.


David Daney


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H.J. Lu
On Thu, Dec 30, 2010 at 11:30 AM, Joseph S. Myers
 wrote:
> On Thu, 30 Dec 2010, H.J. Lu wrote:
>
>> On Thu, Dec 30, 2010 at 10:42 AM, Joseph S. Myers
>>  wrote:
>> > On Thu, 30 Dec 2010, H.J. Lu wrote:
>> >
>> >> Hi,
>> >>
>> >> This patch adds 32bit x86-64 support to binutils. Support in compiler,
>> >> library and OS is required to use it.  It can be used to implement the
>> >> new 32bit OS for x86-64.  Any comments?
>> >
>> > Do you have a public psABI document?  I think the psABI at the ELF level
>> > needs to come before the binutils bits, at the function call level needs
>> > to come before the GCC bits, etc.
>>
>> The psABI is the same as x86-64 psABI, except for 32bit ELF instead of
>> 64bit.
>
> I don't think that's an adequate description.  If the "ILP32" name is
> accurate then it's certainly wrong at the C level since some type sizes
> are different, with effects in turn on such things as the description of
> initial stack layout storing argv.  At the ELF level are you saying that
> each relocation applies to a relocatable field of the same width as for
> 64-bit (but with the width of the addend being restricted, of course)?  Do
> any relocations applying to word64 fields need 32-bit variants applying to
> word32 for the ILP32 ABI?

ILP32 uses small model in x86-64 psABI.

> The right thing to do would be to go through the ABI sources and prepare a
> patch adding a description of the ILP32 ABI at least place where any
> change is needed.

That is on my TODO list. I am planning to publish an ILP32 psABI.

>> > the problems that have plagued the MIPS n32 syscall ABI, which seems like
>> > a similar case?
>>
>> Can you describe MIPS n32 problems?
>
> Syscalls sometimes need three different versions in the kernel; sometimes
> the wrong version gets put in the n32 syscall table.  Special syscall
> wrappers are often needed in glibc; although for most purposes the glibc
> port is a 32-bit one, including having separate functions for 32-bit and
> 64-bit off_t, syscalls tend to need to be called in the 64-bit way (long
> long values as single arguments, in particular).
>
>> You are right.  Add ILP32 support to Linux kernel may be tricky.
>> We did some experiment to use IA32 syscall interface for ILP32:
>
> That seems likely to run into the structure layout issues I mentioned
> (long long only being 4-byte aligned for IA32, but 8-byte aligned for
> x86-64 and I presume for the new ABI).
>
>> That means we pass 64bit integers to kernel the same way as ia32.
>
> That has its own problems.  If the C prototype of a system call, at the
> userspace level, has a 64-bit integer argument, it's desirable that the
> argument is passed to the kernel in the same way as it is passed to the C
> function implementing the syscall.  This allows glibc's automatically
> generated syscall wrappers to work.  (Note how fanotify_mark is defined to
> use i:is for 32-bit, i:s for 64-bit - and i:s for MIPS n32
> because a 64-bit argument is passed in a single register for n32.
> Recently added syscalls have had care taken to position 64-bit arguments
> so that this sort of thing does work for 32-bit architectures, rather than
> leaving an unused register for alignment as can arise with bad positioning
> of the 64-bit arguments.)  MIPS n32 still needs special wrappers in
> various cases where generic Linux needs C wrappers (posix_fadvise,
> fallocate, posix_fallocate, for example) but avoids them for syscalls.list
> cases.
>
> If the C ABI for 64-bit integers passes them in single registers (as is
> the natural adaptation of the 64-bit ABI), then such integers should also
> be passed to the kernel in single registers; otherwise you need custom
> wrappers for each affected syscall.  But then you certainly can't use the
> IA32 syscall ABI
>

We will address those issues if we want to support ILP32 on Linux.

-- 
H.J.


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread Richard Guenther
On Thu, Dec 30, 2010 at 8:30 PM, Joseph S. Myers
 wrote:
> On Thu, 30 Dec 2010, H.J. Lu wrote:
>
>> On Thu, Dec 30, 2010 at 10:42 AM, Joseph S. Myers
>>  wrote:
>> > On Thu, 30 Dec 2010, H.J. Lu wrote:
>> >
>> >> Hi,
>> >>
>> >> This patch adds 32bit x86-64 support to binutils. Support in compiler,
>> >> library and OS is required to use it.  It can be used to implement the
>> >> new 32bit OS for x86-64.  Any comments?
>> >
>> > Do you have a public psABI document?  I think the psABI at the ELF level
>> > needs to come before the binutils bits, at the function call level needs
>> > to come before the GCC bits, etc.
>>
>> The psABI is the same as x86-64 psABI, except for 32bit ELF instead of
>> 64bit.
>
> I don't think that's an adequate description.  If the "ILP32" name is
> accurate then it's certainly wrong at the C level since some type sizes
> are different, with effects in turn on such things as the description of
> initial stack layout storing argv.  At the ELF level are you saying that
> each relocation applies to a relocatable field of the same width as for
> 64-bit (but with the width of the addend being restricted, of course)?  Do
> any relocations applying to word64 fields need 32-bit variants applying to
> word32 for the ILP32 ABI?
>
> The right thing to do would be to go through the ABI sources and prepare a
> patch adding a description of the ILP32 ABI at least place where any
> change is needed.
>
>> > the problems that have plagued the MIPS n32 syscall ABI, which seems like
>> > a similar case?
>>
>> Can you describe MIPS n32 problems?
>
> Syscalls sometimes need three different versions in the kernel; sometimes
> the wrong version gets put in the n32 syscall table.  Special syscall
> wrappers are often needed in glibc; although for most purposes the glibc
> port is a 32-bit one, including having separate functions for 32-bit and
> 64-bit off_t, syscalls tend to need to be called in the 64-bit way (long
> long values as single arguments, in particular).

Would be nice if LFS would be mandatory on the new ABI, thus
off_t being 64bits.

Richard.


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread Jakub Jelinek
On Thu, Dec 30, 2010 at 08:53:32PM +0100, Richard Guenther wrote:
> > Syscalls sometimes need three different versions in the kernel; sometimes
> > the wrong version gets put in the n32 syscall table.  Special syscall
> > wrappers are often needed in glibc; although for most purposes the glibc
> > port is a 32-bit one, including having separate functions for 32-bit and
> > 64-bit off_t, syscalls tend to need to be called in the 64-bit way (long
> > long values as single arguments, in particular).
> 
> Would be nice if LFS would be mandatory on the new ABI, thus
> off_t being 64bits.

And avoid ambiguous cases that x86-64 ABI has, e.g. whether
caller or callee is responsible for sign/zero extension of arguments, to
avoid the need to sign/zero extend twice, etc.

Jakub


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H.J. Lu
On Thu, Dec 30, 2010 at 11:40 AM, H.J. Lu  wrote:
> On Thu, Dec 30, 2010 at 11:30 AM, Joseph S. Myers
>  wrote:
>> On Thu, 30 Dec 2010, H.J. Lu wrote:
>>
>>> On Thu, Dec 30, 2010 at 10:42 AM, Joseph S. Myers
>>>  wrote:
>>> > On Thu, 30 Dec 2010, H.J. Lu wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> This patch adds 32bit x86-64 support to binutils. Support in compiler,
>>> >> library and OS is required to use it.  It can be used to implement the
>>> >> new 32bit OS for x86-64.  Any comments?
>>> >
>>> > Do you have a public psABI document?  I think the psABI at the ELF level
>>> > needs to come before the binutils bits, at the function call level needs
>>> > to come before the GCC bits, etc.
>>>
>>> The psABI is the same as x86-64 psABI, except for 32bit ELF instead of
>>> 64bit.
>>
>> I don't think that's an adequate description.  If the "ILP32" name is
>> accurate then it's certainly wrong at the C level since some type sizes
>> are different, with effects in turn on such things as the description of
>> initial stack layout storing argv.  At the ELF level are you saying that
>> each relocation applies to a relocatable field of the same width as for
>> 64-bit (but with the width of the addend being restricted, of course)?  Do
>> any relocations applying to word64 fields need 32-bit variants applying to
>> word32 for the ILP32 ABI?
>
> ILP32 uses small model in x86-64 psABI.
>

Here is the ILP32 psABI:

http://www.kernel.org/pub/linux/devel/binutils/ilp32/


-- 
H.J.


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H. Peter Anvin
On 12/30/2010 11:53 AM, Richard Guenther wrote:
> 
> Would be nice if LFS would be mandatory on the new ABI, thus
> off_t being 64bits.
> 

Yes, although that's a higher-order thing.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H. Peter Anvin
On 12/30/2010 11:34 AM, David Daney wrote:
> 
> My suggestion:  Since people already spend a great deal of effort
> maintaining the existing i386 compatible Linux syscall infrastructure,
> make your new 32-bit x86-64 Linux syscall ABI identical to the existing
> i386 syscall ABI.  This means that the psABI must use the same size and
> alignment rules for in-memory structures as the i386 does.
> 

No, it doesn't.  It just means it need to do so *for the types used by
the kernel*.  The kernel uses types like __u64, which would indeed have
to be declared aligned(4).

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H.J. Lu
On Thu, Dec 30, 2010 at 12:02 PM, H.J. Lu  wrote:
>
> Here is the ILP32 psABI:
>
> http://www.kernel.org/pub/linux/devel/binutils/ilp32/
>

I put my x86-64 psABI changes at:

http://git.kernel.org/?p=devel/binutils/hjl/x86-64-psabi.git;a=summary

Please send me patches to improve the ILP32 psABI.

Thanks.


-- 
H.J.


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread David Daney

On 12/30/2010 12:12 PM, H. Peter Anvin wrote:

On 12/30/2010 11:34 AM, David Daney wrote:


My suggestion:  Since people already spend a great deal of effort
maintaining the existing i386 compatible Linux syscall infrastructure,
make your new 32-bit x86-64 Linux syscall ABI identical to the existing
i386 syscall ABI.  This means that the psABI must use the same size and
alignment rules for in-memory structures as the i386 does.



No, it doesn't.  It just means it need to do so *for the types used by
the kernel*.  The kernel uses types like __u64, which would indeed have
to be declared aligned(4).



Some legacy interfaces don't use fixed width types.  There almost 
certainly are some ioctls that don't use your fancy __u64.


Then there are things like ppoll() that take a pointer to:

   struct timespec {
   longtv_sec; /* seconds */
   longtv_nsec;/* nanoseconds */
   };

There are no fields in there that are controlled by __u64 either. 
Admittedly this case might not differ between the two 32-bit ABIs, but 
it shows that __u64/__u32 are not universally used in the Linux syscall 
ABIs.


If you are happy with potential memory layout differences between the 
two 32-bit ABIs, then don't specify that they are the same.  But don't 
claim that use of __u64/__u32 covers all cases.


David Daney


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H.J. Lu
On Thu, Dec 30, 2010 at 12:27 PM, David Daney  wrote:
> On 12/30/2010 12:12 PM, H. Peter Anvin wrote:
>>
>> On 12/30/2010 11:34 AM, David Daney wrote:
>>>
>>> My suggestion:  Since people already spend a great deal of effort
>>> maintaining the existing i386 compatible Linux syscall infrastructure,
>>> make your new 32-bit x86-64 Linux syscall ABI identical to the existing
>>> i386 syscall ABI.  This means that the psABI must use the same size and
>>> alignment rules for in-memory structures as the i386 does.
>>>
>>
>> No, it doesn't.  It just means it need to do so *for the types used by
>> the kernel*.  The kernel uses types like __u64, which would indeed have
>> to be declared aligned(4).
>>
>
> Some legacy interfaces don't use fixed width types.  There almost certainly
> are some ioctls that don't use your fancy __u64.
>
> Then there are things like ppoll() that take a pointer to:
>
>           struct timespec {
>               long    tv_sec;         /* seconds */
>               long    tv_nsec;        /* nanoseconds */
>           };
>
> There are no fields in there that are controlled by __u64 either. Admittedly
> this case might not differ between the two 32-bit ABIs, but it shows that
> __u64/__u32 are not universally used in the Linux syscall ABIs.
>
> If you are happy with potential memory layout differences between the two
> 32-bit ABIs, then don't specify that they are the same.  But don't claim
> that use of __u64/__u32 covers all cases.

We can put a syscall wrapper to translate it.


-- 
H.J.


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread Joseph S. Myers
On Thu, 30 Dec 2010, Richard Guenther wrote:

> Would be nice if LFS would be mandatory on the new ABI, thus
> off_t being 64bits.

That's certainly abstractly better (and something BSDs do better than 
GNU/Linux).  I expect you'd run into a few complications actually making a 
32-bit glibc port like that (lots of the generic 32-bit code will want to 
build separate 32-bit and 64-bit versions as functions; maybe it will be 
easy to build things with those just ending up as duplicates, but making 
them aliases, as they generally are for 64-bit, could be harder).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread David Daney

On 12/30/2010 12:28 PM, H.J. Lu wrote:

On Thu, Dec 30, 2010 at 12:27 PM, David Daney  wrote:

On 12/30/2010 12:12 PM, H. Peter Anvin wrote:


On 12/30/2010 11:34 AM, David Daney wrote:


My suggestion:  Since people already spend a great deal of effort
maintaining the existing i386 compatible Linux syscall infrastructure,
make your new 32-bit x86-64 Linux syscall ABI identical to the existing
i386 syscall ABI.  This means that the psABI must use the same size and
alignment rules for in-memory structures as the i386 does.



No, it doesn't.  It just means it need to do so *for the types used by
the kernel*.  The kernel uses types like __u64, which would indeed have
to be declared aligned(4).



Some legacy interfaces don't use fixed width types.  There almost certainly
are some ioctls that don't use your fancy __u64.

Then there are things like ppoll() that take a pointer to:

   struct timespec {
   longtv_sec; /* seconds */
   longtv_nsec;/* nanoseconds */
   };

There are no fields in there that are controlled by __u64 either. Admittedly
this case might not differ between the two 32-bit ABIs, but it shows that
__u64/__u32 are not universally used in the Linux syscall ABIs.

If you are happy with potential memory layout differences between the two
32-bit ABIs, then don't specify that they are the same.  But don't claim
that use of __u64/__u32 covers all cases.


We can put a syscall wrapper to translate it.



Of course you can.

But you are starting with a blank slate, you should be asking yourself 
why you would want to.


What is your objective here?  Is it:

1) Fastest time to a relatively bug free useful system?

or

2) Purity of ABI design?


What would the performance penalty be for identical structure layout 
between the two 32-bit ABIs?


Really I don't care one way or the other.  The necessity of syscall 
wrappers is actually probably beneficial to me.  It will create a 
greater future employment demand for people with the necessary skills to 
write them.


David Daney


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread Robert Millan
Hi folks,

I had this unsubmitted patch in my local filesystem.  It makes Linux
detect ELF32 AMD64 binaries and sets a flag to restrict them to
32-bit address space.

It's not rocket science but can save you some work in case you
haven't implemented this already.

Best regards

-- 
Robert Millan
diff -Nur linux-2.6-2.6.26-libre2.old/arch/x86/kernel/sys_x86_64.c 
linux-2.6-2.6.26-libre2/arch/x86/kernel/sys_x86_64.c
--- linux-2.6-2.6.26-libre2.old/arch/x86/kernel/sys_x86_64.c2008-07-13 
23:51:29.0 +0200
+++ linux-2.6-2.6.26-libre2/arch/x86/kernel/sys_x86_64.c2009-05-29 
22:57:41.0 +0200
@@ -48,7 +48,7 @@
 static void find_start_end(unsigned long flags, unsigned long *begin,
   unsigned long *end)
 {
-   if (!test_thread_flag(TIF_IA32) && (flags & MAP_32BIT)) {
+   if ((!test_thread_flag(TIF_IA32) && (flags & MAP_32BIT)) || 
test_thread_flag(TIF_AMD32)) {
unsigned long new_begin;
/* This is usually used needed to map code in small
   model, so it needs to be in the first 31bit. Limit
@@ -94,7 +94,7 @@
(!vma || addr + len <= vma->vm_start))
return addr;
}
-   if (((flags & MAP_32BIT) || test_thread_flag(TIF_IA32))
+   if (((flags & MAP_32BIT) || test_thread_flag(TIF_IA32) || 
test_thread_flag(TIF_AMD32))
&& len <= mm->cached_hole_size) {
mm->cached_hole_size = 0;
mm->free_area_cache = begin;
@@ -150,8 +150,8 @@
if (flags & MAP_FIXED)
return addr;
 
-   /* for MAP_32BIT mappings we force the legact mmap base */
-   if (!test_thread_flag(TIF_IA32) && (flags & MAP_32BIT))
+   /* for MAP_32BIT mappings we force the legacy mmap base */
+   if ((!test_thread_flag(TIF_IA32) && (flags & MAP_32BIT)) || 
test_thread_flag(TIF_AMD32))
goto bottomup;
 
/* requesting a specific address */
@@ -232,5 +232,7 @@
up_read(&uts_sem);
if (personality(current->personality) == PER_LINUX32) 
err |= copy_to_user(&name->machine, "i686", 5); 
+   else if (test_thread_flag(TIF_AMD32))
+   err |= copy_to_user(&name->machine, "amd32", 6);
return err ? -EFAULT : 0;
 }
diff -Nur linux-2.6-2.6.26-libre2.old/arch/x86/mm/mmap.c 
linux-2.6-2.6.26-libre2/arch/x86/mm/mmap.c
--- linux-2.6-2.6.26-libre2.old/arch/x86/mm/mmap.c  2008-07-13 
23:51:29.0 +0200
+++ linux-2.6-2.6.26-libre2/arch/x86/mm/mmap.c  2009-05-26 14:30:53.0 
+0200
@@ -53,6 +53,15 @@
return 0;
 }
 
+static int mmap_is_32bit(void)
+{
+   if (mmap_is_ia32 ())
+   return 1;
+   if (test_thread_flag(TIF_AMD32))
+   return 1;
+   return 0;
+}
+
 static int mmap_is_legacy(void)
 {
if (current->personality & ADDR_COMPAT_LAYOUT)
@@ -73,7 +82,7 @@
* 28 bits of randomness in 64bit mmaps, 40 address space bits
*/
if (current->flags & PF_RANDOMIZE) {
-   if (mmap_is_ia32())
+   if (mmap_is_32bit())
rnd = (long)get_random_int() % (1<<8);
else
rnd = (long)(get_random_int() % (1<<28));
diff -Nur linux-2.6-2.6.26-libre2.old/fs/binfmt_elf_amd32.c 
linux-2.6-2.6.26-libre2/fs/binfmt_elf_amd32.c
--- linux-2.6-2.6.26-libre2.old/fs/binfmt_elf_amd32.c   1970-01-01 
01:00:00.0 +0100
+++ linux-2.6-2.6.26-libre2/fs/binfmt_elf_amd32.c   2009-05-26 
14:26:24.0 +0200
@@ -0,0 +1,46 @@
+/*
+ *  Support for loading AMD32 binaries
+ *  Copyright (C) 2009  Robert Millan
+ *
+ *  This program is free software: you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation, either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program.  If not, see .
+ */
+
+#include 
+
+#undef ELF_CLASS
+#defineELF_CLASS   ELFCLASS32
+
+#undef elfhdr
+#define elfhdr elf32_hdr
+#undef elf_phdr
+#define elf_phdr   elf32_phdr
+#undef elf_note
+#define elf_note   elf32_note
+#undef elf_addr_t
+#define elf_addr_t Elf32_Addr
+
+#undef ELF_PLATFORM
+#defineELF_PLATFORM("amd32")
+
+#undef elf_check_arch
+#defineelf_check_arch(x)   ((x)->e_machine == EM_X86_64 && 
(x)->e_ident[EI_CLASS] == ELFCLASS32)
+
+#undef SET_PERSONALITY
+#define SET_PERSONALITY(ex, ibcs2) do { set_personality_64bit(); 
set_thread_flag(TIF_AMD32); current->personality |= force_personality32;

Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H. Peter Anvin
I believe it covers all cases *relevant for this particular situation* (unlike, 
say, MIPS) and that any deviation is a bug which can and should be fixed.

"David Daney"  wrote:

>On 12/30/2010 12:12 PM, H. Peter Anvin wrote:
>> On 12/30/2010 11:34 AM, David Daney wrote:
>>>
>>> My suggestion:  Since people already spend a great deal of effort
>>> maintaining the existing i386 compatible Linux syscall
>infrastructure,
>>> make your new 32-bit x86-64 Linux syscall ABI identical to the
>existing
>>> i386 syscall ABI.  This means that the psABI must use the same size
>and
>>> alignment rules for in-memory structures as the i386 does.
>>>
>>
>> No, it doesn't.  It just means it need to do so *for the types used
>by
>> the kernel*.  The kernel uses types like __u64, which would indeed
>have
>> to be declared aligned(4).
>>
>
>Some legacy interfaces don't use fixed width types.  There almost 
>certainly are some ioctls that don't use your fancy __u64.
>
>Then there are things like ppoll() that take a pointer to:
>
>struct timespec {
>longtv_sec; /* seconds */
>longtv_nsec;/* nanoseconds */
>};
>
>There are no fields in there that are controlled by __u64 either. 
>Admittedly this case might not differ between the two 32-bit ABIs, but 
>it shows that __u64/__u32 are not universally used in the Linux syscall
>
>ABIs.
>
>If you are happy with potential memory layout differences between the 
>two 32-bit ABIs, then don't specify that they are the same.  But don't 
>claim that use of __u64/__u32 covers all cases.
>
>David Daney

-- 
Sent from my mobile phone.  Please pardon any lack of formatting.


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H. Peter Anvin
We do have a slightly more extensive patch already implemented.

"Robert Millan"  wrote:

>Hi folks,
>
>I had this unsubmitted patch in my local filesystem.  It makes Linux
>detect ELF32 AMD64 binaries and sets a flag to restrict them to
>32-bit address space.
>
>It's not rocket science but can save you some work in case you
>haven't implemented this already.
>
>Best regards
>
>-- 
>Robert Millan

-- 
Sent from my mobile phone.  Please pardon any lack of formatting.


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H.J. Lu
I also have a patch for gcc 4.4 which works on simple codes.


H.J.
On Thu, Dec 30, 2010 at 1:31 PM, H. Peter Anvin  wrote:
> We do have a slightly more extensive patch already implemented.
>
> "Robert Millan"  wrote:
>
>>Hi folks,
>>
>>I had this unsubmitted patch in my local filesystem.  It makes Linux
>>detect ELF32 AMD64 binaries and sets a flag to restrict them to
>>32-bit address space.
>>
>>It's not rocket science but can save you some work in case you
>>haven't implemented this already.
>>
>>Best regards
>>
>>--
>>Robert Millan
>
> --
> Sent from my mobile phone.  Please pardon any lack of formatting.
>



-- 
H.J.


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H. Peter Anvin
On 12/30/2010 12:39 PM, David Daney wrote:
> 
> Really I don't care one way or the other.  The necessity of syscall
> wrappers is actually probably beneficial to me.  It will create a
> greater future employment demand for people with the necessary skills to
> write them.
> 

Or perhaps automatic generation will actually get implemented.  I wrote
an automatic syscall wrapper generator for klibc; one of the best design
decisions I made for that project.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H. Peter Anvin
On 12/30/2010 11:57 AM, Jakub Jelinek wrote:
>>
>> Would be nice if LFS would be mandatory on the new ABI, thus
>> off_t being 64bits.
> 
> And avoid ambiguous cases that x86-64 ABI has, e.g. whether
> caller or callee is responsible for sign/zero extension of arguments, to
> avoid the need to sign/zero extend twice, etc.
> 

Ehwhat?  x86-64 is completely unambiguous on that point; the i386 one is
not.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread Robert Millan
2010/12/30 H.J. Lu :
> I also have a patch for gcc 4.4 which works on simple codes.
>
> H.J.
> On Thu, Dec 30, 2010 at 1:31 PM, H. Peter Anvin  wrote:
>> We do have a slightly more extensive patch already implemented.

Could you make those patches available somewhere?  It'd be
interesting to play with them.

Btw, I recommend against 8-byte longs.  In the tests I did in
2009, I recall glibc source was extremely unhappy due to
sizeof(long)==sizeof(void *) assumptions.

-- 
Robert Millan


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread Robert Millan
2010/12/30 Richard Guenther :
> Would be nice if LFS would be mandatory on the new ABI, thus
> off_t being 64bits.

Please do also consider time_t.

-- 
Robert Millan


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H.J. Lu
On Thu, Dec 30, 2010 at 2:18 PM, Robert Millan  wrote:
> 2010/12/30 H.J. Lu :
>> I also have a patch for gcc 4.4 which works on simple codes.
>>
>> H.J.
>> On Thu, Dec 30, 2010 at 1:31 PM, H. Peter Anvin  wrote:
>>> We do have a slightly more extensive patch already implemented.
>
> Could you make those patches available somewhere?  It'd be
> interesting to play with them.
>
> Btw, I recommend against 8-byte longs.  In the tests I did in
> 2009, I recall glibc source was extremely unhappy due to
> sizeof(long)==sizeof(void *) assumptions.
>

ILP32 psABI specifies 4byte for long.

-- 
H.J.


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H. Peter Anvin
On 12/30/2010 02:18 PM, Robert Millan wrote:
> 2010/12/30 H.J. Lu :
>> I also have a patch for gcc 4.4 which works on simple codes.
>>
>> H.J.
>> On Thu, Dec 30, 2010 at 1:31 PM, H. Peter Anvin  wrote:
>>> We do have a slightly more extensive patch already implemented.
> 
> Could you make those patches available somewhere?  It'd be
> interesting to play with them.
> 
> Btw, I recommend against 8-byte longs.  In the tests I did in
> 2009, I recall glibc source was extremely unhappy due to
> sizeof(long)==sizeof(void *) assumptions.
> 

Yes, it's ILP32.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H. Peter Anvin
On 12/30/2010 02:21 PM, Robert Millan wrote:
> 2010/12/30 Richard Guenther :
>> Would be nice if LFS would be mandatory on the new ABI, thus
>> off_t being 64bits.
> 
> Please do also consider time_t.
> 

Changing the kernel-facing time_t might completely wreck the reuse of
the i386 kernel ABI; I'm not sure.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



gcc-4.5-20101230 is now available

2010-12-30 Thread gccadmin
Snapshot gcc-4.5-20101230 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20101230/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch 
revision 168358

You'll find:

 gcc-4.5-20101230.tar.bz2 Complete GCC (includes all of below)

  MD5=695b2beb33f5458f831cd1394b931c3b
  SHA1=77c6b2489f2e7bfc27e10550992d63176b1ac005

 gcc-core-4.5-20101230.tar.bz2C front end and core compiler

  MD5=1234442a665ee2900acb2adac2e611ab
  SHA1=0cc319f345f73923ef454af6e1d0b2847a54dbff

 gcc-ada-4.5-20101230.tar.bz2 Ada front end and runtime

  MD5=c09fd0ab1f67bdb3da14914362415d47
  SHA1=2549d07cbdef80a61746c0c9055c75f43578f997

 gcc-fortran-4.5-20101230.tar.bz2 Fortran front end and runtime

  MD5=1010baa67eb6f0550d27e0dcb7681b94
  SHA1=d8617384165fa793217e9fd604a815c11480c133

 gcc-g++-4.5-20101230.tar.bz2 C++ front end and runtime

  MD5=84e10dc9f1b6e5c1c8370976d774f2fb
  SHA1=f3012f1fee3c5ab8b6ba100dba1b757c6c00eac0

 gcc-go-4.5-20101230.tar.bz2  Go front end and runtime

  MD5=3ca2b4a9c62c4c574940c2bf6c48d3f3
  SHA1=89af4744b91d4ddb7533a9831945c03e305ea432

 gcc-java-4.5-20101230.tar.bz2Java front end and runtime

  MD5=319f44b8d108e46464306e9d84d68fae
  SHA1=a94866fc1ad2fabb482cbf7e897e165a0f5b4137

 gcc-objc-4.5-20101230.tar.bz2Objective-C front end and runtime

  MD5=f60ad6067101d21c8bbcb284bd3e292b
  SHA1=2bd9afbf293b2d5e3da0eb4dbef886b5e898322a

 gcc-testsuite-4.5-20101230.tar.bz2   The GCC testsuite

  MD5=66c4c3ab4fedbccbd4a2167c0695f41c
  SHA1=d0b6264ee00ed9c2e6d9cc25201e09df219f685a

Diffs from 4.5-20101223 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread Joseph S. Myers
On Thu, 30 Dec 2010, H. Peter Anvin wrote:

> On 12/30/2010 02:21 PM, Robert Millan wrote:
> > 2010/12/30 Richard Guenther :
> >> Would be nice if LFS would be mandatory on the new ABI, thus
> >> off_t being 64bits.
> > 
> > Please do also consider time_t.
> > 
> 
> Changing the kernel-facing time_t might completely wreck the reuse of
> the i386 kernel ABI; I'm not sure.

Before changing time_t for a new ILP32 ABI, you probably want to work out 
what is required - on both the libc and kernel sides - to change it for 
existing 32-bit ABIs (whether providing a new ABI variant like 
_FILE_OFFSET_BITS does, or changing it unconditionally and using symbol 
versioning for compatibility with old binaries built for 32-bit time_t).  
Having done that you then have whatever new syscalls may be needed to work 
with 64-bit time_t on IA32, and can make the new ILP32 ABI not use the old 
32-bit time_t syscalls if desired.

Of course making LFS (or 64-bit time_t) mandatory doesn't help with those 
interfaces that hardcode types such as "int" or "long" - you'll still have 
code that uses fseek rather than fseeko, for example.  If you follow the 
GNU principles of avoiding arbitrary (or at least inappropriate) limits, 
there are quite a lot of libc interfaces that can be problematic in 
extreme cases (large files, strings over 2GB (e.g. regoff_t - glibc bug 
5945 - which is probably one of the easier cases), etc.).  It would be 
good to fix these things, both on the GNU principles and for general 
robustness (there are probably various security holes related to these 
issues - integer overflow issues are always tricky to avoid in C, but bad 
choice of types in APIs certainly doesn't help), but it's quite tricky 
(lots of core ISO C interfaces are involved) and really needs to be kept 
separate from the introduction of new ABIs at the level of x86_64 ILP32.

-- 
Joseph S. Myers
jos...@codesourcery.com


cloog(-parma) 0.16 and ppl 0.11 in infrastructure?

2010-12-30 Thread Jack Howarth
Sebastian,
It appears that the official tarballs are now posted at 
http://www.cloog.org/
for cloog and cloog-parma 0.16. Do you plan on placing those both in the 
infrastructure
directory at gcc.gnu.org's ftp site? If so, the newer ppl 0.11 tarball should 
be added
as well. If those files are updated, we should be set to switch gcc trunk to 
require
ppl >= 0.11, cloog >= 0.16 and the default cloog backend from legacy cloog-ppl 
to
cloog-isl.
  Jack



Re: cloog(-parma) 0.16 and ppl 0.11 in infrastructure?

2010-12-30 Thread Ryan Hill
On Thu, 30 Dec 2010 18:40:35 -0500
Jack Howarth  wrote:

> Sebastian,
> It appears that the official tarballs are now posted at 
> http://www.cloog.org/
> for cloog and cloog-parma 0.16. Do you plan on placing those both in the 
> infrastructure
> directory at gcc.gnu.org's ftp site? If so, the newer ppl 0.11 tarball should 
> be added
> as well. If those files are updated, we should be set to switch gcc trunk to 
> require
> ppl >= 0.11, cloog >= 0.16 and the default cloog backend from legacy 
> cloog-ppl to
> cloog-isl.

And could someone please document these dependencies in the release notes and
the installation docs?


-- 
fonts, gcc-porting,  it makes no sense how it makes no sense
toolchain, wxwidgets   but i'll take it free anytime
@ gentoo.orgEFFD 380E 047A 4B51 D2BD C64F 8AA8 8346 F9A4 0662


signature.asc
Description: PGP signature


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H.J. Lu
On Thu, Dec 30, 2010 at 3:00 PM, Joseph S. Myers
 wrote:
> On Thu, 30 Dec 2010, H. Peter Anvin wrote:
>
>> On 12/30/2010 02:21 PM, Robert Millan wrote:
>> > 2010/12/30 Richard Guenther :
>> >> Would be nice if LFS would be mandatory on the new ABI, thus
>> >> off_t being 64bits.
>> >
>> > Please do also consider time_t.
>> >
>>
>> Changing the kernel-facing time_t might completely wreck the reuse of
>> the i386 kernel ABI; I'm not sure.
>
> Before changing time_t for a new ILP32 ABI, you probably want to work out
> what is required - on both the libc and kernel sides - to change it for
> existing 32-bit ABIs (whether providing a new ABI variant like
> _FILE_OFFSET_BITS does, or changing it unconditionally and using symbol
> versioning for compatibility with old binaries built for 32-bit time_t).
> Having done that you then have whatever new syscalls may be needed to work
> with 64-bit time_t on IA32, and can make the new ILP32 ABI not use the old
> 32-bit time_t syscalls if desired.
>
> Of course making LFS (or 64-bit time_t) mandatory doesn't help with those
> interfaces that hardcode types such as "int" or "long" - you'll still have
> code that uses fseek rather than fseeko, for example.  If you follow the
> GNU principles of avoiding arbitrary (or at least inappropriate) limits,
> there are quite a lot of libc interfaces that can be problematic in
> extreme cases (large files, strings over 2GB (e.g. regoff_t - glibc bug
> 5945 - which is probably one of the easier cases), etc.).  It would be
> good to fix these things, both on the GNU principles and for general
> robustness (there are probably various security holes related to these
> issues - integer overflow issues are always tricky to avoid in C, but bad
> choice of types in APIs certainly doesn't help), but it's quite tricky
> (lots of core ISO C interfaces are involved) and really needs to be kept
> separate from the introduction of new ABIs at the level of x86_64 ILP32.
>

I am checking in ILP32 binutils so that people can play with it.

-- 
H.J.


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H.J. Lu
On Thu, Dec 30, 2010 at 4:25 PM, H.J. Lu  wrote:
> On Thu, Dec 30, 2010 at 3:00 PM, Joseph S. Myers
>  wrote:
>> On Thu, 30 Dec 2010, H. Peter Anvin wrote:
>>
>>> On 12/30/2010 02:21 PM, Robert Millan wrote:
>>> > 2010/12/30 Richard Guenther :
>>> >> Would be nice if LFS would be mandatory on the new ABI, thus
>>> >> off_t being 64bits.
>>> >
>>> > Please do also consider time_t.
>>> >
>>>
>>> Changing the kernel-facing time_t might completely wreck the reuse of
>>> the i386 kernel ABI; I'm not sure.
>>
>> Before changing time_t for a new ILP32 ABI, you probably want to work out
>> what is required - on both the libc and kernel sides - to change it for
>> existing 32-bit ABIs (whether providing a new ABI variant like
>> _FILE_OFFSET_BITS does, or changing it unconditionally and using symbol
>> versioning for compatibility with old binaries built for 32-bit time_t).
>> Having done that you then have whatever new syscalls may be needed to work
>> with 64-bit time_t on IA32, and can make the new ILP32 ABI not use the old
>> 32-bit time_t syscalls if desired.
>>
>> Of course making LFS (or 64-bit time_t) mandatory doesn't help with those
>> interfaces that hardcode types such as "int" or "long" - you'll still have
>> code that uses fseek rather than fseeko, for example.  If you follow the
>> GNU principles of avoiding arbitrary (or at least inappropriate) limits,
>> there are quite a lot of libc interfaces that can be problematic in
>> extreme cases (large files, strings over 2GB (e.g. regoff_t - glibc bug
>> 5945 - which is probably one of the easier cases), etc.).  It would be
>> good to fix these things, both on the GNU principles and for general
>> robustness (there are probably various security holes related to these
>> issues - integer overflow issues are always tricky to avoid in C, but bad
>> choice of types in APIs certainly doesn't help), but it's quite tricky
>> (lots of core ISO C interfaces are involved) and really needs to be kept
>> separate from the introduction of new ABIs at the level of x86_64 ILP32.
>>
>
> I am checking in ILP32 binutils so that people can play with it.
>

I checked in this patch to avoid using ELF32 relocations on ELF64
inputs and vice verse.


-- 
H.J.
---
2010-12-30  H.J. Lu  

* elf64-x86-64.c (elf_x86_64_relocs_compatible): New.
(elf_backend_relocs_compatible): Defined to
elf_x86_64_relocs_compatible.

diff --git a/bfd/elf64-x86-64.c b/bfd/elf64-x86-64.c
index a50dccc..3dd16ba 100644
--- a/bfd/elf64-x86-64.c
+++ b/bfd/elf64-x86-64.c
@@ -4496,6 +4496,17 @@ elf_x86_64_hash_symbol (struct elf_link_hash_entry *h)
   return _bfd_elf_hash_symbol (h);
 }

+/* Return TRUE iff relocations for INPUT are compatible with OUTPUT. */
+
+static bfd_boolean
+elf_x86_64_relocs_compatible (const bfd_target *input,
+ const bfd_target *output)
+{
+  return ((xvec_get_elf_backend_data (input)->s->elfclass
+  == xvec_get_elf_backend_data (output)->s->elfclass)
+ && _bfd_elf_relocs_compatible (input, output));
+}
+
 static const struct bfd_elf_special_section
   elf_x86_64_special_sections[]=
 {
@@ -4536,7 +4547,7 @@ static const struct bfd_elf_special_section
   elf_x86_64_reloc_name_lookup

 #define elf_backend_adjust_dynamic_symbol   elf_x86_64_adjust_dynamic_symbol
-#define elf_backend_relocs_compatible  _bfd_elf_relocs_compatible
+#define elf_backend_relocs_compatible  elf_x86_64_relocs_compatible
 #define elf_backend_check_relocs   elf_x86_64_check_relocs
 #define elf_backend_copy_indirect_symbolelf_x86_64_copy_indirect_symbol
 #define elf_backend_create_dynamic_sections elf_x86_64_create_dynamic_sections


Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H. Peter Anvin
On 12/30/2010 01:08 PM, Robert Millan wrote:
> Hi folks,
> 
> I had this unsubmitted patch in my local filesystem.  It makes Linux
> detect ELF32 AMD64 binaries and sets a flag to restrict them to
> 32-bit address space.
> 
> It's not rocket science but can save you some work in case you
> haven't implemented this already.
> 

I have pushed my old kernel patches to a git tree at:

git://git.kernel.org//pub/scm/linux/kernel/git/hpa/linux-2.6-ilp32.git

They are currently based on 2.6.31 since that was the released version
when I first did this work; they are not intended to be mergeble but
rather as a prototype.

Note that we have no intention of supporting this ABI for the kernel
itself.  The kernel will be a normal x86-64 kernel.

-hpa



Re: RFC: Add 32bit x86-64 support to binutils

2010-12-30 Thread H.J. Lu
On Thu, Dec 30, 2010 at 4:42 PM, H. Peter Anvin  wrote:
> On 12/30/2010 01:08 PM, Robert Millan wrote:
>> Hi folks,
>>
>> I had this unsubmitted patch in my local filesystem.  It makes Linux
>> detect ELF32 AMD64 binaries and sets a flag to restrict them to
>> 32-bit address space.
>>
>> It's not rocket science but can save you some work in case you
>> haven't implemented this already.
>>
>
> I have pushed my old kernel patches to a git tree at:
>
> git://git.kernel.org//pub/scm/linux/kernel/git/hpa/linux-2.6-ilp32.git
>
> They are currently based on 2.6.31 since that was the released version
> when I first did this work; they are not intended to be mergeble but
> rather as a prototype.
>
> Note that we have no intention of supporting this ABI for the kernel
> itself.  The kernel will be a normal x86-64 kernel.
>

Here is the updated ILP32 patch for 2.6.35.


-- 
H.J.
diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 588a7aa..ae915c8 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -42,6 +42,14 @@
 
 void signal_fault(struct pt_regs *regs, void __user *frame, char *where);
 
+/* We use the standard 64-bit versions of these for the ILP32 variants */
+int
+restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc,
+  unsigned long *pax);
+int
+setup_sigcontext(struct sigcontext __user *sc, void __user *fpstate,
+struct pt_regs *regs, unsigned long mask);
+
 int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
 {
int err = 0;
@@ -565,3 +573,118 @@ int ia32_setup_rt_frame(int sig, struct k_sigaction *ka, 
siginfo_t *info,
 
return 0;
 }
+
+int ia32_setup_ilp32_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
+  compat_sigset_t *set, struct pt_regs *regs)
+{
+   struct rt_sigframe_ilp32 __user *frame;
+   void __user *restorer;
+   int err = 0;
+   void __user *fpstate = NULL;
+
+   /* __copy_to_user optimizes that into a single 8 byte store */
+   static const struct {
+   u8 movl;
+   u32 val;
+   u16 int80;
+   u8  pad;
+   } __attribute__((packed)) code = {
+   0xb8,
+   __NR_ia32_ilp32_sigreturn,
+   0x80cd,
+   0,
+   };
+
+   frame = get_sigframe(ka, regs, sizeof(*frame), &fpstate);
+
+   if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
+   return -EFAULT;
+
+   put_user_try {
+   put_user_ex(sig, &frame->sig);
+   put_user_ex(ptr_to_compat(&frame->info), &frame->pinfo);
+   put_user_ex(ptr_to_compat(&frame->uc), &frame->puc);
+   err |= copy_siginfo_to_user32(&frame->info, info);
+
+   /* Create the ucontext.  */
+   if (cpu_has_xsave)
+   put_user_ex(UC_FP_XSTATE, &frame->uc.uc_flags);
+   else
+   put_user_ex(0, &frame->uc.uc_flags);
+   put_user_ex(0, &frame->uc.uc_link);
+   put_user_ex(current->sas_ss_sp, &frame->uc.uc_stack.ss_sp);
+   put_user_ex(sas_ss_flags(regs->sp),
+   &frame->uc.uc_stack.ss_flags);
+   put_user_ex(current->sas_ss_size, &frame->uc.uc_stack.ss_size);
+   err |= setup_sigcontext(&frame->uc.uc_mcontext, fpstate,
+   regs, set->sig[0]);
+   err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
+
+   if (ka->sa.sa_flags & SA_RESTORER)
+   restorer = ka->sa.sa_restorer;
+   else
+   restorer = VDSO32_SYMBOL(current->mm->context.vdso,
+rt_sigreturn);
+   put_user_ex(ptr_to_compat(restorer), &frame->pretcode);
+
+   /*
+* Not actually used anymore, but left because some gdb
+* versions need it.
+*/
+   put_user_ex(*((u64 *)&code), (u64 *)frame->retcode);
+   } put_user_catch(err);
+
+   if (err)
+   return -EFAULT;
+
+   /* Set up registers for signal handler */
+   regs->sp = (unsigned long) frame;
+   regs->ip = (unsigned long) ka->sa.sa_handler;
+
+   /* We use the 64-bit ILP32 calling convention here... */
+   regs->di = sig;
+   regs->si = (unsigned long) &frame->info;
+   regs->dx = (unsigned long) &frame->uc;
+
+   loadsegment(ds, __USER_DS);
+   loadsegment(es, __USER_DS);
+
+   regs->cs = __USER_CS;
+   regs->ss = __USER_DS;
+
+   return 0;
+}
+
+asmlinkage long sys32_ilp32_sigreturn(struct pt_regs *regs)
+{
+   struct rt_sigframe_ilp32 __user *frame;
+   sigset_t set;
+   unsigned long ax;
+   struct pt_regs tregs;
+
+   frame = (struct rt_sigframe_ilp32 __user *)(regs->sp - 4);
+
+   if (!access_ok(VERIFY_READ,

CFP related to compilers: SMART 2011 (co-located with CGO 2011)

2010-12-30 Thread Grigori Fursin
Apologies if you receive multiple copies of this call.


  CALL FOR PAPERS

  5th Workshop on
 Statistical and Machine learning approaches
 
   to ARchitecture and compilaTion
(SMART 2011)

   http://cTuning.org/workshop-smart2011

 April 2nd or 3rd, 2011, Chamonix, France

   (co-located with CGO 2011 Conference)


The rapid rate of architectural change and the large diversity of architecture
features has made it increasingly difficult for compiler writers to keep pace
with microprocessor evolution. This problem has been compounded by the
introduction of multicores. Thus, compiler writers have an intractably complex
problem to solve. A similar situation arises in processor design where new
approaches are needed to help computer architects make the best use of new
underlying technologies and to design systems well adapted to future application
domains.

Recent studies have shown the great potential of statistical machine learning
and search strategies for compilation and machine design. The purpose of this
workshop is to help consolidate and advance the state of the art in this
emerging area of research. The workshop is a forum for the presentation of
recent developments in compiler techniques and machine design methodologies
based on space exploration and statistical machine learning approaches with the
objective of improving performance, parallelism, scalability, and adaptability.

Topics of interest include (but are not limited to):

Machine Learning, Statistical Approaches, or Search applied to

* Empirical Automatic Performance Tuning
* Iterative Feedback-Directed Compilation
* Self-tuning Programs, Libraries and Language Extensions
* Dynamic Optimization/Split Compilation/Adaptive Execution
* Speculative and Adaptive Parallelization
* Low-power Optimizations
* Adaptive Virtualization
* Performance Modeling and Portability
* Adaptive Processor and System Architecture
* Architecture Simulation and Design Space Exploration
* Collective Optimization
* Self-tuning Computing Systems
* Other Topics relevant to Intelligent and Adaptive Compilers/Architectures/OS 

  Important Dates 

* Deadline for paper submission: February 7, 2011
* Decision notification: March 7, 2011
* Deadline for camera-ready papers: March 25, 2011
* Workshop: April 2 or 3, 2011 (half-day)

 Paper Submission Guidelines 
 
Submitted papers should be original and not published or submitted for
publication elsewhere. Papers should use the LNCS format and should be 15
pages maximum. Manuscript preparation guidelines can be found at the LNCS
website (http://www.springer.com/computer/lncs, go to "For Authors" and then
"Information for LNCS Authors"). Papers must be submitted in the PDF using
the workshop submission website:
http://www.easychair.org/conferences/?conf=smart2011

In addition to normal technical papers, please consider submitting "position
paper" (2 to 15 pages). For example, a position paper could include your
thoughts on compiler evolution, future infrastructure technology needs, use of
adaptive techniques for the Cloud, etc.

An informal collection of the papers to be presented will be distributed at the
workshop. All accepted papers will appear on the workshop website.

Program Chair:
 Francois Bodin, CAPS Entreprise, France

Organizers:
 Grigori Fursin, Exascale Computing Research Center, France
 John Cavazos, University of Delaware, USA

Program Committee:
 Denis Barthou, University of Versailles, France
 Marcelo Cintra, University of Edinburgh, UK
 Rudolf Eigenmann, Purdue University, USA
 Robert Hundt, Google Inc, USA
 Engin Ipek, Microsoft Research, USA
 Allen D. Malony, University of Orgeon, USA
 Bilha Mendelson, IBM Haifa, Israel
 Michael O'Boyle, University of Edinburgh, UK
 Markus Puschel, ETH Zurich, Switzerland
 Lawrence Rauchwerger, Texas A&M University, USA
 Xipeng Shen, College of William & Mary, USA
 Christina Silvano, Politecnico di Milano, Italy 
 Bronis R. de Supinski, LLNL, USA
 Chengyong Wu, ICT, China
 Qing Yi, University of Texas at San Antonio, USA 

Steering Committee:
 Francois Bodin, CAPS Entreprise, France
 John Cavazos, University of Delaware, USA
 Lieven Eeckhout, Ghent University, Belgium
 Grigori Fursin, Exascale Computing Research Center, France
 Michael O'Boyle, University of Edinburgh, UK
 David Padua, UIUC, USA
 Olivier Temam, INRIA, France
 Richard Vuduc, Georgia Tech, USA 
 David Whalley, Florida State University, USA

*
Dr. Grigori Fursin
http://unidapt.org/people/gfursin
*



Behavior change of driver on multiple input assembly files

2010-12-30 Thread Jie Zhang
I just found a behavior change of driver on multiple input assembly 
files. Previously (before r164357), for the command line


gcc -o t t1.s t2.s

, the driver will call assembler twice, once for t1.s and once for t2.s. 
After r164357, the driver will only call assembler once for t1.s and 
t2.s. Then if t1.s and t2.s have same symbol, assembler will report an 
error, like:


t2.s: Assembler messages:
t2.s:1: Error: symbol `.L1' is already defined

I read the discussion on the mailing list starting by the patch email of 
r164357.[1] It seems that this behavior change is not the intention of 
that patch. And I think the previous behavior is more useful than the 
current behavior. So it's good to restore the previous behavior, isn't?


For a minimal fix, I propose to change combinable fields of assembly 
languages in default_compilers[] to 0. See the attached patch 
"gcc-not-combine-assembly-inputs.diff". I don't know why the combinable 
fields were set to 1 when --combine option was introduced. There is no 
explanation about that in that patch email.[2] Does anyone still remember?


For an aggressive fix, how about removing the combinable field from 
"struct compiler"? If we change combinable fields of assembly languages 
in default_compilers[] to 0, only ".go" and "@cpp-output" set combinable 
to 1. I don't see any reason for difference between "@cpp-output" and 
".i". So if we can set combinable to 0 for ".go", we have 0 for all 
compilers in default_compilers[], thus we can remove that field. Is 
there a reason to set 1 for ".go"?


I also attached the aggressive patch "gcc-remove-combinable-field.diff". 
Either patch is not tested. Which way should we go?


[1] http://gcc.gnu.org/ml/gcc-patches/2010-09/msg01322.html
[2] http://gcc.gnu.org/ml/gcc-patches/2004-03/msg01880.html


Regards,
--
Jie Zhang



	* gcc.c (default_compilers[]): Set combinable field to 0
	for all assembly languages.

Index: gcc.c
===
--- gcc.c	(revision 168362)
+++ gcc.c	(working copy)
@@ -935,11 +935,11 @@ static const struct compiler default_com
   {".i", "@cpp-output", 0, 0, 0},
   {"@cpp-output",
"%{!M:%{!MM:%{!E:cc1 -fpreprocessed %i %(cc1_options) %{!fsyntax-only:%(invoke_as)", 0, 1, 0},
-  {".s", "@assembler", 0, 1, 0},
+  {".s", "@assembler", 0, 0, 0},
   {"@assembler",
-   "%{!M:%{!MM:%{!E:%{!S:as %(asm_debug) %(asm_options) %i %A ", 0, 1, 0},
-  {".sx", "@assembler-with-cpp", 0, 1, 0},
-  {".S", "@assembler-with-cpp", 0, 1, 0},
+   "%{!M:%{!MM:%{!E:%{!S:as %(asm_debug) %(asm_options) %i %A ", 0, 0, 0},
+  {".sx", "@assembler-with-cpp", 0, 0, 0},
+  {".S", "@assembler-with-cpp", 0, 0, 0},
   {"@assembler-with-cpp",
 #ifdef AS_NEEDS_DASH_FOR_PIPED_INPUT
"%(trad_capable_cpp) -lang-asm %(cpp_options) -fno-directives-only\
@@ -952,7 +952,7 @@ static const struct compiler default_com
   %{!M:%{!MM:%{!E:%{!S:-o %|.s |\n\
as %(asm_debug) %(asm_options) %m.s %A "
 #endif
-   , 0, 1, 0},
+   , 0, 0, 0},
 
 #include "specs.h"
   /* Mark end of table.  */
Index: gcc.c
===
--- gcc.c	(revision 168362)
+++ gcc.c	(working copy)
@@ -847,8 +847,6 @@ struct compiler
   const char *cpp_spec; /* If non-NULL, substitute this spec
    for `%C', rather than the usual
    cpp_spec.  */
-  const int combinable;  /* If nonzero, compiler can deal with
-multiple source files at once (IMA).  */
   const int needs_preprocessing; /* If nonzero, source files need to
 be run through a preprocessor.  */
 };
@@ -876,29 +874,29 @@ static const struct compiler default_com
  were not present when we built the driver, we will hit these copies
  and be given a more meaningful error than "file not used since
  linking is not done".  */
-  {".m",  "#Objective-C", 0, 0, 0}, {".mi",  "#Objective-C", 0, 0, 0},
-  {".mm", "#Objective-C++", 0, 0, 0}, {".M", "#Objective-C++", 0, 0, 0},
-  {".mii", "#Objective-C++", 0, 0, 0},
-  {".cc", "#C++", 0, 0, 0}, {".cxx", "#C++", 0, 0, 0},
-  {".cpp", "#C++", 0, 0, 0}, {".cp", "#C++", 0, 0, 0},
-  {".c++", "#C++", 0, 0, 0}, {".C", "#C++", 0, 0, 0},
-  {".CPP", "#C++", 0, 0, 0}, {".ii", "#C++", 0, 0, 0},
-  {".ads", "#Ada", 0, 0, 0}, {".adb", "#Ada", 0, 0, 0},
-  {".f", "#Fortran", 0, 0, 0}, {".F", "#Fortran", 0, 0, 0},
-  {".for", "#Fortran", 0, 0, 0}, {".FOR", "#Fortran", 0, 0, 0},
-  {".ftn", "#Fortran", 0, 0, 0}, {".FTN", "#Fortran", 0, 0, 0},
-  {".fpp", "#Fortran", 0, 0, 0}, {".FPP", "#Fortran", 0, 0, 0},
-  {".f90", "#Fortran", 0, 0, 0}, {".F90", "#Fortran", 0, 0, 0},
-  {".f95", "#Fortran", 0, 0, 0}, {".F95", "#Fortran", 0, 0, 0},
-  {".f03", "#Fortran", 0, 0, 0}, {".F03", "#Fortran", 0, 0, 0},
-  {".f08", "#Fortran", 0, 0, 0}, {".F08", "#Fortran", 0, 0, 0},
-  {".r", "#Ratfor", 0, 0, 0},
-  {".p", "#Pascal", 0, 0, 0}, {".pas", "#Pascal", 0, 0, 0},
-  {".java", "#Java", 0, 0, 0}, {".class", "#Java", 0, 0, 0},
-