Re: fatal error: gnu/stubs-32.h: No such file

2013-07-24 Thread Andrew Haley
On 07/24/2013 01:48 AM, David Starner wrote:
> I'd like to mention that I too was bit by this one on Debian. I don't
> have a 32-bit development environment installed; why would I? I'm
> building primarily for myself, and if I did have to target a 32-bit
> environment, I'd likely have to mess with more stuff then just the
> compiler.

No, you probably wouldn't.  Just use -m32 and you'd be fine.

> If you can't find a way to detect this error, I can't
> imagine many people would have a problem with turning off multilibs on
> x86-64; it's something of a minority setup.

I don't think it is, really.

Andrew.



Re: fatal error: gnu/stubs-32.h: No such file

2013-07-24 Thread Florian Weimer

On 07/24/2013 10:17 AM, Andrew Haley wrote:

On 07/24/2013 01:48 AM, David Starner wrote:

I'd like to mention that I too was bit by this one on Debian. I don't
have a 32-bit development environment installed; why would I? I'm
building primarily for myself, and if I did have to target a 32-bit
environment, I'd likely have to mess with more stuff then just the
compiler.


No, you probably wouldn't.  Just use -m32 and you'd be fine.


No, that doesn't work.  The glibc development environment on 
Debian/amd64 does not contain the 32-bit header files, and that's where 
the error message comes from.


I don't think that's easy to change because of the way dpkg handles file 
conflicts (even if the files are identical) and how true multi-arch 
support is implemented in Debian.


--
Florian Weimer / Red Hat Product Security Team


Re: fatal error: gnu/stubs-32.h: No such file

2013-07-24 Thread Andrew Haley
On 07/24/2013 09:35 AM, Florian Weimer wrote:
> On 07/24/2013 10:17 AM, Andrew Haley wrote:
>> On 07/24/2013 01:48 AM, David Starner wrote:
>>> I'd like to mention that I too was bit by this one on Debian. I don't
>>> have a 32-bit development environment installed; why would I? I'm
>>> building primarily for myself, and if I did have to target a 32-bit
>>> environment, I'd likely have to mess with more stuff then just the
>>> compiler.
>>
>> No, you probably wouldn't.  Just use -m32 and you'd be fine.
> 
> No, that doesn't work.  The glibc development environment on 
> Debian/amd64 does not contain the 32-bit header files, and that's where 
> the error message comes from.

Well, of course.  It's a prerequisite for building GCC.  I presume that
Debian has the same abilities as Fedora, where if you want to build GCC
you just type

  yum-builddep gcc

and Fedora installs all the build reqs for GCC.

> I don't think that's easy to change because of the way dpkg handles file 
> conflicts (even if the files are identical) and how true multi-arch 
> support is implemented in Debian.

But hold on: if I just wanted to compile C programs I'd use the system's
C compiler.  Anyone building GCC for themself has a reason for doing so.

Andrew.



Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-07-24 Thread Florian Weimer

On 07/23/2013 09:49 PM, H.J. Lu wrote:

  2. Extend the current 16-byte PLT entry:

   ff 25 32 8b 21 00jmpq   *name@GOTPCREL(%rip)
   68 00 00 00 00   pushq  $index
   e9 00 00 00 00   jmpq   PLT0

 which clear bound registers, to 32-byte to add BND prefix to branch
 instructions.


Would it be possible to use a different instruction sequence that stays 
in the 16 byte limit?  Or restrict MPX support to BIND_NOW relocations?


--
Florian Weimer / Red Hat Product Security Team


Re: fatal error: gnu/stubs-32.h: No such file

2013-07-24 Thread Florian Weimer

On 07/24/2013 10:39 AM, Andrew Haley wrote:


Well, of course.  It's a prerequisite for building GCC.  I presume that
Debian has the same abilities as Fedora, where if you want to build GCC
you just type

   yum-builddep gcc

and Fedora installs all the build reqs for GCC.


Yes, "apt-get build-dep gcc" should work, but will install quite a bit 
of other stuff that's not needed for building the more popular front ends.



I don't think that's easy to change because of the way dpkg handles file
conflicts (even if the files are identical) and how true multi-arch
support is implemented in Debian.


But hold on: if I just wanted to compile C programs I'd use the system's
C compiler.  Anyone building GCC for themself has a reason for doing so.


I suspect a fairly common exercise is to check if the trunk still has 
the bug you're about to report.


--
Florian Weimer / Red Hat Product Security Team


Re: fatal error: gnu/stubs-32.h: No such file

2013-07-24 Thread Andrew Haley
On 07/24/2013 10:04 AM, Florian Weimer wrote:
> On 07/24/2013 10:39 AM, Andrew Haley wrote:
> 
>> Well, of course.  It's a prerequisite for building GCC.  I presume that
>> Debian has the same abilities as Fedora, where if you want to build GCC
>> you just type
>>
>>yum-builddep gcc
>>
>> and Fedora installs all the build reqs for GCC.
> 
> Yes, "apt-get build-dep gcc" should work, but will install quite a bit 
> of other stuff that's not needed for building the more popular front ends.

That is not of any significant consequence.

>>> I don't think that's easy to change because of the way dpkg handles file
>>> conflicts (even if the files are identical) and how true multi-arch
>>> support is implemented in Debian.
>>
>> But hold on: if I just wanted to compile C programs I'd use the system's
>> C compiler.  Anyone building GCC for themself has a reason for doing so.
> 
> I suspect a fairly common exercise is to check if the trunk still has 
> the bug you're about to report.

Right, so it should be built the right way.

Andrew.



Re: fatal error: gnu/stubs-32.h: No such file

2013-07-24 Thread David Starner
On Wed, Jul 24, 2013 at 1:17 AM, Andrew Haley  wrote:
> On 07/24/2013 01:48 AM, David Starner wrote:
>> I'd like to mention that I too was bit by this one on Debian. I don't
>> have a 32-bit development environment installed; why would I? I'm
>> building primarily for myself, and if I did have to target a 32-bit
>> environment, I'd likely have to mess with more stuff then just the
>> compiler.
>
> No, you probably wouldn't.  Just use -m32 and you'd be fine.

That's assuming that the hypothetical 32-bit x86 system I was
targeting was running GNU libc6 2.17 (as well as whatever libraries I
need, with version numbers apropos of Debian Unstable.) Conceivable,
but not something I'd bet on. I've got 3 ARM (Android) computers
around, and 3 AMD-64 computers, and I can't imagine why I'd need an
x86 computer. There is one x86 program I run (zsnes), but if Debian
stopped carrying it, it probably wouldn't be worth the time to compile
it myself.

>> If you can't find a way to detect this error, I can't
>> imagine many people would have a problem with turning off multilibs on
>> x86-64; it's something of a minority setup.
>
> I don't think it is, really.

Really? Because my impression is that on Unix, the primary use of the
C compiler has always been to compile programs for the system the
compiler is running on. And x86-32 is a slow, largely obsolete chip;
it's certainly useful to emulate, but I suspect any developer who
needs to build for it knows that up-front and is prepared to deal for
it in the same way that someone who needs an ARM or MIPS compiler is.

> Anyone building GCC for themself has a reason for doing so.

At the current time, Debian's version of GNAT is built from older
sources then the rest of GCC; if I want a 4.8 version of GNAT, I have
to build it myself.

> Right, so it should be built the right way.

The right way? If I don't want to build support for obsolete systems I
don't use, I'm building it the wrong way? If I were building
ia64-linux-gnu, I wouldn't have to enable support for x86-linux-gnu,
but because I'm building amd64-linux-gnu, if I don't, I'm building it
the wrong way?

I don't see this resistance to making it work with real systems and
real workloads. This feature is not useful to many of us, and fails
the GCC build in the middle. That's not really acceptable.

-- 
Kie ekzistas vivo, ekzistas espero.


Re: fatal error: gnu/stubs-32.h: No such file

2013-07-24 Thread Andrew Haley
On 07/24/2013 11:32 AM, David Starner wrote:
> On Wed, Jul 24, 2013 at 1:17 AM, Andrew Haley  wrote:
>> On 07/24/2013 01:48 AM, David Starner wrote:
>>> I'd like to mention that I too was bit by this one on Debian. I don't
>>> have a 32-bit development environment installed; why would I? I'm
>>> building primarily for myself, and if I did have to target a 32-bit
>>> environment, I'd likely have to mess with more stuff then just the
>>> compiler.
>>
>> No, you probably wouldn't.  Just use -m32 and you'd be fine.
> 
> That's assuming that the hypothetical 32-bit x86 system I was
> targeting was running GNU libc6 2.17 (as well as whatever libraries
> I need, with version numbers apropos of Debian Unstable.)
> Conceivable, but not something I'd bet on. I've got 3 ARM (Android)
> computers around, and 3 AMD-64 computers, and I can't imagine why
> I'd need an x86 computer. There is one x86 program I run (zsnes),
> but if Debian stopped carrying it, it probably wouldn't be worth the
> time to compile it myself.

No, I'm assuming you want to build and run an x86 program on your own
system.  -32 is not in any sense a cross-compiler.

>>> If you can't find a way to detect this error, I can't
>>> imagine many people would have a problem with turning off multilibs on
>>> x86-64; it's something of a minority setup.
>>
>> I don't think it is, really.
> 
> Really?

Really.

> Because my impression is that on Unix, the primary use of the
> C compiler has always been to compile programs for the system the
> compiler is running on.

So, use the system's C compiler.

> And x86-32 is a slow, largely obsolete chip; it's certainly useful
> to emulate, but I suspect any developer who needs to build for it
> knows that up-front and is prepared to deal for it in the same way
> that someone who needs an ARM or MIPS compiler is.
> 
>> Anyone building GCC for themself has a reason for doing so.
> 
> At the current time, Debian's version of GNAT is built from older
> sources then the rest of GCC; if I want a 4.8 version of GNAT, I have
> to build it myself.

So you have a specialized use.  Fair enough; you can disable multilib.
I would just install GCC's build dependencies and build with the
defaults.

>> Right, so it should be built the right way.
> 
> The right way?

The default way, then.

> If I don't want to build support for obsolete systems I don't use,
> I'm building it the wrong way? If I were building ia64-linux-gnu, I
> wouldn't have to enable support for x86-linux-gnu, but because I'm
> building amd64-linux-gnu, if I don't, I'm building it the wrong way?
> 
> I don't see this resistance to making it work with real systems and
> real workloads. This feature is not useful to many of us, and fails
> the GCC build in the middle. That's not really acceptable.

There is no resistance whatsoever to making it work with real systems
and real workloads.  The problem is that you don't know that people
running on 64-bit hardware often choose to compile -32 and run -32
locally.

Andrew.




Re: fatal error: gnu/stubs-32.h: No such file

2013-07-24 Thread David Starner
On Wed, Jul 24, 2013 at 4:14 AM, Andrew Haley  wrote:
> I would just install GCC's build dependencies and build with the
> defaults.

I'm glad you have infinite hard-drive space. I rather wish fewer
developers did, as well as those infinitely fast computers they seem
to have; perhaps they would have more empathy with my day-to-day
computing needs.

> There is no resistance whatsoever to making it work with real systems
> and real workloads.

Yes, there is. Both in this thread, and on bugzilla, people with real
systems, in my case a well-stocked development system, have said they
started compiling gcc and after hours the compile has failed without
even an explanation, and people have shrugged, and said what do you
want us to do about it? If a feature causes failure on real systems
like that, then disabling it by default, even if it's used by a
significant minority of users, should be considered. Yes, it would be
better to leave multilibs on and give people building without 32-bit
libraries a proper error message up front, but leaving it as is is not
making it work with real systems; it's causing real people to pull
their hair out or give up on trying to build GCC.

-- 
Kie ekzistas vivo, ekzistas espero.


Re: fatal error: gnu/stubs-32.h: No such file

2013-07-24 Thread Gabriel Dos Reis
> There is no resistance whatsoever to making it work with real systems
> and real workloads.

It does not sound or look like that way.

>  The problem is that you don't know that people
> running on 64-bit hardware often choose to compile -32 and run -32
> locally.

But we know people are running into this issue and reporting it.

-- Gaby


Re: atomic support for LEON3 platform

2013-07-24 Thread Deng Hengyi
Hi Eric,

Thank you for your interesting on this feature. 

Best Regards
WeiY
在 2013-7-24,上午1:07,Eric Botcazou  写道:

>> ok, because i am not familiar with compiler implementation. So if you can
>> give me some references i will appreciate you very much. And by the way is
>> there any plan to support this feature in the mainline?
> 
> OK, let's go ahead and implement the feature.  We first need the binutils 
> side, 
> because a 'cas' is currently rejected by the assembler:
> 
> eric@hermes:~/leon-elf> gcc/xgcc -Bgcc -c cas.adb -mcpu=leon3 
> /tmp/ccOuqOpo.s: Assembler messages:
> /tmp/ccOuqOpo.s:24: Error: Architecture mismatch on "cas".
> /tmp/ccOuqOpo.s:24:  (Requires v9|v9a|v9b; requested architecture is v8.)
> /tmp/ccOuqOpo.s:47: Error: Architecture mismatch on "cas".
> /tmp/ccOuqOpo.s:47:  (Requires v9|v9a|v9b; requested architecture is v8.)
> 
> David, how do you want to handle this on the binutils side?  The compiler 
> currently passes -Av8 for LEON3 and I don't think that we want to pass -Av9 
> instead, so we would need something in between.
> 
> -- 
> Eric Botcazou



Re: fatal error: gnu/stubs-32.h: No such file

2013-07-24 Thread Andrew Haley
On 07/24/2013 01:26 PM, David Starner wrote:
> On Wed, Jul 24, 2013 at 4:14 AM, Andrew Haley  wrote:
>> I would just install GCC's build dependencies and build with the
>> defaults.
> 
> I'm glad you have infinite hard-drive space. I rather wish fewer
> developers did, as well as those infinitely fast computers they seem
> to have; perhaps they would have more empathy with my day-to-day
> computing needs.
> 
>> There is no resistance whatsoever to making it work with real systems
>> and real workloads.
> 
> Yes, there is. Both in this thread, and on bugzilla, people with real
> systems, in my case a well-stocked development system, have said they
> started compiling gcc and after hours the compile has failed without
> even an explanation, and people have shrugged, and said what do you
> want us to do about it?

There should be a better diagnostic.

Andrew.



Re: fatal error: gnu/stubs-32.h: No such file

2013-07-24 Thread Andrew Haley
On 07/24/2013 01:36 PM, Gabriel Dos Reis wrote:
>> There is no resistance whatsoever to making it work with real systems
>> and real workloads.
> 
> It does not sound or look like that way.
> 
>>  The problem is that you don't know that people
>> running on 64-bit hardware often choose to compile -32 and run -32
>> locally.
> 
> But we know people are running into this issue and reporting it.

Yes.  But that on its own is not sufficient to change the default.

Andrew.




OpenMP canonical loop form

2013-07-24 Thread Thomas Schwinge
Hi!

OpenMP defines a canonical loop form (in OpenMP 4: »2.6 Canonical Loop
Form«, in OpenMP 3.1 as part of »2.5.1 Loop Construct«) that says that
the loop index variable »must not be modified during the execution of the
for-loop other than in incr-expr«.  The following code, which violates
this when modifying i in the loop body, thus isn't a conforming program,
and GCC may then exhibit unspecified behavior.  Instead of accepting it
silently, I wonder if it makes sense to have GCC detect this violation
and warn about the unspecified behavior, or even turn it into a hard
error?

#include 
#include 

int
main(void)
{
#pragma omp parallel
#pragma omp for
  for (int i = 0; i < 20; i += 2)
{
  printf("%d: #%d\n", omp_get_thread_num(), i);

  /* Violation of canonical loop form.  */
  --i;
}

  return 0;
}

2: #8
2: #9
0: #0
0: #1
0: #2
0: #3
3: #10
3: #11
1: #4
1: #5
1: #6
1: #7
6: #16
6: #17
4: #12
4: #13
5: #14
5: #15
7: #18
7: #19


Grüße,
 Thomas


pgpJYlK_I5DFR.pgp
Description: PGP signature


whether DIE of a "static const int" member has attribute "DW_AT_const_value"

2013-07-24 Thread hex
Hi all,

I find a strange things: Whether DIE(Debug Information Entry) of a
"static const int" member in a class has the attribute
"DW_AT_const_value" depends on whether there is a virtual function
defined in the class. Is it a expected behavior for GCC? And the
attribute "DW_AT_const_value" matters because GDB could print the
member's address if it does not have the attribute, otherwise GDB
could not.

My test case is as following:
// symbol.cpp
class Test{
private:
const static int hack;
public:
virtual void get(){}
};

const int Test::hack = 3;

int main()
{
Test t;
return 0;
}

Then I run the following steps:
$ g++ -g -o symbol symbol.cpp
$ readelf -wi symbol
The DIE for Test::hack is as follwing:
<2><46>: Abbrev Number: 4 (DW_TAG_member)
   <47>   DW_AT_name: (indirect string, offset: 0x3a): hack
   <4b>   DW_AT_decl_file   : 1
   <4c>   DW_AT_decl_line   : 8
   <4d>   DW_AT_MIPS_linkage_name: (indirect string, offset: 0x3f):
_ZN4Test4hackE
   <51>   DW_AT_type: <0xc3>
   <55>   DW_AT_external: 1
   <56>   DW_AT_accessibility: 3   (private)
   <57>   DW_AT_declaration : 1
   <58>   DW_AT_const_value : 3

If I delete the function get() from the class, the class turns into:
class Test{
private:
const static int hack;
};

const int Test::hack = 3;

int main()
{
Test t;
return 0;
}

Then the DIE for Test::hack does not have the attribute
"DW_AT_const_value". Is it exptected?  Thank you!


Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-24 Thread Richard Biener
"H.J. Lu"  wrote:

>Hi,
>
>Here is a patch to extend x86-64 psABI to support AVX-512:

Afaik avx 512 doubles the amount of xmm registers. Can we get them callee saved 
please?

Thanks,
Richard.

>http://software.intel.com/sites/default/files/319433-015.pdf
>
>
>--
>H.J.




Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-07-24 Thread H.J. Lu
On Wed, Jul 24, 2013 at 1:43 AM, Florian Weimer  wrote:
> On 07/23/2013 09:49 PM, H.J. Lu wrote:
>>
>>   2. Extend the current 16-byte PLT entry:
>>
>>ff 25 32 8b 21 00jmpq   *name@GOTPCREL(%rip)
>>68 00 00 00 00   pushq  $index
>>e9 00 00 00 00   jmpq   PLT0
>>
>>  which clear bound registers, to 32-byte to add BND prefix to branch
>>  instructions.
>
>
> Would it be possible to use a different instruction sequence that stays in
> the 16 byte limit?  Or restrict MPX support to BIND_NOW relocations?
>

It isn't possible to use different insns in PLT to add BND prefix.
The issue isn't about relocation.  The issue is external calls
are routed via PLT entry, which clears bound registers.  That
is why we need to use a different PLT entry to preserve bound
registers.


--
H.J.


Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-24 Thread H.J. Lu
On Wed, Jul 24, 2013 at 8:23 AM, Richard Biener
 wrote:
> "H.J. Lu"  wrote:
>
>>Hi,
>>
>>Here is a patch to extend x86-64 psABI to support AVX-512:
>
> Afaik avx 512 doubles the amount of xmm registers. Can we get them callee 
> saved please?
>

Make them callee saved means we need to change ld.so to
preserve them and we need to change unwind library to
support them.  It is certainly doable.

--
H.J.


Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-24 Thread Joseph S. Myers
On Wed, 24 Jul 2013, H.J. Lu wrote:

> > Afaik avx 512 doubles the amount of xmm registers. Can we get them 
> > callee saved please?
> 
> Make them callee saved means we need to change ld.so to
> preserve them and we need to change unwind library to
> support them.  It is certainly doable.

And setjmp/longjmp (with consequent versioning implications if there isn't 
enough space in jmp_buf).  Avoiding the need for such library changes in 
order to use new instruction set features is why it's usual to make new 
registers (or new bits of existing registers) call-clobbered.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: fatal error: gnu/stubs-32.h: No such file

2013-07-24 Thread Gabriel Dos Reis
On Wed, Jul 24, 2013 at 8:44 AM, Andrew Haley  wrote:
> On 07/24/2013 01:36 PM, Gabriel Dos Reis wrote:
>>> There is no resistance whatsoever to making it work with real systems
>>> and real workloads.
>>
>> It does not sound or look like that way.
>>
>>>  The problem is that you don't know that people
>>> running on 64-bit hardware often choose to compile -32 and run -32
>>> locally.
>>
>> But we know people are running into this issue and reporting it.
>
> Yes.  But that on its own is not sufficient to change the default.

I suspect this might just provide evidence for a claim previously denied.

-- Gaby


Re: fatal error: gnu/stubs-32.h: No such file

2013-07-24 Thread Andrew Haley
On 07/24/2013 04:38 PM, Gabriel Dos Reis wrote:
> On Wed, Jul 24, 2013 at 8:44 AM, Andrew Haley  wrote:
>> On 07/24/2013 01:36 PM, Gabriel Dos Reis wrote:
 There is no resistance whatsoever to making it work with real systems
 and real workloads.
>>>
>>> It does not sound or look like that way.
>>>
  The problem is that you don't know that people
 running on 64-bit hardware often choose to compile -32 and run -32
 locally.
>>>
>>> But we know people are running into this issue and reporting it.
>>
>> Yes.  But that on its own is not sufficient to change the default.
> 
> I suspect this might just provide evidence for a claim previously denied.

Not at all: we're just disagreeing about what a real system with
a real workload looks like.  It's a stupid thing to say anyway, because
who is to say their system is more real than mine or yours?

Andrew.




Intel® Memory Protection Extensions support in the GCC

2013-07-24 Thread Zamyatin, Igor
Hi All!

This is to let you know that enabling of Intel® MPX technology (see details in 
http://download-software.intel.com/sites/default/files/319433-015.pdf) in GCC 
has been started. (Corresponding changes in binutils are here - 
http://sourceware.org/ml/binutils/2013-07/msg00233.html)

Currently compiler changes for Intel® MPX has been put in the branch 
svn://gcc.gnu.org/svn/gcc/branches/mpx (will soon be reflected in svn.html). 
Ilya Enkovich (in cc) will be the main person maintaining this branch and 
submitting changes into the trunk.

Some implementation details could be found on wiki

Thanks,
Igor 



Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-07-24 Thread Ian Lance Taylor
On Tue, Jul 23, 2013 at 12:49 PM, H.J. Lu  wrote:
>
> http://software.intel.com/sites/default/files/319433-015.pdf
>
> introduces 4 bound registers, which will be used for parameter passing
> in x86-64.  Bound registers are cleared by branch instructions.  Branch
> instructions with BND prefix will keep bound register contents.

I took a very quick look at the doc.  Why shouldn't we run the kernel
with BNDPRESERVE = 1, to avoid this behaviour of clearing the bound
registers on branch instructions?  That would let us avoid these
issues.


> I prefer the note section solution.  Any suggestions, comments?

I concur, but why not use the ELF attributes support rather than a new
note section?

Ian


Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-24 Thread Richard Biener
"H.J. Lu"  wrote:

>On Wed, Jul 24, 2013 at 8:23 AM, Richard Biener
> wrote:
>> "H.J. Lu"  wrote:
>>
>>>Hi,
>>>
>>>Here is a patch to extend x86-64 psABI to support AVX-512:
>>
>> Afaik avx 512 doubles the amount of xmm registers. Can we get them
>callee saved please?
>>
>
>Make them callee saved means we need to change ld.so to
>preserve them and we need to change unwind library to
>support them.  It is certainly doable.

IMHO it was a mistake to not have any callee saved xmm register in the original 
abi - we should fix this at this opportunity. Loops with function calls are not 
that uncommon.

Richard.

>--
>H.J.




Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-24 Thread H.J. Lu
On Wed, Jul 24, 2013 at 10:36 AM, Richard Biener
 wrote:
> "H.J. Lu"  wrote:
>
>>On Wed, Jul 24, 2013 at 8:23 AM, Richard Biener
>> wrote:
>>> "H.J. Lu"  wrote:
>>>
Hi,

Here is a patch to extend x86-64 psABI to support AVX-512:
>>>
>>> Afaik avx 512 doubles the amount of xmm registers. Can we get them
>>callee saved please?
>>>
>>
>>Make them callee saved means we need to change ld.so to
>>preserve them and we need to change unwind library to
>>support them.  It is certainly doable.
>
> IMHO it was a mistake to not have any callee saved xmm register in the 
> original abi - we should fix this at this opportunity. Loops with function 
> calls are not that uncommon.
>

Are there any other Linux targets with callee saved vector registers?


--
H.J.


Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-24 Thread Peter Bergner
On Wed, 2013-07-24 at 10:42 -0700, H.J. Lu wrote:
> Are there any other Linux targets with callee saved vector registers?

Yes, on POWER.  From our ABI:

  On processors with the VMX feature.
v0-v1 Volatile scratch registers
v2-v13 Volatile vector parameters registers
v14-v19 Volatile scratch registers
v20-v31 Non-volatile registers

I'll note that the new VSX register state we recently added with power7
were made volatile, but then we already had these non-volatile altivec
regs to use.

Peteer




Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-24 Thread Ondřej Bílka
On Wed, Jul 24, 2013 at 07:36:31PM +0200, Richard Biener wrote:
> "H.J. Lu"  wrote:
> 
> >On Wed, Jul 24, 2013 at 8:23 AM, Richard Biener
> > wrote:
> >> "H.J. Lu"  wrote:
> >>
> >>>Hi,
> >>>
> >>>Here is a patch to extend x86-64 psABI to support AVX-512:
> >>
> >> Afaik avx 512 doubles the amount of xmm registers. Can we get them
> >callee saved please?
> >>
> >
> >Make them callee saved means we need to change ld.so to
> >preserve them and we need to change unwind library to
> >support them.  It is certainly doable.
> 
> IMHO it was a mistake to not have any callee saved xmm register in the 
> original abi - we should fix this at this opportunity. Loops with function 
> calls are not that uncommon.
>
I also noticed this problem and best solution that I came upon is analogue to
__attribute__((fastcall))

This would make possible for libraries to add versioned symbols that use
attribute and migrate to saner calling convention.

> Richard.
> 
> >--
> >H.J.
> 



Re: Help forging a function_decl

2013-07-24 Thread Rodolfo Guilherme Wottrich
Problem solved.

The trouble was that the blocks of my statement list weren't correctly
chained, so when lower_gimple_bind executed in pass_lower_cf, it
accessed an uninitialized memory area, thus sometimes reading the flag
as true and sometimes as false. Now everything runs smoothly.

Again, thank you for the patience. I'm kind of new to gcc, so I was
basically learning how to do it.

Kindest regards,

---
Rodolfo Guilherme Wottrich
Universidade Estadual de Campinas - Unicamp


2013/7/23 Rodolfo Guilherme Wottrich :
> Later I found out that cgraph_mark_needed_node was already being
> called in cgraph_finalize_function, and that should really keep my
> function from being removed. But when the function
> cgraph_remove_unreachable_nodes executes, it is marked as unreachable
> just because at this point it is not marked as needed anymore.
> The piece of code which is doing that is
> function_and_variable_visibility, in ipa.c. Oddly, the condition for
> doing so is having DECL_EXTERNAL set for my fndecl, which I just said,
> was my former mistake.
> I really don't know why this erratic behavior is happening, I am
> explicitly setting DECL_EXTERNAL to false.
>
> Cheers,
>
> ---
> Rodolfo Guilherme Wottrich
> Universidade Estadual de Campinas - Unicamp
>
>
> 2013/7/23 Rodolfo Guilherme Wottrich :
>> Hello,
>>
>> 2013/7/23 Martin Jambor :
>>> Hi,
>>>
>>> But you do call cgraph_add_new_function on it as well, right?  If not,
>>> how is its symbol table node (also called and serving as the call
>>> graph node) created?
>>
>> I call finish_function for my decl. Inside that function, there's this
>> one call to cgraph_add_new_function, but the condition it is inside is
>> bypassed:
>>
>> /* ??? Objc emits functions after finalizing the compilation unit.
>>   This should be cleaned up later and this conditional removed.  */
>>   if (cgraph_global_info_ready)
>>   {
>> cgraph_add_new_function (fndecl, false);
>> return;
>>   }
>>
>> Right after that, there's a call to cgraph_finalize_function, which I
>> guess works the way it should, creating its cgraph node. I took a look
>> at the comment in cgraph_add_new_function and it states that "this
>> function is intended to be used by middle end and allows insertion of
>> new function at arbitrary point of compilation.  The function can be
>> either in high, low or SSA form GIMPLE." As I'm doing it in parsing
>> time and there was no gimplification passes yet, I guess it is ok not
>> to use it at this point.
>>
>>> (And BTW, you are hacking on trunk, right?  Older versions can be
>>> quite a bit different here.)
>>
>> I understand that, but I intend to use Dragonegg to obtain LLVM IR
>> after the gcc front-end has done its work, so I'm hacking version
>> 4.6.4, which is said to work better with Dragonegg.
>>
>>> What do you mean by "50% of the time?"  That you get different results
>>> even when you do not change your compiler?  That should not happen and
>>> means you invoke undefined behavior, most likely depending on some
>>> uninitialized stuff (assuming your HW is OK) so you are probably not
>>> clearing some allocated structure or something.  (Do you know why
>>> DECL_EXTERNAL was set?  That looks weird).
>>
>> Yeah, exactly: different results even not changing the compiler.
>> About DECL_EXTERNAL: it was my fault. At first I was trying to
>> reproduce the creation of a function_decl as in
>> create_omp_child_function, in omp-low.c, and I eventually forgot that
>> I set that attribute when playing with the possibilities.
>>
>>> Anyway, my best guess is that your function is removed by
>>> symtab_remove_unreachable_nodes in ipa.c.  (And now I also see that
>>> analyze_functions in cgraphunit.c is also doing its own unreachable
>>> node removal, but hopefully runs early enough this should not be your
>>> problem.)  If your function is static and is not called or referenced
>>> from anywhere else, gcc will try to get rid of it.
>>
>> I traced the execution path in gdb, and it happens that the function
>> is really being removed in cgraph_remove_unreachable_nodes in ipa.c
>> (that's basically the same function you pointed out, just another gcc
>> version). I don't know why that happens anyway, neither why it happens
>> intermittently when not debugging and never when debugging. My
>> function is not called or referenced yet, but neither is another
>> function in my source code which I put just in order to test this
>> issue, and yet it is not removed like my forged one. So it is in
>> respect to being or not static. A doubt: I tested setting TREE_STATIC
>> for my decl, but does that mean my function is static? The
>> documentation on the TREE_STATIC macro is: "In a FUNCTION_DECL,
>> nonzero if function has been defined".
>>
>>> Try setting DECL_PRESERVE_P of your decl (or calling
>>> cgraph_mark_force_output_node on its call graph node which is cleaner
>>> but should be equivalent for debugging, I suppose).  If that helps,
>>> this is most likely your problem.
>>

Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-24 Thread Richard Henderson
On 07/24/2013 05:23 AM, Richard Biener wrote:
> "H.J. Lu"  wrote:
> 
>> Hi,
>>
>> Here is a patch to extend x86-64 psABI to support AVX-512:
> 
> Afaik avx 512 doubles the amount of xmm registers. Can we get them callee 
> saved please?

Having them callee saved pre-supposes that one knows the width of the register.

There's room in the instruction set for avx1024.  Does anyone believe that is
not going to appear in the next few years?


r~



Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-24 Thread Ondřej Bílka
On Wed, Jul 24, 2013 at 08:25:14AM -1000, Richard Henderson wrote:
> On 07/24/2013 05:23 AM, Richard Biener wrote:
> > "H.J. Lu"  wrote:
> > 
> >> Hi,
> >>
> >> Here is a patch to extend x86-64 psABI to support AVX-512:
> > 
> > Afaik avx 512 doubles the amount of xmm registers. Can we get them callee 
> > saved please?
> 
> Having them callee saved pre-supposes that one knows the width of the 
> register.
> 
> There's room in the instruction set for avx1024.  Does anyone believe that is
> not going to appear in the next few years?
> 
It would be mistake for intel to focus on avx1024. You hit diminishing
returns and only few workloads would utilize loading 128 bytes at once.
Problem with vectorization is that it becomes memory bound so you will
not got much because performance is dominated by cache throughput.

You would get bigger speedup from more effective pipelining, more
fusion...


Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-07-24 Thread H.J. Lu
On Wed, Jul 24, 2013 at 9:45 AM, Ian Lance Taylor  wrote:
> On Tue, Jul 23, 2013 at 12:49 PM, H.J. Lu  wrote:
>>
>> http://software.intel.com/sites/default/files/319433-015.pdf
>>
>> introduces 4 bound registers, which will be used for parameter passing
>> in x86-64.  Bound registers are cleared by branch instructions.  Branch
>> instructions with BND prefix will keep bound register contents.
>
> I took a very quick look at the doc.  Why shouldn't we run the kernel
> with BNDPRESERVE = 1, to avoid this behaviour of clearing the bound
> registers on branch instructions?  That would let us avoid these
> issues.

This doesn't work in case of legacy callees which return pointers.
The bound registers will be incorrect since they are set in the
last MPX function.  MPX callers will get wrong bounds on
pointers returned by legacy callees

>
>> I prefer the note section solution.  Any suggestions, comments?
>
> I concur, but why not use the ELF attributes support rather than a new
> note section?
>

The issues are

1. ELF attributes target static linker.  There is no support in
shared library nor executables.  We may need it to make run-time
decision based on MPX feature to select legacy or MPX share
library.
2. ELF attribute lookup isn't very fast at run-time.


--
H.J.


Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-07-24 Thread Ian Lance Taylor
On Wed, Jul 24, 2013 at 11:53 AM, H.J. Lu  wrote:
> On Wed, Jul 24, 2013 at 9:45 AM, Ian Lance Taylor  wrote:
>> On Tue, Jul 23, 2013 at 12:49 PM, H.J. Lu  wrote:
>>>
>>> http://software.intel.com/sites/default/files/319433-015.pdf
>>>
>>> introduces 4 bound registers, which will be used for parameter passing
>>> in x86-64.  Bound registers are cleared by branch instructions.  Branch
>>> instructions with BND prefix will keep bound register contents.
>>
>> I took a very quick look at the doc.  Why shouldn't we run the kernel
>> with BNDPRESERVE = 1, to avoid this behaviour of clearing the bound
>> registers on branch instructions?  That would let us avoid these
>> issues.
>
> This doesn't work in case of legacy callees which return pointers.
> The bound registers will be incorrect since they are set in the
> last MPX function.  MPX callers will get wrong bounds on
> pointers returned by legacy callees

As far as I can see the compiler needs to know the pair of bound
registers associated with a pointer anyhow.  So if the compiler calls
some function and gets a pointer, it needs to know the bound registers
that go with that pointer.  Are you suggesting that not only are bound
registers passed as parameters to functions, they are also implicitly
returned by functions?

Ian


Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-07-24 Thread H.J. Lu
On Wed, Jul 24, 2013 at 11:59 AM, Ian Lance Taylor  wrote:
> On Wed, Jul 24, 2013 at 11:53 AM, H.J. Lu  wrote:
>> On Wed, Jul 24, 2013 at 9:45 AM, Ian Lance Taylor  wrote:
>>> On Tue, Jul 23, 2013 at 12:49 PM, H.J. Lu  wrote:

 http://software.intel.com/sites/default/files/319433-015.pdf

 introduces 4 bound registers, which will be used for parameter passing
 in x86-64.  Bound registers are cleared by branch instructions.  Branch
 instructions with BND prefix will keep bound register contents.
>>>
>>> I took a very quick look at the doc.  Why shouldn't we run the kernel
>>> with BNDPRESERVE = 1, to avoid this behaviour of clearing the bound
>>> registers on branch instructions?  That would let us avoid these
>>> issues.
>>
>> This doesn't work in case of legacy callees which return pointers.
>> The bound registers will be incorrect since they are set in the
>> last MPX function.  MPX callers will get wrong bounds on
>> pointers returned by legacy callees
>
> As far as I can see the compiler needs to know the pair of bound
> registers associated with a pointer anyhow.  So if the compiler calls
> some function and gets a pointer, it needs to know the bound registers
> that go with that pointer.  Are you suggesting that not only are bound
> registers passed as parameters to functions, they are also implicitly
> returned by functions?
>

Yes, when pointer is returned in register.


--
H.J.


Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-24 Thread H.J. Lu
On Wed, Jul 24, 2013 at 10:55 AM, Peter Bergner  wrote:
> On Wed, 2013-07-24 at 10:42 -0700, H.J. Lu wrote:
>> Are there any other Linux targets with callee saved vector registers?
>
> Yes, on POWER.  From our ABI:
>
>   On processors with the VMX feature.
> v0-v1 Volatile scratch registers
> v2-v13 Volatile vector parameters registers
> v14-v19 Volatile scratch registers
> v20-v31 Non-volatile registers
>
> I'll note that the new VSX register state we recently added with power7
> were made volatile, but then we already had these non-volatile altivec
> regs to use.

How do you save/restore those vector registers for
exception? Unwinder in libgcc uses _Unwind_Word
to save and restore registers in DWARF unwind frame.
It doesn't support anything wider than _Unwind_Word,
which is usually smaller than vector register.

--
H.J.


GNU Tools Cauldron 2013 - Presentation videos

2013-07-24 Thread Diego Novillo

I have uploaded all the videos we recorded at the Cauldron to
the workshop page (http://gcc.gnu.org/wiki/cauldron2013).

The videos are also available at the YouTube playlist:
http://www.youtube.com/playlist?list=PLsgS8fWwKJZhrjVEN7tsQyj2nLb5z0n70

If you think your talk was recorded but you do not see the video,
please let me know and I'll fix it in the playlist.

If you presented a talk and do not see your slides in
http://gcc.gnu.org/wiki/cauldron2013, please fix the link
yourself or let me know and I'll add them to the table (if you
can fix the links yourself, you'll be doing me a big favour).


Diego.


Re: fatal error: gnu/stubs-32.h: No such file

2013-07-24 Thread David Starner
On Wed, Jul 24, 2013 at 8:50 AM, Andrew Haley  wrote:
> Not at all: we're just disagreeing about what a real system with
> a real workload looks like.

No, we aren't. We're disagreeing about whether it's acceptable to
enable a feature by default that breaks the compiler build half way
through with an obscure error message. Real systems need features that
aren't enabled by default sometimes.

> It's a stupid thing to say anyway, because
> who is to say their system is more real than mine or yours?

By that logic, you've already said that any system needing GNAT is
less real then others, because it's not enabled by default.

-- 
Kie ekzistas vivo, ekzistas espero.


Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-07-24 Thread Roland McGrath
I've read through the MPX spec once, but most of it is still not very
clear to me.  So please correct any misconceptions.  (HJ, if you answer
any or all of these questions in your usual style with just, "It's not a
problem," I will find you and I will kill you.  Explain!)

Will an MPX-using binary require an MPX-supporting dynamic linker to run
correctly?

* An old dynamic linker won't clobber %bndN directly, so that's not a
  problem.

* Does having the bounds registers set have any effect on regular/legacy
  code, or only when bndc[lun] instructions are used?  

  If it doesn't affect normal instructions, then I don't entirely
  understand why it would matter to clear %bnd* when entering or leaving
  legacy code.  Is it solely for the case of legacy code returning a
  pointer value, so that the new code would expect the new ABI wherein
  %bnd0 has been set to correspond to the pointer returned in %rax?

* What's the effect of entering the dynamic linker via "bnd jmp"
  (i.e. new MPX-using binary with new PLT, old dynamic linker)?  The old
  dynamic linker will leave %bndN et al exactly as they are, until its
  first unadorned branching instruction implicitly clears them.  So the
  only problem would be if the work _dl_runtime_{resolve,profile} does
  before its first branch/call were affected by the %bndN state.

If there are indeed any problems with this scenario, then you need a
plan to make new binaries require a new dynamic linker (and fail
gracefully in the absence of one, and have packaging systems grok the
dependency, etc.)

In a related vein, what's the effect of entering some legacy code via
"bnd jmp" (i.e. new binary using PLT call into legacy DSO)?  

* If the state of %bndN et al does not affect legacy code directly, then
  it's not a problem.  The legacy code will eventually use an unadorned
  branch instruction, and that will implicitly clear %bnd*.  (Even if
  it's a leaf function that's entirely branch-free, its return will
  count as such an unadorned branch instruction.)

* If that's not the case, then a PLT entry that jumps to legacy code
  will need to clear the %bndN state.  I see one straightforward
  approach, at the cost of a double-bounce (i.e. turning the normal
  double-bounce into a triple-bounce) when going from MPX code to legacy
  code.  Each PLT entry can be:

bnd jmp *foo@GOTPCREL(%rip)
pushq $N
bnd jmp .Lplt0
.balign 16
jmp *foo@GOTPCREL+8(%rip)
.balign 32

  and now each of those gets two (adjacent) GOT slots rather than just
  one.  When the dynamic linker resolves "foo" and sees that it's in a
  legacy DSO, it sets the foo GOT slot to point to .plt+(N*32 + 16) and
  the foo+1 GOT slot to point to the real target (resolution of "foo").
  After fixup, entering that PLT entry will do "bnd jmp" to the second
  half of the entry, which does (unadorned) "jmp" to the real target,
  implicitly clearing %bndN state.

Those are the background questions to help me understand better.
Now, to your specific questions.

I can't tell if you are proposing that a single object might contain
both 16-byte and 32-byte PLT slots next to each other in the same .plt
section.  That seems like a bad idea.  I can think of two things off
hand that expect PLT entries to be of uniform size, and there may well
be more.

* The foo@plt pseudo-symbols that e.g. objdump will display are based on
  the BFD backend knowing the size of PLT entries.  Arguably this ought
  to look at sh_entsize of .plt instead of using baked-in knowledge, but
  it doesn't.

* The linker-generated CFI for .plt is a single FDE for the whole
  section, using a DWARF expression covering all normal PLT entries
  together based on them having uniform size and contents.  (You could
  of course make the linker generate per-entry CFI, or partition the PLT
  into short and long entries and have the CFI treat the two partitions
  appropriately differently.  But that seems like a complication best
  avoided.)

Now, assuming we are talking about a uniform PLT in each object, there
is the question of whether to use a new PLT layout everywhere, or only
when linking an object with some input files that use MPX.  

* My initial reaction was to say that we should just change it
  unconditionally to keep things simple: use new linker, get new format,
  end of story.  Simplicity is good.

* But, doubling the size of PLT entries means more i-cache pressure.  If
  cache lines are 64 bytes, then today you fit four entries into a cache
  line.  Assuming PLT entries are more used than unused, this is a good
  thing.  Reducing that to two entries per cache line means twice as
  many i-cache misses if you hit a given PLT frequently (with even
  distribution of which entries you actually use--at any rate, it's
  "more" even if it's not "twice as many").  Perhaps this is enough cost
  in real-world situations to be worried about.  I really don't know.

* As I mentioned before, there are things floating ar

Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-07-24 Thread Ian Lance Taylor
On Wed, Jul 24, 2013 at 4:36 PM, Roland McGrath  wrote:
>
> Will an MPX-using binary require an MPX-supporting dynamic linker to run
> correctly?
>
> * An old dynamic linker won't clobber %bndN directly, so that's not a
>   problem.

These are my answers and likely incorrect.

It will clobber the registers indirectly, though, as soon as it
executes a branching instruction.  The effect will be that calls from
bnd-checked code to bnd-checked code through the dynamic linker will
not succeed.

I have not yet seen the changes this will require to the ABI, but I'm
making the natural assumptions: the first four pointer arguments to a
function will be associated with a pair of bound registers, and
similarly for a returned pointer.  I don't know what the proposal is
for struct parameters and return values.


> * Does having the bounds registers set have any effect on regular/legacy
>   code, or only when bndc[lun] instructions are used?

As far as I can tell, only when the bndXX instructions are used,
though I'd be happy to hear otherwise.


>   If it doesn't affect normal instructions, then I don't entirely
>   understand why it would matter to clear %bnd* when entering or leaving
>   legacy code.  Is it solely for the case of legacy code returning a
>   pointer value, so that the new code would expect the new ABI wherein
>   %bnd0 has been set to correspond to the pointer returned in %rax?

There is no problem with clearing the bnd registers when calling in or
out of legacy code.  The issue is avoiding clearing the pointers when
calling from bnd-enabled code to bnd-enabled code.


> * What's the effect of entering the dynamic linker via "bnd jmp"
>   (i.e. new MPX-using binary with new PLT, old dynamic linker)?  The old
>   dynamic linker will leave %bndN et al exactly as they are, until its
>   first unadorned branching instruction implicitly clears them.  So the
>   only problem would be if the work _dl_runtime_{resolve,profile} does
>   before its first branch/call were affected by the %bndN state.

"It's not a problem."

> In a related vein, what's the effect of entering some legacy code via
> "bnd jmp" (i.e. new binary using PLT call into legacy DSO)?
>
> * If the state of %bndN et al does not affect legacy code directly, then
>   it's not a problem.  The legacy code will eventually use an unadorned
>   branch instruction, and that will implicitly clear %bnd*.  (Even if
>   it's a leaf function that's entirely branch-free, its return will
>   count as such an unadorned branch instruction.)

Yes.

> * If that's not the case, 

It is the case.

> I can't tell if you are proposing that a single object might contain
> both 16-byte and 32-byte PLT slots next to each other in the same .plt
> section.  That seems like a bad idea.  I can think of two things off
> hand that expect PLT entries to be of uniform size, and there may well
> be more.
>
> * The foo@plt pseudo-symbols that e.g. objdump will display are based on
>   the BFD backend knowing the size of PLT entries.  Arguably this ought
>   to look at sh_entsize of .plt instead of using baked-in knowledge, but
>   it doesn't.

This seems fixable.  Of course, we could also keep the PLT the same
length by changing it.  The current PLT entries are

jmpq *GOT(sym)
pushq offset
jmpq plt0

The linker or dynamic linker initializes *GOT(sym) to point to the
second instruction in this sequence.  So we can keep the PLT at 16
bytes by simply changing it to jump somewhere else.

bnd jmpq *GOT(sym)
.skip 9

We have the linker or dynamic linker fill in *GOT(sym) to point to the
second PLT table.  When the dynamic linker is involved, we use another
DT tag to point to the second PLT.  The offsets are consistent: there
is one entry in each PLT table, so the dynamic linker can compute the
right value.  Then in the second PLT we have the sequence

pushq offset
bnd jmpq plt0

That gives the dynamic linker the offset that it needs to update
*GOT(sym) to point to the runtime symbol value.  So we get slightly
worse instruction cache handling the first time a function is called,
but after that we are the same as before.  And PLT entries are the
same size as always so everything is simpler.

The special DT tag will tell the dynamic linker to apply the special
processing.  No attribute is needed to change behaviour.  The issue
then is: a program linked in this way will not work with an old
dynamic linker, because the old dynamic linker will not initialize
GOT(sym) to the right value.  That is a problem for any scheme, so I
think that is OK.  But if that is a concern, we could actually handle
by generating two PLTs.  One conventional PLT, and another as I just
outlined.  The linker branches to the new PLT, and initializes
GOT(sym) to point to the old PLT.  The dynamic linker spots this
because it recognizes the new DT tags, and cunningly rewrites the GOT
to point to the new PLT.  Cost is an extra jump the first time a
function is called when using the old 

Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512

2013-07-24 Thread Jakub Jelinek
On Wed, Jul 24, 2013 at 07:36:31PM +0200, Richard Biener wrote:
> >Make them callee saved means we need to change ld.so to
> >preserve them and we need to change unwind library to
> >support them.  It is certainly doable.
> 
> IMHO it was a mistake to not have any callee saved xmm register in the
> original abi - we should fix this at this opportunity.  Loops with
> function calls are not that uncommon.

I've raised that earlier already.  One issue with that beyond having to
teach unwinders about this (dynamic linker if you mean only for the lazy PLT
resolving is only a matter of whether the dynamic linker itself has been
built with a compiler that would clobber those registers anywhere) is that
as history shows, the vector registers keep growing over time.
So if we reserve now either 8 or all 16 zmm16 to zmm31 registers as call
saved, do we save them as 512 bit registers, or say 1024 bit already?
If just 512 bit, then when next time the vector registers grow in size (will
they?), would we have just low parts of the 1024 bits registers call saved
and upper half call clobbered (I guess that is the case for M$Win 64-bit ABI
now, just with 128 bit vs. more).

But yeah, it would be nice to have some call saved ones.

Jakub