Re: doloop insn generated incorrectly (wrong mode)

2018-10-11 Thread Segher Boessenkool
Hi!

On Wed, Oct 10, 2018 at 08:55:12PM -0400, Paul Koning wrote:

[ snip ]

> Note that this isn't permitted by the .md file -- the mode is wrong (QI not 
> HI).

Other targets use an expander and check the mode explicitly in there.  See
rs6000 or sh for example.

> It's not obvious to me how that relates to the wrong insn generated by the 
> compiler, but clearly it can't be good to have something that won't match 
> when it's time to generate the output code.  It seems to me that the doloop 
> convertor should be checking the doloop_end pattern to make sure the mode 
> matches, and not apply it unless it does.  
> 
> Why is this happening, and how can I fix it (short of removing the doloop_end 
> pattern)?  I see a comment in loop-doloop.c about handling a FAIL of the 
> pattern -- am I going to have to check that the wrong mode is being passed in 
> and FAIL if so?

That is exactly what other targets do.  Maybe you can improve doloop so
this isn't necessary anymore?  :-)


Segher


Re: TLSDESC clobber ABI stability/futureproofness?

2018-10-11 Thread Szabolcs Nagy
On 11/10/18 04:53, Alexandre Oliva wrote:
> On Oct 10, 2018, Rich Felker  wrote:
>> For aarch64 at least, according to discussions I had with Szabolcs
>> Nagy, there is an intent that any new extensions to the aarch64
>> register file be treated as clobbered by tlsdesc functions, rather
>> than preserved.
> 
> That's unfortunate.  I'm not sure I understand the reasoning behind this
> intent.  Maybe we should discuss it further?
> 

sve registers overlap with existing float registers
so float register access clobbers them.

so new code is suddenly not compatible with existing
tlsdesc entry points in the libc.

i think extensions should not cause such abi break.
we could mark binaries so they fail to load on an old
system instead of failing randomly at runtime, but
requiring new libc for a new system is suboptimal
(you cannot deploy stable linux distros then).

if we update the libc then the tlsdesc entry has to
save/restore all sve regs, which can be huge state
(cause excessive stack usage), but more importantly
suddenly the process becomes "sve enabled" even if it
otherwise does not use sve at all (linux kernel keeps
track of which processes use sve instructions, ones
that don't can enter the kernel more quickly as the
sve state does not have to be saved)

i don't see a good solution for this, but in practice
it's unlikely that user code would need tls access and
sve together much, so it seems reasonable to just add
sve registers to tlsdesc call clobber list and do the
same for future extensions too (tlsdesc call will not
be worse than a normal indirect call).

(in principle it's possible that tlsdesc entry avoids
using any float regs, but in practice that requires
hackery in the libc: marking every affected translation
units with -mgeneral-regs-only or similar)


Re: doloop insn generated incorrectly (wrong mode)

2018-10-11 Thread Paul Koning



> On Oct 11, 2018, at 3:11 AM, Segher Boessenkool  
> wrote:
> 
> Hi!
> 
> On Wed, Oct 10, 2018 at 08:55:12PM -0400, Paul Koning wrote:
> 
> [ snip ]
> 
>> ...
>> Why is this happening, and how can I fix it (short of removing the 
>> doloop_end pattern)?  I see a comment in loop-doloop.c about handling a FAIL 
>> of the pattern -- am I going to have to check that the wrong mode is being 
>> passed in and FAIL if so?
> 
> That is exactly what other targets do.  Maybe you can improve doloop so
> this isn't necessary anymore?  :-)

Maybe.  For now I'll take the expand and check approach.  And propose a doc 
patch to explain that this is needed, because it's an unusual situation. 

paul

RE: TechNet Asia-Pacific - AFCEA 2018 Attendee list ?

2018-10-11 Thread Peter Chase
Hello,  

 

Hope you are doing good. I am not sure whether you got a chance to read my
previous mail (mentioned below). 

 

Please let me know if you are interested in acquiring the complete list of
attendee contacts.

 

Hope to hear from you soon.

 

Best Wishes,

Peter

 

From: Peter Chase [mailto:peter.ch...@conference-delegates.com] 
Sent: 10 October 2018 16:52
To: 'gcc@gcc.gnu.org'
Subject: TechNet Asia-Pacific - AFCEA 2018 Attendee list ?

 

Hi,

 

Hope you are having a great day! 

 

I am glad to know that you are exhibiting at TechNet Asia-Pacific - AFCEA
2018. I wanted to let you know that we have the current attendee list of
3,870 contacts for TechNet Asia-Pacific - AFCEA 2018.

 

We would provide the contact information including : Company name, URL/web
address, Contacts name, Job titles, Mailing address, phone number and direct
email address etc..

 

Project details:

Total Contacts - 3,870

List Delivery-XLS/CSV

Package options  - Basic & Premium

 

Let me know if you are interested and I shall get back to you with the
pricing and other information.

 

 

Best Regards,

Peter Chase

 

 

If you don't wish to receive our newsletters, reply back with " Opt Out " in
subject line



Re: TLSDESC clobber ABI stability/futureproofness?

2018-10-11 Thread Rich Felker
On Thu, Oct 11, 2018 at 12:53:04AM -0300, Alexandre Oliva wrote:
> On Oct 10, 2018, Rich Felker  wrote:
> 
> > It's recently come up in musl libc development that the tlsdesc asm
> > functions, at least for some archs, are potentially not future-proof,
> > in that, for a given fixed version of the asm in the dynamic linker,
> > it seems possible for a future ISA level and compiler supporting that
> > ISA level to produce code, in the C functions called in the dynamic
> > fallback case, instructions which clobber registers which are normally
> > call-clobbered, but which are non-clobbered in the tlsdesc ABI. This
> > does not risk breakage when an existing valid build of libc/ldso is
> > used on new hardware and new appliations that provide new registers,
> > but it does risk breakage if an existing source version of libc/ldso
> > is built with a compiler supporting new extensions, which is difficult
> > to preclude and not something we want to try to preclude.
> 
> I understand the concern.  I considered it back when I designed TLSDesc,
> and my reasoning was that if the implementation of the fallback dynamic
> TLS descriptor allocator could possibly use some register, the library
> implementation should know about it as well, and work to preserve it.  I
> realize this might not be the case for an old library built by a new
> compiler for a newer version of the library's target.  Other pieces of
> the library may fail as well, if registers unknown to it are available
> and used by the compiler (setjmp/longjmp, *context, dynamic PLT
> resolution come to mind), besides the usual difficulties building old
> code with newer tools, so I figured it wasn't worth sacrificing the
> performance of the normal TLSDesc case to make this aspect of register
> set extensions easier.
> 
> There might be another particularly risky case, namely, that the memory
> allocator used by TLS descriptors be overridden by code that uses more
> registers than the library knows to preserve.  Memory allocation within
> the dynamic loader, including lazy TLS Descriptor relocation resolution,
> is a context in which we should probably use internal, non-overridable
> memory allocators, if we don't already.  This would reduce the present
> risky case to the one in the paragraph above.

This is indeed the big risk for glibc right now (with lazy,
non-fail-safe allocation of dynamic TLS), but not for musl, where the
only code that runs in the fallback path uses signal masking, atomics,
and memcpy/memset to acquire and install the already-allocated TLS. We
have precise control of what code is executing at the source level,
but without strong assumptions about the compiler and the ability to
restrict what parts of the ISA it uses, which we don't want to make,
we only have partial control at the binary level.

I had considered just pregenerating (via gcc -S at minimum ISA level)
and committing flattened asm for this code path, but that does not
work well for ARM where we have to support runtime-switchable
implementations of the atomic primitives, and the implementation is
provided by the kernel in some cases, thereby outside our control.

> > For aarch64 at least, according to discussions I had with Szabolcs
> > Nagy, there is an intent that any new extensions to the aarch64
> > register file be treated as clobbered by tlsdesc functions, rather
> > than preserved.
> 
> That's unfortunate.  I'm not sure I understand the reasoning behind this
> intent.  Maybe we should discuss it further?

Aside from what Szabolcs Nagy already wrote in the reply, the
explanation I'd heard for why it's not a significant burdern/cost is
that it's unlikely for vector-heavy code to be using TLS where the TLS
address load can't be hoisted out of the blocks where the
call-clobbered vector regs are in use. Generally, if such hoisting is
performed, the main/only advantage of avoiding clobbers is for
registers which may contain incoming arguments.

> > In the x86 spec, the closest I can find are the phrasing:
> 
> > "being able to assume no registers are clobbered by the call"
> 
> > and the comment in the pseudo-C:
> 
> > /* Preserve any call-clobbered registers not preserved because of
> >the above across the call below.  */
> 
> > Source: https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt
> 
> > What is the policy for i386 and x86_64?
> 
> I don't know that my proposal ever became authoritative policy, but even
> if it is, I guess I have to agree it is underspecified and the reasoning
> above could be added.
> 
> > Are normally call-clobbered registers from new register file
> > extensions intended to be preserved by the tlsdesc functions, or
> > clobberable by them?
> 
> My thinking has always been that they should be preserved, which doesn't
> necessarily mean they have to be saved and restored.  Only if the
> implementation of tlsdesc could possibly modify them should it arrange
> for their entry-point values to be restored before returnin

avoidance of lea after 5 operations?

2018-10-11 Thread Jason A. Donenfeld
Hey GCConauts,

I've noticed some strange behavior in gcc > 5.

unsigned int hmmm5(unsigned int a, unsigned int b, unsigned int c)
{
return a + (b << c << c << c << c << c);
}

This function compiles how you'd expect:

mov ecx, edx
sal esi, cl
sal esi, cl
sal esi, cl
sal esi, cl
sal esi, cl
lea eax, [rsi+rdi]
ret

However, when performing more than 5 shifts, gcc switches to using add
instead of lea, which then generates an extra mov instruction:

unsigned int hmmm6(unsigned int a, unsigned int b, unsigned int c)
{
return a + (b << c << c << c << c << c << c);
}

Producing:

mov ecx, edx
mov eax, esi
sal eax, cl
sal eax, cl
sal eax, cl
sal eax, cl
sal eax, cl
sal eax, cl
add eax, edi
ret

Thinking this might be a side effect of avoid_lea_for_addr, I tried
setting '-mtune-ctrl=^avoid_lea_for_addr', but to no avail.

I also couldn't find anything in various documentation and instruction
tables regarding the latencies or scheduling of these functions that
would imply switching to add after 5 operations is somehow better.

I realize this is probably a fairly trivial matter, but I am very
curious if somebody knows which heuristic gcc is applying here, and
why exactly. It's not something done by any other compiler I could
find, and it only started happening with gcc 6.

Regards,
Jason


Re: avoidance of lea after 5 operations?

2018-10-11 Thread Alexander Monakov
On Thu, 11 Oct 2018, Jason A. Donenfeld wrote:
> 
> I realize this is probably a fairly trivial matter, but I am very
> curious if somebody knows which heuristic gcc is applying here, and
> why exactly. It's not something done by any other compiler I could
> find, and it only started happening with gcc 6.

It's a change in register allocation, gcc selects eax instead of esi
for the shifts. Doesn't appear to be obviously intentional, could be
a bug or bad luck.

Alexander


gcc-7-20181011 is now available

2018-10-11 Thread gccadmin
Snapshot gcc-7-20181011 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/7-20181011/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 7 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch 
revision 265067

You'll find:

 gcc-7-20181011.tar.xzComplete GCC

  SHA256=920e44d84857076dfce86c093210df011b639b53ba51b53f7ab223037f7167f5
  SHA1=973c7c0cfd2a7a0e478a7894ccbd99806d444553

Diffs from 7-20181004 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-7
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: TLSDESC clobber ABI stability/futureproofness?

2018-10-11 Thread Alexandre Oliva
On Oct 11, 2018, Rich Felker  wrote:

> This is indeed the big risk for glibc right now (with lazy,
> non-fail-safe allocation of dynamic TLS)

Yeah, dynamic TLS was a can of works in that regard even before lazy TLS
relocations.

> that it's unlikely for vector-heavy code to be using TLS where the TLS
> address load can't be hoisted out of the blocks where the
> call-clobbered vector regs are in use. Generally, if such hoisting is
> performed, the main/only advantage of avoiding clobbers is for
> registers which may contain incoming arguments.

I see.  Well, the more registers are preserved, the better for the ideal
fast path, but even if some are not, you're still better off than
explicitly calling tls_get_addr...

> unless there is some future-proof approach to
> save-all/restore-all that works on all archs with TLSDESC

Please don't single-out TLSDESC as if the problem affected it alone.
Lazy relocation with traditional PLT entries for functions are also
supposed to save and restore all registers, and the same issues arise,
except they're a lot more common.  The only difference is that failures
to preserve registers are less visible, because most of the time you're
resolving them to functions that abide by the normal ABI, but once
specialized calling conventions kick in, the very same issues arise.
TLS descriptors are just one case of such specialized calling
conventions.  Indeed, one of the reasons that made me decide this
arrangement was acceptable was precisely because the problem already
existed with preexisting lazy PLT resolution.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


Re: TLSDESC clobber ABI stability/futureproofness?

2018-10-11 Thread Rich Felker
On Thu, Oct 11, 2018 at 08:18:37PM -0300, Alexandre Oliva wrote:
> On Oct 11, 2018, Rich Felker  wrote:
> 
> > This is indeed the big risk for glibc right now (with lazy,
> > non-fail-safe allocation of dynamic TLS)
> 
> Yeah, dynamic TLS was a can of works in that regard even before lazy TLS
> relocations.
> 
> > that it's unlikely for vector-heavy code to be using TLS where the TLS
> > address load can't be hoisted out of the blocks where the
> > call-clobbered vector regs are in use. Generally, if such hoisting is
> > performed, the main/only advantage of avoiding clobbers is for
> > registers which may contain incoming arguments.
> 
> I see.  Well, the more registers are preserved, the better for the ideal
> fast path, but even if some are not, you're still better off than
> explicitly calling tls_get_addr...

Also, it seems gcc is failing to do this hoisting right on x86_64
right now, regardless of which TLS model is used. Bug report should be
coming soon.

> > unless there is some future-proof approach to
> > save-all/restore-all that works on all archs with TLSDESC
> 
> Please don't single-out TLSDESC as if the problem affected it alone.
> Lazy relocation with traditional PLT entries for functions are also
> supposed to save and restore all registers, and the same issues arise,

Right, but lazy relocations are a "feature" you can easily just omit,
and we do. However the only way to omit this path from TLSDESC is
installing the new TLS to all live threads at dlopen time. That's
actually not a bad idea -- it drops the compare/branch from the
dynamic tlsdesc code path, and likewise in __tls_get_addr, making both
forms of dynamic TLS (possibly considerably) faster. I'm just
concerned about whether it can be done without making thread
creation/exit significantly slower.

> except they're a lot more common.  The only difference is that failures
> to preserve registers are less visible, because most of the time you're
> resolving them to functions that abide by the normal ABI, but once
> specialized calling conventions kick in, the very same issues arise.
> TLS descriptors are just one case of such specialized calling
> conventions.  Indeed, one of the reasons that made me decide this
> arrangement was acceptable was precisely because the problem already
> existed with preexisting lazy PLT resolution.

I see. From that perspective, it's less of a constraint than
constraints that already existed elsewhere. Unfortunately from our
perspective in musl those greater constraints don't exist, and the one
imposed by TLSDESC is the unique one of its kind.

Rich