Re: doloop insn generated incorrectly (wrong mode)
Hi! On Wed, Oct 10, 2018 at 08:55:12PM -0400, Paul Koning wrote: [ snip ] > Note that this isn't permitted by the .md file -- the mode is wrong (QI not > HI). Other targets use an expander and check the mode explicitly in there. See rs6000 or sh for example. > It's not obvious to me how that relates to the wrong insn generated by the > compiler, but clearly it can't be good to have something that won't match > when it's time to generate the output code. It seems to me that the doloop > convertor should be checking the doloop_end pattern to make sure the mode > matches, and not apply it unless it does. > > Why is this happening, and how can I fix it (short of removing the doloop_end > pattern)? I see a comment in loop-doloop.c about handling a FAIL of the > pattern -- am I going to have to check that the wrong mode is being passed in > and FAIL if so? That is exactly what other targets do. Maybe you can improve doloop so this isn't necessary anymore? :-) Segher
Re: TLSDESC clobber ABI stability/futureproofness?
On 11/10/18 04:53, Alexandre Oliva wrote: > On Oct 10, 2018, Rich Felker wrote: >> For aarch64 at least, according to discussions I had with Szabolcs >> Nagy, there is an intent that any new extensions to the aarch64 >> register file be treated as clobbered by tlsdesc functions, rather >> than preserved. > > That's unfortunate. I'm not sure I understand the reasoning behind this > intent. Maybe we should discuss it further? > sve registers overlap with existing float registers so float register access clobbers them. so new code is suddenly not compatible with existing tlsdesc entry points in the libc. i think extensions should not cause such abi break. we could mark binaries so they fail to load on an old system instead of failing randomly at runtime, but requiring new libc for a new system is suboptimal (you cannot deploy stable linux distros then). if we update the libc then the tlsdesc entry has to save/restore all sve regs, which can be huge state (cause excessive stack usage), but more importantly suddenly the process becomes "sve enabled" even if it otherwise does not use sve at all (linux kernel keeps track of which processes use sve instructions, ones that don't can enter the kernel more quickly as the sve state does not have to be saved) i don't see a good solution for this, but in practice it's unlikely that user code would need tls access and sve together much, so it seems reasonable to just add sve registers to tlsdesc call clobber list and do the same for future extensions too (tlsdesc call will not be worse than a normal indirect call). (in principle it's possible that tlsdesc entry avoids using any float regs, but in practice that requires hackery in the libc: marking every affected translation units with -mgeneral-regs-only or similar)
Re: doloop insn generated incorrectly (wrong mode)
> On Oct 11, 2018, at 3:11 AM, Segher Boessenkool > wrote: > > Hi! > > On Wed, Oct 10, 2018 at 08:55:12PM -0400, Paul Koning wrote: > > [ snip ] > >> ... >> Why is this happening, and how can I fix it (short of removing the >> doloop_end pattern)? I see a comment in loop-doloop.c about handling a FAIL >> of the pattern -- am I going to have to check that the wrong mode is being >> passed in and FAIL if so? > > That is exactly what other targets do. Maybe you can improve doloop so > this isn't necessary anymore? :-) Maybe. For now I'll take the expand and check approach. And propose a doc patch to explain that this is needed, because it's an unusual situation. paul
RE: TechNet Asia-Pacific - AFCEA 2018 Attendee list ?
Hello, Hope you are doing good. I am not sure whether you got a chance to read my previous mail (mentioned below). Please let me know if you are interested in acquiring the complete list of attendee contacts. Hope to hear from you soon. Best Wishes, Peter From: Peter Chase [mailto:peter.ch...@conference-delegates.com] Sent: 10 October 2018 16:52 To: 'gcc@gcc.gnu.org' Subject: TechNet Asia-Pacific - AFCEA 2018 Attendee list ? Hi, Hope you are having a great day! I am glad to know that you are exhibiting at TechNet Asia-Pacific - AFCEA 2018. I wanted to let you know that we have the current attendee list of 3,870 contacts for TechNet Asia-Pacific - AFCEA 2018. We would provide the contact information including : Company name, URL/web address, Contacts name, Job titles, Mailing address, phone number and direct email address etc.. Project details: Total Contacts - 3,870 List Delivery-XLS/CSV Package options - Basic & Premium Let me know if you are interested and I shall get back to you with the pricing and other information. Best Regards, Peter Chase If you don't wish to receive our newsletters, reply back with " Opt Out " in subject line
Re: TLSDESC clobber ABI stability/futureproofness?
On Thu, Oct 11, 2018 at 12:53:04AM -0300, Alexandre Oliva wrote: > On Oct 10, 2018, Rich Felker wrote: > > > It's recently come up in musl libc development that the tlsdesc asm > > functions, at least for some archs, are potentially not future-proof, > > in that, for a given fixed version of the asm in the dynamic linker, > > it seems possible for a future ISA level and compiler supporting that > > ISA level to produce code, in the C functions called in the dynamic > > fallback case, instructions which clobber registers which are normally > > call-clobbered, but which are non-clobbered in the tlsdesc ABI. This > > does not risk breakage when an existing valid build of libc/ldso is > > used on new hardware and new appliations that provide new registers, > > but it does risk breakage if an existing source version of libc/ldso > > is built with a compiler supporting new extensions, which is difficult > > to preclude and not something we want to try to preclude. > > I understand the concern. I considered it back when I designed TLSDesc, > and my reasoning was that if the implementation of the fallback dynamic > TLS descriptor allocator could possibly use some register, the library > implementation should know about it as well, and work to preserve it. I > realize this might not be the case for an old library built by a new > compiler for a newer version of the library's target. Other pieces of > the library may fail as well, if registers unknown to it are available > and used by the compiler (setjmp/longjmp, *context, dynamic PLT > resolution come to mind), besides the usual difficulties building old > code with newer tools, so I figured it wasn't worth sacrificing the > performance of the normal TLSDesc case to make this aspect of register > set extensions easier. > > There might be another particularly risky case, namely, that the memory > allocator used by TLS descriptors be overridden by code that uses more > registers than the library knows to preserve. Memory allocation within > the dynamic loader, including lazy TLS Descriptor relocation resolution, > is a context in which we should probably use internal, non-overridable > memory allocators, if we don't already. This would reduce the present > risky case to the one in the paragraph above. This is indeed the big risk for glibc right now (with lazy, non-fail-safe allocation of dynamic TLS), but not for musl, where the only code that runs in the fallback path uses signal masking, atomics, and memcpy/memset to acquire and install the already-allocated TLS. We have precise control of what code is executing at the source level, but without strong assumptions about the compiler and the ability to restrict what parts of the ISA it uses, which we don't want to make, we only have partial control at the binary level. I had considered just pregenerating (via gcc -S at minimum ISA level) and committing flattened asm for this code path, but that does not work well for ARM where we have to support runtime-switchable implementations of the atomic primitives, and the implementation is provided by the kernel in some cases, thereby outside our control. > > For aarch64 at least, according to discussions I had with Szabolcs > > Nagy, there is an intent that any new extensions to the aarch64 > > register file be treated as clobbered by tlsdesc functions, rather > > than preserved. > > That's unfortunate. I'm not sure I understand the reasoning behind this > intent. Maybe we should discuss it further? Aside from what Szabolcs Nagy already wrote in the reply, the explanation I'd heard for why it's not a significant burdern/cost is that it's unlikely for vector-heavy code to be using TLS where the TLS address load can't be hoisted out of the blocks where the call-clobbered vector regs are in use. Generally, if such hoisting is performed, the main/only advantage of avoiding clobbers is for registers which may contain incoming arguments. > > In the x86 spec, the closest I can find are the phrasing: > > > "being able to assume no registers are clobbered by the call" > > > and the comment in the pseudo-C: > > > /* Preserve any call-clobbered registers not preserved because of > >the above across the call below. */ > > > Source: https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt > > > What is the policy for i386 and x86_64? > > I don't know that my proposal ever became authoritative policy, but even > if it is, I guess I have to agree it is underspecified and the reasoning > above could be added. > > > Are normally call-clobbered registers from new register file > > extensions intended to be preserved by the tlsdesc functions, or > > clobberable by them? > > My thinking has always been that they should be preserved, which doesn't > necessarily mean they have to be saved and restored. Only if the > implementation of tlsdesc could possibly modify them should it arrange > for their entry-point values to be restored before returnin
avoidance of lea after 5 operations?
Hey GCConauts, I've noticed some strange behavior in gcc > 5. unsigned int hmmm5(unsigned int a, unsigned int b, unsigned int c) { return a + (b << c << c << c << c << c); } This function compiles how you'd expect: mov ecx, edx sal esi, cl sal esi, cl sal esi, cl sal esi, cl sal esi, cl lea eax, [rsi+rdi] ret However, when performing more than 5 shifts, gcc switches to using add instead of lea, which then generates an extra mov instruction: unsigned int hmmm6(unsigned int a, unsigned int b, unsigned int c) { return a + (b << c << c << c << c << c << c); } Producing: mov ecx, edx mov eax, esi sal eax, cl sal eax, cl sal eax, cl sal eax, cl sal eax, cl sal eax, cl add eax, edi ret Thinking this might be a side effect of avoid_lea_for_addr, I tried setting '-mtune-ctrl=^avoid_lea_for_addr', but to no avail. I also couldn't find anything in various documentation and instruction tables regarding the latencies or scheduling of these functions that would imply switching to add after 5 operations is somehow better. I realize this is probably a fairly trivial matter, but I am very curious if somebody knows which heuristic gcc is applying here, and why exactly. It's not something done by any other compiler I could find, and it only started happening with gcc 6. Regards, Jason
Re: avoidance of lea after 5 operations?
On Thu, 11 Oct 2018, Jason A. Donenfeld wrote: > > I realize this is probably a fairly trivial matter, but I am very > curious if somebody knows which heuristic gcc is applying here, and > why exactly. It's not something done by any other compiler I could > find, and it only started happening with gcc 6. It's a change in register allocation, gcc selects eax instead of esi for the shifts. Doesn't appear to be obviously intentional, could be a bug or bad luck. Alexander
gcc-7-20181011 is now available
Snapshot gcc-7-20181011 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/7-20181011/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 7 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch revision 265067 You'll find: gcc-7-20181011.tar.xzComplete GCC SHA256=920e44d84857076dfce86c093210df011b639b53ba51b53f7ab223037f7167f5 SHA1=973c7c0cfd2a7a0e478a7894ccbd99806d444553 Diffs from 7-20181004 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-7 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: TLSDESC clobber ABI stability/futureproofness?
On Oct 11, 2018, Rich Felker wrote: > This is indeed the big risk for glibc right now (with lazy, > non-fail-safe allocation of dynamic TLS) Yeah, dynamic TLS was a can of works in that regard even before lazy TLS relocations. > that it's unlikely for vector-heavy code to be using TLS where the TLS > address load can't be hoisted out of the blocks where the > call-clobbered vector regs are in use. Generally, if such hoisting is > performed, the main/only advantage of avoiding clobbers is for > registers which may contain incoming arguments. I see. Well, the more registers are preserved, the better for the ideal fast path, but even if some are not, you're still better off than explicitly calling tls_get_addr... > unless there is some future-proof approach to > save-all/restore-all that works on all archs with TLSDESC Please don't single-out TLSDESC as if the problem affected it alone. Lazy relocation with traditional PLT entries for functions are also supposed to save and restore all registers, and the same issues arise, except they're a lot more common. The only difference is that failures to preserve registers are less visible, because most of the time you're resolving them to functions that abide by the normal ABI, but once specialized calling conventions kick in, the very same issues arise. TLS descriptors are just one case of such specialized calling conventions. Indeed, one of the reasons that made me decide this arrangement was acceptable was precisely because the problem already existed with preexisting lazy PLT resolution. -- Alexandre Oliva, freedom fighter https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe
Re: TLSDESC clobber ABI stability/futureproofness?
On Thu, Oct 11, 2018 at 08:18:37PM -0300, Alexandre Oliva wrote: > On Oct 11, 2018, Rich Felker wrote: > > > This is indeed the big risk for glibc right now (with lazy, > > non-fail-safe allocation of dynamic TLS) > > Yeah, dynamic TLS was a can of works in that regard even before lazy TLS > relocations. > > > that it's unlikely for vector-heavy code to be using TLS where the TLS > > address load can't be hoisted out of the blocks where the > > call-clobbered vector regs are in use. Generally, if such hoisting is > > performed, the main/only advantage of avoiding clobbers is for > > registers which may contain incoming arguments. > > I see. Well, the more registers are preserved, the better for the ideal > fast path, but even if some are not, you're still better off than > explicitly calling tls_get_addr... Also, it seems gcc is failing to do this hoisting right on x86_64 right now, regardless of which TLS model is used. Bug report should be coming soon. > > unless there is some future-proof approach to > > save-all/restore-all that works on all archs with TLSDESC > > Please don't single-out TLSDESC as if the problem affected it alone. > Lazy relocation with traditional PLT entries for functions are also > supposed to save and restore all registers, and the same issues arise, Right, but lazy relocations are a "feature" you can easily just omit, and we do. However the only way to omit this path from TLSDESC is installing the new TLS to all live threads at dlopen time. That's actually not a bad idea -- it drops the compare/branch from the dynamic tlsdesc code path, and likewise in __tls_get_addr, making both forms of dynamic TLS (possibly considerably) faster. I'm just concerned about whether it can be done without making thread creation/exit significantly slower. > except they're a lot more common. The only difference is that failures > to preserve registers are less visible, because most of the time you're > resolving them to functions that abide by the normal ABI, but once > specialized calling conventions kick in, the very same issues arise. > TLS descriptors are just one case of such specialized calling > conventions. Indeed, one of the reasons that made me decide this > arrangement was acceptable was precisely because the problem already > existed with preexisting lazy PLT resolution. I see. From that perspective, it's less of a constraint than constraints that already existed elsewhere. Unfortunately from our perspective in musl those greater constraints don't exist, and the one imposed by TLSDESC is the unique one of its kind. Rich