[RFC] memcpy/strcpy on unitialized memory optimization and custom entry point
Hi, This is another comment on builtin/header comments. While for alignment/size bounds there is no reason to use builtin as gcc should also optimize header version with if (__builtin_constant_p(n<42) && n < 42) there is information that you could get only with gcc help. Its that while speculative loads are easy library couldn't do speculative writes. One of performance issues is that copying single byte with library routine is expensive. Mainly as its at tail of unpredicted branches as bigger range is more probable, a extreme would be my following benchmark where copying one byte with avx2 memcpy is slower than copying 700 bytes as benchmark caused loop to be likely and well predicted while one byte is quite unlikely http://kam.mff.cuni.cz/~ondra/benchmark_string/haswell/memcpy_profile/results_rand/result.html Now as proposal in reality most of time increasing memcpy size to multiple of 8 would improve performance and didn't change semantics of program as these bytes are not accessed by application. Same with strcpy that would just copy 8-byte blocks instead spending time finding correct size. However that would be hard to do. Also this applies to vectorizer in general which with freshly allocated memory could afford simpler path that does few extra writes. So instead there could be gcc optimization that detects this by checking that its just allocated memory so writing beyond allocated boundary wouldn't change that its uninitialized. For example here you could do only one 8-byte load/store instead of three. #include char foo[8]; char *fo() { char bar[8]; memcpy(bar, foo, 7); return strdup(bar); } When exact size isn't know it would call a function say memcpy_j/strcpy_j that could write extra bytes until 8-byte boundary, I plan these to speedup strdup. So comments, could it be generalized more? Or it is too much work without much benefit?
Re: [RFC] Formation of vector function name
On Mon, 15 Jun 2015, Andrew Pinski wrote: > > results in asm redirection for log to __log_finite and final vector > > function name becomes _ZGVbN2v___log_finite. > > > > With point of view from C Library side, it reflects in addition of asm > > redirections _ZGVbN2v___log_finite = _ZGVbN2v_log in the headers. > > > > May be the cleaner way is to base vector name on original name of > > declaration, not asm declaration name (use DECL_NAME instead of > > DECL_ASSEMBLER_NAME)? > > > I don't think this would be useful really because if you have a > function say logl where you have two options of long double, you want > to support both you would name one logl and the other logl128 and then > using DECL_ASSEMBLER_NAME to from the SIMD name would be useful to use > the one with logl128 in it rather than logl. The point is that the vector versions may not be in one-to-one correspondence with the scalar versions - you might have several different scalar versions depending on compiler options, all of which correspond to a single vector version. -- Joseph S. Myers jos...@codesourcery.com
gcc-5-20150616 is now available
Snapshot gcc-5-20150616 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/5-20150616/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-5-branch revision 224539 You'll find: gcc-5-20150616.tar.bz2 Complete GCC MD5=1612373698646b35ac4a8ceb8fd9422d SHA1=4c651a8e51b78485e05506b8e7d700333e0c3174 Diffs from 5-20150609 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Question about DRAP register and reserving hard registers
I have a question about the DRAP register (used for dynamic stack alignment) and about reserving/using hard registers in general. I am trying to understand where, if a drap register is allocated, GCC is told not to use it during general register allocation. There must be some code somewhere for this but I cannot find it. I am trying to implement dynamic stack alignment on MIPS and because there is so much code for the x86 dynamic stack alignment I am trying to incorporate bits of it as I understrand what I need instead of just turning it all on at once and getting completely lost. Right now I am using register 16 on MIPS to access incoming arguments in a function that needs dynamic alignment, so it is my drap register if my understanding of the x86 code and its use of a DRAP register is correct. I copy the stack pointer into reg 16 before I align the stack pointer (during expand_prologue). So far the only way I have found to stop the register allocator from also using reg 16 and thus messing up its value is to set fixed_regs[16]. But I don't see the x86 doing this for its DRAP register and I was wondering how it is handled there. I think setting fixed_regs[16] is why C++ tests with exception handling are not working for me because this register is not getting set and restored (since it is thought to be fixed) during code that uses throw and catch. Steve Ellcey sell...@imgtec.com