[RFC] memcpy/strcpy on unitialized memory optimization and custom entry point

2015-06-16 Thread Ondřej Bílka
Hi,

This is another comment on builtin/header comments. While for
alignment/size bounds there is no reason to use builtin as gcc should
also optimize header version with 
if (__builtin_constant_p(n<42) && n < 42)

there is information that you could get only with gcc help. Its that
while speculative loads are easy library couldn't do speculative writes. 

One of performance issues is that copying single byte with library
routine is expensive. Mainly as its at tail of unpredicted branches as
bigger range is more probable, a extreme would be my following benchmark
where copying one byte with avx2 memcpy is slower than copying 700 bytes
as benchmark caused loop to be likely and well predicted while one byte
is quite unlikely
http://kam.mff.cuni.cz/~ondra/benchmark_string/haswell/memcpy_profile/results_rand/result.html

Now as proposal in reality most of time increasing memcpy size to
multiple of 8 would improve performance and didn't change semantics of
program as these bytes are not accessed by application. Same with strcpy
that would just copy 8-byte blocks instead spending time finding correct
size. However that would be hard to do.

Also this applies to vectorizer in general which with freshly allocated
memory could afford simpler path that does few extra writes.

So instead there could be gcc optimization that detects this by checking
that its just allocated memory so writing beyond allocated boundary
wouldn't change that its uninitialized.

For example here you could do only one 8-byte load/store instead of three.

#include 
char foo[8];
char *fo()
{
  char bar[8];
  memcpy(bar, foo, 7);
  return strdup(bar);
}

When exact size isn't know it would call a function say
memcpy_j/strcpy_j that could write extra bytes until 8-byte boundary, I
plan these to speedup strdup.

So comments, could it be generalized more? Or it is too much work
without much benefit?


Re: [RFC] Formation of vector function name

2015-06-16 Thread Joseph Myers
On Mon, 15 Jun 2015, Andrew Pinski wrote:

> > results in asm redirection for log to __log_finite and final vector
> > function name becomes _ZGVbN2v___log_finite.
> >
> > With point of view from C Library side, it reflects in addition of asm
> > redirections _ZGVbN2v___log_finite = _ZGVbN2v_log in the headers.
> >
> > May be the cleaner way is to base vector name on original name of
> > declaration, not asm declaration name (use DECL_NAME instead of
> > DECL_ASSEMBLER_NAME)?
> 
> 
> I don't think this would be useful really because if you have a
> function say logl where you have two options of long double, you want
> to support both you would name one logl and the other logl128 and then
> using DECL_ASSEMBLER_NAME to from the SIMD name would be useful to use
> the one with logl128 in it rather than logl.

The point is that the vector versions may not be in one-to-one 
correspondence with the scalar versions - you might have several different 
scalar versions depending on compiler options, all of which correspond to 
a single vector version.

-- 
Joseph S. Myers
jos...@codesourcery.com


gcc-5-20150616 is now available

2015-06-16 Thread gccadmin
Snapshot gcc-5-20150616 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/5-20150616/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-5-branch 
revision 224539

You'll find:

 gcc-5-20150616.tar.bz2   Complete GCC

  MD5=1612373698646b35ac4a8ceb8fd9422d
  SHA1=4c651a8e51b78485e05506b8e7d700333e0c3174

Diffs from 5-20150609 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Question about DRAP register and reserving hard registers

2015-06-16 Thread Steve Ellcey

I have a question about the DRAP register (used for dynamic stack alignment)
and about reserving/using hard registers in general.  I am trying to understand
where, if a drap register is allocated, GCC is told not to use it during
general register allocation.  There must be some code somewhere for this
but I cannot find it.

I am trying to implement dynamic stack alignment on MIPS and because there
is so much code for the x86 dynamic stack alignment I am trying to incorporate
bits of it as I understrand what I need instead of just turning it all on
at once and getting completely lost.

Right now I am using register 16 on MIPS to access incoming arguments
in a function that needs dynamic alignment, so it is my drap register if
my understanding of the x86 code and its use of a DRAP register is correct.
I copy the stack pointer into reg 16 before I align the stack pointer
(during expand_prologue).  So far the only way I have found to stop the
register allocator from also using reg 16 and thus messing up its value is to
set fixed_regs[16].  But I don't see the x86 doing this for its DRAP register
and I was wondering how it is handled there.

I think setting fixed_regs[16] is why C++ tests with exception handling are
not working for me because this register is not getting set and restored
(since it is thought to be fixed) during code that uses throw and catch.

Steve Ellcey
sell...@imgtec.com