Re: k-byte memset/memcpy/strlen builtins

2017-01-12 Thread Robin Dapp
> Yes, for memset with larger element we could add an optab plus
> internal function combination and use that when the target wants.  Or
> always use such IFN and fall back to loopy expansion.

So, adding additional patterns in tree-loop-distribute.c (and mapping
them to dedicated optabs) is fine? Or does the yes refer to the
"else"/"or" part of my question (how would the backend recognize the
patterns then)?

> I'd say a multibyte memchr might make sense, but strlen specifically?
> Not sure.

ok, memchr would also work for the snippet I have in mind.

Regards
 Robin



Re: input address reload issue

2017-01-12 Thread Aurelien Buhrig
On 09/01/2017 19:35, Jeff Law wrote:
> On 01/09/2017 07:02 AM, Aurelien Buhrig wrote:
>> On 06/01/2017 17:06, Jeff Law wrote:
>>> On 01/06/2017 03:20 AM, Aurelien Buhrig wrote:

>> So the insn:
>> (set (reg:QI 0 r0) (mem:QI (plus:SI (reg:SI 2 r2)(const_int 1))
>>
>> is transformed into:
>> (set (reg:SI 8 a0) (reg:SI 2 r2))
>> (set (reg:SI 8 a0) (const_int 1))
>> (set (reg:SI 8 a0) (plus:SI (reg:SI 8 a0) (reg:SI 8 a0)))
>> (set (reg:QI 0 r0) (mem:QI (reg:SI 8 a0))
>>
>> This "basic" transformation requires two reload regs, but only
>> one is
>> given/used/possible from the rl structure (in emit_reload_insns).
>>
>> So where does the issue comes from? The need for 2 reload regs, the
>> transformation which is too "basic" and could be optimized to use
>> only
>> one reload reg, or any wrong/missing reload target hook?
> Sounds like you need secondary or intermediate reloads.
 Do you suggest the secondary reload must implement a scratch reg & md
 pattern to implement this reload?
>>> Perhaps.  I don't know enough about your architecture to be 100% sure
>>> about how all the pieces interact with each other -- reload, and
>>> secondary reloads in particular are a complex area.  I'm largely going
>>> on your comment that you need 2 reload registers.
>>>
>>> Presumably you don't have an instruction for
>>> (set (reg) (plus (reg) (const_int)))
>>>
>>> Thus you need two scratch reload regs IIUC.  One to hold r2 another to
>>> hold (const_int 1).  So you'd want to generate
>>>
>>> (set (areg1) (reg r2))
>>> (set (areg2) (const_int 1))
>>> (set (areg1) (plus (areg1) (areg2)
>>> (set (r0) (mem (areg1))
>>>
>>> Or something along those lines.  If you're going to stick with reload,
>>> you'll likely want to dig into find_reloads_address and its children
>>> to see what reloads it generates and why (debug_reload can be helpful
>>> here).
>>
>> Thank you very much Jeff for your help. The best mapping for my target
>> would be:
>> (set areg1 r2) (set r0 (mem:QI (areg1 + 1)), and only 1 scratch would be
>> needed.
> So if you've got reg+d addressing modes, then I'd start by looking at
> your GO_IF_LEGITIMATE_ADDRESS, REG_OK_FOR_*_P, etc.  Reload ought to
> be able to reload r2 into an address register without defining
> secondary reloads.  From what you've described, it's comparable to any
> target that has split address/data registers.
TARGET_LEGITIMATE_ADDRESS and REGNO_OK_FOR_BASE_P are OK. After trying
to debug emit_input_reload, I decided to give LRA a try (and remove cc0).
This fixes my reload issue !

Thanks for your help and advice.
Aurélien


Re: k-byte memset/memcpy/strlen builtins

2017-01-12 Thread Richard Biener
On Thu, Jan 12, 2017 at 9:26 AM, Robin Dapp  wrote:
>> Yes, for memset with larger element we could add an optab plus
>> internal function combination and use that when the target wants.  Or
>> always use such IFN and fall back to loopy expansion.
>
> So, adding additional patterns in tree-loop-distribute.c (and mapping
> them to dedicated optabs) is fine? Or does the yes refer to the
> "else"/"or" part of my question (how would the backend recognize the
> patterns then)?

Yes, enhancing tree-loop-distribution.c with extra patterns is fine (hey, there
were supposed to be patterns for all of lapack & friends ... ;))

The question is only whether loop-distribution should always create the IFN
or just if the backend has an optab so expansion is trivial (rather than
needing to re-build a loop doing the operation).

Richard.

>> I'd say a multibyte memchr might make sense, but strlen specifically?
>> Not sure.
>
> ok, memchr would also work for the snippet I have in mind.
>
> Regards
>  Robin
>


TLS run-time requirements on x86, etc.

2017-01-12 Thread Joel Sherrill

Hi

I am looking at the RTEMS x86 TLS support. When -fPIC is
specified, gcc generates calls to ___tls_get_addr(). But
when it is not specified, there are no external calls.
To make sure we are doing the right thing, I have a
few questions:

+ What is expected for "get TLS" when __tls_get_addr()
is not called?

+ Can we force all RTEMS i386 gcc configurations to make the
function call to __tls_get_addr()? We have an implementation
of it for ARM so it would be the same. So this would
be a simple solution for us.

+ Is there any reference material to show what is expected
for each architecture? I think the MIPS generates an illegal
instruction and you end up doing TLS in there. It would be
easier if we could configure gcc to make subroutine calls
to __tls_get_addr() instead generically?

Anything else about TLS and run-time requirements we
should know?

--
Joel Sherrill, Ph.D. Director of Research & Development
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35806
Support Available(256) 722-9985


Re: TLS run-time requirements on x86, etc.

2017-01-12 Thread Jakub Jelinek
On Thu, Jan 12, 2017 at 11:22:58AM -0600, Joel Sherrill wrote:
> I am looking at the RTEMS x86 TLS support. When -fPIC is
> specified, gcc generates calls to ___tls_get_addr(). But
> when it is not specified, there are no external calls.
> To make sure we are doing the right thing, I have a
> few questions:
> 
> + What is expected for "get TLS" when __tls_get_addr()
> is not called?
> 
> + Can we force all RTEMS i386 gcc configurations to make the
> function call to __tls_get_addr()? We have an implementation
> of it for ARM so it would be the same. So this would
> be a simple solution for us.

Please read:
https://www.akkadia.org/drepper/tls.pdf

Jakub


Re: k-byte memset/memcpy/strlen builtins

2017-01-12 Thread Martin Sebor

On 01/11/2017 09:16 AM, Robin Dapp wrote:

Hi,

When examining the performance of some test cases on s390 I realized
that we could do better for constructs like 2-byte memcpys or
2-byte/4-byte memsets. Due to some s390-specific architectural
properties, we could be faster by e.g. avoiding excessive unrolling and
using dedicated memory instructions (or similar).


There are at least two enhancement requests in Bugzilla to improve
memcmp when one or more of the arguments are constant: bugs 12086
and 78257 (the former for constant small lengths and the latter
for constant byte arrays).  It seems that one or both of these might
also benefit from some of your ideas and/or vice versa.

Martin


For 1-byte memset/memcpy the builtin functions provide a straightforward
way to achieve this. At first sight it seemed possible to extend
tree-loop-distribution.c to include the additional variants we need.
However, multibyte memsets/memcpys are not covered by the C standard and
I'm therefore unsure if such an approach is preferable or if there are
more idiomatic ways or places where to add the functionality.

The same question goes for 2-byte strlen. I didn't see a recognition
pattern for strlen (apart from optimizations due to known string length
in tree-ssa-strlen.c). Would it make sense to include strlen recognition
and subsequently handling for 2-byte strlen? The situation might of
course more complicated than memset because of encodings etc. My snippet
in question used a fixed-length encoding of 2 bytes, however.

Another simple idea to tackle this would be a peephole optimization but
I'm not sure if this is really feasible for something like memset.
Wouldn't the peephole have to be recursive then?

Regards
 Robin





Re: Throwing exceptions from a .so linked with -static-lib* ?

2017-01-12 Thread Yuri Gribov
On Thu, Jan 12, 2017 at 5:06 AM, Paul Smith  wrote:
> TL;DR:
> I have an issue where if I have a .so linked with -static-lib* making
> all STL symbols private, and if I throw an exception out of that .so to
> be caught by the caller, then I get a SIGABRT from a gcc_assert() down
> in the guts of the signal handling:
>
> #0  0x7773a428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x7773c02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x0040e938 in _Unwind_SetGR (context=, 
> index=, val=) at 
> /usr/src/cc/gcc-6.2.0/libgcc/unwind-dw2.c:271
> 271   gcc_assert (index < (int) sizeof(dwarf_reg_size_table));
>
> Should it be possible to do this successfully or am I doomed to failure?
> More details and a test case below.
>
>
> More detail:
>
> I'm trying to distribute a shared library built with the latest version
> of C++ (well, GCC 6.2 with C++14) on GNU/Linux.  I compile it with an
> older sysroot, taken from RHEL 6.3 (glibc 2.12) so it will run on older
> systems.
>
> My .so is written in C++ and programs that link it will also be written
> in C++ although they may be compiled and linked with potentially much
> older versions of GCC (like, 4.9 etc.)  I'm not worried about programs
> compiled with clang or whatever at this point.
>
> Because I want to use new C++ but want users to be able to use my .so on
> older systems, I link with -static-libgcc -static-libstdc++.  Because I
> don't want to worry about security issues etc. in system libraries, I
> don't link anything else statically.
>
> I also use a linker script to force all symbols (even libstdc++ symbols)
> to be private to my shared library except the ones I want to publish.
>
> Using "nm | grep ' [A-TV-Z] '" I can see that no other symbols besides
> mine are public.
>
> However, if my library throws an exception which I expect to be handled
> by the program linking my library, then I get a SIGABRT, as above; the
> full backtrace is:
>
> #0  0x7773a428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x7773c02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x0040e938 in _Unwind_SetGR (context=, 
> index=, val=) at 
> /usr/src/cc/gcc-6.2.0/libgcc/unwind-dw2.c:271

That seems to be
  gcc_assert (index < (int) sizeof(dwarf_reg_size_table));
which _probably_ means that unwinder is using broken unwind table.

Note that documentation for -static-libgcc explicitly mentions that
   There are several situations in which an application should
use the shared libgcc instead of the static version.  The most
   common of these is when the application wishes to throw and
catch exceptions across different shared libraries.  In that case,
   each of the libraries as well as the application itself
should use the shared libgcc.
Removing -static-libgcc fixes problem with your reprocase.

My (most probably wrong) guess is that unwind tables for main
executable and library get registered in two different instances of
libgcc (one in libgcc_s.so loaded by myprog and one statically linked
into mylib.so) which prevents normal exception propagation.

Also this thead may be relevant:
https://gcc.gnu.org/ml/gcc-help/2009-12/msg00186.html

> #3  0x004012a2 in __gxx_personality_v0 ()
> #4  0x77feb903 in _Unwind_RaiseException_Phase2 
> (exc=exc@entry=0x43b890, context=context@entry=0x7fffe330) at 
> /usr/src/cc/gcc-6.2.0/libgcc/unwind.inc:62
> #5  0x77febf8a in _Unwind_RaiseException (exc=0x43b890) at 
> /usr/src/cc/gcc-6.2.0/libgcc/unwind.inc:131
> #6  0x77fde84b in __cxa_throw () from 
> /home/psmith/src/static-eh/libmylib.so
> #7  0x77fddecb in MyLib::create () at mylib.cpp:2
> #8  0x00400da4 in main () at myprog.cpp:2
>
> I should note that if I use the GCC 5.4 that comes standard on my OS
> rather than my locally-built version I get identical behavior and
> backtrace (except not as much debuggability of course).  So I don't
> think it's an incorrect build.
>
> If I don't use -static-libstdc++ with my .so then it doesn't fail.  Also
> if I don't use a linker script to hide all the C++ symbols it doesn't
> fail (but of course anyone who links with my shared library will use my
> copy of the STL).
>
>
> Here's a repro case (this shows the problem on my Ubuntu GNOME 16.04
> GNU/Linux system with GCC 5.4 and binutils 2.26.1):
>
> ~$ cat mylib.h
> class MyLib { public: static void create(); };
>
> ~$ cat mylib.cpp
> #include "mylib.h"
> void MyLib::create() { throw 42; }
>
> ~$ cat myprog.cpp
> #include "mylib.h"
> int main() { try { MyLib::create(); } catch (...) { return 0; } return 1; }
>
> ~$ cat ver.map
> { global: _ZN5MyLib6createEv; local: *; };
>
> ~$ g++ -I. -g -fPIC -static-libgcc -static-libstdc++ \
> -Wl,--version-script=ver.map -Wl,-soname=libmylib.so \
> -shared -o libmylib.so mylib.cpp
>
> ~$ g++ -I. -g -fPIC  -L. -Wl,-rpath="\$ORIGIN" -o myprog myprog.cpp \
> -lmylib
>
> ~$ ./myprog
> Aborted (

gcc-6-20170112 is now available

2017-01-12 Thread gccadmin
Snapshot gcc-6-20170112 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/6-20170112/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-6-branch 
revision 244386

You'll find:

 gcc-6-20170112.tar.bz2   Complete GCC

  MD5=61f50d36739a1c324e73c975ae4bf733
  SHA1=2d5729fd2f5aa63e2a3cbd736beff1bc59453132

Diffs from 6-20170105 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.