date:20170111

Re: LTO remapping/deduction of machine modes of types/decls

2017-01-11 Thread Richard Biener

On Tue, 10 Jan 2017, Alexander Monakov wrote:

> On Tue, 10 Jan 2017, Richard Biener wrote:
> > In general I think they should match.  But without seeing concrete 
> > examples of where they do not I can't comment on whether such exceptions
> > make sense.  For example if you adjust a DECLs alignment and then
> > re-layout it I'd expect you might get a non-BLKmode mode for an
> > aggregate in some circumstances -- but then decl and type are not 1:1
> > compatible (due to different alignment), but this case is clearly desired
> > as requiring type copies for the sake of alignment would be wasteful.
> 
> Thanks; Vlad will follow up with (I believe) a different kind of mismatches
> originating in the C++ front-end.
> 
> > > For our original goal, I think we'll switch to the other solution I've 
> > > outlined in the opening mail, i.e. propagating mode tables at WPA stage 
> > > and keeping enough information to know if the section comes from the 
> > > host or native compiler.
> > 
> > So add a hack ontop of the hack?  Ugh.  So why exactly doesn't it
> > already work?  It looks like decls and types have their modes
> > "fixed" with the per-file mode table at WPA time.  So what is missing
> > is to "fix" modes in the per-function sections that are not touched
> > by WPA?
> 
> WPA re-streams packed function bodies as-is, so anything referred to
> from within just the body won't be subject to mode remapping; I think
> only modes of toplevel declarations and functions' arguments will be
> remapped.  And I believe it wouldn't be acceptable to unpack/remap/repack
> function bodies at WPA stage (it's contrary to LTO scalability goal).

Yes indeed.  But this means the mode-maps have to be per function
section (with possibly a way to "share" them?).  Or we need a way
to annotate function sections with "no need to re-map" as the
native nvptx sections don't need remapping and the others all use
the same map?

Richard.

Re: LTO remapping/deduction of machine modes of types/decls

2017-01-11 Thread Alexander Monakov

On Wed, 11 Jan 2017, Richard Biener wrote:
> > WPA re-streams packed function bodies as-is, so anything referred to
> > from within just the body won't be subject to mode remapping; I think
> > only modes of toplevel declarations and functions' arguments will be
> > remapped.  And I believe it wouldn't be acceptable to unpack/remap/repack
> > function bodies at WPA stage (it's contrary to LTO scalability goal).
> 
> Yes indeed.  But this means the mode-maps have to be per function
> section (with possibly a way to "share" them?).  Or we need a way
> to annotate function sections with "no need to re-map" as the
> native nvptx sections don't need remapping and the others all use
> the same map?

Right, the latter: we know that sections coming from the native compiler
already have the right modes and thus need no remapping, and the sections
coming from the host compiler all need remapping (and will use the same
mapping).  Prefixes of per-function section names already carry the distinction
(".gnu.lto_foo" vs. ".gnu.offload_lto_foo").

Alexander

k-byte memset/memcpy/strlen builtins

2017-01-11 Thread Robin Dapp

Hi,

When examining the performance of some test cases on s390 I realized
that we could do better for constructs like 2-byte memcpys or
2-byte/4-byte memsets. Due to some s390-specific architectural
properties, we could be faster by e.g. avoiding excessive unrolling and
using dedicated memory instructions (or similar).

For 1-byte memset/memcpy the builtin functions provide a straightforward
way to achieve this. At first sight it seemed possible to extend
tree-loop-distribution.c to include the additional variants we need.
However, multibyte memsets/memcpys are not covered by the C standard and
I'm therefore unsure if such an approach is preferable or if there are
more idiomatic ways or places where to add the functionality.

The same question goes for 2-byte strlen. I didn't see a recognition
pattern for strlen (apart from optimizations due to known string length
in tree-ssa-strlen.c). Would it make sense to include strlen recognition
and subsequently handling for 2-byte strlen? The situation might of
course more complicated than memset because of encodings etc. My snippet
in question used a fixed-length encoding of 2 bytes, however.

Another simple idea to tackle this would be a peephole optimization but
I'm not sure if this is really feasible for something like memset.
Wouldn't the peephole have to be recursive then?

Regards
 Robin

Re: Help with integrating my test program into dejagnu

2017-01-11 Thread Mike Stump

On Jan 10, 2017, at 9:13 PM, Daniel Santos  wrote:
> I've gotten rid of the Makefile and everything is run now from msabi.exp.  
> I've also gotten rid of the header file, now that I know how to define a 
> "_noinfo" fn pointer, so it's down to just 4 files: msabi.exp, gen.cc, 
> msabi.c and do_test.S.

Sounds better.

> After running using DG_TORTURE_OPTIONS,

But why?  I think you missed what you're testing.  You aren't creating or 
looking for bugs in the optimizer.  Your test case isn't for an optimizer, 
therefore, you should not torture the poor test case.  I think what you are 
testing is argument passing.  That typically is the decision about what bits 
are where, and that is an optimization irrelevant thing to test.

> it became clear that the resulting program was just too big, so I've modified 
> the generator so that the test can be done in little bits.

A sum of little bits always is likely more costly the just one large thing.  I 
don't think there is an economy to be had there, other than the ability to say 
test case 15 fails, and you want a person to be able to drill into test case 15 
by itself without the others around.  With a well structured large test case, 
it should be clear how each subpart can be separated out and run as a single 
small test case.

For example:

test1() { ... }

main() {
  test1();
  test2();
  [ ... ]
}

here, we see that we remove 99% of the test case, and run just a single case.  
Normal, manual edit leaving just one line, and then the transitive closure of 
the one test routine.  I think if you time it, you discover that you can fit in 
more cases this way, then if you break them up; also, if you torture, you can't 
fit in as many cases in the time given.  This is at the heart of why I don't 
think you want to torture.

> Otherwise, the build eats 6GiB+ and takes forever on the final set of flags.

So, one review point will be, is the added time testing at all useful in 
general.  That's an open review point.  The compiler can be easily damaged with 
random edits, but we have fairly good coverage that will catch most of it.  We 
typically don't spend time in the test suite methodically to catch every single 
thing that can go wrong, just the things that usually do go wrong based upon 
reported bugs.  What is the added time in seconds to test on what type of 
machine?

> And now for 50 questions. :)  Am I using DG_TORTURE_OPTIONS correctly

I want to say no.  See above.  No one should ever use it, unless they have a 
very specific well though out reason.  I've not heard the reason in this case.

> or should such a test only exist under gcc.torture?

gcc.torture is for a very narrow and specific type of bug.  There are bugs that 
people that work on the optimizer add for test cases that go though the 
optimizer that they want to ensure that the bug they just fixed, doesn't 
re-appear.  So, the first question, are you working on the optimizer?  If not, 
then it would likely be inappropriate.

> I'm not sure if I'm managing this correctly, as I'm calling pass/fail $subdir 
> after each iteration of the test (should this only be called once?).

No, if you did it, you would call it once per iteration, and you would mix in 
the torture flags to the pass/fail line.  pass "$file.c $torture_option" would 
be the typical, in your code, it would be $generator_args.

> Also, being that the generator is C++, I've added HOSTCXX and HOSTCXXFLAGS to 
> site.exp, I hope that's OK.

Hum.  I worry about the knock on effect of some sort.  Generally I don't like 
adding anything to site.exp unless needed.  In this case, I think it'd be fine. 
 It is the most simple and direct way to do it.

> Finally, would you please look at my runtest_msabi procedure to make sure 
> that I'm doing the build correctly?  I'm using "remote_exec build" for most 
> of it and I'm not 100% certain if that is the correct way to do it.

Yeah, close enough to likely not worry about it too much.  If you wanted to 
improve it, then next step would be to remove the isnative part and finish the 
code for cross builds, and just after that finish the code for canadian cross 
builds.  A canadian cross is one in which the build machine and the host 
machine are different.  With the isnative, you can get the details of 
host/build and target machine completely wrong and pay no price for it.  Once 
you remove it, you then have understand which code works for which system and 
ensure it works.  A cross build loosely, is one in which the target machine and 
the host machine are different.  The reason why I suggested isnative, is then 
you don't have to worry about it, and you can punt the finishing to a cross or 
canadian cross person.  For them, it is rather trivial to clean up the test 
case to get it to work i a cross environment.  Without testing, it is easy 
enough to get wrong.  Also, for them, testing it is then trivial.  If you can 
find someone that can test in a cross environment and report back if it works

Re: k-byte memset/memcpy/strlen builtins

2017-01-11 Thread Richard Biener

On January 11, 2017 5:16:43 PM GMT+01:00, Robin Dapp  
wrote:
>Hi,
>
>When examining the performance of some test cases on s390 I realized
>that we could do better for constructs like 2-byte memcpys or
>2-byte/4-byte memsets. Due to some s390-specific architectural
>properties, we could be faster by e.g. avoiding excessive unrolling and
>using dedicated memory instructions (or similar).

Not sure why you mention memcpy, how does that depend on 'element size'?

>For 1-byte memset/memcpy the builtin functions provide a
>straightforward
>way to achieve this. At first sight it seemed possible to extend
>tree-loop-distribution.c to include the additional variants we need.
>However, multibyte memsets/memcpys are not covered by the C standard
>and
>I'm therefore unsure if such an approach is preferable or if there are
>more idiomatic ways or places where to add the functionality.

Yes, for memset with larger element we could add an optab plus internal 
function combination and use that when the target wants.  Or always use such 
IFN and fall back to loopy expansion.

>The same question goes for 2-byte strlen. I didn't see a recognition
>pattern for strlen (apart from optimizations due to known string length
>in tree-ssa-strlen.c). Would it make sense to include strlen
>recognition
>and subsequently handling for 2-byte strlen? The situation might of

I'd say a multibyte memchr might make sense, but strlen specifically?  Not sure.

Likewise multibyte memcmp.

Richard.

>course more complicated than memset because of encodings etc. My
>snippet
>in question used a fixed-length encoding of 2 bytes, however.
>
>Another simple idea to tackle this would be a peephole optimization but
>I'm not sure if this is really feasible for something like memset.
>Wouldn't the peephole have to be recursive then?
>
>Regards
> Robin

Re: k-byte memset/memcpy/strlen builtins

2017-01-11 Thread Aaron Sawdey

On Wed, 2017-01-11 at 17:16 +0100, Robin Dapp wrote:
> Hi,

Hi Robin,
  I thought I'd share some of what I've run into while doing similar
things for the rs6000 target.

First off, be aware that glibc does some macro expansion things to try
to handle 1/2/3 byte string operations in some cases.

Secondly, the way I approached this was to use the patterns 
defined in optabs.def for these things:

OPTAB_D (cmpmem_optab, "cmpmem$a")
OPTAB_D (cmpstr_optab, "cmpstr$a")
OPTAB_D (cmpstrn_optab, "cmpstrn$a")
OPTAB_D (movmem_optab, "movmem$a")
OPTAB_D (setmem_optab, "setmem$a")
OPTAB_D (strlen_optab, "strlen$a")

If you define movmemsi, that should get used by expand_builtin_memcpy
for any memcpy call that it sees.

The constraints I was able to find when implementing cmpmemsi for
memcmp were:
 * don't compare past the given length (obviously)
 * don't read past the given length
 * except it's ok to do so if you can prove via alignment or
   runtime check that you are not going to cause a pagefault.
   Not crossing a 4k boundary seems to be generally viewed as
   acceptable.

I would recommend looking at preprocessed code to make sure no funny
business is happening, and then look at your .md files. It looks to me
like s390 has got both movmem and strlen patterns there already.

If I understand correctly you are wanting to do multi-byte characters.
Seems to me you need to follow the path Richard Biener suggests and
make optab expansions that handle wider chars and then perhaps map
wcslen et. al. to them?

   Aaron
> 
> For 1-byte memset/memcpy the builtin functions provide a
> straightforward
> way to achieve this. At first sight it seemed possible to extend
> tree-loop-distribution.c to include the additional variants we need.
> However, multibyte memsets/memcpys are not covered by the C standard
> and
> I'm therefore unsure if such an approach is preferable or if there
> are
> more idiomatic ways or places where to add the functionality.
> 
> The same question goes for 2-byte strlen. I didn't see a recognition
> pattern for strlen (apart from optimizations due to known string
> length
> in tree-ssa-strlen.c). Would it make sense to include strlen
> recognition
> and subsequently handling for 2-byte strlen? The situation might of
> course more complicated than memset because of encodings etc. My
> snippet
> in question used a fixed-length encoding of 2 bytes, however.
> 
> Another simple idea to tackle this would be a peephole optimization
> but
> I'm not sure if this is really feasible for something like memset.
> Wouldn't the peephole have to be recursive then?
> 
> Regards
>  Robin
> 
-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

Re: Worse code after bbro?

2017-01-11 Thread Segher Boessenkool

On Thu, Jan 05, 2017 at 07:39:21PM +0100, Jan Hubicka wrote:
> In fact cfglayout was invented to implement bb-reorder originally :)

So, hrm, are there any passes we *want* to do in non-cfglayout mode?


Segher

Re: Help with integrating my test program into dejagnu

2017-01-11 Thread Joseph Myers

A test [istarget x86_64-*-gnu] is wrong; i?86-* -m64 should always be 
handled exactly the same as x86_64-* -m64.

You need to work out which ABIs (-m32, -mx32, -m64) this testing is 
meaningful for.  Then, allow both x86_64- and i?86- targets, together with 
an appropriate effective-target test ("lp64" to allow just -m64, for 
example).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Help with integrating my test program into dejagnu

2017-01-11 Thread Daniel Santos


On 01/11/2017 12:25 PM, Joseph Myers wrote:

A test [istarget x86_64-*-gnu] is wrong; i?86-* -m64 should always be
handled exactly the same as x86_64-* -m64.

You need to work out which ABIs (-m32, -mx32, -m64) this testing is
meaningful for.  Then, allow both x86_64- and i?86- targets, together with
an appropriate effective-target test ("lp64" to allow just -m64, for
example).
Thank your help!  (testsuite/target-supports.exp is quite a library!)  
So this test is 64-bit only and makes heavy use of gcc extensions 
(mostly attributes ms_abi).  It's aim is to test pro & epilogue creation 
for 64-bit ms_abi functions that call sysv_abi functions (these are the 
ones that result in the massive SSE clobbers).  I do not believe msvc 
supports sysv_abi functions, so the test the test can't be built there.  
As such, I'm only intending to run this when gcc is being tested on 
64-bit x86 platforms.  The resulting program must be executed on the 
target as well, so I was using [isnative].  Would this then be the 
correct test?


if { (![istarget x86_64-*] && ![istarget i?86-*])
 || ![is-effective-target lp64] || ![isnative] } then {
unsupported "$subdir"
return
}

Thanks,
Daniel

Throwing exceptions from a .so linked with -static-lib* ?

2017-01-11 Thread Paul Smith

TL;DR:
I have an issue where if I have a .so linked with -static-lib* making
all STL symbols private, and if I throw an exception out of that .so to
be caught by the caller, then I get a SIGABRT from a gcc_assert() down
in the guts of the signal handling:

#0  0x7773a428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x7773c02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x0040e938 in _Unwind_SetGR (context=, 
index=, val=) at 
/usr/src/cc/gcc-6.2.0/libgcc/unwind-dw2.c:271
271   gcc_assert (index < (int) sizeof(dwarf_reg_size_table));

Should it be possible to do this successfully or am I doomed to failure?
More details and a test case below.


More detail:

I'm trying to distribute a shared library built with the latest version
of C++ (well, GCC 6.2 with C++14) on GNU/Linux.  I compile it with an
older sysroot, taken from RHEL 6.3 (glibc 2.12) so it will run on older
systems.

My .so is written in C++ and programs that link it will also be written
in C++ although they may be compiled and linked with potentially much
older versions of GCC (like, 4.9 etc.)  I'm not worried about programs
compiled with clang or whatever at this point.

Because I want to use new C++ but want users to be able to use my .so on
older systems, I link with -static-libgcc -static-libstdc++.  Because I
don't want to worry about security issues etc. in system libraries, I
don't link anything else statically.

I also use a linker script to force all symbols (even libstdc++ symbols)
to be private to my shared library except the ones I want to publish.

Using "nm | grep ' [A-TV-Z] '" I can see that no other symbols besides
mine are public.

However, if my library throws an exception which I expect to be handled
by the program linking my library, then I get a SIGABRT, as above; the
full backtrace is:

#0  0x7773a428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x7773c02a in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x0040e938 in _Unwind_SetGR (context=, 
index=, val=) at 
/usr/src/cc/gcc-6.2.0/libgcc/unwind-dw2.c:271
#3  0x004012a2 in __gxx_personality_v0 ()
#4  0x77feb903 in _Unwind_RaiseException_Phase2 
(exc=exc@entry=0x43b890, context=context@entry=0x7fffe330) at 
/usr/src/cc/gcc-6.2.0/libgcc/unwind.inc:62
#5  0x77febf8a in _Unwind_RaiseException (exc=0x43b890) at 
/usr/src/cc/gcc-6.2.0/libgcc/unwind.inc:131
#6  0x77fde84b in __cxa_throw () from 
/home/psmith/src/static-eh/libmylib.so
#7  0x77fddecb in MyLib::create () at mylib.cpp:2
#8  0x00400da4 in main () at myprog.cpp:2

I should note that if I use the GCC 5.4 that comes standard on my OS
rather than my locally-built version I get identical behavior and
backtrace (except not as much debuggability of course).  So I don't
think it's an incorrect build.

If I don't use -static-libstdc++ with my .so then it doesn't fail.  Also
if I don't use a linker script to hide all the C++ symbols it doesn't
fail (but of course anyone who links with my shared library will use my
copy of the STL).


Here's a repro case (this shows the problem on my Ubuntu GNOME 16.04
GNU/Linux system with GCC 5.4 and binutils 2.26.1):

~$ cat mylib.h
class MyLib { public: static void create(); };

~$ cat mylib.cpp
#include "mylib.h"
void MyLib::create() { throw 42; }

~$ cat myprog.cpp
#include "mylib.h"
int main() { try { MyLib::create(); } catch (...) { return 0; } return 1; }

~$ cat ver.map
{ global: _ZN5MyLib6createEv; local: *; };

~$ g++ -I. -g -fPIC -static-libgcc -static-libstdc++ \
-Wl,--version-script=ver.map -Wl,-soname=libmylib.so \
-shared -o libmylib.so mylib.cpp

~$ g++ -I. -g -fPIC  -L. -Wl,-rpath="\$ORIGIN" -o myprog myprog.cpp \
-lmylib

~$ ./myprog
Aborted (core dumped)

Now if I rebuild without the --version-script argument or without
-static-libstdc++, I get success as expected:

~$ ./myprog 

~$ echo $?
0

Re: LTO remapping/deduction of machine modes of types/decls

Re: LTO remapping/deduction of machine modes of types/decls

k-byte memset/memcpy/strlen builtins

Re: Help with integrating my test program into dejagnu

Re: k-byte memset/memcpy/strlen builtins

Re: k-byte memset/memcpy/strlen builtins

Re: Worse code after bbro?

Re: Help with integrating my test program into dejagnu

Re: Help with integrating my test program into dejagnu

Throwing exceptions from a .so linked with -static-lib* ?

10 matches

Site Navigation

Mail list logo

Footer information