Re: Loading plugins with arm-none-eabi-gcc

2020-07-22 Thread Andrew Pinski via Gcc
On Tue, Jul 21, 2020 at 11:25 PM Shuai Wang via Gcc  wrote:
>
> Hello,
>
> I am currently trying to migrate a gcc plugin that has been well developed
> for x86 code to ARM platform (for arm-none-eabi-gcc).
>
> Currently I did the following steps:
>
> 1. write a hello world program t.c
>
> 2. compile with the following commands:
>
> ➜  arm-none-eabi-gcc -v
>  ..
>  gcc version 9.3.1 20200408 (release) (GNU Arm Embedded Toolchain
> 9-2020-q2-update)
>
> ➜  arm-none-eabi-gcc -S -mcpu=cortex-m3 -mthumb -fdump-tree-all t.c
>
> It works fine, and can smoothly print out all gimple code at different
> stages.
>
> 3. Load my plugin (the plugin is compiled by x64 gcc version 10.0):
>
> ➜  file instrument_san_cov.so
> instrument_san_cov.so: ELF 64-bit LSB shared object, x86-64, version 1
> (SYSV), dynamically linked, with debug_info, not stripped
> ➜  file arm-none-eabi-gcc
> arm-none-eabi-gcc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
> dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux
> 2.6.24, BuildID[sha1]=fbadd6adc8607f595caeccae919f3bab9df2d7a6, stripped
>
> ➜  arm-none-eabi-gcc -fplugin=./instrument_cov.so -S -mcpu=cortex-m3
> -mthumb -fdump-tree-all t.c
> cc1: error: cannot load plugin ./instrument_cov.so
>./instrument_cov.so: undefined symbol:
> _Z20build_string_literaliPKcP9tree_nodem
>
> ➜  c++filt -n _Z20build_string_literaliPKcP9tree_nodem
> build_string_literal(int, char const*, tree_node*, unsigned long)
>
>
> It seems that somewhat a function named `build_string_literal` cannot be
> found. Why is that? I have no idea how to proceed on this matter and cannot
> find some proper documents. Any suggestion would be appreciated. Thank you!

Did you compile your plugin with the headers from the GCC that you are
using to load the plugin into?
If not, then it won't work.  Note build_string_literal changed between
GCC 9 and GCC 10 in the source and GCC plugin ABI is not stable
between releases at all.

Thanks,
Andrew

>
> Best,
> Shuai


Re: Three issues

2020-07-22 Thread David Malcolm via Gcc
On Tue, 2020-07-21 at 22:49 +, Gary Oblock via Gcc wrote:
> Some background:
> 
> This is in the dreaded structure reorganization optimization that I'm
> working on. It's running at LTRANS time with '-flto-partition=one'.
> 
> My issues in order of importance are:
> 
> 1) In gimple-ssa.h, the equal method for ssa_name_hasher
> has a segfault because the "var" field of "a" is (nil).
> 
> struct ssa_name_hasher : ggc_ptr_hash
> {
>   /* Hash a tree in a uid_decl_map.  */
> 
>   static hashval_t
>   hash (tree item)
>   {
> return item->ssa_name.var->decl_minimal.uid;
>   }
> 
>   /* Return true if the DECL_UID in both trees are equal.  */
> 
>   static bool
>   equal (tree a, tree b)
>   {
>   return (a->ssa_name.var->decl_minimal.uid == b->ssa_name.var-
> >decl_minimal.uid);
>   }
> };

I notice that tree.h has:

/* Returns the variable being referenced.  This can be NULL_TREE for
   temporaries not associated with any user variable.
   Once released, this is the only field that can be relied upon.  */
#define SSA_NAME_VAR(NODE)  \
  (SSA_NAME_CHECK (NODE)->ssa_name.var == NULL_TREE \
   || TREE_CODE ((NODE)->ssa_name.var) == IDENTIFIER_NODE   \
   ? NULL_TREE : (NODE)->ssa_name.var)

So presumably that ssa_name_hasher is making an implicit assumption
that such temporaries aren't present in the hash_table; maybe they are
for yours?

Is this a hash_table that you're populating yourself?

With the caveat that I'm sleep-deprived, another way this could happen
is if "a" is not an SSA_NAME but is in fact some other kind of tree;
you could try replacing
  a->ssa_name.ver
with
  SSA_NAME_CHECK (a)->ssa_name.var
(and similarly for b)

But the first explanation seems more likely.


> 
[...snip qn 2...]


> 3) For my bug in (1) I got so distraught that I ran valgrind which
> in my experience is an act of desperation for compilers.
> 
> None of the errors it spotted are associated with my optimization
> (although it oh so cleverly pointed out the segfault) however it
> showed the following:
> 
> ==18572== Invalid read of size 8
> ==18572==at 0x1079DC1: execute_one_pass(opt_pass*)
> (passes.c:2550)

What is line 2550 of passes.c in your working copy?

==18572==by 0x107ABD3: execute_ipa_pass_list(opt_pass*)
> (passes.c:2929)
> ==18572==by 0xAC0E52: symbol_table::compile() (cgraphunit.c:2786)
> ==18572==by 0x9915A9: lto_main() (lto.c:653)
> ==18572==by 0x11EE4A0: compile_file() (toplev.c:458)
> ==18572==by 0x11F1888: do_compile() (toplev.c:2302)
> ==18572==by 0x11F1BA3: toplev::main(int, char**) (toplev.c:2441)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==  Address 0x5842880 is 16 bytes before a block of size 88
> alloc'd
> ==18572==at 0x4C3017F: operator new(unsigned long) (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==18572==by 0x21E00B7: make_pass_ipa_prototype(gcc::context*)
> (ipa-prototype.c:329)

You say above that none of the errors are associated with your
optimization, but presumably this is your new pass, right?  Can you
post the code somewhere?

> ==18572==by 0x106E987:
> gcc::pass_manager::pass_manager(gcc::context*) (pass-
> instances.def:178)
> ==18572==by 0x11EFCE8: general_init(char const*, bool)
> (toplev.c:1250)
> ==18572==by 0x11F1A86: toplev::main(int, char**) (toplev.c:2391)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==
> 
> Are these known issues with lto or is this a valgrind issue?

Hope this is helpful
Dave



Re: Loading plugins with arm-none-eabi-gcc

2020-07-22 Thread Shuai Wang via Gcc
Hey Andrew,

Thanks a lot for getting back to me. No I am not. Let me clarify the
context here:

1. On my Ubuntu (x86-64 version), I use x86 gcc (version 10.0) to
compile this plugin, and test this plugin on various programs' GIMPLE code
during its compilation with x86 gcc (version 10.0).

2. Then, I switched to use arm-none-eabi-gcc to load this plugin, and
encountered the above issue.

3. Since I am doing a cross-platform compilation (on Ubuntu x86), I am
anticipating to NOT directly compile my plugin (as a typical .so shared
library) into an ARM library, right? Otherwise it cannot be loaded and
executed on x86 Ubuntu, right?

4. Then it seems to me that still, the proper way is to compile a x86
plugin, and then somewhat use the arm-none-eabi-gcc to load the plugin
during cross architecture compilation?

Best,
Shuai



On Wed, Jul 22, 2020 at 3:20 PM Andrew Pinski  wrote:

> On Tue, Jul 21, 2020 at 11:25 PM Shuai Wang via Gcc 
> wrote:
> >
> > Hello,
> >
> > I am currently trying to migrate a gcc plugin that has been well
> developed
> > for x86 code to ARM platform (for arm-none-eabi-gcc).
> >
> > Currently I did the following steps:
> >
> > 1. write a hello world program t.c
> >
> > 2. compile with the following commands:
> >
> > ➜  arm-none-eabi-gcc -v
> >  ..
> >  gcc version 9.3.1 20200408 (release) (GNU Arm Embedded Toolchain
> > 9-2020-q2-update)
> >
> > ➜  arm-none-eabi-gcc -S -mcpu=cortex-m3 -mthumb -fdump-tree-all t.c
> >
> > It works fine, and can smoothly print out all gimple code at different
> > stages.
> >
> > 3. Load my plugin (the plugin is compiled by x64 gcc version 10.0):
> >
> > ➜  file instrument_san_cov.so
> > instrument_san_cov.so: ELF 64-bit LSB shared object, x86-64, version 1
> > (SYSV), dynamically linked, with debug_info, not stripped
> > ➜  file arm-none-eabi-gcc
> > arm-none-eabi-gcc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
> > dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for
> GNU/Linux
> > 2.6.24, BuildID[sha1]=fbadd6adc8607f595caeccae919f3bab9df2d7a6, stripped
> >
> > ➜  arm-none-eabi-gcc -fplugin=./instrument_cov.so -S -mcpu=cortex-m3
> > -mthumb -fdump-tree-all t.c
> > cc1: error: cannot load plugin ./instrument_cov.so
> >./instrument_cov.so: undefined symbol:
> > _Z20build_string_literaliPKcP9tree_nodem
> >
> > ➜  c++filt -n _Z20build_string_literaliPKcP9tree_nodem
> > build_string_literal(int, char const*, tree_node*, unsigned long)
> >
> >
> > It seems that somewhat a function named `build_string_literal` cannot be
> > found. Why is that? I have no idea how to proceed on this matter and
> cannot
> > find some proper documents. Any suggestion would be appreciated. Thank
> you!
>
> Did you compile your plugin with the headers from the GCC that you are
> using to load the plugin into?
> If not, then it won't work.  Note build_string_literal changed between
> GCC 9 and GCC 10 in the source and GCC plugin ABI is not stable
> between releases at all.
>
> Thanks,
> Andrew
>
> >
> > Best,
> > Shuai
>


Re: New x86-64 micro-architecture levels

2020-07-22 Thread Jan Beulich
On 21.07.2020 20:04, Florian Weimer wrote:
> * Premachandra Mallappa:
> 
>> [AMD Public Use]
>>
>> Hi Floarian,
>>
>>> I'm including a proposal for the levels below.  I use single letters for 
>>> them, but I expect that the concrete implementation of this proposal will 
>>> use 
>>> names like “x86-100”, “x86-101”, like in the glibc patch referenced above.  
>>> (But we can discuss other approaches.)
>>
>> Personally I am not a big fan of this, for 2 reasons 
>> 1. uses just x86 in name on x86_64 as well
> 
> That's deliberate, so that we can use the same x86-* names for 32-bit
> library selection (once we define matching micro-architecture levels
> there).

While indeed I did understand it to be deliberate, in the light of
64-bit only ISA extensions (like AMX, and I suspect we're going to
see more) I nevertheless think Premachandra has a point here.

Jan


Re: Loading plugins with arm-none-eabi-gcc

2020-07-22 Thread Andrew Pinski via Gcc
On Wed, Jul 22, 2020 at 12:45 AM Shuai Wang  wrote:
>
> Hey Andrew,
>
> Thanks a lot for getting back to me. No I am not. Let me clarify the context 
> here:
>
> 1. On my Ubuntu (x86-64 version), I use x86 gcc (version 10.0) to compile 
> this plugin, and test this plugin on various programs' GIMPLE code during its 
> compilation with x86 gcc (version 10.0).
>
> 2. Then, I switched to use arm-none-eabi-gcc to load this plugin, and 
> encountered the above issue.

Right because you did not recompile the plugin to use the headers of
arm-none-eabi-gcc compiler.  You need to recompile the plugin for that
compiler using the native GCC you compiled the compiler with; that is
you might need to recompile the compiler too.
There is no stable plugin API/ABI here and that is what you are running into.

Thanks,
Andrew

>
> 3. Since I am doing a cross-platform compilation (on Ubuntu x86), I am 
> anticipating to NOT directly compile my plugin (as a typical .so shared 
> library) into an ARM library, right? Otherwise it cannot be loaded and 
> executed on x86 Ubuntu, right?
>
> 4. Then it seems to me that still, the proper way is to compile a x86 plugin, 
> and then somewhat use the arm-none-eabi-gcc to load the plugin during cross 
> architecture compilation?
>
> Best,
> Shuai
>
>
>
> On Wed, Jul 22, 2020 at 3:20 PM Andrew Pinski  wrote:
>>
>> On Tue, Jul 21, 2020 at 11:25 PM Shuai Wang via Gcc  wrote:
>> >
>> > Hello,
>> >
>> > I am currently trying to migrate a gcc plugin that has been well developed
>> > for x86 code to ARM platform (for arm-none-eabi-gcc).
>> >
>> > Currently I did the following steps:
>> >
>> > 1. write a hello world program t.c
>> >
>> > 2. compile with the following commands:
>> >
>> > ➜  arm-none-eabi-gcc -v
>> >  ..
>> >  gcc version 9.3.1 20200408 (release) (GNU Arm Embedded Toolchain
>> > 9-2020-q2-update)
>> >
>> > ➜  arm-none-eabi-gcc -S -mcpu=cortex-m3 -mthumb -fdump-tree-all t.c
>> >
>> > It works fine, and can smoothly print out all gimple code at different
>> > stages.
>> >
>> > 3. Load my plugin (the plugin is compiled by x64 gcc version 10.0):
>> >
>> > ➜  file instrument_san_cov.so
>> > instrument_san_cov.so: ELF 64-bit LSB shared object, x86-64, version 1
>> > (SYSV), dynamically linked, with debug_info, not stripped
>> > ➜  file arm-none-eabi-gcc
>> > arm-none-eabi-gcc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
>> > dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux
>> > 2.6.24, BuildID[sha1]=fbadd6adc8607f595caeccae919f3bab9df2d7a6, stripped
>> >
>> > ➜  arm-none-eabi-gcc -fplugin=./instrument_cov.so -S -mcpu=cortex-m3
>> > -mthumb -fdump-tree-all t.c
>> > cc1: error: cannot load plugin ./instrument_cov.so
>> >./instrument_cov.so: undefined symbol:
>> > _Z20build_string_literaliPKcP9tree_nodem
>> >
>> > ➜  c++filt -n _Z20build_string_literaliPKcP9tree_nodem
>> > build_string_literal(int, char const*, tree_node*, unsigned long)
>> >
>> >
>> > It seems that somewhat a function named `build_string_literal` cannot be
>> > found. Why is that? I have no idea how to proceed on this matter and cannot
>> > find some proper documents. Any suggestion would be appreciated. Thank you!
>>
>> Did you compile your plugin with the headers from the GCC that you are
>> using to load the plugin into?
>> If not, then it won't work.  Note build_string_literal changed between
>> GCC 9 and GCC 10 in the source and GCC plugin ABI is not stable
>> between releases at all.
>>
>> Thanks,
>> Andrew
>>
>> >
>> > Best,
>> > Shuai


Re: New x86-64 micro-architecture levels

2020-07-22 Thread Florian Weimer via Gcc
* Dongsheng Song:

> I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
> recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
> python's platform tags (e.g. manylinux2010, manylinux2014).

I started out with a year number, but that was before the was Level A.
Too many new CPUs only fall under level A unfortunately because they do
not even have AVX.  This even applies to some new server CPU designs
released this year.

I'm concerned that putting a year into the level name suggests that
everything main-stream released after that year supports that level, and
that's not true.  I think for manylinux, it's different, and it actually
works out there.  No one is building a new GNU/Linux distribution that
is based on glibc 2.12 today, for example.  But not so much for x86
CPUs.

If you think my worry is unfounded, then a year-based approach sounds
compelling.

Thanks,
Florian



Re: New x86-64 micro-architecture levels

2020-07-22 Thread Richard Biener via Gcc
On Wed, Jul 22, 2020 at 10:58 AM Florian Weimer via Gcc  wrote:
>
> * Dongsheng Song:
>
> > I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
> > recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
> > python's platform tags (e.g. manylinux2010, manylinux2014).
>
> I started out with a year number, but that was before the was Level A.
> Too many new CPUs only fall under level A unfortunately because they do
> not even have AVX.  This even applies to some new server CPU designs
> released this year.
>
> I'm concerned that putting a year into the level name suggests that
> everything main-stream released after that year supports that level, and
> that's not true.  I think for manylinux, it's different, and it actually
> works out there.  No one is building a new GNU/Linux distribution that
> is based on glibc 2.12 today, for example.  But not so much for x86
> CPUs.
>
> If you think my worry is unfounded, then a year-based approach sounds
> compelling.

I think the main question is whether those levels are supposed to be
an implementation detail hidden from most software developer or
if people are expected to make concious decisions between
-march=x86-100 and -march=x86-101.  Implementation detail
for system integrators, that is.

If it's not merely an implementation detail then names without
any chance of providing false hints (x86-2014 - oh, it will
run fine on the CPU I bought in 2015; or, x86-avx2 - ah, of
course I want avx2) is better.  But this also means this feature
should come with extensive documentation on how it is
supposed to be used.  For example we might suggest ISVs
provide binaries for all architecture levels or use IFUNCs
or other runtime CPU selection capabilities.  It's also required
to provide a (extensive?) list of SKUs that fall into the respective
categories (probably up to CPU vendors to amend those).
Since this is a feature crossing multiple projects - at least
glibc and GCC - sharing the source of said documentation
would be important.

So for the bike-shedding I indeed think x86-10{0,1,2,3}
or x86-{A,B,C,..}, eventually duplicating as x86_64- as
suggested by Jan is better than x86-2014 or x86-avx2.

Richard.

> Thanks,
> Florian
>


Re: Three issues

2020-07-22 Thread Richard Biener via Gcc
On Wed, Jul 22, 2020 at 12:51 AM Gary Oblock via Gcc  wrote:
>
> Some background:
>
> This is in the dreaded structure reorganization optimization that I'm
> working on. It's running at LTRANS time with '-flto-partition=one'.
>
> My issues in order of importance are:
>
> 1) In gimple-ssa.h, the equal method for ssa_name_hasher
> has a segfault because the "var" field of "a" is (nil).
>
> struct ssa_name_hasher : ggc_ptr_hash
> {
>   /* Hash a tree in a uid_decl_map.  */
>
>   static hashval_t
>   hash (tree item)
>   {
> return item->ssa_name.var->decl_minimal.uid;
>   }
>
>   /* Return true if the DECL_UID in both trees are equal.  */
>
>   static bool
>   equal (tree a, tree b)
>   {
>   return (a->ssa_name.var->decl_minimal.uid == 
> b->ssa_name.var->decl_minimal.uid);
>   }
> };
>
> The parameter "a" is associated with "*entry" on the 2nd to last
> line shown (it's trimmed off after that.) This from hash-table.h:
>
> template template class Allocator>
> typename hash_table::value_type &
> hash_table
> ::find_with_hash (const compare_type &comparable, hashval_t hash)
> {
>   m_searches++;
>   size_t size = m_size;
>   hashval_t index = hash_table_mod1 (hash, m_size_prime_index);
>
>   if (Lazy && m_entries == NULL)
> m_entries = alloc_entries (size);
>
> #if CHECKING_P
>   if (m_sanitize_eq_and_hash)
> verify (comparable, hash);
> #endif
>
>   value_type *entry = &m_entries[index];
>   if (is_empty (*entry)
>   || (!is_deleted (*entry) && Descriptor::equal (*entry, comparable)))
> return *entry;
>   .
>   .
>
> Is there any way this could happen other than by a memory corruption
> of some kind? This is a show stopper for me and I really need some help on
> this issue.
>
> 2) I tried to dump out all the gimple in the following way at the very
> beginning of my program:
>
> void
> print_program ( FILE *file, int leading_space )
> {
>   struct cgraph_node *node;
>   fprintf ( file, "%*sProgram:\n", leading_space, "");
>
>   // Print Global Decls
>   //
>   varpool_node *var;
>   FOR_EACH_VARIABLE ( var)
>   {
> tree decl = var->decl;
> fprintf ( file, "%*s", leading_space, "");
> print_generic_decl ( file, decl, (dump_flags_t)0);
> fprintf ( file, "\n");
>   }
>
>   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
>   {
> struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
> dump_function_header ( file, func->decl, (dump_flags_t)0);
> dump_function_to_file ( func->decl, file, (dump_flags_t)0);
>   }
> }
>
> When I run this the first two (out of three) functions print
> just fine. However, for the third, func->decl is (nil) and
> it segfaults.
>
> Now the really odd thing is that this works perfectly at the
> end or middle of my optimization.
>
> What gives?
>
> 3) For my bug in (1) I got so distraught that I ran valgrind which
> in my experience is an act of desperation for compilers.
>
> None of the errors it spotted are associated with my optimization
> (although it oh so cleverly pointed out the segfault) however it
> showed the following:
>
> ==18572== Invalid read of size 8
> ==18572==at 0x1079DC1: execute_one_pass(opt_pass*) (passes.c:2550)
> ==18572==by 0x107ABD3: execute_ipa_pass_list(opt_pass*) (passes.c:2929)
> ==18572==by 0xAC0E52: symbol_table::compile() (cgraphunit.c:2786)
> ==18572==by 0x9915A9: lto_main() (lto.c:653)
> ==18572==by 0x11EE4A0: compile_file() (toplev.c:458)
> ==18572==by 0x11F1888: do_compile() (toplev.c:2302)
> ==18572==by 0x11F1BA3: toplev::main(int, char**) (toplev.c:2441)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==  Address 0x5842880 is 16 bytes before a block of size 88 alloc'd
> ==18572==at 0x4C3017F: operator new(unsigned long) (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==18572==by 0x21E00B7: make_pass_ipa_prototype(gcc::context*) 
> (ipa-prototype.c:329)
> ==18572==by 0x106E987: gcc::pass_manager::pass_manager(gcc::context*) 
> (pass-instances.def:178)
> ==18572==by 0x11EFCE8: general_init(char const*, bool) (toplev.c:1250)
> ==18572==by 0x11F1A86: toplev::main(int, char**) (toplev.c:2391)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==
>
> Are these known issues with lto or is this a valgrind issue?

It smells like you are modifying IL via APIs that rely on cfun set to the
function you are modifying.  Note such API dependence might be not
obvious so it's advisable to do

 push_cfun (function to modify);
... modify IL of function ...
 pop_cfun ();

note push/pop_cfun can be expensive so try to glob function modifications.
That said, the underlying issue is likely garbage collector related - try
building with --enable-valgrind-annotations which makes valgrind a bit more
GCC GC aware.

Richard.

> Thanks,
>
> Gary
>
>
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is 
> for the sole use of the intended recipient(s) and contains information that 
> is confidential and proprietary to Ampere Computing or its s

Re: New x86-64 micro-architecture levels

2020-07-22 Thread Florian Weimer via Gcc
* Richard Biener:

> On Wed, Jul 22, 2020 at 10:58 AM Florian Weimer via Gcc  
> wrote:
>>
>> * Dongsheng Song:
>>
>> > I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
>> > recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
>> > python's platform tags (e.g. manylinux2010, manylinux2014).
>>
>> I started out with a year number, but that was before the was Level A.
>> Too many new CPUs only fall under level A unfortunately because they do
>> not even have AVX.  This even applies to some new server CPU designs
>> released this year.
>>
>> I'm concerned that putting a year into the level name suggests that
>> everything main-stream released after that year supports that level, and
>> that's not true.  I think for manylinux, it's different, and it actually
>> works out there.  No one is building a new GNU/Linux distribution that
>> is based on glibc 2.12 today, for example.  But not so much for x86
>> CPUs.
>>
>> If you think my worry is unfounded, then a year-based approach sounds
>> compelling.
>
> I think the main question is whether those levels are supposed to be
> an implementation detail hidden from most software developer or
> if people are expected to make concious decisions between
> -march=x86-100 and -march=x86-101.  Implementation detail
> for system integrators, that is.

Anyone who wants to optimize their software something that's more
current than what was available in 2003 has to think about this in some
form.

With these levels, I hope to provide a pre-packaged set of choices, with
a consistent user interface, in the sense that -march= options and file
system locations match.  Programmers will definitely encounter these
strings, and they need to know what they mean for their users.  We need
to provide them with the required information so that they can make
decisions based on their knowledge of their user base.  But the ultimate
decision really has to be a programmer choice.

I'm not sure if GCC documentation or glibc documentation would be the
right place for this.  An online resource that can be linked to directly
seems more appropriate.

Apart from that, there is the more limited audience of general purpose
distribution builders.  I expect they will pick one of these levels to
build all the distribution binaries, unless they want to be stuck in
2003.  But as long they do not choose the highest level defined,
programmers might still want to provide optimized library builds for
run-time selection, and then they need the same guidance as before.

> If it's not merely an implementation detail then names without
> any chance of providing false hints (x86-2014 - oh, it will
> run fine on the CPU I bought in 2015; or, x86-avx2 - ah, of
> course I want avx2) is better.  But this also means this feature
> should come with extensive documentation on how it is
> supposed to be used.  For example we might suggest ISVs
> provide binaries for all architecture levels or use IFUNCs
> or other runtime CPU selection capabilities.

I think we should document the mechanism as best as we can, and provide
intended use cases.  We shouldn't go as far as to tell programmers what
library versions they must build, except that they should always include
a fallback version if no optimized library can be selected.

Describing the interactions with IFUNCs also makes sense.

But I think we should not go overboard with this.  Historically, we've
done not such a great job with documenting toolchain features, I know,
and we should do better now.  I will try to write something helpful, but
it should still match the relative importance of this feature.

> It's also required to provide a (extensive?) list of SKUs that fall
> into the respective categories (probably up to CPU vendors to amend
> those).

I'm afraid, but SKUs are not very useful in this context.
Virtualization can disable features (e.g., some cloud providers
advertise they use certain SKUs, but some features are not available to
guests), and firmware updates have done so as well.  I think the only
way is to document our selection criteria, and encourage CPU vendors to
enhance their SKU browsers so that you can search by the (lack of)
support for certain CPU features.

The selection criteria I suggested should not be affected by firmware
and microcode updates at least (I took that into consideration), but
it's just not possible to achieve virtualization and kernel version
independence, given that some features based on which we want to make
library selections demand kernel and hypervisor support.

> Since this is a feature crossing multiple projects - at least
> glibc and GCC - sharing the source of said documentation
> would be important.

Technically, the GCC web site would work for me.  It's not a wiki.  It's
not CVS.  We can update it outside of release cycle.  We are not forced
to use the GFDL with Invariant Sections.  It doesn't end up in our
product documentation, where it would be confusing if it discusses
unsupported

Re: New x86-64 micro-architecture levels

2020-07-22 Thread Florian Weimer via Gcc
* Jan Beulich:

> On 21.07.2020 20:04, Florian Weimer wrote:
>> * Premachandra Mallappa:
>> 
>>> [AMD Public Use]
>>>
>>> Hi Floarian,
>>>
 I'm including a proposal for the levels below.  I use single letters for 
 them, but I expect that the concrete implementation of this proposal will 
 use 
 names like “x86-100”, “x86-101”, like in the glibc patch referenced above. 
  (But we can discuss other approaches.)
>>>
>>> Personally I am not a big fan of this, for 2 reasons 
>>> 1. uses just x86 in name on x86_64 as well
>> 
>> That's deliberate, so that we can use the same x86-* names for 32-bit
>> library selection (once we define matching micro-architecture levels
>> there).
>
> While indeed I did understand it to be deliberate, in the light of
> 64-bit only ISA extensions (like AMX, and I suspect we're going to
> see more) I nevertheless think Premachandra has a point here.

Let me explain how I ended up there.  Maybe I'm wrong.

Previously, I observed that it is difficult to set LD_PRELOAD and
LD_LIBRARY_PATH on combined x86-64/i386 systems, so that the right
libraries are loaded for both variants, and users aren't confused by
dynamic linker warning messages.  On some systems, it is possible to use
dynamic string tokens ($LIB), but not all.

Eventually, it will be possible to add and restrict glibc-hwcaps
subdirectories by setting an environment variable.  The original patch
series only contains ld.so command line options because I wanted to
avoid a discussion about the precise mechanism for setting the
environment variable (current glibc has two approaches).  But the desire
to provide this functionality is there: for adding additional
glibc-hwcaps subdirectories to be searched first, and for restricting
selection to a subset of the built-in (automatically-selected)
subdirectories.

I was worried that we would run into the same problem as with
LD_PRELOAD, where x86-64 and i386 binaries may have different
requirements.  I wanted to minimize the conflict by sharing the names
(eventually, once we have 32-bit variants).

But thinking about this again, I'm not sure if my worry is warranted.
The main selection criteria is still the library load path, and that is
already provided by some different means (e.g. $LIB).  Within the
library path, there is the glibc-hwcaps subdirectory, but since it is
nested under a specific library path subdirectory (determined by the
architecture), adding subdirectories to be searched which do not exist
on the file system, or surpressing directories which would not be
searched in the first place, is not a problem.  The situation is
completely benign and would not warrant any error message from the
dynamic loader.

If this analysis is correct, there is no reason to share the
subdirectory names between x86-64 and i386 binaries, and we can put “64”
somewhere in the x86-64 strings.

The remaining issue is the - vs _ issue.  I think GCC currently uses
“x86-64” in places that are not part of identifiers or target triplets.
Richard mentioned “x86_64-” as a potential choice.  Would it be too
awkward to have ”-march=x86_64-…”?

Thanks,
Florian



Re: Loading plugins with arm-none-eabi-gcc

2020-07-22 Thread Shuai Wang via Gcc
Dear Andrew,

Thanks a lot. Let me make sure I understand the entire picture here. So
basically on my Ubuntu 18.04 x86 machine, I use:

1. gcc (version 10.0; x86) to compile arm-none-eabi-gcc.

2. And also use gcc (version 10.0; x86) to compile the plugin; I tested a
number of x86 applications and the plugin works fine.

3. Right now I want to use arm-none-eabi-gcc to load the plugin and do some
instrumentation on the GIMPLE code of a program, which is going to be
compiled into an ARM binary code.

So your point is that this won't work, am I right? You are expecting to:

1. gcc (version 10.0; x86) to compile arm-none-eabi-gcc.

2. And also use arm-none-eabi-gcc to compile the plugin

3. Use arm-none-eabi-gcc to load the plugin and do some instrumentation on
the GIMPLE code of a program, which is going to be compiled into an ARM
binary code.

Am I right? Then my question is, what binary format at step 2 I need to
compile the plugin program into? x86, or ARM?

Best,
Shuai



On Wed, Jul 22, 2020 at 4:20 PM Andrew Pinski  wrote:

> On Wed, Jul 22, 2020 at 12:45 AM Shuai Wang 
> wrote:
> >
> > Hey Andrew,
> >
> > Thanks a lot for getting back to me. No I am not. Let me clarify the
> context here:
> >
> > 1. On my Ubuntu (x86-64 version), I use x86 gcc (version 10.0) to
> compile this plugin, and test this plugin on various programs' GIMPLE code
> during its compilation with x86 gcc (version 10.0).
> >
> > 2. Then, I switched to use arm-none-eabi-gcc to load this plugin, and
> encountered the above issue.
>
> Right because you did not recompile the plugin to use the headers of
> arm-none-eabi-gcc compiler.  You need to recompile the plugin for that
> compiler using the native GCC you compiled the compiler with; that is
> you might need to recompile the compiler too.
> There is no stable plugin API/ABI here and that is what you are running
> into.
>
> Thanks,
> Andrew
>
> >
> > 3. Since I am doing a cross-platform compilation (on Ubuntu x86), I am
> anticipating to NOT directly compile my plugin (as a typical .so shared
> library) into an ARM library, right? Otherwise it cannot be loaded and
> executed on x86 Ubuntu, right?
> >
> > 4. Then it seems to me that still, the proper way is to compile a x86
> plugin, and then somewhat use the arm-none-eabi-gcc to load the plugin
> during cross architecture compilation?
> >
> > Best,
> > Shuai
> >
> >
> >
> > On Wed, Jul 22, 2020 at 3:20 PM Andrew Pinski  wrote:
> >>
> >> On Tue, Jul 21, 2020 at 11:25 PM Shuai Wang via Gcc 
> wrote:
> >> >
> >> > Hello,
> >> >
> >> > I am currently trying to migrate a gcc plugin that has been well
> developed
> >> > for x86 code to ARM platform (for arm-none-eabi-gcc).
> >> >
> >> > Currently I did the following steps:
> >> >
> >> > 1. write a hello world program t.c
> >> >
> >> > 2. compile with the following commands:
> >> >
> >> > ➜  arm-none-eabi-gcc -v
> >> >  ..
> >> >  gcc version 9.3.1 20200408 (release) (GNU Arm Embedded
> Toolchain
> >> > 9-2020-q2-update)
> >> >
> >> > ➜  arm-none-eabi-gcc -S -mcpu=cortex-m3 -mthumb -fdump-tree-all
> t.c
> >> >
> >> > It works fine, and can smoothly print out all gimple code at different
> >> > stages.
> >> >
> >> > 3. Load my plugin (the plugin is compiled by x64 gcc version 10.0):
> >> >
> >> > ➜  file instrument_san_cov.so
> >> > instrument_san_cov.so: ELF 64-bit LSB shared object, x86-64, version 1
> >> > (SYSV), dynamically linked, with debug_info, not stripped
> >> > ➜  file arm-none-eabi-gcc
> >> > arm-none-eabi-gcc: ELF 64-bit LSB executable, x86-64, version 1
> (SYSV),
> >> > dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for
> GNU/Linux
> >> > 2.6.24, BuildID[sha1]=fbadd6adc8607f595caeccae919f3bab9df2d7a6,
> stripped
> >> >
> >> > ➜  arm-none-eabi-gcc -fplugin=./instrument_cov.so -S -mcpu=cortex-m3
> >> > -mthumb -fdump-tree-all t.c
> >> > cc1: error: cannot load plugin ./instrument_cov.so
> >> >./instrument_cov.so: undefined symbol:
> >> > _Z20build_string_literaliPKcP9tree_nodem
> >> >
> >> > ➜  c++filt -n _Z20build_string_literaliPKcP9tree_nodem
> >> > build_string_literal(int, char const*, tree_node*, unsigned long)
> >> >
> >> >
> >> > It seems that somewhat a function named `build_string_literal` cannot
> be
> >> > found. Why is that? I have no idea how to proceed on this matter and
> cannot
> >> > find some proper documents. Any suggestion would be appreciated.
> Thank you!
> >>
> >> Did you compile your plugin with the headers from the GCC that you are
> >> using to load the plugin into?
> >> If not, then it won't work.  Note build_string_literal changed between
> >> GCC 9 and GCC 10 in the source and GCC plugin ABI is not stable
> >> between releases at all.
> >>
> >> Thanks,
> >> Andrew
> >>
> >> >
> >> > Best,
> >> > Shuai
>


Re: New x86-64 micro-architecture levels

2020-07-22 Thread Jan Beulich
On 22.07.2020 12:34, Florian Weimer wrote:
> The remaining issue is the - vs _ issue.  I think GCC currently uses
> “x86-64” in places that are not part of identifiers or target triplets.
> Richard mentioned “x86_64-” as a potential choice.  Would it be too
> awkward to have ”-march=x86_64-…”?

Personally I'm advocating for avoiding underscores whenever dashes
can also be used, and whenever they're not needed to distinguish
themselves from dashes (like in target triplets). But this doesn't
make their use "awkward" here of course - it's just my personal
view on it. And maybe, despite the main was sent _to_ just me, it
was really me you meant to ask ...

Jan


Re: New x86-64 micro-architecture levels

2020-07-22 Thread Richard Biener via Gcc
On Wed, Jul 22, 2020 at 12:16 PM Florian Weimer  wrote:
>
> * Richard Biener:
>
> > On Wed, Jul 22, 2020 at 10:58 AM Florian Weimer via Gcc  
> > wrote:
> >>
> >> * Dongsheng Song:
> >>
> >> > I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
> >> > recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
> >> > python's platform tags (e.g. manylinux2010, manylinux2014).
> >>
> >> I started out with a year number, but that was before the was Level A.
> >> Too many new CPUs only fall under level A unfortunately because they do
> >> not even have AVX.  This even applies to some new server CPU designs
> >> released this year.
> >>
> >> I'm concerned that putting a year into the level name suggests that
> >> everything main-stream released after that year supports that level, and
> >> that's not true.  I think for manylinux, it's different, and it actually
> >> works out there.  No one is building a new GNU/Linux distribution that
> >> is based on glibc 2.12 today, for example.  But not so much for x86
> >> CPUs.
> >>
> >> If you think my worry is unfounded, then a year-based approach sounds
> >> compelling.
> >
> > I think the main question is whether those levels are supposed to be
> > an implementation detail hidden from most software developer or
> > if people are expected to make concious decisions between
> > -march=x86-100 and -march=x86-101.  Implementation detail
> > for system integrators, that is.
>
> Anyone who wants to optimize their software something that's more
> current than what was available in 2003 has to think about this in some
> form.
>
> With these levels, I hope to provide a pre-packaged set of choices, with
> a consistent user interface, in the sense that -march= options and file
> system locations match.  Programmers will definitely encounter these
> strings, and they need to know what they mean for their users.  We need
> to provide them with the required information so that they can make
> decisions based on their knowledge of their user base.  But the ultimate
> decision really has to be a programmer choice.
>
> I'm not sure if GCC documentation or glibc documentation would be the
> right place for this.  An online resource that can be linked to directly
> seems more appropriate.
>
> Apart from that, there is the more limited audience of general purpose
> distribution builders.  I expect they will pick one of these levels to
> build all the distribution binaries, unless they want to be stuck in
> 2003.  But as long they do not choose the highest level defined,
> programmers might still want to provide optimized library builds for
> run-time selection, and then they need the same guidance as before.
>
> > If it's not merely an implementation detail then names without
> > any chance of providing false hints (x86-2014 - oh, it will
> > run fine on the CPU I bought in 2015; or, x86-avx2 - ah, of
> > course I want avx2) is better.  But this also means this feature
> > should come with extensive documentation on how it is
> > supposed to be used.  For example we might suggest ISVs
> > provide binaries for all architecture levels or use IFUNCs
> > or other runtime CPU selection capabilities.
>
> I think we should document the mechanism as best as we can, and provide
> intended use cases.  We shouldn't go as far as to tell programmers what
> library versions they must build, except that they should always include
> a fallback version if no optimized library can be selected.
>
> Describing the interactions with IFUNCs also makes sense.
>
> But I think we should not go overboard with this.  Historically, we've
> done not such a great job with documenting toolchain features, I know,
> and we should do better now.  I will try to write something helpful, but
> it should still match the relative importance of this feature.
>
> > It's also required to provide a (extensive?) list of SKUs that fall
> > into the respective categories (probably up to CPU vendors to amend
> > those).
>
> I'm afraid, but SKUs are not very useful in this context.
> Virtualization can disable features (e.g., some cloud providers
> advertise they use certain SKUs, but some features are not available to
> guests), and firmware updates have done so as well.  I think the only
> way is to document our selection criteria, and encourage CPU vendors to
> enhance their SKU browsers so that you can search by the (lack of)
> support for certain CPU features.
>
> The selection criteria I suggested should not be affected by firmware
> and microcode updates at least (I took that into consideration), but
> it's just not possible to achieve virtualization and kernel version
> independence, given that some features based on which we want to make
> library selections demand kernel and hypervisor support.
>
> > Since this is a feature crossing multiple projects - at least
> > glibc and GCC - sharing the source of said documentation
> > would be important.
>
> Technically, the GCC web site would work for me.

Re: New x86-64 micro-architecture levels

2020-07-22 Thread H.J. Lu via Gcc
On Wed, Jul 22, 2020 at 6:50 AM Richard Biener via Libc-alpha
 wrote:
>
> On Wed, Jul 22, 2020 at 12:16 PM Florian Weimer  wrote:
> >
> > * Richard Biener:
> >
> > > On Wed, Jul 22, 2020 at 10:58 AM Florian Weimer via Gcc  
> > > wrote:
> > >>
> > >> * Dongsheng Song:
> > >>
> > >> > I fully agree these names (100/101, A/B/C/D) are not very intuitive, I
> > >> > recommend using isa tags by year (e.g. x64_2010, x64_2014) like the
> > >> > python's platform tags (e.g. manylinux2010, manylinux2014).
> > >>
> > >> I started out with a year number, but that was before the was Level A.
> > >> Too many new CPUs only fall under level A unfortunately because they do
> > >> not even have AVX.  This even applies to some new server CPU designs
> > >> released this year.
> > >>
> > >> I'm concerned that putting a year into the level name suggests that
> > >> everything main-stream released after that year supports that level, and
> > >> that's not true.  I think for manylinux, it's different, and it actually
> > >> works out there.  No one is building a new GNU/Linux distribution that
> > >> is based on glibc 2.12 today, for example.  But not so much for x86
> > >> CPUs.
> > >>
> > >> If you think my worry is unfounded, then a year-based approach sounds
> > >> compelling.
> > >
> > > I think the main question is whether those levels are supposed to be
> > > an implementation detail hidden from most software developer or
> > > if people are expected to make concious decisions between
> > > -march=x86-100 and -march=x86-101.  Implementation detail
> > > for system integrators, that is.
> >
> > Anyone who wants to optimize their software something that's more
> > current than what was available in 2003 has to think about this in some
> > form.
> >
> > With these levels, I hope to provide a pre-packaged set of choices, with
> > a consistent user interface, in the sense that -march= options and file
> > system locations match.  Programmers will definitely encounter these
> > strings, and they need to know what they mean for their users.  We need
> > to provide them with the required information so that they can make
> > decisions based on their knowledge of their user base.  But the ultimate
> > decision really has to be a programmer choice.
> >
> > I'm not sure if GCC documentation or glibc documentation would be the
> > right place for this.  An online resource that can be linked to directly
> > seems more appropriate.
> >
> > Apart from that, there is the more limited audience of general purpose
> > distribution builders.  I expect they will pick one of these levels to
> > build all the distribution binaries, unless they want to be stuck in
> > 2003.  But as long they do not choose the highest level defined,
> > programmers might still want to provide optimized library builds for
> > run-time selection, and then they need the same guidance as before.
> >
> > > If it's not merely an implementation detail then names without
> > > any chance of providing false hints (x86-2014 - oh, it will
> > > run fine on the CPU I bought in 2015; or, x86-avx2 - ah, of
> > > course I want avx2) is better.  But this also means this feature
> > > should come with extensive documentation on how it is
> > > supposed to be used.  For example we might suggest ISVs
> > > provide binaries for all architecture levels or use IFUNCs
> > > or other runtime CPU selection capabilities.
> >
> > I think we should document the mechanism as best as we can, and provide
> > intended use cases.  We shouldn't go as far as to tell programmers what
> > library versions they must build, except that they should always include
> > a fallback version if no optimized library can be selected.
> >
> > Describing the interactions with IFUNCs also makes sense.
> >
> > But I think we should not go overboard with this.  Historically, we've
> > done not such a great job with documenting toolchain features, I know,
> > and we should do better now.  I will try to write something helpful, but
> > it should still match the relative importance of this feature.
> >
> > > It's also required to provide a (extensive?) list of SKUs that fall
> > > into the respective categories (probably up to CPU vendors to amend
> > > those).
> >
> > I'm afraid, but SKUs are not very useful in this context.
> > Virtualization can disable features (e.g., some cloud providers
> > advertise they use certain SKUs, but some features are not available to
> > guests), and firmware updates have done so as well.  I think the only
> > way is to document our selection criteria, and encourage CPU vendors to
> > enhance their SKU browsers so that you can search by the (lack of)
> > support for certain CPU features.
> >
> > The selection criteria I suggested should not be affected by firmware
> > and microcode updates at least (I took that into consideration), but
> > it's just not possible to achieve virtualization and kernel version
> > independence, given that some features based on which we want to make
> > 

RE: New x86-64 micro-architecture levels

2020-07-22 Thread Mallappa, Premachandra
[AMD Public Use]


> That's deliberate, so that we can use the same x86-* names for 32-bit library 
> selection (once we define matching micro-architecture levels there).

Understood.

> If numbers are out, what should we use instead?
> x86-sse4, x86-avx2, x86-avx512?  Would that work?

Yes please, I think we have to choose somewhere, above would be more descriptive

> Let's merge Level B into level C then?

I would vote for this.

>> Also we would also like to have dynamic loader support for "zen" / 
>> "zen2" as a version of "Level D" and takes preference over Level D, 
>> which may have super-optimized libraries from AMD or other vendors.

> *That* shouldn't be too hard to implement if we can nail down the selection 
> criteria.  Let's call this Zen-specific Level C x86-zen-avx2 for the sake of 
> exposition.

Some way of specifying a superset of "level C" , that "C" will capture fully.

Zen/zen2 takes precedence over Level C, but not Level D, but falls back to 
"Level C" or "x86-avx2" but not "x86-avx".

I think it is better to run a x86-zen on a x86-avx2 or x86-avx compared to 
running on a base x86_64 config.

> With the levels I proposed, these aspects are covered.  But if we start to 
> create vendor-specific forks in the feature progression, things get 
> complicated.
I am not strictly proposing OS vendors should create/maintain this (it would be 
nice if they did), but a support to cached load via system-wide-config. This 
directory may/will contain a subset of system libs.

> Do you think we need to figure this out in this iteration?  If yes, then I 
> really need a semi-formal description of the selection criteria for this 
> x86-zen-avx2 directory, so that I can passed it along with my psABI proposal.

Preference level (decreasing order) (I can only speak for AMD, others please 
pitch in)
- system wide config to override (in this case x86-zen)
- x86-avx2
- x86-sse4 (or avx, based on how we name and merge Level B)
- default x86_64


Re: Three issues

2020-07-22 Thread Gary Oblock via Gcc
David,

Note, for the first explanation, this is the hash table for the default defs 
and not
some private pass specific table so I'm not directly touching it in way. 
However, that
doesn't mean some other common or not so common function I'm invoking has
the side effect of doing this in some pathological way (you point this out by 
asking
if they are my temporaries.) I do create temporaries but I certainly make no 
attempt
to add them to the default defs. In fact, I went so far as to instrument the 
code that
adds them to see if I was doing this but I found nothing. This likely means I'm 
causing
a subtle memory corruption or something I'm doing has bad side effects.

The second option (not an explanation) has me diddling some fairly important 
stuff
that I don't know all that much about therefore I prefer to find and fix the 
root cause.

Thanks,

Gary




From: David Malcolm 
Sent: Wednesday, July 22, 2020 12:31 AM
To: Gary Oblock ; gcc@gcc.gnu.org 
Subject: Re: Three issues

[EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
be mindful of safe email handling and proprietary information protection 
practices.]


On Tue, 2020-07-21 at 22:49 +, Gary Oblock via Gcc wrote:
> Some background:
>
> This is in the dreaded structure reorganization optimization that I'm
> working on. It's running at LTRANS time with '-flto-partition=one'.
>
> My issues in order of importance are:
>
> 1) In gimple-ssa.h, the equal method for ssa_name_hasher
> has a segfault because the "var" field of "a" is (nil).
>
> struct ssa_name_hasher : ggc_ptr_hash
> {
>   /* Hash a tree in a uid_decl_map.  */
>
>   static hashval_t
>   hash (tree item)
>   {
> return item->ssa_name.var->decl_minimal.uid;
>   }
>
>   /* Return true if the DECL_UID in both trees are equal.  */
>
>   static bool
>   equal (tree a, tree b)
>   {
>   return (a->ssa_name.var->decl_minimal.uid == b->ssa_name.var-
> >decl_minimal.uid);
>   }
> };

I notice that tree.h has:

/* Returns the variable being referenced.  This can be NULL_TREE for
   temporaries not associated with any user variable.
   Once released, this is the only field that can be relied upon.  */
#define SSA_NAME_VAR(NODE)  \
  (SSA_NAME_CHECK (NODE)->ssa_name.var == NULL_TREE \
   || TREE_CODE ((NODE)->ssa_name.var) == IDENTIFIER_NODE   \
   ? NULL_TREE : (NODE)->ssa_name.var)

So presumably that ssa_name_hasher is making an implicit assumption
that such temporaries aren't present in the hash_table; maybe they are
for yours?

Is this a hash_table that you're populating yourself?

With the caveat that I'm sleep-deprived, another way this could happen
is if "a" is not an SSA_NAME but is in fact some other kind of tree;
you could try replacing
  a->ssa_name.ver
with
  SSA_NAME_CHECK (a)->ssa_name.var
(and similarly for b)

But the first explanation seems more likely.


>
[...snip qn 2...]


> 3) For my bug in (1) I got so distraught that I ran valgrind which
> in my experience is an act of desperation for compilers.
>
> None of the errors it spotted are associated with my optimization
> (although it oh so cleverly pointed out the segfault) however it
> showed the following:
>
> ==18572== Invalid read of size 8
> ==18572==at 0x1079DC1: execute_one_pass(opt_pass*)
> (passes.c:2550)

What is line 2550 of passes.c in your working copy?

==18572==by 0x107ABD3: execute_ipa_pass_list(opt_pass*)
> (passes.c:2929)
> ==18572==by 0xAC0E52: symbol_table::compile() (cgraphunit.c:2786)
> ==18572==by 0x9915A9: lto_main() (lto.c:653)
> ==18572==by 0x11EE4A0: compile_file() (toplev.c:458)
> ==18572==by 0x11F1888: do_compile() (toplev.c:2302)
> ==18572==by 0x11F1BA3: toplev::main(int, char**) (toplev.c:2441)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==  Address 0x5842880 is 16 bytes before a block of size 88
> alloc'd
> ==18572==at 0x4C3017F: operator new(unsigned long) (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==18572==by 0x21E00B7: make_pass_ipa_prototype(gcc::context*)
> (ipa-prototype.c:329)

You say above that none of the errors are associated with your
optimization, but presumably this is your new pass, right?  Can you
post the code somewhere?

> ==18572==by 0x106E987:
> gcc::pass_manager::pass_manager(gcc::context*) (pass-
> instances.def:178)
> ==18572==by 0x11EFCE8: general_init(char const*, bool)
> (toplev.c:1250)
> ==18572==by 0x11F1A86: toplev::main(int, char**) (toplev.c:2391)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==
>
> Are these known issues with lto or is this a valgrind issue?

Hope this is helpful
Dave



Re: Three issues

2020-07-22 Thread Gary Oblock via Gcc
Richard,

I was really hopeful about your suggestions but I went over my code and
anything that modified anything had a cfun_push and cfun_pop associated with it.

Also, enabling the extra annotations didn't make a difference.

I'm thinking a wolf fence test that scans for malformed default_def hash table
entries is my only recourse at this point.

Thanks,

Gary

From: Richard Biener 
Sent: Wednesday, July 22, 2020 2:32 AM
To: Gary Oblock 
Cc: gcc@gcc.gnu.org 
Subject: Re: Three issues

[EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
be mindful of safe email handling and proprietary information protection 
practices.]


On Wed, Jul 22, 2020 at 12:51 AM Gary Oblock via Gcc  wrote:
>
> Some background:
>
> This is in the dreaded structure reorganization optimization that I'm
> working on. It's running at LTRANS time with '-flto-partition=one'.
>
> My issues in order of importance are:
>
> 1) In gimple-ssa.h, the equal method for ssa_name_hasher
> has a segfault because the "var" field of "a" is (nil).
>
> struct ssa_name_hasher : ggc_ptr_hash
> {
>   /* Hash a tree in a uid_decl_map.  */
>
>   static hashval_t
>   hash (tree item)
>   {
> return item->ssa_name.var->decl_minimal.uid;
>   }
>
>   /* Return true if the DECL_UID in both trees are equal.  */
>
>   static bool
>   equal (tree a, tree b)
>   {
>   return (a->ssa_name.var->decl_minimal.uid == 
> b->ssa_name.var->decl_minimal.uid);
>   }
> };
>
> The parameter "a" is associated with "*entry" on the 2nd to last
> line shown (it's trimmed off after that.) This from hash-table.h:
>
> template template class Allocator>
> typename hash_table::value_type &
> hash_table
> ::find_with_hash (const compare_type &comparable, hashval_t hash)
> {
>   m_searches++;
>   size_t size = m_size;
>   hashval_t index = hash_table_mod1 (hash, m_size_prime_index);
>
>   if (Lazy && m_entries == NULL)
> m_entries = alloc_entries (size);
>
> #if CHECKING_P
>   if (m_sanitize_eq_and_hash)
> verify (comparable, hash);
> #endif
>
>   value_type *entry = &m_entries[index];
>   if (is_empty (*entry)
>   || (!is_deleted (*entry) && Descriptor::equal (*entry, comparable)))
> return *entry;
>   .
>   .
>
> Is there any way this could happen other than by a memory corruption
> of some kind? This is a show stopper for me and I really need some help on
> this issue.
>
> 2) I tried to dump out all the gimple in the following way at the very
> beginning of my program:
>
> void
> print_program ( FILE *file, int leading_space )
> {
>   struct cgraph_node *node;
>   fprintf ( file, "%*sProgram:\n", leading_space, "");
>
>   // Print Global Decls
>   //
>   varpool_node *var;
>   FOR_EACH_VARIABLE ( var)
>   {
> tree decl = var->decl;
> fprintf ( file, "%*s", leading_space, "");
> print_generic_decl ( file, decl, (dump_flags_t)0);
> fprintf ( file, "\n");
>   }
>
>   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
>   {
> struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
> dump_function_header ( file, func->decl, (dump_flags_t)0);
> dump_function_to_file ( func->decl, file, (dump_flags_t)0);
>   }
> }
>
> When I run this the first two (out of three) functions print
> just fine. However, for the third, func->decl is (nil) and
> it segfaults.
>
> Now the really odd thing is that this works perfectly at the
> end or middle of my optimization.
>
> What gives?
>
> 3) For my bug in (1) I got so distraught that I ran valgrind which
> in my experience is an act of desperation for compilers.
>
> None of the errors it spotted are associated with my optimization
> (although it oh so cleverly pointed out the segfault) however it
> showed the following:
>
> ==18572== Invalid read of size 8
> ==18572==at 0x1079DC1: execute_one_pass(opt_pass*) (passes.c:2550)
> ==18572==by 0x107ABD3: execute_ipa_pass_list(opt_pass*) (passes.c:2929)
> ==18572==by 0xAC0E52: symbol_table::compile() (cgraphunit.c:2786)
> ==18572==by 0x9915A9: lto_main() (lto.c:653)
> ==18572==by 0x11EE4A0: compile_file() (toplev.c:458)
> ==18572==by 0x11F1888: do_compile() (toplev.c:2302)
> ==18572==by 0x11F1BA3: toplev::main(int, char**) (toplev.c:2441)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==  Address 0x5842880 is 16 bytes before a block of size 88 alloc'd
> ==18572==at 0x4C3017F: operator new(unsigned long) (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==18572==by 0x21E00B7: make_pass_ipa_prototype(gcc::context*) 
> (ipa-prototype.c:329)
> ==18572==by 0x106E987: gcc::pass_manager::pass_manager(gcc::context*) 
> (pass-instances.def:178)
> ==18572==by 0x11EFCE8: general_init(char const*, bool) (toplev.c:1250)
> ==18572==by 0x11F1A86: toplev::main(int, char**) (toplev.c:2391)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==
>
> Are these known issues with lto or is this a valgrind issue?

It smells like you are modifying IL via APIs that rel

Re: Three issues

2020-07-22 Thread Gary Oblock via Gcc
Richard,

My wolf fence failed to detect an issue at the end of my pass
so I'm now hunting for a problem I caused in a following pass.

Your thoughts?

Gary

- Wolf Fence Follows -
int
wf_func ( tree *slot, tree *dummy)
{
  tree t_val = *slot;
  gcc_assert( t_val->ssa_name.var);
  return 0;
}

void
wolf_fence (
Info *info // Pass level gobal info (might not use it)
  )
{
  struct cgraph_node *node;
  fprintf( stderr,
  "Wolf Fence: Find wolf via gcc_assert(t_val->ssa_name.var)\n");
  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
{
  struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
  push_cfun ( func);
  DEFAULT_DEFS ( func)->traverse_noresize < tree *, wf_func> ( NULL);
  pop_cfun ();
}
  fprintf( stderr, "Wolf Fence: Didn't find wolf!\n");
}

From: Richard Biener 
Sent: Wednesday, July 22, 2020 2:32 AM
To: Gary Oblock 
Cc: gcc@gcc.gnu.org 
Subject: Re: Three issues

[EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
be mindful of safe email handling and proprietary information protection 
practices.]


On Wed, Jul 22, 2020 at 12:51 AM Gary Oblock via Gcc  wrote:
>
> Some background:
>
> This is in the dreaded structure reorganization optimization that I'm
> working on. It's running at LTRANS time with '-flto-partition=one'.
>
> My issues in order of importance are:
>
> 1) In gimple-ssa.h, the equal method for ssa_name_hasher
> has a segfault because the "var" field of "a" is (nil).
>
> struct ssa_name_hasher : ggc_ptr_hash
> {
>   /* Hash a tree in a uid_decl_map.  */
>
>   static hashval_t
>   hash (tree item)
>   {
> return item->ssa_name.var->decl_minimal.uid;
>   }
>
>   /* Return true if the DECL_UID in both trees are equal.  */
>
>   static bool
>   equal (tree a, tree b)
>   {
>   return (a->ssa_name.var->decl_minimal.uid == 
> b->ssa_name.var->decl_minimal.uid);
>   }
> };
>
> The parameter "a" is associated with "*entry" on the 2nd to last
> line shown (it's trimmed off after that.) This from hash-table.h:
>
> template template class Allocator>
> typename hash_table::value_type &
> hash_table
> ::find_with_hash (const compare_type &comparable, hashval_t hash)
> {
>   m_searches++;
>   size_t size = m_size;
>   hashval_t index = hash_table_mod1 (hash, m_size_prime_index);
>
>   if (Lazy && m_entries == NULL)
> m_entries = alloc_entries (size);
>
> #if CHECKING_P
>   if (m_sanitize_eq_and_hash)
> verify (comparable, hash);
> #endif
>
>   value_type *entry = &m_entries[index];
>   if (is_empty (*entry)
>   || (!is_deleted (*entry) && Descriptor::equal (*entry, comparable)))
> return *entry;
>   .
>   .
>
> Is there any way this could happen other than by a memory corruption
> of some kind? This is a show stopper for me and I really need some help on
> this issue.
>
> 2) I tried to dump out all the gimple in the following way at the very
> beginning of my program:
>
> void
> print_program ( FILE *file, int leading_space )
> {
>   struct cgraph_node *node;
>   fprintf ( file, "%*sProgram:\n", leading_space, "");
>
>   // Print Global Decls
>   //
>   varpool_node *var;
>   FOR_EACH_VARIABLE ( var)
>   {
> tree decl = var->decl;
> fprintf ( file, "%*s", leading_space, "");
> print_generic_decl ( file, decl, (dump_flags_t)0);
> fprintf ( file, "\n");
>   }
>
>   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
>   {
> struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
> dump_function_header ( file, func->decl, (dump_flags_t)0);
> dump_function_to_file ( func->decl, file, (dump_flags_t)0);
>   }
> }
>
> When I run this the first two (out of three) functions print
> just fine. However, for the third, func->decl is (nil) and
> it segfaults.
>
> Now the really odd thing is that this works perfectly at the
> end or middle of my optimization.
>
> What gives?
>
> 3) For my bug in (1) I got so distraught that I ran valgrind which
> in my experience is an act of desperation for compilers.
>
> None of the errors it spotted are associated with my optimization
> (although it oh so cleverly pointed out the segfault) however it
> showed the following:
>
> ==18572== Invalid read of size 8
> ==18572==at 0x1079DC1: execute_one_pass(opt_pass*) (passes.c:2550)
> ==18572==by 0x107ABD3: execute_ipa_pass_list(opt_pass*) (passes.c:2929)
> ==18572==by 0xAC0E52: symbol_table::compile() (cgraphunit.c:2786)
> ==18572==by 0x9915A9: lto_main() (lto.c:653)
> ==18572==by 0x11EE4A0: compile_file() (toplev.c:458)
> ==18572==by 0x11F1888: do_compile() (toplev.c:2302)
> ==18572==by 0x11F1BA3: toplev::main(int, char**) (toplev.c:2441)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==  Address 0x5842880 is 16 bytes before a block of size 88 alloc'd
> ==18572==at 0x4C3017F: operator new(unsigned long) (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==18572==by 0x21E00B7: make_pass_ipa_prototype(gcc::context*) 
> (ipa-prototype.c:32

Re: Three issues

2020-07-22 Thread Richard Biener via Gcc
On Thu, Jul 23, 2020 at 5:32 AM Gary Oblock  wrote:
>
> Richard,
>
> My wolf fence failed to detect an issue at the end of my pass
> so I'm now hunting for a problem I caused in a following pass.
>
> Your thoughts?

Sorry - I'd look at the IL after your pass for obvious mistakes.
All default defs need to have a VAR_DECL associated as
SSA_NAME_VAR.

> Gary
>
> - Wolf Fence Follows -
> int
> wf_func ( tree *slot, tree *dummy)
> {
>   tree t_val = *slot;
>   gcc_assert( t_val->ssa_name.var);
>   return 0;
> }
>
> void
> wolf_fence (
> Info *info // Pass level gobal info (might not use it)
>   )
> {
>   struct cgraph_node *node;
>   fprintf( stderr,
>   "Wolf Fence: Find wolf via gcc_assert(t_val->ssa_name.var)\n");
>   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
> {
>   struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
>   push_cfun ( func);
>   DEFAULT_DEFS ( func)->traverse_noresize < tree *, wf_func> ( NULL);
>   pop_cfun ();
> }
>   fprintf( stderr, "Wolf Fence: Didn't find wolf!\n");
> }
> 
> From: Richard Biener 
> Sent: Wednesday, July 22, 2020 2:32 AM
> To: Gary Oblock 
> Cc: gcc@gcc.gnu.org 
> Subject: Re: Three issues
>
> [EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please 
> be mindful of safe email handling and proprietary information protection 
> practices.]
>
>
> On Wed, Jul 22, 2020 at 12:51 AM Gary Oblock via Gcc  wrote:
> >
> > Some background:
> >
> > This is in the dreaded structure reorganization optimization that I'm
> > working on. It's running at LTRANS time with '-flto-partition=one'.
> >
> > My issues in order of importance are:
> >
> > 1) In gimple-ssa.h, the equal method for ssa_name_hasher
> > has a segfault because the "var" field of "a" is (nil).
> >
> > struct ssa_name_hasher : ggc_ptr_hash
> > {
> >   /* Hash a tree in a uid_decl_map.  */
> >
> >   static hashval_t
> >   hash (tree item)
> >   {
> > return item->ssa_name.var->decl_minimal.uid;
> >   }
> >
> >   /* Return true if the DECL_UID in both trees are equal.  */
> >
> >   static bool
> >   equal (tree a, tree b)
> >   {
> >   return (a->ssa_name.var->decl_minimal.uid == 
> > b->ssa_name.var->decl_minimal.uid);
> >   }
> > };
> >
> > The parameter "a" is associated with "*entry" on the 2nd to last
> > line shown (it's trimmed off after that.) This from hash-table.h:
> >
> > template > template class Allocator>
> > typename hash_table::value_type &
> > hash_table
> > ::find_with_hash (const compare_type &comparable, hashval_t hash)
> > {
> >   m_searches++;
> >   size_t size = m_size;
> >   hashval_t index = hash_table_mod1 (hash, m_size_prime_index);
> >
> >   if (Lazy && m_entries == NULL)
> > m_entries = alloc_entries (size);
> >
> > #if CHECKING_P
> >   if (m_sanitize_eq_and_hash)
> > verify (comparable, hash);
> > #endif
> >
> >   value_type *entry = &m_entries[index];
> >   if (is_empty (*entry)
> >   || (!is_deleted (*entry) && Descriptor::equal (*entry, comparable)))
> > return *entry;
> >   .
> >   .
> >
> > Is there any way this could happen other than by a memory corruption
> > of some kind? This is a show stopper for me and I really need some help on
> > this issue.
> >
> > 2) I tried to dump out all the gimple in the following way at the very
> > beginning of my program:
> >
> > void
> > print_program ( FILE *file, int leading_space )
> > {
> >   struct cgraph_node *node;
> >   fprintf ( file, "%*sProgram:\n", leading_space, "");
> >
> >   // Print Global Decls
> >   //
> >   varpool_node *var;
> >   FOR_EACH_VARIABLE ( var)
> >   {
> > tree decl = var->decl;
> > fprintf ( file, "%*s", leading_space, "");
> > print_generic_decl ( file, decl, (dump_flags_t)0);
> > fprintf ( file, "\n");
> >   }
> >
> >   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
> >   {
> > struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
> > dump_function_header ( file, func->decl, (dump_flags_t)0);
> > dump_function_to_file ( func->decl, file, (dump_flags_t)0);
> >   }
> > }
> >
> > When I run this the first two (out of three) functions print
> > just fine. However, for the third, func->decl is (nil) and
> > it segfaults.
> >
> > Now the really odd thing is that this works perfectly at the
> > end or middle of my optimization.
> >
> > What gives?
> >
> > 3) For my bug in (1) I got so distraught that I ran valgrind which
> > in my experience is an act of desperation for compilers.
> >
> > None of the errors it spotted are associated with my optimization
> > (although it oh so cleverly pointed out the segfault) however it
> > showed the following:
> >
> > ==18572== Invalid read of size 8
> > ==18572==at 0x1079DC1: execute_one_pass(opt_pass*) (passes.c:2550)
> > ==18572==by 0x107ABD3: execute_ipa_pass_list(opt_pass*) (passes.c:2929)
> > ==18572==by 0xAC0E52: symbol_table::compile() (cgraphunit.c:2786)
> > ==18572==by 0x9915A9: lto_main() (lto.c:653)
> > ==18572==b

Non-inlined functions and mixed architectures

2020-07-22 Thread Allan Sandfeld Jensen
A problem that I keep running into is functions defined headers, but used in 
sources files that are compiled with different CPU feature flags (for runtime 
CPU feature selection).

We know to make sure the functions are inlinable and their address never 
taken, but of course in debug builds they are still not inlined. Every so 
often the functions get compiled using some of the optional CPU instructions, 
and if the linker selects the optimized versions those instructions can then 
leak through to instances compiled with different CPU flags where the 
instructions aren't supposed to be used. This happens even in unoptimized 
debug builds as the extended instruction selections doesn't count as an 
optimization.

So far the main workaround for gcc has been to mark the functions as 
always_inline.

I have been wondering if you couldn't use the same technique you used for fix 
similar problems for mixed archs for LTO builds and tag shared functions with 
their archs so they don't get merged by linker?

I know the whole thing could technially be seen as an ODR violation, but it 
would still be great if it was something GCC could just handle it out of the 
box.

Alternatively an compile time option to mark non-inline inline functions as 
weak or not generated at all would when compiling certain files would also 
work. 

Best regards
'Allan