Re: [GSOC] LTO dump tool project

2018-06-04 Thread Martin Liška
On 06/01/2018 08:59 PM, Hrishikesh Kulkarni wrote:
> Hi,
> I have pushed the changes to github
> (https://github.com/hrisearch/gcc). Added a command line option for
> specific dumps of variables and functions used in IL e.g.
> -fdump-lto-list=foo will dump:
> Call Graph:
> 
> foo/1 (foo)
>   Type: function
>  visibility: default

Hi.

Thanks for the next step. I've got some comments about it:

- -fdump-lto-list=foo is wrong option name, I would use -fdump-lto-symbol
  or something similar.

- for -fdump-lto-list I would really prefer to use a format similar to nm:
  print a header with column description and then one line for a symbol

- think about mangling/demangling of C++ symbols, you can take a look at
nm it also has --demangle, --no-demangle

- please learn & try to use an autoformat for your editor in order to
  fulfill GNU coding style. Following checker will help you:

$ ./contrib/check_GNU_style.py /tmp/p
=== ERROR type #1: dot, space, space, end of comment (6 error(s)) ===
gcc/lto/lto-dump.c:38:17:/*Dump everything*/
gcc/lto/lto-dump.c:44:41:/*Dump variables and functions used in IL*/
gcc/lto/lto-dump.c:73:50:/*Dump specific variables and functions used in IL*/
gcc/lto/lto.c:3364:19:  /*Dump everything*/
gcc/lto/lto.c:3368:43:  /*Dump variables and functions used in IL*/
gcc/lto/lto.c:3372:52:  /*Dump specific variables and functions used in IL*/

=== ERROR type #2: lines should not exceed 80 characters (11 error(s)) ===
gcc/lto/lto-dump.c:51:80:static const char * const symtab_type_names[] 
= {"symbol", "function", "variable"};
gcc/lto/lto-dump.c:56:80:fprintf (stderr, "\n%s (%s)", 
cnode->dump_asm_name (), cnode->name ());
gcc/lto/lto-dump.c:57:80:fprintf (stderr, "\n  Type: %s", 
symtab_type_names[cnode->type]);
gcc/lto/lto-dump.c:66:80:fprintf (stderr, "\n%s (%s)", 
vnode->dump_asm_name (), vnode->name ());
gcc/lto/lto-dump.c:67:80:fprintf (stderr, "\n  Type: %s", 
symtab_type_names[vnode->type]);
gcc/lto/lto-dump.c:80:80:static const char * const symtab_type_names[] 
= {"symbol", "function", "variable"};
gcc/lto/lto-dump.c:87:80:fprintf (stderr, "\n%s (%s)", 
cnode->dump_asm_name (), cnode->name ());
gcc/lto/lto-dump.c:88:80:fprintf (stderr, "\n  Type: 
%s", symtab_type_names[cnode->type]);
gcc/lto/lto-dump.c:99:80:fprintf (stderr, "\n%s (%s)", 
vnode->dump_asm_name (), vnode->name ());
gcc/lto/lto-dump.c:100:80:fprintf (stderr, "\n  Type: 
%s", symtab_type_names[vnode->type]);
gcc/lto/Make-lang.in:25:80:LTO_OBJS = lto/lto-lang.o lto/lto.o lto/lto-object.o 
attribs.o lto/lto-partition.o lto/lto-symtab.o lto/lto-dump.o

=== ERROR type #3: there should be exactly one space between function name and 
parenthesis (15 error(s)) ===
gcc/lto/lto-dump.c:39:9:void dump()
gcc/lto/lto-dump.c:41:8:fprintf(stderr, "\nHello World!\n");
gcc/lto/lto-dump.c:45:14:void dump_list()
gcc/lto/lto-dump.c:74:15:void dump_list2()
gcc/lto/lto-dump.c:85:13:   if (!strcmp(flag_lto_dump_list2, 
cnode->name()))
gcc/lto/lto-dump.c:97:16:   if (!strcmp(flag_lto_dump_list2, vnode->name()))
gcc/lto/lto-dump.h:23:9:void dump();
gcc/lto/lto-dump.h:24:14:void dump_list();
gcc/lto/lto-dump.h:25:15:void dump_list2();
gcc/lto/lang.opt:67:7:LTO Var(flag_lto_dump)
gcc/lto/lang.opt:71:7:LTO Var(flag_lto_dump_list)
gcc/lto/lang.opt:75:36:LTO Driver RejectNegative Joined Var(flag_lto_dump_list2)
gcc/lto/lto.c:3366:8:dump();
gcc/lto/lto.c:3370:13:dump_list();
gcc/lto/lto.c:3374:14:dump_list2();

=== ERROR type #4: there should be no space before a left square bracket (4 
error(s)) ===
gcc/lto/lto-dump.c:59:19:   visibility_types [DECL_VISIBILITY 
(cnode->decl)]);
gcc/lto/lto-dump.c:69:19:   visibility_types [DECL_VISIBILITY 
(vnode->decl)]);
gcc/lto/lto-dump.c:90:20:   visibility_types 
[DECL_VISIBILITY (cnode->decl)]);
gcc/lto/lto-dump.c:102:20:  visibility_types 
[DECL_VISIBILITY (vnode->decl)]);

=== ERROR type #5: trailing whitespace (5 error(s)) ===
gcc/lto/lto-dump.c:50:0:█
gcc/lto/lto-dump.c:79:0:█
gcc/lto/lto-dump.c:92:2:}█
gcc/lto/lto-dump.c:98:3:{█
gcc/lto/lto-dump.c:105:1:}█

And please try to avoid adding blank lines / remove blank lines in files which 
you don't modify.
Examples: cgraph.c, varpool.c. Note that the checker is not 100% sure, but will 
help you.

> 
> Regards,
> Hrishikesh
> 
> On Tue, May 29, 2018 at 11:13 PM, Martin Liška  wrote:
>> On 05/29/2018 07:38 PM, Martin Liška wrote:
>>> $ nm main.o
>>>  T main
>>>  T mystring
>>>  C pole
>>
>> Or we can be inspired by readelf:
>>
>> $ readelf -s a.out
>> [snip]
>> Symbol table '.symtab' contains 74 entries:
>>Num:Value  Size TypeBind   Vis  Ndx Name
>> 66: 00601250 0 NOTYPE  GLOBAL DEFAULT   24 _end
>> 

Re: Solaris issues

2018-06-04 Thread pa...@free.fr
Hi

That should be 'adding to valgrind' which I mentioned in the following 
paragraph.

A+
Paul





 Original message 
From: Jonathan Wakely  
Date:2018/06/03  23:34  (GMT+01:00) 
To: Paul Floyd  
Cc: gcc@gcc.gnu.org 
Subject: Re: Solaris issues 

On 2 June 2018 at 11:29, Paul Floyd  wrote:
> Secondly I’ve been doing some work on adding support for C++14 and C++17 
> sized/aligned new and delete operators.

Aren't they already supported? I thought Solaris had aligned_alloc
and/or posix_memalign?


in/out operands and auto-inc-dec

2018-06-04 Thread Paul Koning
The internals manual in its description of the "matching constraint" says that 
it works for cases where the in and out operands are somewhat different, such 
as *p++ vs. *p.  Obviously that is meant to cover post_inc side effects.

The curious thing is that auto-inc-dec.c specifically avoids doing this: if it 
finds what looks like a suitable candidate for auto-inc or auto-dec 
optimization but that operand occurs more than once in the insn, it doesn't 
make the change.  The result is code that's both larger and slower for machines 
that have post_inc etc. addressing modes.  The gccint documentation suggests 
that it was the intent to optimize this case, so I wonder why it is avoided.

paul



Re: in/out operands and auto-inc-dec

2018-06-04 Thread Jeff Law
On 06/04/2018 07:31 AM, Paul Koning wrote:
> The internals manual in its description of the "matching constraint" says 
> that it works for cases where the in and out operands are somewhat different, 
> such as *p++ vs. *p.  Obviously that is meant to cover post_inc side effects.
> 
> The curious thing is that auto-inc-dec.c specifically avoids doing this: if 
> it finds what looks like a suitable candidate for auto-inc or auto-dec 
> optimization but that operand occurs more than once in the insn, it doesn't 
> make the change.  The result is code that's both larger and slower for 
> machines that have post_inc etc. addressing modes.  The gccint documentation 
> suggests that it was the intent to optimize this case, so I wonder why it is 
> avoided.
I wouldn't be terribly surprised if the old flow.c based auto-inc
discovery handled this, but the newer auto-inc-dec.c doesn't.  The docs
were probably written prior to the conversion.

jeff


Re: in/out operands and auto-inc-dec

2018-06-04 Thread Paul Koning



> On Jun 4, 2018, at 9:51 AM, Jeff Law  wrote:
> 
> On 06/04/2018 07:31 AM, Paul Koning wrote:
>> The internals manual in its description of the "matching constraint" says 
>> that it works for cases where the in and out operands are somewhat 
>> different, such as *p++ vs. *p.  Obviously that is meant to cover post_inc 
>> side effects.
>> 
>> The curious thing is that auto-inc-dec.c specifically avoids doing this: if 
>> it finds what looks like a suitable candidate for auto-inc or auto-dec 
>> optimization but that operand occurs more than once in the insn, it doesn't 
>> make the change.  The result is code that's both larger and slower for 
>> machines that have post_inc etc. addressing modes.  The gccint documentation 
>> suggests that it was the intent to optimize this case, so I wonder why it is 
>> avoided.
> I wouldn't be terribly surprised if the old flow.c based auto-inc
> discovery handled this, but the newer auto-inc-dec.c doesn't.  The docs
> were probably written prior to the conversion.

That fits, because there is a reference to "the flow pass of the compiler" when 
these constructs are introduced in section 14.16.

So is this an omission, or is there a reason why that optimization was removed? 
 

paul




Re: in/out operands and auto-inc-dec

2018-06-04 Thread Jeff Law
On 06/04/2018 08:06 AM, Paul Koning wrote:
> 
> 
>> On Jun 4, 2018, at 9:51 AM, Jeff Law  wrote:
>>
>> On 06/04/2018 07:31 AM, Paul Koning wrote:
>>> The internals manual in its description of the "matching constraint" says 
>>> that it works for cases where the in and out operands are somewhat 
>>> different, such as *p++ vs. *p.  Obviously that is meant to cover post_inc 
>>> side effects.
>>>
>>> The curious thing is that auto-inc-dec.c specifically avoids doing this: if 
>>> it finds what looks like a suitable candidate for auto-inc or auto-dec 
>>> optimization but that operand occurs more than once in the insn, it doesn't 
>>> make the change.  The result is code that's both larger and slower for 
>>> machines that have post_inc etc. addressing modes.  The gccint 
>>> documentation suggests that it was the intent to optimize this case, so I 
>>> wonder why it is avoided.
>> I wouldn't be terribly surprised if the old flow.c based auto-inc
>> discovery handled this, but the newer auto-inc-dec.c doesn't.  The docs
>> were probably written prior to the conversion.
> 
> That fits, because there is a reference to "the flow pass of the compiler" 
> when these constructs are introduced in section 14.16.
> 
> So is this an omission, or is there a reason why that optimization was 
> removed?  
I would guess omission, probably on the assumption it wasn't terribly
important and there wasn't really a good way to test it.There aren't
many targets that use auto-inc getting a lot of attention these days,
and those that do can't have multiple memory operands.

Jeff


Re: Solaris issues

2018-06-04 Thread Rainer Orth
Hi Paul,

> I’ve been having 2 issues with GCC head on Solaris. Firstly. The build is 
> currently broken
>
> gmake[3]: Leaving directory `/export/home/paulf/scratch/gcc/build'
> Comparing stages 2 and 3
> warning: gcc/cc1obj-checksum.o differs
> Bootstrap comparison failure!
> i386-pc-solaris2.11/amd64/libgcc/avx_resms64_s.o differs
> i386-pc-solaris2.11/amd64/libgcc/sse_resms64_s.o differs
> i386-pc-solaris2.11/amd64/libgcc/sse_resms64.o differs
> [more snipped]
>
> Is there any workaround for this

this is PR target/85994.  For the moment, you can just touch compare and
resume the build.  Trivial patch forthcoming...

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Solaris issues

2018-06-04 Thread Rainer Orth
Hi Jonathan,

> On 2 June 2018 at 11:29, Paul Floyd  wrote:
>> Secondly I’ve been doing some work on adding support for C++14 and C++17
>> sized/aligned new and delete operators.
>
> Aren't they already supported? I thought Solaris had aligned_alloc
> and/or posix_memalign?

posix_memalign had already been added in Solaris 11.0, while
aligned_alloc followed in Solaris 11.4.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: in/out operands and auto-inc-dec

2018-06-04 Thread Paul Koning



> On Jun 4, 2018, at 10:09 AM, Jeff Law  wrote:
> 
> On 06/04/2018 08:06 AM, Paul Koning wrote:
>> 
>> 
>>> On Jun 4, 2018, at 9:51 AM, Jeff Law  wrote:
>>> 
>>> On 06/04/2018 07:31 AM, Paul Koning wrote:
 The internals manual in its description of the "matching constraint" says 
 that it works for cases where the in and out operands are somewhat 
 different, such as *p++ vs. *p.  Obviously that is meant to cover post_inc 
 side effects.
 
 The curious thing is that auto-inc-dec.c specifically avoids doing this: 
 if it finds what looks like a suitable candidate for auto-inc or auto-dec 
 optimization but that operand occurs more than once in the insn, it 
 doesn't make the change.  The result is code that's both larger and slower 
 for machines that have post_inc etc. addressing modes.  The gccint 
 documentation suggests that it was the intent to optimize this case, so I 
 wonder why it is avoided.
>>> I wouldn't be terribly surprised if the old flow.c based auto-inc
>>> discovery handled this, but the newer auto-inc-dec.c doesn't.  The docs
>>> were probably written prior to the conversion.
>> 
>> That fits, because there is a reference to "the flow pass of the compiler" 
>> when these constructs are introduced in section 14.16.
>> 
>> So is this an omission, or is there a reason why that optimization was 
>> removed?  
> I would guess omission, probably on the assumption it wasn't terribly
> important and there wasn't really a good way to test it.There aren't
> many targets that use auto-inc getting a lot of attention these days,
> and those that do can't have multiple memory operands.

By "multiple memory operands" do you mean both source and dest in memory?  Ok, 
but I didn't mean that specifically.  The issue is on an instruction with a 
read/modify/write destination operand, like two-operand add.  If the 
destination looks like a candidate for post-inc, it's skipped because it shows 
up twice in the RTL -- since that uses three operand notation.

For example:

for (int i = 0; i < n; i++)
*p++ += i;

produces (on pdp11):
add $02, r0
add r1, -02(r0)

rather than simply "add r1, (r0)+".  But if I change the += to =, the expected 
optimization does take place.

paul



Re: Project Ranger

2018-06-04 Thread Richard Biener
On Fri, Jun 1, 2018 at 10:38 PM Andrew MacLeod  wrote:
>
> On 06/01/2018 05:48 AM, Richard Biener wrote:
>
> On Wed, May 30, 2018 at 1:53 AM Andrew MacLeod  wrote:

bah, gmail now completely mangles quoting :/

>
> This allows queries for a range on an edge, on entry to a block, as an
> operand on an specific statement, or to calculate the range of the
> result of a statement.  There are no prerequisites to use it, simply
> create a path ranger and start using the API.   There is even an
> available function which can be lightly called and doesn't require
> knowledge of the ranger:
>
> So what the wiki pages do not show is how contextual information
> is handled (and how that's done w/o dominators as the wiki page
> claims).  That is, path_range_on_edge suggests that range information
> provided by the (all) controlling stmts of that edge are used when
> determining the range for 'name'.
>
> the wiki doesn't talk about it, but the document itself goes into great 
> detail of how the dynamic process works.
>
> That's probably true for the 'stmt' variant as well.
>
> This leads me to guess that either the cache has to be
> O(number of SSA uses) size or a simple VRP pass implemented
> using Ranger querying ranges of all SSA ops for each stmt
> will require O(number of SSA uses) times analysis?
>
> The current cache is not particularly efficient, its size is  #ssa_name * 
> #BB, assuming you actually go and look at every basic block, which many 
> passes do not.  It also only puts a range in there when it has to.Im not 
> sure why it really matters unless we find a pathological case that kills 
> compile time which cannot be addressed.  This would all fall under 
> performance analysis and tweaking.  Its the end performance that matters.
>
> When a  range request is made for NAME in a block, it requests a range on 
> each incoming edge, and then unions those ranges, and caches it. This is the 
> range-on-entry cache.   The next time a range request for  NAME is made for 
> on-entry to that block, it simply picks it up from the cache.
>
> The requests for range on an edge uses the gori_map summary to decide whether 
> that block creates a range for NAME or not.
> If the block does not generate a range, it simply requests the range-on-entry 
> to that block.
> If it does generate a range, then it examines the statements defining the 
> condition, gets ranges for those,  and calculates the range that would be 
> generated on that edge.  It also requests ranges for any of those names in 
> the defining chain which originate outside this block.
>
> So it queries all the names which feed other names and so on. The queries end 
> when a statement is found that doesn't generate a useful range. Sometimes it 
> has to go hunt quite a way, but usually they are nearby.  And caching the 
> range-on-entry values prevents the lookups from doing the same work over and 
> over.
>
> Once a range has been calculated, we don't spend much more time calculating 
> it again.   Any other ssa-name which uses that name also doesnt need to 
> recalculate it.  For the most part, we walk the def chains for any given 
> ssa-name and its feeding names once and future requests pretty much come from 
> the cache.
>
> So yes, there is also a lot of recursion in here at the moment. calls for 
> ranges spawning other calls before they can be resolved.  I suspect sometimes 
> this call chain is deep. I don't know, we haven't performed any analysis on 
> it yet.
>
> One of the advantages of a DOM-style or SSA-with-ASSERT_EXPR
> pass is that the number of evaluations and the size of the cache
> isn't that way "unbound".  On the WIKI page you mention edge
> info isn't cached yet - whatever that means ;)
>
> So I wonder what's the speed for doing
>
> The on-entry caches prevent it from being "unbound". . and the intersection 
> of incoming edge calculation performs the equivalent merging of ranges that 
> DOM based processing give you.
>
> if (x > 10)
>A
> else
>B
> C
>
> block A has x [11, MAX]
> block B has x [MIN, 10]
> block C, at the bottom of a diamond, the 2 incoming edges union those 2 
> ranges back to [MIN, MAX] and we know nothing about the range of X again.  
> Accomplished the same thing dominators would tell you.
>
> The edge info that "isnt cached yet", (and may not ever be), is simply the 
> 'static' info calculated by evaluating the few statements at the bottom of 
> the block feeding the condition.  any values coming into the block are 
> cached.I this case its simply looking at x > 10 and calculating the range 
> on whichever edge.One of the original design elements called for this to 
> be cached, but in practice I don't think it needs to be.  But we haven't 
> analyzed the performance yet.. so it can only get quicker :-)
>
>
>
>
>   FOR_EACH_BB_FN (bb)
>  FOR_EACH_STMT (stmt)
>FOR_EACH_SSA_USE (stmt)
>path_range_on_stmt (..., SSA-USE, stmt);
>
> assuming that it takes into account contr

Re: Project Ranger

2018-06-04 Thread Richard Biener
On Mon, Jun 4, 2018 at 5:17 PM Richard Biener
 wrote:>
> On Fri, Jun 1, 2018 at 10:38 PM Andrew MacLeod  wrote:
> >
> > On 06/01/2018 05:48 AM, Richard Biener wrote:
> >
> > On Wed, May 30, 2018 at 1:53 AM Andrew MacLeod  wrote:
>
> bah, gmail now completely mangles quoting :/

Ah no, you sent a multipart mail and it showed the html variant.  Replying in
plain-text mode messes things up then.  And your mail likely got rejected
by the list anyway.

> >
> > This allows queries for a range on an edge, on entry to a block, as an
> > operand on an specific statement, or to calculate the range of the
> > result of a statement.  There are no prerequisites to use it, simply
> > create a path ranger and start using the API.   There is even an
> > available function which can be lightly called and doesn't require
> > knowledge of the ranger:
> >
> > So what the wiki pages do not show is how contextual information
> > is handled (and how that's done w/o dominators as the wiki page
> > claims).  That is, path_range_on_edge suggests that range information
> > provided by the (all) controlling stmts of that edge are used when
> > determining the range for 'name'.
> >
> > the wiki doesn't talk about it, but the document itself goes into great 
> > detail of how the dynamic process works.
> >
> > That's probably true for the 'stmt' variant as well.
> >
> > This leads me to guess that either the cache has to be
> > O(number of SSA uses) size or a simple VRP pass implemented
> > using Ranger querying ranges of all SSA ops for each stmt
> > will require O(number of SSA uses) times analysis?
> >
> > The current cache is not particularly efficient, its size is  #ssa_name * 
> > #BB, assuming you actually go and look at every basic block, which many 
> > passes do not.  It also only puts a range in there when it has to.Im 
> > not sure why it really matters unless we find a pathological case that 
> > kills compile time which cannot be addressed.  This would all fall under 
> > performance analysis and tweaking.  Its the end performance that matters.
> >
> > When a  range request is made for NAME in a block, it requests a range on 
> > each incoming edge, and then unions those ranges, and caches it. This is 
> > the range-on-entry cache.   The next time a range request for  NAME is made 
> > for on-entry to that block, it simply picks it up from the cache.
> >
> > The requests for range on an edge uses the gori_map summary to decide 
> > whether that block creates a range for NAME or not.
> > If the block does not generate a range, it simply requests the 
> > range-on-entry to that block.
> > If it does generate a range, then it examines the statements defining the 
> > condition, gets ranges for those,  and calculates the range that would be 
> > generated on that edge.  It also requests ranges for any of those names in 
> > the defining chain which originate outside this block.
> >
> > So it queries all the names which feed other names and so on. The queries 
> > end when a statement is found that doesn't generate a useful range. 
> > Sometimes it has to go hunt quite a way, but usually they are nearby.  And 
> > caching the range-on-entry values prevents the lookups from doing the same 
> > work over and over.
> >
> > Once a range has been calculated, we don't spend much more time calculating 
> > it again.   Any other ssa-name which uses that name also doesnt need to 
> > recalculate it.  For the most part, we walk the def chains for any given 
> > ssa-name and its feeding names once and future requests pretty much come 
> > from the cache.
> >
> > So yes, there is also a lot of recursion in here at the moment. calls for 
> > ranges spawning other calls before they can be resolved.  I suspect 
> > sometimes this call chain is deep. I don't know, we haven't performed any 
> > analysis on it yet.
> >
> > One of the advantages of a DOM-style or SSA-with-ASSERT_EXPR
> > pass is that the number of evaluations and the size of the cache
> > isn't that way "unbound".  On the WIKI page you mention edge
> > info isn't cached yet - whatever that means ;)
> >
> > So I wonder what's the speed for doing
> >
> > The on-entry caches prevent it from being "unbound". . and the intersection 
> > of incoming edge calculation performs the equivalent merging of ranges that 
> > DOM based processing give you.
> >
> > if (x > 10)
> >A
> > else
> >B
> > C
> >
> > block A has x [11, MAX]
> > block B has x [MIN, 10]
> > block C, at the bottom of a diamond, the 2 incoming edges union those 2 
> > ranges back to [MIN, MAX] and we know nothing about the range of X again.  
> > Accomplished the same thing dominators would tell you.
> >
> > The edge info that "isnt cached yet", (and may not ever be), is simply the 
> > 'static' info calculated by evaluating the few statements at the bottom of 
> > the block feeding the condition.  any values coming into the block are 
> > cached.I this case its simply looking at x > 10 and calculating the 

For which gcc release is going to be foreseen the support for the Coroutines TS extension?

2018-06-04 Thread Marco Ippolito
Hi all,

clang and VS2017 already support the Coroutines TS extensions.
For which gcc release is going to be foreseen the support for the
Coroutines TS extension?

Looking forward to your kind feedback about this extremely important aspect
of gcc.

Marco


Re: For which gcc release is going to be foreseen the support for the Coroutines TS extension?

2018-06-04 Thread Jonathan Wakely
On 4 June 2018 at 18:32, Marco Ippolito wrote:
> Hi all,
>
> clang and VS2017 already support the Coroutines TS extensions.
> For which gcc release is going to be foreseen the support for the
> Coroutines TS extension?

This has been discussed recently, search the mailing list.

It will be supported after somebody implements it.


code-gen options for disabling multi-operand AArch64 and ARM instructions

2018-06-04 Thread Laszlo Ersek
Hi!

Apologies if this isn't the right place for asking. For the problem
statement, I'll simply steal Ard's writeup [1]:

> KVM on ARM refuses to decode load/store instructions used to perform
> I/O to emulated devices, and instead relies on the exception syndrome
> information to describe the operand register, access size, etc. This
> is only possible for instructions that have a single input/output
> register (as opposed to ones that increment the offset register, or
> load/store pair instructions, etc). Otherwise, QEMU crashes with the
> following error
>
>   error: kvm run failed Function not implemented
>   [...]
>   QEMU: Terminated
>
> and KVM produces a warning such as the following in the kernel log
>
>   kvm [17646]: load/store instruction decoding not implemented
>
> GCC with LTO enabled will emit such instructions for Mmio[Read|Write]
> invocations performed in a loop, so we need to disable LTO [...]

We have a Red Hat Bugzilla about the (very likely) same issue [2].

Earlier, we had to work around the same on AArch64 too [3].

Would it be possible to introduce a dedicated -mXXX option, for ARM and
AArch64, that disabled the generation of such multi-operand
instructions?

I note there are several similar instructions (for other architectures):
* -mno-multiple (ppc)
* -mno-fused-madd (ia64)
* -mno-mmx and a lot of friends (x86)

Obviously, if the feature request is deemed justified, we should provide
the exact family of instructions to disable. I'll leave that to others
on the CC list with more ARM/AArch64 expertise; I just wanted to get
this thread started. (Sorry if the option is already being requested
elsewhere; I admit I didn't search the GCC bugzilla.)

Thanks!
Laszlo

[1] https://lists.01.org/pipermail/edk2-devel/2018-June/025476.html
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1576593
[3] https://github.com/tianocore/edk2/commit/2efbf710e27a


Re: [GSOC] LTO dump tool project

2018-06-04 Thread Hrishikesh Kulkarni
Hi,

-fdump-lto-list will dump all the symbol list
-fdump-lto-list -demangle will dump all the list with symbol names demangled
-fdump-lto-symbol=foo will dump details of foo

The output(demangled) will be in tabular form like nm:
Symbol Table
Name Type Visibility
  printffunction default
   kvariable default
mainfunction default
 barfunction default
 foofunction default

I have tried to format the changes according to gnu coding style and
added required methods in symtab_node.

Please find the diff file attached.

Regards,
Hrishikesh

On Mon, Jun 4, 2018 at 2:06 PM, Martin Liška  wrote:
> On 06/01/2018 08:59 PM, Hrishikesh Kulkarni wrote:
>> Hi,
>> I have pushed the changes to github
>> (https://github.com/hrisearch/gcc). Added a command line option for
>> specific dumps of variables and functions used in IL e.g.
>> -fdump-lto-list=foo will dump:
>> Call Graph:
>>
>> foo/1 (foo)
>>   Type: function
>>  visibility: default
>
> Hi.
>
> Thanks for the next step. I've got some comments about it:
>
> - -fdump-lto-list=foo is wrong option name, I would use -fdump-lto-symbol
>   or something similar.
>
> - for -fdump-lto-list I would really prefer to use a format similar to nm:
>   print a header with column description and then one line for a symbol
>
> - think about mangling/demangling of C++ symbols, you can take a look at
> nm it also has --demangle, --no-demangle
>
> - please learn & try to use an autoformat for your editor in order to
>   fulfill GNU coding style. Following checker will help you:
>
> $ ./contrib/check_GNU_style.py /tmp/p
> === ERROR type #1: dot, space, space, end of comment (6 error(s)) ===
> gcc/lto/lto-dump.c:38:17:/*Dump everything*/
> gcc/lto/lto-dump.c:44:41:/*Dump variables and functions used in IL*/
> gcc/lto/lto-dump.c:73:50:/*Dump specific variables and functions used in IL*/
> gcc/lto/lto.c:3364:19:  /*Dump everything*/
> gcc/lto/lto.c:3368:43:  /*Dump variables and functions used in IL*/
> gcc/lto/lto.c:3372:52:  /*Dump specific variables and functions used in IL*/
>
> === ERROR type #2: lines should not exceed 80 characters (11 error(s)) ===
> gcc/lto/lto-dump.c:51:80:static const char * const 
> symtab_type_names[] = {"symbol", "function", "variable"};
> gcc/lto/lto-dump.c:56:80:fprintf (stderr, "\n%s (%s)", 
> cnode->dump_asm_name (), cnode->name ());
> gcc/lto/lto-dump.c:57:80:fprintf (stderr, "\n  Type: %s", 
> symtab_type_names[cnode->type]);
> gcc/lto/lto-dump.c:66:80:fprintf (stderr, "\n%s (%s)", 
> vnode->dump_asm_name (), vnode->name ());
> gcc/lto/lto-dump.c:67:80:fprintf (stderr, "\n  Type: %s", 
> symtab_type_names[vnode->type]);
> gcc/lto/lto-dump.c:80:80:static const char * const 
> symtab_type_names[] = {"symbol", "function", "variable"};
> gcc/lto/lto-dump.c:87:80:fprintf (stderr, "\n%s 
> (%s)", cnode->dump_asm_name (), cnode->name ());
> gcc/lto/lto-dump.c:88:80:fprintf (stderr, "\n  Type: 
> %s", symtab_type_names[cnode->type]);
> gcc/lto/lto-dump.c:99:80:fprintf (stderr, "\n%s 
> (%s)", vnode->dump_asm_name (), vnode->name ());
> gcc/lto/lto-dump.c:100:80:fprintf (stderr, "\n  Type: 
> %s", symtab_type_names[vnode->type]);
> gcc/lto/Make-lang.in:25:80:LTO_OBJS = lto/lto-lang.o lto/lto.o 
> lto/lto-object.o attribs.o lto/lto-partition.o lto/lto-symtab.o lto/lto-dump.o
>
> === ERROR type #3: there should be exactly one space between function name 
> and parenthesis (15 error(s)) ===
> gcc/lto/lto-dump.c:39:9:void dump()
> gcc/lto/lto-dump.c:41:8:fprintf(stderr, "\nHello World!\n");
> gcc/lto/lto-dump.c:45:14:void dump_list()
> gcc/lto/lto-dump.c:74:15:void dump_list2()
> gcc/lto/lto-dump.c:85:13:   if (!strcmp(flag_lto_dump_list2, 
> cnode->name()))
> gcc/lto/lto-dump.c:97:16:   if (!strcmp(flag_lto_dump_list2, 
> vnode->name()))
> gcc/lto/lto-dump.h:23:9:void dump();
> gcc/lto/lto-dump.h:24:14:void dump_list();
> gcc/lto/lto-dump.h:25:15:void dump_list2();
> gcc/lto/lang.opt:67:7:LTO Var(flag_lto_dump)
> gcc/lto/lang.opt:71:7:LTO Var(flag_lto_dump_list)
> gcc/lto/lang.opt:75:36:LTO Driver RejectNegative Joined 
> Var(flag_lto_dump_list2)
> gcc/lto/lto.c:3366:8:dump();
> gcc/lto/lto.c:3370:13:dump_list();
> gcc/lto/lto.c:3374:14:dump_list2();
>
> === ERROR type #4: there should be no space before a left square bracket (4 
> error(s)) ===
> gcc/lto/lto-dump.c:59:19:   visibility_types [DECL_VISIBILITY 
> (cnode->decl)]);
> gcc/lto/lto-dump.c:69:19:   visibility_types [DECL_VISIBILITY 
> (vnode->decl)]);
> gcc/lto/lto-dump.c:90:20:   visibility_types 
> [DECL_VISIBILITY (cnode->decl)]);
> gcc/l

Martin Liska appointed GCOV co-maintainer

2018-06-04 Thread David Edelsohn
I am pleased to announce that the GCC Steering Committee has
appointed Martin Liska as GCOV co-maintainer.

Please join me in congratulating Martin on his new role.
Martin, please update your listing in the MAINTAINERS file.

Happy hacking!
David



Re: Solaris issues

2018-06-04 Thread Paul Floyd



> On 4 Jun 2018, at 16:50, Rainer Orth  wrote:
> 
> Hi Jonathan,
> 
>> On 2 June 2018 at 11:29, Paul Floyd  wrote:
>>> Secondly I’ve been doing some work on adding support for C++14 and C++17
>>> sized/aligned new and delete operators.
>> 
>> Aren't they already supported? I thought Solaris had aligned_alloc
>> and/or posix_memalign?
> 
> posix_memalign had already been added in Solaris 11.0, while
> aligned_alloc followed in Solaris 11.4.
> 

That’s nothing to do with my problem.

I was only talking about adding support to aligned new/delete on Valgrind (here 
is the bugzilla entry I created if you are interested 
https://bugs.kde.org/show_bug.cgi?id=388787 
)

On Solaris (I’m using 11.3) Valgrind is having problems because of the large 
number of .rodate sections. I haven’t seen this problem on Linux.

Here again are the details:


Currently Valgrind can’t cope with the .rodata sections that get generated (see 
bugzilla https://bugs.kde.org/show_bug.cgi?id=390871 
>)

Here’s an extract from the report

--18142-- WARNING: Serious error when reading debug info
--18142-- When reading debug info from 
/export/home/paulf/tools/gcc/lib/libstdc++.so.6.0.25:
--18142-- Can't make sense of .rodata section mapping

The cause appears to be hundreds of Elf32_Shdr (also Elf64_Shdr) with names 
.rodata and/or .rodata..  Sample:
Section Headers:
 [Nr] Name  TypeAddr OffSize   ES Flg Lk Inf Al
 [15] .rodata._ZNKSt10l PROGBITS00093a88 093a88 10 01 AMS  0   0  1
 [16] .rodata._ZNKSt12_ PROGBITS00093a98 093a98 08 01 AMS  0   0  1
 [17] .rodata._ZNKSt12_ PROGBITS00093aa0 093aa0 07 01 AMS  0   0  1
 [18] .rodata   PROGBITS00093ac0 093ac0 003504 00   A  0   0 32
 [19] .rodata._ZTSNSt12 PROGBITS00096fe0 096fe0 2c 00   A  0   0 32
 [20] .rodata._ZTSNSt12 PROGBITS00097020 097020 2b 00   A  0   0 32
 [21] .rodata._ZNSt6chr PROGBITS0009704b 09704b 01 00   A  0   0  1
 [22] .rodata._ZNSt6chr PROGBITS0009704c 09704c 01 00   A  0   0  1
 [23] .rodata._ZNKSt9ba PROGBITS0009704d 09704d 0f 01 AMS  0   0  1
 [24] .rodata._ZNKSt16b PROGBITS0009705c 09705c 16 01 AMS  0   0  1
 [25] .rodata._ZNKSt20b PROGBITS00097072 097072 1a 01 AMS  0   0  1
 [26] .rodata._ZNKSt8ba PROGBITS0009708c 09708c 0e 01 AMS  0   0  1
 [27] .rodata._ZNKSt10b PROGBITS0009709a 09709a 10 01 AMS  0   0  1
 [28] .rodata   PROGBITS000970ac 0970ac 0002b4 01 AMS  0   0  4
 [29] .rodata._ZNKSt9ex PROGBITS00097360 097360 0f 01 AMS  0   0  1
and there are 639 more .rodata* Sections in that one file.

Inspection shows that the .rodata* are adjacent after considering alignment.  
Therefore a workaround might be for the debuginfo reader to aggregate them all 
into a single internal .rodata section.  A simple proposed patch is:
https://bugsfiles.kde.org/attachment.cgi?id=110638 
 
>
The above patch does fix the problem.
Any idea why so many sections are getting generated? (The test case is 
trivially small)


A+
Paul

Re: Martin Liska appointed GCOV co-maintainer

2018-06-04 Thread Martin Liška

On 06/04/2018 08:22 PM, David Edelsohn wrote:

I am pleased to announce that the GCC Steering Committee has
appointed Martin Liska as GCOV co-maintainer.


Hello.

I'm pleased to become co-maintainer of the GCOV infrastructure.
Thank you for the trust.



Please join me in congratulating Martin on his new role.
Martin, please update your listing in the MAINTAINERS file.


I've just installed the patch.

Thanks,
Martin



Happy hacking!
David





Re: [GSOC] LTO dump tool project

2018-06-04 Thread Martin Liška

On 06/04/2018 08:13 PM, Hrishikesh Kulkarni wrote:

Hi,

-fdump-lto-list will dump all the symbol list


I see extra new lines in the output:

$ lto1 -fdump-lto-list main.o
[..snip..]
Symbol Table
NameTypeVisibility

   fwrite/15function default

  foo/14function default

 mystring/12variable default

 pole/11variable default

 main/13function default


-fdump-lto-list -demangle will dump all the list with symbol names demangled


Good for now. Note that non-demagle version prints function names with order 
(/$number).
I would not print that.


-fdump-lto-symbol=foo will dump details of foo


I would really prefer to use symtab_node::debug for now. It presents all 
details about
symbol instead of current implementation which does: '-fdump-lto-list | grep 
foo'



The output(demangled) will be in tabular form like nm:
Symbol Table
 Name Type Visibility
   printffunction default
kvariable default
 mainfunction default
  barfunction default
  foofunction default

I have tried to format the changes according to gnu coding style and
added required methods in symtab_node.


That's nice that you came up with new symbol_node methods. It's much better.
About the GNU coding style, I still see trailing whitespace:

=== ERROR type #3: there should be no space before a left square bracket (1 
error(s)) ===
gcc/symtab.c:818:26:  return visibility_types [DECL_VISIBILITY (decl)];

=== ERROR type #4: trailing whitespace (6 error(s)) ===
gcc/lto/lto-dump.c:40:4:void█
gcc/lto/lto-dump.c:45:0:█
gcc/lto/lto-dump.c:56:52:   fprintf (stderr, 
"\n%20s",(flag_lto_dump_demangle)█
gcc/lto/lto-dump.c:73:53:   fprintf (stderr, 
"\n%20s",(flag_lto_dump_demangle)█
gcc/lto/lto-dump.c:78:2:}█
gcc/lto/lto-dump.c:79:1:}█

Martin



Please find the diff file attached.

Regards,
Hrishikesh

On Mon, Jun 4, 2018 at 2:06 PM, Martin Liška  wrote:

On 06/01/2018 08:59 PM, Hrishikesh Kulkarni wrote:

Hi,
I have pushed the changes to github
(https://github.com/hrisearch/gcc). Added a command line option for
specific dumps of variables and functions used in IL e.g.
-fdump-lto-list=foo will dump:
Call Graph:

foo/1 (foo)
   Type: function
  visibility: default


Hi.

Thanks for the next step. I've got some comments about it:

- -fdump-lto-list=foo is wrong option name, I would use -fdump-lto-symbol
   or something similar.

- for -fdump-lto-list I would really prefer to use a format similar to nm:
   print a header with column description and then one line for a symbol

- think about mangling/demangling of C++ symbols, you can take a look at
nm it also has --demangle, --no-demangle

- please learn & try to use an autoformat for your editor in order to
   fulfill GNU coding style. Following checker will help you:

$ ./contrib/check_GNU_style.py /tmp/p
=== ERROR type #1: dot, space, space, end of comment (6 error(s)) ===
gcc/lto/lto-dump.c:38:17:/*Dump everything*/
gcc/lto/lto-dump.c:44:41:/*Dump variables and functions used in IL*/
gcc/lto/lto-dump.c:73:50:/*Dump specific variables and functions used in IL*/
gcc/lto/lto.c:3364:19:  /*Dump everything*/
gcc/lto/lto.c:3368:43:  /*Dump variables and functions used in IL*/
gcc/lto/lto.c:3372:52:  /*Dump specific variables and functions used in IL*/

=== ERROR type #2: lines should not exceed 80 characters (11 error(s)) ===
gcc/lto/lto-dump.c:51:80:static const char * const symtab_type_names[] = {"symbol", 
"function", "variable"};
gcc/lto/lto-dump.c:56:80:fprintf (stderr, "\n%s (%s)", 
cnode->dump_asm_name (), cnode->name ());
gcc/lto/lto-dump.c:57:80:fprintf (stderr, "\n  Type: %s", 
symtab_type_names[cnode->type]);
gcc/lto/lto-dump.c:66:80:fprintf (stderr, "\n%s (%s)", 
vnode->dump_asm_name (), vnode->name ());
gcc/lto/lto-dump.c:67:80:fprintf (stderr, "\n  Type: %s", 
symtab_type_names[vnode->type]);
gcc/lto/lto-dump.c:80:80:static const char * const symtab_type_names[] = {"symbol", 
"function", "variable"};
gcc/lto/lto-dump.c:87:80:fprintf (stderr, "\n%s (%s)", 
cnode->dump_asm_name (), cnode->name ());
gcc/lto/lto-dump.c:88:80:fprintf (stderr, "\n  Type: %s", 
symtab_type_names[cnode->type]);
gcc/lto/lto-dump.c:99:80:fprintf (stderr, "\n%s (%s)", 
vnode->dump_asm_name (), vnode->name ());
gcc/lto/lto-dump.c:100:80:fprintf (stderr, "\n  Type: %s", 
symtab_type_names[vnode->type]);
gcc/lto/Make-lang.in:25:80:LTO_OBJS = lto/lto-lang.o lto/lto.o lto/lto-object.o 
at

Re: Solaris issues

2018-06-04 Thread paulf
- Original Message -
> Hi Paul,
> 

[Build issue on Solaris]

> 
> this is PR target/85994.  For the moment, you can just touch compare
> and
> resume the build.  Trivial patch forthcoming...

Hi Rainer

This worked a treat, thanks.

A+
Paul


__builtin_isnormal question

2018-06-04 Thread Steve Ellcey
Is there a bug in __builtin_isnormal or am I just confused as to what it
means?  There doesn't seem to be any actual definition/documentation for
the function.  __builtin_isnormal(0.0) is returning false.  That seems
wrong to me, 0.0 is a normal (as opposed to a denormalized) number isn't
it?  Or is zero special?

Steve Ellcey
sell...@cavium.com

#include 
#include 
#include 
int main()
{
double x;
x = 0.0;
printf("%e %e %e\n", x, DBL_MIN, DBL_MAX);
printf("normal is %s\n", __builtin_isnormal(x) ? "TRUE" : "FALSE");
x = 1.0;
printf("%e %e %e\n", x, DBL_MIN, DBL_MAX);
printf("normal is %s\n", __builtin_isnormal(x) ? "TRUE" : "FALSE");
return 0;
}

% gcc x.c -o x
% ./x
0.00e+00 2.225074e-308 1.797693e+308
normal is FALSE
1.00e+00 2.225074e-308 1.797693e+308
normal is TRUE


Re: __builtin_isnormal question

2018-06-04 Thread Andrew Pinski
On Mon, Jun 4, 2018 at 1:44 PM Steve Ellcey  wrote:
>
> Is there a bug in __builtin_isnormal or am I just confused as to what it
> means?  There doesn't seem to be any actual definition/documentation for
> the function.  __builtin_isnormal(0.0) is returning false.  That seems
> wrong to me, 0.0 is a normal (as opposed to a denormalized) number isn't
> it?  Or is zero special?

I think it is just the builtin version of the standard C/C++ function
isnormal which has the following definition:
Determines if the given floating point number arg is normal, i.e. is
neither zero, subnormal, infinite, nor NaN. The macro returns an
integral value.
--- CUT ---
Which means zero is special :).


Thanks,
Andrew

>
> Steve Ellcey
> sell...@cavium.com
>
> #include 
> #include 
> #include 
> int main()
> {
> double x;
> x = 0.0;
> printf("%e %e %e\n", x, DBL_MIN, DBL_MAX);
> printf("normal is %s\n", __builtin_isnormal(x) ? "TRUE" : "FALSE");
> x = 1.0;
> printf("%e %e %e\n", x, DBL_MIN, DBL_MAX);
> printf("normal is %s\n", __builtin_isnormal(x) ? "TRUE" : "FALSE");
> return 0;
> }
>
> % gcc x.c -o x
> % ./x
> 0.00e+00 2.225074e-308 1.797693e+308
> normal is FALSE
> 1.00e+00 2.225074e-308 1.797693e+308
> normal is TRUE


aliasing between internal zero-length-arrays and other members

2018-06-04 Thread Martin Sebor

GCC silently (without -Wpedantic) accepts declarations of zero
length arrays that are followed by other members in the same
struct, such as in:

  struct A { char a, b[0], c; };

Is it intended that accesses to elements of such arrays that
alias other members be well-defined?

In my tests, GCC assumes that neither read nor write accesses
to elements of internal zero-length arrays alias other members,
so assuming those aren't bugs I wonder if the documentation
should be updated to make that clear and a warning added for
such declarations (and perhaps also accesses).

For example, the test in the following function is eliminated,
implying that GCC assumes that the access to p->b does not modify
p->c, even though with i set to 0 it would:

  void f (struct A *p, int i)
  {
int x = p->c;
p->b[i] = 1;
if (x != p->c)
  __builtin_abort ();
  }

Martin


RE: RISC-V ELF multilibs

2018-06-04 Thread Palmer Dabbelt

On Thu, 31 May 2018 07:23:22 PDT (-0700), matthew.fort...@mips.com wrote:

Palmer Dabbelt  writes:

On Tue, 29 May 2018 11:02:58 PDT (-0700), Jim Wilson wrote:
> On 05/26/2018 06:04 AM, Sebastian Huber wrote:
>> Why is the default multilib and a variant identical?
>
> This is supposed to be a single multilib, with two names.  We use
> MULTILIB_REUSE to map the two names to a single multilib.
>
> rohan:1030$ ./xgcc -B./ -march=rv64imafdc -mabi=lp64d --print-libgcc
> ./rv64imafdc/lp64d/libgcc.a
> rohan:1031$ ./xgcc -B./ -march=rv64gc -mabi=lp64d --print-libgcc
> ./rv64imafdc/lp64d/libgcc.a
> rohan:1032$ ./xgcc -B./ --print-libgcc
> ./libgcc.a
> rohan:1033$
>
> So this is working right when the -march option is given, but not
when
> no -march is given.  I'd suggest a bug report so I can track this, if
> you haven't already filed one.

IIRC this is actually a limit of the GCC build system: there needs to
be some
default multilib, and it has to be unprefixed.  I wanted to keep the
library
paths orthogonal (ie, not bake in a default that rv64gc/lp64d lives at
/lib),
so I chose to just build a redundant multilib.

It'd be great to get rid of this, but I'm afraid it's way past my level
of
understanding as to how all this works.


I do actually have a solution for this but it is not submitted upstream.
MIPS has basically the same set of problems that RISC-V does in this area
and in an ideal world there would be no 'fallback' multilib such that if
you use compiler options that map to a library variant that does not
exist then the linker just fails to find any libraries at all rather than
using the default multilib.

I can share the raw patch for this and try to give you some idea about how
it works. I am struggling to find time to do much open source support at
the moment so may not be able to do all the due diligence to get it
committed. Would you be willing to take a look and do some of the work to
get it in tree?


If you have a rough patch that'd be great, it'll be at least a starting point.  
Last time I poked around I wasn't ever sure where to look.


Thanks!


Re: __builtin_isnormal question

2018-06-04 Thread Liu Hao

在 2018/6/5 4:44, Steve Ellcey 写道:

Is there a bug in __builtin_isnormal or am I just confused as to what it
means?  There doesn't seem to be any actual definition/documentation for
the function.  __builtin_isnormal(0.0) is returning false.  That seems
wrong to me, 0.0 is a normal (as opposed to a denormalized) number isn't
it?  Or is zero special?



It is documented here (this block of text is copied from GCC 7 manual):

```
6.59 Other Built-in Functions Provided by GCC
GCC provides built-in versions of the ISO C99 floating-point comparison 
macros ... . In the same fashion, GCC provides fpclassify, isfinite, 
isinf_sign, **isnormal** and signbit built-ins used with __builtin_ 
prefxed. The isinf and isnan built-in functions appear both with and 
without the __builtin_ prefx.

```

ISO C says:
```
Description
2 The isnormal macro determines whether its argument value is normal 
(neither zero, subnormal, infinite, nor NaN). First, an argument 
represented in a format wider than its semantic type is converted to its 
semantic type. Then determination is based on the type of the argument.

```

`isnormal(x)` is roughly equivalent to `fpclassify(x) == FP_NORMAL`. 
When `x` is `0.0`, `fpclassify(x)` yields `FP_ZERO`, so the result is 
`false`.



--
Best regards,
LH_Mouse



Re: code-gen options for disabling multi-operand AArch64 and ARM instructions

2018-06-04 Thread Ard Biesheuvel
On 4 June 2018 at 20:10, Laszlo Ersek  wrote:
> Hi!
>
> Apologies if this isn't the right place for asking. For the problem
> statement, I'll simply steal Ard's writeup [1]:
>
>> KVM on ARM refuses to decode load/store instructions used to perform
>> I/O to emulated devices, and instead relies on the exception syndrome
>> information to describe the operand register, access size, etc. This
>> is only possible for instructions that have a single input/output
>> register (as opposed to ones that increment the offset register, or
>> load/store pair instructions, etc). Otherwise, QEMU crashes with the
>> following error
>>
>>   error: kvm run failed Function not implemented
>>   [...]
>>   QEMU: Terminated
>>
>> and KVM produces a warning such as the following in the kernel log
>>
>>   kvm [17646]: load/store instruction decoding not implemented
>>
>> GCC with LTO enabled will emit such instructions for Mmio[Read|Write]
>> invocations performed in a loop, so we need to disable LTO [...]
>
> We have a Red Hat Bugzilla about the (very likely) same issue [2].
>
> Earlier, we had to work around the same on AArch64 too [3].
>
> Would it be possible to introduce a dedicated -mXXX option, for ARM and
> AArch64, that disabled the generation of such multi-operand
> instructions?
>
> I note there are several similar instructions (for other architectures):
> * -mno-multiple (ppc)
> * -mno-fused-madd (ia64)
> * -mno-mmx and a lot of friends (x86)
>
> Obviously, if the feature request is deemed justified, we should provide
> the exact family of instructions to disable. I'll leave that to others
> on the CC list with more ARM/AArch64 expertise; I just wanted to get
> this thread started. (Sorry if the option is already being requested
> elsewhere; I admit I didn't search the GCC bugzilla.)
>

I am not convinced that tweaking GCC code generation is the correct
approach here, to be honest.

The issue only occurs when load/store instructions trap into KVM,
which (correct me if I am wrong) mostly only occurs when emulating
MMIO. The case I have been looking into (UEFI) uses MMIO accessors
correctly, but due to the way they are implemented (in C), LTO code
generation may result in load/store instructions with multiple outputs
to be used.

So first of all, I would like to understand the magnitude of the
problem. If all cases we can identify involve performing MMIO using C
memory references, I think we should fix the code rather than the
compiler.