PROPOSAL: Extend inline asm syntax with size spec

2018-10-07 Thread Borislav Petkov
Hi people,

this is an attempt to see whether gcc's inline asm heuristic when
estimating inline asm statements' cost for better inlining can be
improved.

AFAIU, the problematic arises when one ends up using a lot of inline
asm statements in the kernel but due to the inline asm cost estimation
heuristic which counts lines, I think, for example like in this here
macro:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/include/asm/cpufeature.h#n162

the resulting code ends up not inlining the functions themselves which
use this macro. I.e., you see a CALL  instead of its body
getting inlined directly.

Even though it should be because the actual instructions are only a
couple in most cases and all those other directives end up in another
section anyway.

The issue is explained below in the forwarded mail in a larger detail
too.

Now, Richard suggested doing something like:

 1) inline asm ("...")
 2) asm ("..." : : : : )
 3) asm ("...") __attribute__((asm_size()));

with which user can tell gcc what the size of that inline asm statement
is and thus allow for more precise cost estimation and in the end better
inlining.

And FWIW 3) looks pretty straight-forward to me because attributes are
pretty common anyways.

But I'm sure there are other options and I'm sure people will have
better/different ideas so feel free to chime in.

Thx.

On Wed, Oct 03, 2018 at 02:30:50PM -0700, Nadav Amit wrote:
> This patch-set deals with an interesting yet stupid problem: kernel code
> that does not get inlined despite its simplicity. There are several
> causes for this behavior: "cold" attribute on __init, different function
> optimization levels; conditional constant computations based on
> __builtin_constant_p(); and finally large inline assembly blocks.
> 
> This patch-set deals with the inline assembly problem. I separated these
> patches from the others (that were sent in the RFC) for easier
> inclusion. I also separated the removal of unnecessary new-lines which
> would be sent separately.
> 
> The problem with inline assembly is that inline assembly is often used
> by the kernel for things that are other than code - for example,
> assembly directives and data. GCC however is oblivious to the content of
> the blocks and assumes their cost in space and time is proportional to
> the number of the perceived assembly "instruction", according to the
> number of newlines and semicolons. Alternatives, paravirt and other
> mechanisms are affected, causing code not to be inlined, and degrading
> compilation quality in general.
> 
> The solution that this patch-set carries for this problem is to create
> an assembly macro, and then call it from the inline assembly block.  As
> a result, the compiler sees a single "instruction" and assigns the more
> appropriate cost to the code.
> 
> To avoid uglification of the code, as many noted, the macros are first
> precompiled into an assembly file, which is later assembled together
> with the C files. This also enables to avoid duplicate implementation
> that was set before for the asm and C code. This can be seen in the
> exception table changes.
> 
> Overall this patch-set slightly increases the kernel size (my build was
> done using my Ubuntu 18.04 config + localyesconfig for the record):
> 
>text  data bss dec hex filename
> 18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
> 18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+0.1%)
> 
> The number of static functions in the image is reduced by 379, but
> actually inlining is even better, which does not always shows in these
> numbers: a function may be inlined causing the calling function not to
> be inlined.
> 
> I ran some limited number of benchmarks, and in general the performance
> impact is not very notable. You can still see >10 cycles shaved off some
> syscalls that manipulate page-tables (e.g., mprotect()), in which
> paravirt caused many functions not to be inlined. In addition this
> patch-set can prevent issues such as [1], and improves code readability
> and maintainability.
> 
> Update: Rasmus recently caused me (inadvertently) to become paranoid
> about the dependencies. To clarify: if any of the headers changes, any c
> file which uses macros that are included in macros.S would be fine as
> long as it includes the header as well (as it should). Adding an
> assertion to check this is done might become slightly ugly, and nobody
> else is concerned about it. Another minor issue is that changes of
> macros.S would not trigger a global rebuild, but that is pretty similar
> to changes of the Makefile that do not trigger a rebuild.
> 
> [1] https://patchwork.kernel.org/patch/10450037/
> 
> v8->v9: * Restoring the '-pipe' parameter (Rasmus)
>   * Adding Kees's tested-by tag (Kees)
> 
> v7->v8:   * Add acks (Masahiro, Max)
>   * Rebase on 4.19 (Ingo)
> 
> v6->v7: * Fix context switch tracking (Ingo)
>   * Fix xtensa build error (Ingo)
> 

PATCH for Re: Broken mirror site

2018-10-07 Thread Gerald Pfeifer
On Tue, 2 Oct 2018, David McCallum wrote:
> At this time, your Canadian mirror site at http://gcc.parentingamerica.com/
> is broken. There is only a page with the text "It works!"... which appears
> to be untrue.

Thanks for letting us know, David!

James, heads up!

For now I committed the patch below.

Gerald

Index: mirrors.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/mirrors.html,v
retrieving revision 1.254
diff -u -r1.254 mirrors.html
--- mirrors.html30 Sep 2018 14:38:46 -  1.254
+++ mirrors.html7 Oct 2018 12:40:08 -
@@ -19,7 +19,6 @@
 
-Canada: http://gcc.parentingamerica.com";>http://gcc.parentingamerica.com, 
thanks to James Miller (jmiller at parentingamerica.com).
 France (no snapshots): ftp://ftp.lip6.fr/pub/gcc/";>ftp.lip6.fr, thanks to ftpmaint at 
lip6.fr
 France, Brittany: ftp://ftp.irisa.fr/pub/mirrors/gcc.gnu.org/gcc/";>ftp.irisa.fr, thanks 
to ftpmaint at irisa.fr
 France, Versailles: ftp://ftp.uvsq.fr/pub/gcc/";>ftp.uvsq.fr, 
thanks to ftpmaint at uvsq.fr


Re: PROPOSAL: Extend inline asm syntax with size spec

2018-10-07 Thread Borislav Petkov
On Sun, Oct 07, 2018 at 08:22:28AM -0500, Segher Boessenkool wrote:
> GCC already estimates the *size* of inline asm, and this is required
> *for correctness*.

I didn't say it didn't - but the heuristic could use improving.

> So I guess the real issue is that the inline asm size estimate for x86
> isn't very good (since it has to be pessimistic, and x86 insns can be
> huge)?

Well, the size thing could be just a "parameter" or "hint" of sorts, to
tell gcc to inline the function X which is inlining the asm statement
into the function Y which is calling function X. If you look at the
patchset, it is moving everything to asm macros where gcc is apparently
able to do better inlining.

> >  3) asm ("...") __attribute__((asm_size()));
> 
> Eww.

Why?

> More precise *size* estimates, yes.  And if the user lies he should not
> be surprised to get assembler errors, etc.

Yes.

Another option would be if gcc parses the inline asm directly and
does a more precise size estimation. Which is a lot more involved and
complicated solution so I guess we wanna look at the simpler ones first.

:-)

> I don't like 2) either.  But 1) looks interesting, depends what its
> semantics would be?  "Don't count this insn's size for inlining decisions",
> maybe?

Or simply "this asm statement has a size of 1" to mean, inline it
everywhere. Which has the same caveats as above.

> Another option is to just force inlining for those few functions where
> GCC currently makes an inlining decision you don't like.  Or are there
> more than a few?

I'm afraid they're more than a few and this should work automatically,
if possible.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


[RESEND] PROPOSAL: Extend inline asm syntax with size spec

2018-10-07 Thread Nadav Amit
Second try after being blocked by gcc mailing list:

at 9:09 AM, Nadav Amit mailto:na...@vmware.com>> wrote:

at 2:18 AM, Borislav Petkov mailto:b...@alien8.de>> wrote:

Hi people,

this is an attempt to see whether gcc's inline asm heuristic when
estimating inline asm statements' cost for better inlining can be
improved.

AFAIU, the problematic arises when one ends up using a lot of inline
asm statements in the kernel but due to the inline asm cost estimation
heuristic which counts lines, I think, for example like in this here
macro:

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&data=02%7C01%7Cnamit%40vmware.com%7C6db1258c65ea45bbe4ea08d62c35ceec%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745007006838299&sdata=iehl%2Fb8h%2BZE%2Frqb4qjac19WekSgOObc9%2BM1Jto1VgF4%3D&reserved=0

the resulting code ends up not inlining the functions themselves which
use this macro. I.e., you see a CALL  instead of its body
getting inlined directly.

Even though it should be because the actual instructions are only a
couple in most cases and all those other directives end up in another
section anyway.

The issue is explained below in the forwarded mail in a larger detail
too.

Now, Richard suggested doing something like:

1) inline asm ("...")
2) asm ("..." : : : : )
3) asm ("...") __attribute__((asm_size()));

with which user can tell gcc what the size of that inline asm statement
is and thus allow for more precise cost estimation and in the end better
inlining.

And FWIW 3) looks pretty straight-forward to me because attributes are
pretty common anyways.

But I'm sure there are other options and I'm sure people will have
better/different ideas so feel free to chime in.

Thanks for taking care of it. I would like to mention a second issue, since
you may want to resolve both with a single solution: not inlining
conditional __builtin_constant_p(), in which there are two code-paths - one
for constants and one for variables.

Consider for example the Linux kernel ilog2 macro, which has a condition
based on __builtin_constant_p() (
https://elixir.bootlin.com/linux/v4.19-rc7/source/include/linux/log2.h#L160
). The compiler mistakenly considers the “heavy” code-path that is supposed
to be evaluated only in compilation time to evaluate the code size. This
causes the kernel to consider functions such as kmalloc() as “big”.
kmalloc() is marked with always_inline attribute, so instead the calling
functions, such as kzalloc() are not inlined.

When I thought about hacking gcc to solve this issue, I considered an
intrinsic that would override the cost of a given statement. This solution
is not too nice, but may solve both issues.

In addition, note that AFAIU the impact of a wrong cost of code estimation
can also impact loop and other optimizations.

Regards,
Nadav




Re: PROPOSAL: Extend inline asm syntax with size spec

2018-10-07 Thread Richard Biener
On October 7, 2018 6:09:30 PM GMT+02:00, Nadav Amit  wrote:
>at 2:18 AM, Borislav Petkov  wrote:
>
>> Hi people,
>> 
>> this is an attempt to see whether gcc's inline asm heuristic when
>> estimating inline asm statements' cost for better inlining can be
>> improved.
>> 
>> AFAIU, the problematic arises when one ends up using a lot of inline
>> asm statements in the kernel but due to the inline asm cost
>estimation
>> heuristic which counts lines, I think, for example like in this here
>> macro:
>> 
>>
>https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&data=02%7C01%7Cnamit%40vmware.com%7C6db1258c65ea45bbe4ea08d62c35ceec%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745007006838299&sdata=iehl%2Fb8h%2BZE%2Frqb4qjac19WekSgOObc9%2BM1Jto1VgF4%3D&reserved=0
>> 
>> the resulting code ends up not inlining the functions themselves
>which
>> use this macro. I.e., you see a CALL  instead of its body
>> getting inlined directly.
>> 
>> Even though it should be because the actual instructions are only a
>> couple in most cases and all those other directives end up in another
>> section anyway.
>> 
>> The issue is explained below in the forwarded mail in a larger detail
>> too.
>> 
>> Now, Richard suggested doing something like:
>> 
>> 1) inline asm ("...")
>> 2) asm ("..." : : : : )
>> 3) asm ("...") __attribute__((asm_size()));
>> 
>> with which user can tell gcc what the size of that inline asm
>statement
>> is and thus allow for more precise cost estimation and in the end
>better
>> inlining.
>> 
>> And FWIW 3) looks pretty straight-forward to me because attributes
>are
>> pretty common anyways.
>> 
>> But I'm sure there are other options and I'm sure people will have
>> better/different ideas so feel free to chime in.
>
>Thanks for taking care of it. I would like to mention a second issue,
>since
>you may want to resolve both with a single solution: not inlining
>conditional __builtin_constant_p(), in which there are two code-paths -
>one
>for constants and one for variables.
>
>Consider for example the Linux kernel ilog2 macro, which has a
>condition
>based on __builtin_constant_p() (
>https://elixir.bootlin.com/linux/v4.19-rc7/source/include/linux/log2.h#L160
>). The compiler mistakenly considers the “heavy” code-path that is
>supposed
>to be evaluated only in compilation time to evaluate the code size.

But this is a misconception about __builtin_constant_p. It doesn't guard sth 
like 'constexpr' regions. If you try to use it with those semantics you'll fail 
(appearantly you do). 

Of course IPA CP code size estimates when seeing a constant fed to bcp might be 
not optimal, that's another issue of course. 

Richard. 

>This
>causes the kernel to consider functions such as kmalloc() as “big”.
>kmalloc() is marked with always_inline attribute, so instead the
>calling
>functions, such as kzalloc() are not inlined.
>
>When I thought about hacking gcc to solve this issue, I considered an
>intrinsic that would override the cost of a given statement. This
>solution
>is not too nice, but may solve both issues.
>
>In addition, note that AFAIU the impact of a wrong cost of code
>estimation
>can also impact loop and other optimizations.
>
>Regards,
>Nadav



Re: PROPOSAL: Extend inline asm syntax with size spec

2018-10-07 Thread Jeff Law
On 10/7/18 1:06 PM, Nadav Amit wrote:
> at 9:46 AM, Richard Biener  wrote:
> 
>> On October 7, 2018 6:09:30 PM GMT+02:00, Nadav Amit  wrote:
>>> at 2:18 AM, Borislav Petkov  wrote:
>>>
 Hi people,

 this is an attempt to see whether gcc's inline asm heuristic when
 estimating inline asm statements' cost for better inlining can be
 improved.

 AFAIU, the problematic arises when one ends up using a lot of inline
 asm statements in the kernel but due to the inline asm cost
>>> estimation
 heuristic which counts lines, I think, for example like in this here
 macro:
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975505381&sdata=Nd0636K9Z1IsUs1RWSRAhVuVboLxlBCB4peiAMfmQzQ%3D&reserved=0
 the resulting code ends up not inlining the functions themselves
>>> which
 use this macro. I.e., you see a CALL  instead of its body
 getting inlined directly.

 Even though it should be because the actual instructions are only a
 couple in most cases and all those other directives end up in another
 section anyway.

 The issue is explained below in the forwarded mail in a larger detail
 too.

 Now, Richard suggested doing something like:

 1) inline asm ("...")
 2) asm ("..." : : : : )
 3) asm ("...") __attribute__((asm_size()));

 with which user can tell gcc what the size of that inline asm
>>> statement
 is and thus allow for more precise cost estimation and in the end
>>> better
 inlining.

 And FWIW 3) looks pretty straight-forward to me because attributes
>>> are
 pretty common anyways.

 But I'm sure there are other options and I'm sure people will have
 better/different ideas so feel free to chime in.
>>>
>>> Thanks for taking care of it. I would like to mention a second issue,
>>> since
>>> you may want to resolve both with a single solution: not inlining
>>> conditional __builtin_constant_p(), in which there are two code-paths -
>>> one
>>> for constants and one for variables.
>>>
>>> Consider for example the Linux kernel ilog2 macro, which has a
>>> condition
>>> based on __builtin_constant_p() (
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv4.19-rc7%2Fsource%2Finclude%2Flinux%2Flog2.h%23L160&data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975515386&sdata=Hk39Za9%2FxcFyK0sGENB24d6QySjsDGzF%2FwqjnUEMiGk%3D&reserved=0
>>> ). The compiler mistakenly considers the “heavy” code-path that is
>>> supposed
>>> to be evaluated only in compilation time to evaluate the code size.
>>
>> But this is a misconception about __builtin_constant_p. It doesn't guard sth 
>> like 'constexpr' regions. If you try to use it with those semantics you'll 
>> fail (appearantly you do). 
>>
>> Of course IPA CP code size estimates when seeing a constant fed to bcp might 
>> be not optimal, that's another issue of course. 
> 
> I understand that this is might not be the right way to implement macros
> such as ilog2() and test_bit(), but this code is around for some time.
That doesn't make it right -- and there's been numerous bogus bugs
reported against ilog2 because the authors of ilog2 haven't had a clear
understanding of the semantics of builtin_constant_p.


Jeff


gcc-9-20181007 is now available

2018-10-07 Thread gccadmin
Snapshot gcc-9-20181007 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/9-20181007/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 264906

You'll find:

 gcc-9-20181007.tar.xzComplete GCC

  SHA256=a8c3588b5c09d66d884d18d5a8749cb087b9ff5c2adfede062abf455729d1123
  SHA1=1148b942ec4205689459c041b93b6e77f88b22f2

Diffs from 9-20180930 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.