List user branches on git web

2020-07-10 Thread Martin Liška

Hey.

Sometimes it's handy to send somebody a user branch link. However, the
current URL is long as:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=heads;h=refs/users/marxin/heads

Would it be possible to make some alias? Something like:
https://gcc.gnu.org/git-branches/marxin

Thanks,
Martin


Effect of nested unions and structures in GCC?

2020-07-10 Thread Unidef Defshrizzle
What would happen?


Re: List user branches on git web

2020-07-10 Thread Martin Liška

On 7/10/20 10:19 AM, Martin Liška wrote:

current URL is long as:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=heads;h=refs/users/marxin/heads


Apparently such page does not list user branches.

So we can at least redirect to a user branch log:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=log;h=refs/users/marxin/heads/slp-function-v4

So a redirection from
https://gcc.gnu.org/git-branches/marxin/heads/slp-function-v4
?

Martin


Re: Effect of nested unions and structures in GCC?

2020-07-10 Thread Jonathan Wakely via Gcc
On Fri, 10 Jul 2020 at 09:25, Unidef Defshrizzle  wrote:
>
> What would happen?

This doesn't seem like a question about GCC development, so is
off-topic on this mailing list.

The C language says what should happen, GCC follows that.


Re: Question about indirect functions and PGO

2020-07-10 Thread Erick Ochoa
Forgot to mention that these functions take a function pointer as a 
parameter and as a result, the specialized functions are able to replace 
the indirect function call with a direct function call.


On 10/07/2020 13:17, Erick Ochoa wrote:

Hello,

I'm working on an optimization and I encountered this interesting 
behaviour. There are a couple of functions that are specialized when the 
program is not compiled with PGO (-fprofile-generate and -fprofile-use)


However, when the program is compiled with PGO the compiler does not 
specialize the function calls.


I printing the program just after materializing all clones.

I am running this version of GCC:
Author: GCC Administrator 
Date:   Fri Jul 10 00:16:28 2020 +

     Daily bump.

I can imagine that the profiling information was used to determine that 
specializing these functions is a bad tradeoff between binary size and 
speed. But I do not know this for sure. How can I find out why these 
functions were not specialized? (I.e. is there a threshold that wasn't 
met, and if so, where is it located and what's its value?)


Thanks!


Question about indirect functions and PGO

2020-07-10 Thread Erick Ochoa

Hello,

I'm working on an optimization and I encountered this interesting 
behaviour. There are a couple of functions that are specialized when the 
program is not compiled with PGO (-fprofile-generate and -fprofile-use)


However, when the program is compiled with PGO the compiler does not 
specialize the function calls.


I printing the program just after materializing all clones.

I am running this version of GCC:
Author: GCC Administrator 
Date:   Fri Jul 10 00:16:28 2020 +

Daily bump.

I can imagine that the profiling information was used to determine that 
specializing these functions is a bad tradeoff between binary size and 
speed. But I do not know this for sure. How can I find out why these 
functions were not specialized? (I.e. is there a threshold that wasn't 
met, and if so, where is it located and what's its value?)


Thanks!


SRA argument after materialization

2020-07-10 Thread Erick Ochoa

Hello,

is there a way to determine just how an argument is affected by SRA 
after SRA has occured? I have the following functions:


source:

_Bool
returnLastField (struct arc anArc)
{
  return anArc.c;
}

_Bool
returnNextField (struct arc anArc)
{
  _Bool *ptr = &(anArc.a);
  ptr = ptr + 1; // accesses anArc.b
  return *ptr;
}


I normally determine field accesses by looking at both COMPONENT_REF and 
MEM_REFs...


Without SRA:

returnLastField (struct arc anArc)
{
   [count: 1]:
  _2 = anArc.c;
  return _2;

}


returnNextField (struct arc anArc)
{
   [count: 1]:
  _2 = MEM[(_Bool *)&anArc + 1B]; // accesses anArc.b
  return _2;

}


However, SRA obfuscates this.

With SRA:

returnLastField.isra (_Bool ISRA.0)
{
  struct arc anArc;
  struct arc anArc;

   [count: 1]:

   [count: 1]:
  _1 = ISRA.0_2(D);
  return _1;

}

returnNextField.isra (_Bool ISRA.1)
{
  struct arc anArc;
  struct arc anArc;

   [count: 1]:

   [count: 1]:
  _1 = ISRA.1_2(D);
  return _1;

}


How would I be able to determine the effects of ISRA on the struct argument?

Thanks!


Announce: GNU MPFR 4.1.0 is released

2020-07-10 Thread Vincent Lefevre
GNU MPFR 4.1.0 ("épinards à la crème"), a C library for
multiple-precision floating-point computations with correct rounding,
is now available for download from the MPFR web site:

  https://www.mpfr.org/mpfr-4.1.0/

from InriaForge:

  https://gforge.inria.fr/projects/mpfr/

and from the GNU FTP site:

  https://ftp.gnu.org/gnu/mpfr/

Thanks very much to those who sent us bug reports and/or tested the
release candidate.

The SHA1 digests:
877d35a8a81a4d2d9446252e9b4ae944754d8ceb  mpfr-4.1.0.tar.bz2
154a34083cb3a114ed9e687afafea38aa38c8737  mpfr-4.1.0.tar.gz
159c3a58705662bfde4dc93f2617f3660855ead6  mpfr-4.1.0.tar.xz
a48d9e5866a1549ee298357cac7e488ff0dc  mpfr-4.1.0.zip

The SHA256 digests:
feced2d430dd5a97805fa289fed3fc8ff2b094c02d05287fd6133e7f1f0ec926  
mpfr-4.1.0.tar.bz2
3127fe813218f3a1f0adf4e8899de23df33b4cf4b4b3831a5314f78e65ffa2d6  
mpfr-4.1.0.tar.gz
0c98a3f1732ff6ca4ea690552079da9c597872d30e96ec28414ee23c95558a7f  
mpfr-4.1.0.tar.xz
7cc03bcb6db6e7b0f1f8aa5c9c704155b74ba69af139e9b1e859b905618cf88b  mpfr-4.1.0.zip

The signatures:
https://www.mpfr.org/mpfr-4.1.0/mpfr-4.1.0.tar.xz.asc
https://www.mpfr.org/mpfr-4.1.0/mpfr-4.1.0.tar.bz2.asc
https://www.mpfr.org/mpfr-4.1.0/mpfr-4.1.0.tar.gz.asc
https://www.mpfr.org/mpfr-4.1.0/mpfr-4.1.0.zip.asc

Each tarball is signed by Vincent Lefèvre. This can be verified using
the DSA key ID 980C197698C3739D; this key can be retrieved with:

  gpg --recv-keys 980C197698C3739D

or by downloading it from .
The key fingerprint is:

  07F3 DBBE CC1A 3960 5078  094D 980C 1976 98C3 739D

The signatures can be verified with: gpg --verify 
You should check that the key fingerprint matches.

Changes from versions 4.0.* to version 4.1.0:
- The "épinards à la crème" release.
- Binary compatible with MPFR 4.0.*, though some minor changes in the
  behavior of the formatted output functions may be visible, regarded
  as underspecified behavior or bug fixes (see below).
- New --enable-formally-proven-code configure option, to use (when available)
  formally proven code.
- Improved __GMP_CC and __GMP_CFLAGS retrieval (in particular for MS Windows).
- Option -pedantic is now always removed from __GMP_CFLAGS (see INSTALL).
- Changed __float128 to the type _Float128 specified in ISO/IEC TS 18661.
  __float128 is used as a fallback if _Float128 is not supported.
- New function mpfr_get_str_ndigits about conversion to a string of digits.
- New function mpfr_dot for the dot product (incomplete, experimental).
- New functions mpfr_get_decimal128 and mpfr_set_decimal128 (available only
  when MPFR has been built with decimal float support).
- New function mpfr_cmpabs_ui.
- New function mpfr_total_order_p for the IEEE 754 totalOrder predicate.
- The mpfr_out_str function now accepts bases from -2 to -36, in order to
  follow mpfr_get_str and GMP's mpf_out_str functions (these cases gave an
  assertion failure, as with other invalid bases).
- Shared caches: cleanup; really detect lock failures (abort in this case).
- The behavior of the formatted output functions (mpfr_printf, etc.) with
  an empty precision field has improved: trailing zeros are kept in a way
  similar to the formatted output functions from C.
- Improved mpfr_add and mpfr_sub when all operands have a precision equal to
  twice the number of bits per word, e.g., 128 bits on a 64-bit platform.
- Optimized the tuning parameters for various architectures.
- Improved test coverage to 98.6% of code for x86_64.
- Bug fixes.
- MPFR manual: corrected/completed the mpfr_get_str description in order to
  follow the historical behavior and GMP's mpf_get_str function.
- New: optional "make check-exported-symbols", mainly for the MPFR developers
  and binary distributions, to check that MPFR does not define symbols with a
  GMP reserved prefix (experimental).
- Mini-gmp support: replaced --enable-mini-gmp configure option by
  --with-mini-gmp (still experimental, read doc/mini-gmp).
- A GCC bug on Sparc (present at least in old GCC 4.5.3 and 5.5.0 versions),
  which made several tests fail when TLS was enabled, is now avoided in the
  tests. The MPFR library itself was not affected and normal code using the
  MPFR library should not be affected either. Users and distributions that
  disabled TLS just because of the test failures can safely re-enable it.

You can send success and failure reports to , and give
us the canonical system name (by running the "./config.guess" script),
the processor and the compiler version, in order to complete the
"Platforms Known to Support MPFR" section of the MPFR 4.1.0 web page.

Regards,

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


New x86-64 micro-architecture levels

2020-07-10 Thread Florian Weimer via Gcc
Most Linux distributions still compile against the original x86-64
baseline that was based on the AMD K8 (minus the 3DNow! parts, for Intel
EM64T compatibility).

There has been an attempt to use the existing AT_PLATFORM-based loading
mechanism in the glibc dynamic linker to enable a selection of optimized
libraries.  But the general selection mechanism in glibc is problematic:

  hwcaps subdirectory selection in the dynamic loader
  

We also have the problem that the glibc version of "haswell" is distinct
from GCC's -march=haswell (and presumably other compilers):

  Definition of "haswell" platform is inconsistent with GCC 
  

And that the selection criteria are not what people expect:

  Epyc and other current AMD CPUs do not select the "haswell" platform
  subdirectory
  

Since the hwcaps-based selection does not work well regardless of
architecture (even in cases the kernel provides glibc with data), I
worked on a new mechanism that does not have the problems associated
with the old mechanism:

  [PATCH 00/30] RFC: elf: glibc-hwcaps support
  

(Don't be concerned that these patches have not been reviewed; we are
busy preparing the glibc 2.32 release, and these changes do not alter
the glibc ABI itself, so they do not have immediate priority.  I'm
fairly confident that a version of these changes will make it into glibc
2.33, and I hope to backport them into Fedora 33, Fedora 32, and Red Hat
Enterprise Linux 8.4.  Debian as well, but I have never done anything
like it there, so I don't know if the patches will be accepted.)

Out of the box, this should work fairly well for IBM POWER and Z, where
there is a clear progression of silicon versions (at least on paper
—virtualization may blur the picture somewhat).

However, for x86, we do not have such a clear progression of
micro-architecture versions.  This is not just as a result of the
AMD/Intel competition, but also due to ongoing product differentiation
within one chip vendor.  I think we need these levels broadly for the
following reasons:

* Selecting on individual CPU features (similar to the old hwcaps
  mechanism) in glibc has scalability issues, particularly for
  LD_LIBRARY_PATH processing.

* Developers need guidance about useful targets for optimization.  I
  think there is value in limiting the choices, in the sense that “if
  you are able to test three builds in total, these are the things you
  should build”.

* glibc and the compilers should align in their definition of the
  levels, so that developers can use an -march= option to build for a
  particular level that is recognized by glibc.  This is why I think the
  description of the levels should go into the psABI supplement.

* A preference order for these levels avoids falling back to the K8
  baseline if the platform progresses to a new version due to
  glibc/kernel/hypervisor/hardware upgrades.

I'm including a proposal for the levels below.  I use single letters for
them, but I expect that the concrete implementation of this proposal
will use names like “x86-100”, “x86-101”, like in the glibc patch
referenced above.  (But we can discuss other approaches.)

I looked at various machines in the Red Hat labs and talked to Intel and
AMD engineers about this, but this concrete proposal is based on my own
analysis of the situation.  I excluded CPU features related to
cryptography and cache management, including hardware transactional
memory, and CPU timing.  I assume that we will see some of these
features being disabled by the firmware or the kernel over time.  That
would eliminate entire levels from selection, which is not desirable.
For cryptographic code, I expect that localized selection of an
optimized implementation works because such code tends to be isolated
blocks, running for dozens of cycles each time, not something that gets
scattered all over the place by the compiler.

We previously discussed not emitting VZEROUPPER at later levels, but I
don't think this is beneficial because the ABI does not have
callee-saved vector registers, so it can only be useful with local
functions (or whatever LTO considers local), where there is no ABI
impact anyway.

I did not include FSGSBASE because the FS base is already available at
%fs:0.  Changing the FS base in userspace breaks too much, so the main
benefit is the tighter encoding of rdfsbase, which seems very slim.

Not covered in this are tuning decisions.  I think we can benefit from
some variance in this area between implementations; it should not affect
correctness.  32-bit support is also a separate matter.

* Level A

CMPXCHG16B, LAHF/SAHF, POPCNT, SSE3, SSE4.1, SSE4.2, SSSE3

This is one step above the K8 baseline and corresponds to a mainline CPU
model ca. 2008 to 2011.  It is also implem

Re: Future debug options: -f* or -g*?

2020-07-10 Thread Nathan Sidwell

On 7/9/20 3:28 PM, Fangrui Song via Gcc wrote:

Fix email addresses:)



IMHO the -f ones are misnamed.
-fFOO -> affect generated code (non-target-specific) or language feature
-gFOO -> affect debug info
-mFOO -> machine-specific option

the -fdump options are misnamed btw, I remember Jeff Law pointed that out after 
Mark Mitchell added the first one.  I'm not sure why we didn't rename it right 
then.  I'll bet there are other -foptions that don;t match my comfortable world 
view :)


nathan


On 2020-07-09, Fangrui Song wrote:

Both GCC and Clang have implemented many debugging options under -f and
-g. Whether options go to -f or -g appears to be pretty arbitrary decisions.

A non-complete list of GCC supported debug options is documented here at
https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html

I think there options belong to 3 categories:

(a) -f* & doesn't imply -g2: -fdebug-types-section 
-feliminate-unused-debug-types,
  -fdebug-default-version=5 (this exists in clang purely because -gdwarf-5 
implies -g2 & there is need to not imply -g2)
(b) -g* & implies -g2: -gsplit-dwarf (I want to move this one to (c) 
http://lists.llvm.org/pipermail/cfe-dev/2020-July/066202.html )

   -gdwarf-5, -ggdb, -gstabs
(c) -g* but does not imply -g2: -ggnu-pubnames, -gcolumn-info, -gstrict-dwarf, 
-gz, ...

   the list appears to be much longer than (b)

( (b) isn't very good to its non-orthogonality. The interaction with -g0 -g1
 and -g3 can be non-obvious sometimes.)

Cary Coutant kindly shared with me his understanding about debug
options (attached at the end). Many established options can't probably
be fixed now. Some are still fixable (-gsplit-dwarf).

This post is mainly about future debug options. Shall we figure out a
convention for future debug options?

Personally I'd prefer (c) but I won't object to (a).
I'd avoid (b).


In retrospect, I regret not naming the option -fsplit-dwarf, which
clearly would not have implied -g, and would have fit in with a few
other dwarf-related -f options. (I don't know whether Richard's
objection to it is because there is already -gsplit-dwarf, or if he
would have objected to it as an alternative-universe spelling.)

At the time, I thought it was fairly common for all/most -g options
(except -g0) to imply -g. Perhaps that wasn't true then or is no
longer true now. If the rest of the community is OK with changing
-gsplit-dwarf to not imply -g, and no one has said it would cause them
any hardship, I'm OK with your proposed change.

I did design it so that you could get the equivalent by simply writing
"-gsplit-dwarf -g0" at the front of the compiler options (e.g., via an
environment variable), so that a subsequent -g would not only turn on
debug but would also enable split-dwarf. We used that fairly regularly
at Google.

Regarding how the build system can discover whether or not split dwarf
is in effect, without parsing all the options presented to gcc, and
without looking for the side effects (the .dwo files), we dodged that
in the Google build system by having a higher-level build flag,
--fission, which would tell the build system to pass -gsplit-dwarf to
gcc AND look for the .dwo files produced on the side. We simply
disallowed having the user pass -gsplit-dwarf directly to the
compiler.

Feel free to share this.



--
Nathan Sidwell


Re: New x86-64 micro-architecture levels

2020-07-10 Thread Joseph Myers
On Fri, 10 Jul 2020, Florian Weimer via Gcc wrote:

> * Level A
> 
> CMPXCHG16B, LAHF/SAHF, POPCNT, SSE3, SSE4.1, SSE4.2, SSSE3
> 
> This is one step above the K8 baseline and corresponds to a mainline CPU
> model ca. 2008 to 2011.  It is also implemented by recent-ish
> generations of Intel Atom server CPUs (although I haven't tested the
> latest version).  A 32-bit variant would have to list many additional
> CPU features here.

FWIW, this is also the limit of what can be run under QEMU emulation, as 
QEMU lacks support for AVX and newer instruction set features.

On the other hand, virtual machines seem liable to report something closer 
to the K8 baseline to the guest OS, missing the level A features, even 
when the underlying hardware supports everything in level B or level C.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: New x86-64 micro-architecture levels

2020-07-10 Thread H.J. Lu via Gcc
On Fri, Jul 10, 2020 at 10:30 AM Florian Weimer  wrote:
>
> Most Linux distributions still compile against the original x86-64
> baseline that was based on the AMD K8 (minus the 3DNow! parts, for Intel
> EM64T compatibility).
>
> There has been an attempt to use the existing AT_PLATFORM-based loading
> mechanism in the glibc dynamic linker to enable a selection of optimized
> libraries.  But the general selection mechanism in glibc is problematic:
>
>   hwcaps subdirectory selection in the dynamic loader
>   
>
> We also have the problem that the glibc version of "haswell" is distinct
> from GCC's -march=haswell (and presumably other compilers):
>
>   Definition of "haswell" platform is inconsistent with GCC
>   
>
> And that the selection criteria are not what people expect:
>
>   Epyc and other current AMD CPUs do not select the "haswell" platform
>   subdirectory
>   
>
> Since the hwcaps-based selection does not work well regardless of
> architecture (even in cases the kernel provides glibc with data), I
> worked on a new mechanism that does not have the problems associated
> with the old mechanism:
>
>   [PATCH 00/30] RFC: elf: glibc-hwcaps support
>   
>
> (Don't be concerned that these patches have not been reviewed; we are
> busy preparing the glibc 2.32 release, and these changes do not alter
> the glibc ABI itself, so they do not have immediate priority.  I'm
> fairly confident that a version of these changes will make it into glibc
> 2.33, and I hope to backport them into Fedora 33, Fedora 32, and Red Hat
> Enterprise Linux 8.4.  Debian as well, but I have never done anything
> like it there, so I don't know if the patches will be accepted.)
>
> Out of the box, this should work fairly well for IBM POWER and Z, where
> there is a clear progression of silicon versions (at least on paper
> —virtualization may blur the picture somewhat).
>
> However, for x86, we do not have such a clear progression of
> micro-architecture versions.  This is not just as a result of the
> AMD/Intel competition, but also due to ongoing product differentiation
> within one chip vendor.  I think we need these levels broadly for the
> following reasons:
>
> * Selecting on individual CPU features (similar to the old hwcaps
>   mechanism) in glibc has scalability issues, particularly for
>   LD_LIBRARY_PATH processing.
>
> * Developers need guidance about useful targets for optimization.  I
>   think there is value in limiting the choices, in the sense that “if
>   you are able to test three builds in total, these are the things you
>   should build”.
>
> * glibc and the compilers should align in their definition of the
>   levels, so that developers can use an -march= option to build for a
>   particular level that is recognized by glibc.  This is why I think the
>   description of the levels should go into the psABI supplement.
>
> * A preference order for these levels avoids falling back to the K8
>   baseline if the platform progresses to a new version due to
>   glibc/kernel/hypervisor/hardware upgrades.
>
> I'm including a proposal for the levels below.  I use single letters for
> them, but I expect that the concrete implementation of this proposal
> will use names like “x86-100”, “x86-101”, like in the glibc patch
> referenced above.  (But we can discuss other approaches.)
>
> I looked at various machines in the Red Hat labs and talked to Intel and
> AMD engineers about this, but this concrete proposal is based on my own
> analysis of the situation.  I excluded CPU features related to
> cryptography and cache management, including hardware transactional
> memory, and CPU timing.  I assume that we will see some of these
> features being disabled by the firmware or the kernel over time.  That
> would eliminate entire levels from selection, which is not desirable.
> For cryptographic code, I expect that localized selection of an
> optimized implementation works because such code tends to be isolated
> blocks, running for dozens of cycles each time, not something that gets
> scattered all over the place by the compiler.
>
> We previously discussed not emitting VZEROUPPER at later levels, but I
> don't think this is beneficial because the ABI does not have
> callee-saved vector registers, so it can only be useful with local
> functions (or whatever LTO considers local), where there is no ABI
> impact anyway.
>
> I did not include FSGSBASE because the FS base is already available at
> %fs:0.  Changing the FS base in userspace breaks too much, so the main
> benefit is the tighter encoding of rdfsbase, which seems very slim.
>
> Not covered in this are tuning decisions.  I think we can benefit from
> some variance in this area between implementations; it should not affect
> correctness.

gcc-9-20200710 is now available

2020-07-10 Thread GCC Administrator via Gcc
Snapshot gcc-9-20200710 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/9-20200710/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 9 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-9 
revision 00672d956d0e2ac42c34ae17bbcf9b61c5efa2a5

You'll find:

 gcc-9-20200710.tar.xzComplete GCC

  SHA256=10066601f61efc3732d7f463d930c9c5ae80ebc37a05aa6e27520f84c7183717
  SHA1=927e7e037b8f0976ac42405088783718cb931055

Diffs from 9-20200703 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Accessing result data of target options without Mask or Var properties

2020-07-10 Thread The Other via Gcc
Hi,
How would I access the result data of target options that don't have Mask
or Var properties? For example, how would I access the result ISA string in
the -march option for the RISC-V target?

Here is the relevant option code inside the .opt file:
march=
Target Report RejectNegative Joined
-march= Generate code for given RISC-V ISA (e.g. RV64IM).  ISA strings must
be
lower-case.

I assume that this information is stored somewhere, as I read somewhere
online that GCC generates different assembly for different ISA strings,
though I suppose this could be referring to previous versions of GCC. I am
attempting to access this information for use in a target hook.

Thanks,
Theo