date:20190619

Re: gcc: -ftest-coverage and -auxbase

2019-06-19 Thread Martin Liška

On 6/18/19 11:51 PM, david.tay...@dell.com wrote:
>> From: Martin Liška 
>> Sent: Tuesday, June 18, 2019 11:20 AM
>>
>> .gcno files are created during compilation and contain info about a source 
>> file.
>> These files will be created by a cross compiler, so that's fine.
>>
>> During a run of a program a .gcda file is created. It contains information 
>> about
>> number of execution of edges. These files are dumped during at_exit by an
>> instrumented application. And the content is stored to a disk (.gcda
>> extension).
>>
>> So what difficulties do you have with that please?
>>
>> Martin
> 
> Not sure I understand the question.
> 
> Conceptually I don't have any problems with the compiler
> creating .gcno files at compile time and the program creating
> .gcda files at run-time.
> 
> As far as the .gcda files go, exit is never called -- it's an embedded
> operating system.  The kernel does not call exit.  Application
> specific glue code will need to be written.  This is to be expected.
> And is completely reasonable.

Yep, then call __gcov_dump at a place where you want to finish instrumentation:
https://gcc.gnu.org/onlinedocs/gcc/Gcov-and-Optimization.html

> 
> As far as the .gcno files go -- currently, while doing over 10,000
> compiles GCC wants to write all the .gcno files to the same file
> name in the same NFS mounted directory.  This is simultaneously
> not useful and very very slow.

Please take a look at attached patch, that will allow you to do:
./gcc/xgcc -Bgcc /tmp/main.c --coverage -fprofile-note-dir=/tmp/

$ ls -l /tmp/main.gcno
-rw-r--r-- 1 marxin users 228 Jun 19 09:18 /tmp/main.gcno

Is the suggested patch working for you?
Martin

> 
> Down the road I'm going to want to make additional changes --
> for example, putting the instrumentation data into a section
> specified on the command line rather than .data.
> 
> Right now I'm concerned about the .gcno files.  I want to be
> able to specify the pathname or the base of the pathname
> on the command line.  I don't really care whether it is called
> -auxbase or something else.  I was thinking '-auxbase' as that
> is the name currently passed to the sub-processes.  I do not
> ultimately care what the name is...
> 
> Additionally, if we do this I want it to be done in a manner
> that when contributed back is likely to be accepted.
> 
> 

diff --git a/gcc/common.opt b/gcc/common.opt
index a1544d06824..d382e70317d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2096,6 +2096,10 @@ Common Joined RejectNegative Var(profile_data_prefix)
 Set the top-level directory for storing the profile data.
 The default is 'pwd'.
 
+fprofile-note-dir=
+Common Joined RejectNegative Var(profile_note_prefix)
+Set the top-level directory for storing the profile note file.
+
 fprofile-correction
 Common Report Var(flag_profile_correction)
 Enable correction of flow inconsistent profile data input.
diff --git a/gcc/coverage.c b/gcc/coverage.c
index 1ffefd5f482..ea7b258d9dd 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -1204,6 +1204,12 @@ coverage_init (const char *filename)
   int len = strlen (filename);
   int prefix_len = 0;
 
+#if HAVE_DOS_BASED_FILE_SYSTEM
+  const char *separator = "\\";
+#else
+  const char *separator = "/";
+#endif
+
   /* Since coverage_init is invoked very early, before the pass
  manager, we need to set up the dumping explicitly. This is
  similar to the handling in finish_optimization_passes.  */
@@ -1217,11 +1223,6 @@ coverage_init (const char *filename)
 	 of filename in order to prevent file path clashing.  */
   if (profile_data_prefix)
 	{
-#if HAVE_DOS_BASED_FILE_SYSTEM
-	  const char *separator = "\\";
-#else
-	  const char *separator = "/";
-#endif
 	  filename = concat (getpwd (), separator, filename, NULL);
 	  filename = mangle_path (filename);
 	  len = strlen (filename);
@@ -1259,6 +1260,9 @@ coverage_init (const char *filename)
   memcpy (bbg_file_name, filename, len);
   strcpy (bbg_file_name + len, GCOV_NOTE_SUFFIX);
 
+  if (profile_note_prefix)
+	bbg_file_name = concat (profile_note_prefix, separator, bbg_file_name, NULL);
+
   if (!gcov_open (bbg_file_name, -1))
 	{
 	  error ("cannot open %s", bbg_file_name);

[RFC] zstd as a compression algorithm for LTO

2019-06-19 Thread Martin Liška

Hi.

I've written a patch draft that replaces zlib with the zstd compression 
algorithm ([1])
in LTO. I'm also sending statistics that are collected for couple of quite big 
C++ source
files. Observation I did:

- LTO stream compression takes 3-4% of LGEN compile time
- zstd in default compression level (3) generated slighly smaller LTO elf files
- zstd compression is 4-8x faster
- decompression is quite negligible, but for a bigger project (godot) I can
  reduction from 1.37 to 0.53 seconds
- ZSTD API is much simpler to use

Suggestion based on the observation:
- I would suggest to make zstd optional (--enable-zstd) and one would
  use #include  + -lzstd
- I like the default level as we want to mainly speed up LTO compilation
- we can provide an option to control algorithm (-flto-compression-algorithm),
  similarly to -flto-compression-level
- we can discuss possible compression of LTO bytecode that is distributed 
between WPA
  stage and individual LTRANS phases.

Thoughts?
Thanks,
Martin

[1] https://github.com/facebook/zstd
>From 4939e90b2a8051128b7b2b0214a5fad5183f3bca Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 19 Jun 2019 09:40:35 +0200
Subject: [PATCH] Replace zlib with zstd.

---
 gcc/Makefile.in|   2 +-
 gcc/common.opt |   2 +-
 gcc/lto-compress.c | 161 -
 gcc/timevar.def|   4 +-
 4 files changed, 33 insertions(+), 136 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d9e0885b96b..8aedcccb717 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -373,7 +373,7 @@ OUTPUT_OPTION = -o $@
 # This is where we get zlib from.  zlibdir is -L../zlib and zlibinc is
 # -I../zlib, unless we were configured with --with-system-zlib, in which
 # case both are empty.
-ZLIB = @zlibdir@ -lz
+ZLIB = @zlibdir@ -lzstd -lz
 ZLIBINC = @zlibinc@
 
 # How to find GMP
diff --git a/gcc/common.opt b/gcc/common.opt
index a1544d06824..f15e21914f3 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1888,7 +1888,7 @@ Specify the algorithm to partition symbols and vars at linktime.
 
 ; The initial value of -1 comes from Z_DEFAULT_COMPRESSION in zlib.h.
 flto-compression-level=
-Common Joined RejectNegative UInteger Var(flag_lto_compression_level) Init(-1) IntegerRange(0, 9)
+Common Joined RejectNegative UInteger Var(flag_lto_compression_level) Init(-1) IntegerRange(0, )
 -flto-compression-level=	Use zlib compression level  for IL.
 
 flto-odr-type-merging
diff --git a/gcc/lto-compress.c b/gcc/lto-compress.c
index 3287178f257..b24f30f956e 100644
--- a/gcc/lto-compress.c
+++ b/gcc/lto-compress.c
@@ -27,13 +27,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple.h"
 #include "cgraph.h"
 #include "lto-streamer.h"
-/* zlib.h includes other system headers.  Those headers may test feature
-   test macros.  config.h may define feature test macros.  For this reason,
-   zlib.h needs to be included after, rather than before, config.h and
-   system.h.  */
-#include 
 #include "lto-compress.h"
 #include "timevar.h"
+#include 
 
 /* Compression stream structure, holds the flush callback and opaque token,
the buffered data, and a note of whether compressing or uncompressing.  */
@@ -48,45 +44,23 @@ struct lto_compression_stream
   bool is_compression;
 };
 
-/* Overall compression constants for zlib.  */
-
-static const size_t Z_BUFFER_LENGTH = 4096;
 static const size_t MIN_STREAM_ALLOCATION = 1024;
 
-/* For zlib, allocate SIZE count of ITEMS and return the address, OPAQUE
-   is unused.  */
-
-static void *
-lto_zalloc (void *opaque, unsigned items, unsigned size)
-{
-  gcc_assert (opaque == Z_NULL);
-  return xmalloc (items * size);
-}
-
-/* For zlib, free memory at ADDRESS, OPAQUE is unused.  */
-
-static void
-lto_zfree (void *opaque, void *address)
-{
-  gcc_assert (opaque == Z_NULL);
-  free (address);
-}
-
-/* Return a zlib compression level that zlib will not reject.  Normalizes
+/* Return a zstd compression level that zstd will not reject.  Normalizes
the compression level from the command line flag, clamping non-default
values to the appropriate end of their valid range.  */
 
 static int
-lto_normalized_zlib_level (void)
+lto_normalized_zstd_level (void)
 {
   int level = flag_lto_compression_level;
 
-  if (level != Z_DEFAULT_COMPRESSION)
+  if (level != ZSTD_CLEVEL_DEFAULT)
 {
-  if (level < Z_NO_COMPRESSION)
-	level = Z_NO_COMPRESSION;
-  else if (level > Z_BEST_COMPRESSION)
-	level = Z_BEST_COMPRESSION;
+  if (level < 1)
+	level = 1;
+  else if (level > ZSTD_maxCLevel ())
+	level = ZSTD_maxCLevel ();
 }
 
   return level;
@@ -169,57 +143,19 @@ void
 lto_end_compression (struct lto_compression_stream *stream)
 {
   unsigned char *cursor = (unsigned char *) stream->buffer;
-  size_t remaining = stream->bytes;
-  const size_t outbuf_length = Z_BUFFER_LENGTH;
-  unsigned char *outbuf = (unsigned char *) xmalloc (outbuf_length);
-  z_stream out_stream;
-  size_t compressed_bytes = 0;
-

Re: Avoid stack references in inline assembly

2019-06-19 Thread Segher Boessenkool

On Tue, Jun 18, 2019 at 12:19:51PM +0200, Florian Weimer wrote:
> For example, on POWER, the condition register is used to indicate
> errors.  Instead of using that directly, we need to store that in a
> register, via mfcr:

Hrm, that example shows that my suggestion in
https://gcc.gnu.org/ml/gcc/2019-06/msg00170.html needs to be extended a
bit, for multiple return values.  We usually do extra return values for
builtins via an extra pointer; here, that cannot come as last argument
though, the call args are varargs.

So I suggested a generic builtin

  retval = __builtin_syscall (STYLE, syscall_nr, arg0, arg1, ...);

but that should for PowerPC syscalls be

  retval = __builtin_syscall (STYLE, &err, syscall_nr, arg0, arg1, ...);

Generically it it just

  retval = __builtin_syscall (STYLE, args...);

(or without retval, if the target decides this style call has no retval?)

>  <__GI___getdents64>:
>0:   addis   r2,r12,0
> 0: R_PPC64_REL16_HA .TOC.
>4:   addir2,r2,0
> 4: R_PPC64_REL16_LO .TOC.+0x4
>8:   li  r0,202
>c:   sc  
>   10:   mfcrr0

You also get this mfcr if the error code isn't used currently, even.  And
it uses mfcr always it seems, not mfocrf, or directly use the bit like in
a conditional branch insn or an isel or whatnot.

> Ideally, the mfcr, andis, beqlr instructions would just be a bclr
> instruction.

Yup.

Segher

Confusion with labels as values

2019-06-19 Thread Florian Rommel

Hi,

Recently I wanted to take and print the address of a label.  When
compiling with -O2, I noticed that the address equals the function body
start address if the label is not used as a goto target.

Here is an example:

#include 
int main(void) {
printf("main:   %p\n", main);
printf("label1: %p\n", &&label1);
label1:
puts("---");
return 0;
}

compile with:
$ gcc -O2 -o example1 example1.c

or more specifically:
$ gcc -O1 -fschedule-insns2 -o example1 example1.c

Output:
  main:   0x562ed396216e
  label1: 0x562ed396216e
  ---


(or compile with -S to see that the label is moved to the start of the
function)

That is not completely surprising because labels as values are not
really valid outside of the originating function [1].

However when I assign the two addresses to automatic variables (which
should be okay) and compare them, they are different (despite having
the same value; the substraction result is 0).  Passing them to an
external function yields equality again (if the function is not
inlined).

#include 
void compare(size_t x, size_t y) {
printf("x == y : %d\n", x == y);
}
int main(void) {
size_t m = (size_t)main;
size_t l = (size_t)&&label1;
printf("m: %p\n", m);
printf("l: %p\n", l);
printf("m == l : %d\n", m == l);
printf("m - l  :% d\n", m - l);
compare(m, l);
label1:
puts("---");
return 0;
}

Output:
  m: 0x559a775cd16e
  l: 0x559a775cd16e
  m - l  : 0
  m == l : 0
  x == y : 1
  ---


The reasons for this behavior probably lies in constant
folding/propagation.

I'm not sure whether this is technically a bug (Labels as Values /
Computed Gotos are not Standard C anyway).  But this is at least
confusing.  Maybe the label should not be moved in the first place?

Regards,
Flo


[1] https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
// Compile with:
// $ gcc -O2 -o example2 example2.c
// or
// $ gcc -O1 -fschedule-insns2 -o example2 example2.c

#include 

void compare(size_t x, size_t y) {
printf("x == y : %d\n", x == y);
}

int main(void) {
size_t m = (size_t)main;
size_t l = (size_t)&&label1;
printf("m: %p\n", m);
printf("l: %p\n", l);
printf("m == l : %d\n", m == l);
printf("m - l  :% d\n", m - l);
compare(m, l);
label1:
puts("---");
return 0;
}
// Compile with:
// $ gcc -O2 -o example1 example1.c
// or
// $ gcc -O1 -fschedule-insns2 -o example1 example1.c

#include 

int main(void) {
printf("main:   %p\n", main);
printf("label1: %p\n", &&label1);
label1:
puts("---");
return 0;
}

Re: Avoid stack references in inline assembly

2019-06-19 Thread Florian Weimer

* Segher Boessenkool:

>>  <__GI___getdents64>:
>>0:   addis   r2,r12,0
>> 0: R_PPC64_REL16_HA .TOC.
>>4:   addir2,r2,0
>> 4: R_PPC64_REL16_LO .TOC.+0x4
>>8:   li  r0,202
>>c:   sc  
>>   10:   mfcrr0
>
> You also get this mfcr if the error code isn't used currently, even.  And
> it uses mfcr always it seems, not mfocrf, or directly use the bit like in
> a conditional branch insn or an isel or whatnot.

It's hard-coded in the glibc inline assembler fragment for system calls.

I don't think the x/y constraints work in extended asm, despite being
documented in the manual.  If they did, we could output the condition
register to a variable, say err, and then write

  err & (1 << 28)

for the error check.  GCC could then figure out that this actually
checks the condition register and use it directly.

Thanks,
Florian

Re: Expanding roundeven (Was: Re: About GSOC.)

2019-06-19 Thread Tejas Joshi

Hello.
I have made following changes to inspect inlining of roundeven with
the following test code:
double
plusone (double d)
{
return __builtin_roundeven (d) + 1;
}

Running the program using -O2 foo.c gave internal compiler error which
I believe is because gcc_unreachable() at:

if (TARGET_SSE4_1)
emit_insn (gen_sse4_1_round2
   (operands[0], operands[1], GEN_INT (ROUND_
   | ROUND_NO_EXC)));
I think the following condition matches the criterion? :
> I think the code will be much clearer if it explicitly says
> ROUND_ROUNDEVEN | ROUND_NO_EXC,

 else if (TARGET_64BIT || (mode != DFmode))
{
  if (ROUND_ == ROUND_FLOOR)
ix86_expand_floorceil (operands[0], operands[1], true);
  else if (ROUND_ == ROUND_CEIL)
ix86_expand_floorceil (operands[0], operands[1], false);
  else if (ROUND_ == ROUND_TRUNC)
ix86_expand_trunc (operands[0], operands[1]);
  else
gcc_unreachable ();
}
in:
(define_expand "2"
But with -mavx, generated the vroundsd insn. Does it mean ROUNDEVEN
should have a condition in the else if, but comments above
ix86_expand* functions in i386-expand.c says that those are for SSE2
sequences?

Thanks,
--Tejas


On Mon, 17 Jun 2019 at 22:45, Joseph Myers  wrote:
>
> On Mon, 17 Jun 2019, Tejas Joshi wrote:
>
> > > existing ROUND_NO_EXC definition in GCC.  A new definition will need
> > > adding alongside ROUND_FLOOR, ROUND_CEIL and ROUND_TRUNC to correspond to
> > > rounding to nearest with ties to even, evaluating to 0.)
> >
> > So (ROUND_ROUNDEVEN   0x0) be declared for rounding towards nearest
> > even for rounding mode argument? But reference says that RC field
> > should end up as 00B for rounding ties to even? I am also much
>
> I think the code will be much clearer if it explicitly says
> ROUND_ROUNDEVEN | ROUND_NO_EXC, than if it hardcodes implicit knowledge
> that 0 is the value used for rounding to nearest with ties to even.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index e547dda80f1..40536d7929c 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -906,6 +906,7 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundpd, "__builtin_ia32_floor
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundpd, "__builtin_ia32_ceilpd", IX86_BUILTIN_CEILPD, (enum rtx_code) ROUND_CEIL, (int) V2DF_FTYPE_V2DF_ROUND)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundpd, "__builtin_ia32_truncpd", IX86_BUILTIN_TRUNCPD, (enum rtx_code) ROUND_TRUNC, (int) V2DF_FTYPE_V2DF_ROUND)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundpd, "__builtin_ia32_rintpd", IX86_BUILTIN_RINTPD, (enum rtx_code) ROUND_MXCSR, (int) V2DF_FTYPE_V2DF_ROUND)
+BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundpd, "__builtin_ia32_roundevenpd", IX86_BUILTIN_ROUNDEVENPD, (enum rtx_code) ROUND_ROUNDEVEN, (int) V2DF_FTYPE_V2DF_ROUND)
 
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundpd_vec_pack_sfix, "__builtin_ia32_floorpd_vec_pack_sfix", IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX, (enum rtx_code) ROUND_FLOOR, (int) V4SI_FTYPE_V2DF_V2DF_ROUND)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundpd_vec_pack_sfix, "__builtin_ia32_ceilpd_vec_pack_sfix", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX, (enum rtx_code) ROUND_CEIL, (int) V4SI_FTYPE_V2DF_V2DF_ROUND)
@@ -917,6 +918,7 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, "__builtin_ia32_floor
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, "__builtin_ia32_ceilps", IX86_BUILTIN_CEILPS, (enum rtx_code) ROUND_CEIL, (int) V4SF_FTYPE_V4SF_ROUND)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, "__builtin_ia32_truncps", IX86_BUILTIN_TRUNCPS, (enum rtx_code) ROUND_TRUNC, (int) V4SF_FTYPE_V4SF_ROUND)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, "__builtin_ia32_rintps", IX86_BUILTIN_RINTPS, (enum rtx_code) ROUND_MXCSR, (int) V4SF_FTYPE_V4SF_ROUND)
+BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, "__builtin_ia32_roundevenps", IX86_BUILTIN_ROUNDEVENPS, (enum rtx_code) ROUND_ROUNDEVEN, (int) V4SF_FTYPE_V4SF_ROUND)
 
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps_sfix, "__builtin_ia32_floorps_sfix", IX86_BUILTIN_FLOORPS_SFIX, (enum rtx_code) ROUND_FLOOR, (int) V4SI_FTYPE_V4SF_ROUND)
 BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps_sfix, "__builtin_ia32_ceilps_sfix", IX86_BUILTIN_CEILPS_SFIX, (enum rtx_code) ROUND_CEIL, (int) V4SI_FTYPE_V4SF_ROUND)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 01213ccb82c..ce58b5e5c28 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2468,6 +2468,7 @@ enum ix86_stack_slot
   SLOT_CW_TRUNC,
   SLOT_CW_FLOOR,
   SLOT_CW_CEIL,
+  SLOT_CW_ROUNDEVEN,
   SLOT_STV_TEMP,
   MAX_386_STACK_LOCALS
 };
@@ -2479,6 +2480,7 @@ enum ix86_entity
   I387_TRUNC,
   I387_FLOOR,
   I387_CEIL,
+  I387_ROUNDEVEN,
   MAX_386_ENTITIES
 };
 
d

Re: Confusion with labels as values

2019-06-19 Thread Jeff Law

On 6/19/19 7:04 AM, Florian Rommel wrote:
> Hi,
> 
> Recently I wanted to take and print the address of a label.  When
> compiling with -O2, I noticed that the address equals the function body
> start address if the label is not used as a goto target.
> 
> Here is an example:
> 
> #include 
> int main(void) {
> printf("main:   %p\n", main);
> printf("label1: %p\n", &&label1);
> label1:
> puts("---");
> return 0;
> }
> 
> compile with:
> $ gcc -O2 -o example1 example1.c
> 
> or more specifically:
> $ gcc -O1 -fschedule-insns2 -o example1 example1.c
> 
> Output:
>   main:   0x562ed396216e
>   label1: 0x562ed396216e
>   ---
> 
> 
> (or compile with -S to see that the label is moved to the start of the
> function)
> 
> That is not completely surprising because labels as values are not
> really valid outside of the originating function [1].
> 
> However when I assign the two addresses to automatic variables (which
> should be okay) and compare them, they are different (despite having
> the same value; the substraction result is 0).  Passing them to an
> external function yields equality again (if the function is not
> inlined).
> 
> #include 
> void compare(size_t x, size_t y) {
> printf("x == y : %d\n", x == y);
> }
> int main(void) {
> size_t m = (size_t)main;
> size_t l = (size_t)&&label1;
> printf("m: %p\n", m);
> printf("l: %p\n", l);
> printf("m == l : %d\n", m == l);
> printf("m - l  :% d\n", m - l);
> compare(m, l);
> label1:
> puts("---");
> return 0;
> }
> 
> Output:
>   m: 0x559a775cd16e
>   l: 0x559a775cd16e
>   m - l  : 0
>   m == l : 0
>   x == y : 1
>   ---
> 
> 
> The reasons for this behavior probably lies in constant
> folding/propagation.
> 
> I'm not sure whether this is technically a bug (Labels as Values /
> Computed Gotos are not Standard C anyway).  But this is at least
> confusing.  Maybe the label should not be moved in the first place?
A label used as a value, but which is not a jump target will have an
indeterminate value -- it'll end up somewhere in its containing
function,  that's all we guarantee in that case.

jeff

Re: [RFC] zstd as a compression algorithm for LTO

2019-06-19 Thread Jeff Law

On 6/19/19 3:21 AM, Martin Liška wrote:
> Hi.
> 
> I've written a patch draft that replaces zlib with the zstd compression 
> algorithm ([1])
> in LTO. I'm also sending statistics that are collected for couple of quite 
> big C++ source
> files. Observation I did:
> 
> - LTO stream compression takes 3-4% of LGEN compile time
> - zstd in default compression level (3) generated slighly smaller LTO elf 
> files
> - zstd compression is 4-8x faster
> - decompression is quite negligible, but for a bigger project (godot) I can
>   reduction from 1.37 to 0.53 seconds
> - ZSTD API is much simpler to use
> 
> Suggestion based on the observation:
> - I would suggest to make zstd optional (--enable-zstd) and one would
>   use #include  + -lzstd
> - I like the default level as we want to mainly speed up LTO compilation
> - we can provide an option to control algorithm (-flto-compression-algorithm),
>   similarly to -flto-compression-level
> - we can discuss possible compression of LTO bytecode that is distributed 
> between WPA
>   stage and individual LTRANS phases.
Presumably the reason we're not being more aggressive about switching is
the build/run time dependency on zstd?  I wonder if we could default to
zstd and fallback to zlib when zstd isn't available?

jeff

Re: Confusion with labels as values

2019-06-19 Thread Segher Boessenkool

On Wed, Jun 19, 2019 at 09:39:01AM -0600, Jeff Law wrote:
> A label used as a value, but which is not a jump target will have an
> indeterminate value -- it'll end up somewhere in its containing
> function,  that's all we guarantee in that case.

In gimple it was fine and expected, and expand *did* make a code_label,
it was just immediately optimised away:

===
;; Generating RTL for gimple basic block 3

;; label1:

(code_label/s 14 13 15 2 ("label1") [0 uses])

(note 15 14 0 NOTE_INSN_BASIC_BLOCK)
===

and then we get

===
Merging block 3 into block 2...
Merged blocks 2 and 3.
Merged 2 and 3 without moving.
===

leaving

===
(note/s 14 13 16 2 ("label1") NOTE_INSN_DELETED_LABEL 2)
===

Do we want this to work as expected?


Segher

RE: gcc: -ftest-coverage and -auxbase

2019-06-19 Thread David.Taylor

Dell Customer Communication - Confidential

> From: Martin Liška 
> Sent: Wednesday, June 19, 2019 3:19 AM
> 
> On 6/18/19 11:51 PM, david.tay...@dell.com wrote:
> >> From: Martin Liška 
> >> Sent: Tuesday, June 18, 2019 11:20 AM
> >>
> >> .gcno files are created during compilation and contain info about a source
> file.
> >> These files will be created by a cross compiler, so that's fine.
> >>
> >> During a run of a program a .gcda file is created. It contains
> >> information about number of execution of edges. These files are
> >> dumped during at_exit by an instrumented application. And the content
> >> is stored to a disk (.gcda extension).
> >>
> >> So what difficulties do you have with that please?
> >>
> >> Martin
> >
> > Not sure I understand the question.
> >
> > Conceptually I don't have any problems with the compiler creating
> > .gcno files at compile time and the program creating .gcda files at
> > run-time.
> >
> > As far as the .gcda files go, exit is never called -- it's an embedded
> > operating system.  The kernel does not call exit.  Application
> > specific glue code will need to be written.  This is to be expected.
> > And is completely reasonable.
> 
> Yep, then call __gcov_dump at a place where you want to finish
> instrumentation:
> https://gcc.gnu.org/onlinedocs/gcc/Gcov-and-Optimization.html
> 
> >
> > As far as the .gcno files go -- currently, while doing over 10,000
> > compiles GCC wants to write all the .gcno files to the same file name
> > in the same NFS mounted directory.  This is simultaneously not useful
> > and very very slow.
> 
> Please take a look at attached patch, that will allow you to do:
> ./gcc/xgcc -Bgcc /tmp/main.c --coverage -fprofile-note-dir=/tmp/
> 
> $ ls -l /tmp/main.gcno
> -rw-r--r-- 1 marxin users 228 Jun 19 09:18 /tmp/main.gcno
> 
> Is the suggested patch working for you?
> Martin
> 

Thanks for the patch.  Standalone it is not sufficient.  Combined
with the other two changes that have been discussed --

. allowing auxbase to be set and
. changing the specs to only set auxbase if it isn't already set

I think it might well solve the problem.

[Although if auxbase is allowed to be set I'm not convinced that
this patch is necessary.  If auxbase cannot be set, then this patch
alone is insufficient.]

We do all of our compiles from the top of the workspace.
There are a dozen different deliverables being built simultaneously.
There are over 3600 *.c files spread across over 200 directories.
Each deliverable is built by compiling and linking over 1000 of the
*.c files.  Overall over 16,000 compiles occur.  They are all done
from the top directory of the workspace.  And each deliverable
has its own set of compilation defines.  So, foo.o linked into one
deliverable may well be different from the foo.o linked into a
different one.  Further, the build tree structure mimics the
source tree structure.  So, you might well have two foo.o's
in different directories...

Our compilation lines combined with the current specs results
In GCC trying to create over 16,000 GCNO files all named -.gcno
and all living in the top level directory.

> > Down the road I'm going to want to make additional changes -- for
> > example, putting the instrumentation data into a section specified on
> > the command line rather than .data.
> >
> > Right now I'm concerned about the .gcno files.  I want to be able to
> > specify the pathname or the base of the pathname on the command line.
> > I don't really care whether it is called -auxbase or something else.
> > I was thinking '-auxbase' as that is the name currently passed to the
> > sub-processes.  I do not ultimately care what the name is...
> >
> > Additionally, if we do this I want it to be done in a manner that when
> > contributed back is likely to be accepted.
> >
> >

Re: Avoid stack references in inline assembly

2019-06-19 Thread Segher Boessenkool

On Wed, Jun 19, 2019 at 03:09:08PM +0200, Florian Weimer wrote:
> * Segher Boessenkool:
> 
> >>  <__GI___getdents64>:
> >>0:   addis   r2,r12,0
> >> 0: R_PPC64_REL16_HA .TOC.
> >>4:   addir2,r2,0
> >> 4: R_PPC64_REL16_LO .TOC.+0x4
> >>8:   li  r0,202
> >>c:   sc  
> >>   10:   mfcrr0
> >
> > You also get this mfcr if the error code isn't used currently, even.  And
> > it uses mfcr always it seems, not mfocrf, or directly use the bit like in
> > a conditional branch insn or an isel or whatnot.
> 
> It's hard-coded in the glibc inline assembler fragment for system calls.

Yup.

> I don't think the x/y constraints work in extended asm, despite being
> documented in the manual.

You cannot make a CCmode variable in C.

> If they did, we could output the condition
> register to a variable, say err, and then write
> 
>   err & (1 << 28)
> 
> for the error check.

We could implement flag output constraints for the powerpc port, the "=@"
stuff.

> GCC could then figure out that this actually
> checks the condition register and use it directly.

Thing is, the system call sets one CR bit, while the way GCC models this
is with CR fields, four bits each -- and the bit that is used (the SO bit)
isn't even modelled in GCC.


Segher

Re: Confusion with labels as values

2019-06-19 Thread Jeff Law

On 6/19/19 11:09 AM, Segher Boessenkool wrote:
> On Wed, Jun 19, 2019 at 09:39:01AM -0600, Jeff Law wrote:
>> A label used as a value, but which is not a jump target will have an
>> indeterminate value -- it'll end up somewhere in its containing
>> function,  that's all we guarantee in that case.
> 
> In gimple it was fine and expected, and expand *did* make a code_label,
> it was just immediately optimised away:
Yea, because it wasn't used as a jump target.  That's why it gets turned
into a NOTE_INSN_DELETED_LABEL rather than just deleted.

Jeff

Re: [RFC] zstd as a compression algorithm for LTO

2019-06-19 Thread Richard Biener

On June 19, 2019 6:03:21 PM GMT+02:00, Jeff Law  wrote:
>On 6/19/19 3:21 AM, Martin Liška wrote:
>> Hi.
>> 
>> I've written a patch draft that replaces zlib with the zstd
>compression algorithm ([1])
>> in LTO. I'm also sending statistics that are collected for couple of
>quite big C++ source
>> files. Observation I did:
>> 
>> - LTO stream compression takes 3-4% of LGEN compile time
>> - zstd in default compression level (3) generated slighly smaller LTO
>elf files
>> - zstd compression is 4-8x faster
>> - decompression is quite negligible, but for a bigger project (godot)
>I can
>>   reduction from 1.37 to 0.53 seconds
>> - ZSTD API is much simpler to use
>> 
>> Suggestion based on the observation:
>> - I would suggest to make zstd optional (--enable-zstd) and one would
>>   use #include  + -lzstd
>> - I like the default level as we want to mainly speed up LTO
>compilation
>> - we can provide an option to control algorithm
>(-flto-compression-algorithm),
>>   similarly to -flto-compression-level
>> - we can discuss possible compression of LTO bytecode that is
>distributed between WPA
>>   stage and individual LTRANS phases.
>Presumably the reason we're not being more aggressive about switching
>is
>the build/run time dependency on zstd?  I wonder if we could default to
>zstd and fallback to zlib when zstd isn't available?

Is zstd too big to include into the repository? But yes, we can properly encode 
the compression format in the LTO section header and use dlopen to 'find' the 
default to use. 

Richard. 

>jeff

C++17 Support and Website

2019-06-19 Thread Joel Sherrill

Hi

I was double checking the C++17 support in GCC for someone and the text at
this URL states
the support is experimental and gives the impression that the support is
incomplete. The table
of language features now has them all implemented.

Is this text still accurate?

https://gcc.gnu.org/projects/cxx-status.html#cxx17

Thanks.

--joel

Re: C++17 Support and Website

2019-06-19 Thread Jonathan Wakely

On Wed, 19 Jun 2019 at 20:05, Joel Sherrill  wrote:
>
> Hi
>
> I was double checking the C++17 support in GCC for someone and the text at
> this URL states
> the support is experimental and gives the impression that the support is
> incomplete. The table
> of language features now has them all implemented.
>
> Is this text still accurate?

No, see the 9.1.0 announcement:
https://gcc.gnu.org/ml/gcc-announce/2019/msg1.html

> https://gcc.gnu.org/projects/cxx-status.html#cxx17

We should fix that, thanks!

Re: C++17 Support and Website

2019-06-19 Thread Joel Sherrill

On Wed, Jun 19, 2019 at 2:07 PM Jonathan Wakely 
wrote:

> On Wed, 19 Jun 2019 at 20:05, Joel Sherrill  wrote:
> >
> > Hi
> >
> > I was double checking the C++17 support in GCC for someone and the text
> at
> > this URL states
> > the support is experimental and gives the impression that the support is
> > incomplete. The table
> > of language features now has them all implemented.
> >
> > Is this text still accurate?
>
> No, see the 9.1.0 announcement:
> https://gcc.gnu.org/ml/gcc-announce/2019/msg1.html
>
> > https://gcc.gnu.org/projects/cxx-status.html#cxx17
>
> We should fix that, thanks!
>

Thanks for the quick reply. If it is any consolation, LLVM's status page
appears
to similarly suffer from not getting the introductory text updated.

Re: [RFC] zstd as a compression algorithm for LTO

2019-06-19 Thread Andrew Pinski

On Wed, Jun 19, 2019 at 11:55 AM Richard Biener
 wrote:
>
> On June 19, 2019 6:03:21 PM GMT+02:00, Jeff Law  wrote:
> >On 6/19/19 3:21 AM, Martin Liška wrote:
> >> Hi.
> >>
> >> I've written a patch draft that replaces zlib with the zstd
> >compression algorithm ([1])
> >> in LTO. I'm also sending statistics that are collected for couple of
> >quite big C++ source
> >> files. Observation I did:
> >>
> >> - LTO stream compression takes 3-4% of LGEN compile time
> >> - zstd in default compression level (3) generated slighly smaller LTO
> >elf files
> >> - zstd compression is 4-8x faster
> >> - decompression is quite negligible, but for a bigger project (godot)
> >I can
> >>   reduction from 1.37 to 0.53 seconds
> >> - ZSTD API is much simpler to use
> >>
> >> Suggestion based on the observation:
> >> - I would suggest to make zstd optional (--enable-zstd) and one would
> >>   use #include  + -lzstd
> >> - I like the default level as we want to mainly speed up LTO
> >compilation
> >> - we can provide an option to control algorithm
> >(-flto-compression-algorithm),
> >>   similarly to -flto-compression-level
> >> - we can discuss possible compression of LTO bytecode that is
> >distributed between WPA
> >>   stage and individual LTRANS phases.
> >Presumably the reason we're not being more aggressive about switching
> >is
> >the build/run time dependency on zstd?  I wonder if we could default to
> >zstd and fallback to zlib when zstd isn't available?
>
> Is zstd too big to include into the repository? But yes, we can properly 
> encode the compression format in the LTO section header and use dlopen to 
> 'find' the default to use.

At least allow it to be built as part of the normal build like GMP,
etc. are done.
And include it in downloading using contrib/download_prerequisites
like the libraries are done.

Thanks,
Andrew Pinski

>
> Richard.
>
> >jeff
>

Re: [RFC] zstd as a compression algorithm for LTO

2019-06-19 Thread Jan Hubicka

> 
> At least allow it to be built as part of the normal build like GMP,
> etc. are done.
> And include it in downloading using contrib/download_prerequisites
> like the libraries are done.

Anoying detail is that zstd builds with cmake, not autotools

Honza
> 
> Thanks,
> Andrew Pinski
> 
> >
> > Richard.
> >
> > >jeff
> >

Re: [RFC] zstd as a compression algorithm for LTO

2019-06-19 Thread Andrew Pinski

On Wed, Jun 19, 2019 at 12:29 PM Jan Hubicka  wrote:
>
> >
> > At least allow it to be built as part of the normal build like GMP,
> > etc. are done.
> > And include it in downloading using contrib/download_prerequisites
> > like the libraries are done.
>
> Anoying detail is that zstd builds with cmake, not autotools

That makes doing candian crosses interesting and 1000x times harder
than it should be :).

Thanks,
Andrew Pinski

>
> Honza
> >
> > Thanks,
> > Andrew Pinski
> >
> > >
> > > Richard.
> > >
> > > >jeff
> > >

Re: On-Demand range technology [1/5] - Executive Summary

2019-06-19 Thread Kugan Vivekanandarajah

Hi Andrew,

Thanks for working on this.

Enable elimination of zext/sext with VRP patch had to be reverted in
(https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00672.html) due to the
need for value ranges in PROMOTED_MODE precision for at least 1 test
case for alpha.

Playing with ranger suggest that it is not possible to get value
ranges in PROMOTED_MODE precision on demand. Or is there any way we
can use on-demand ranger here?

Thanks,
Kugan

On Thu, 23 May 2019 at 11:28, Andrew MacLeod  wrote:
>
> Now that stage 1 has reopened, I’d like to reopen a discussion about the
> technology and experiences we have from the Ranger project I brought up
> last year. https://gcc.gnu.org/ml/gcc/2018-05/msg00288.html .  (The
> original wiki pages are now out of date, and I will work on updating
> them soon.)
>
> The Ranger is designed to evaluate ranges on-demand rather than through
> a top-down approach. This means you can ask for a range from anywhere,
> and it walks back thru the IL satisfying any preconditions and doing the
> required calculations.  It utilizes a cache to avoid re-doing work. If
> ranges are processed in a forward dominator order, it’s not much
> different than what we do today. Due to its nature, the order you
> process things in has minimal impact on the overall time… You can do it
> in reverse dominator order and get similar times.
>
> It requires no outside preconditions (such as dominators) to work, and
> has a very simple API… Simply query the range of an ssa_name at any
> point in the IL and all the details are taken care of.
>
> We have spent much of the past 6 months refining the prototype (branch
> “ssa-range”) and adjusting it to share as much code with VRP as
> possible. They are currently using a common code base for extracting
> ranges from statements, as well as simplifying statements.
>
> The Ranger deals with just  ranges. The other aspects of  VRP are
> intended to be follow on work that integrates tightly with it, but are
> also independent and would be available for other passes to use.  These
> include:
> - Equivalency tracking
> - Relational processing
> - Bitmask tracking
>
> We have implemented a VRP pass that duplicates the functionality of EVRP
> (other than the bits mentioned above), as well as converted a few other
> passes to use the interface.. I do not anticipate those missing bits
> having a significant impact on the results.
>
> The prototype branch it quite stable and can successfully build and test
> an entire Fedora distribution (9174 packages). There is an issue with
> switches I will discuss later whereby the constant range of a switch
> edge is not readily available and is exponentially expensive to
> calculate. We have a design to address that problem, and in the common
> case we are about 20% faster than EVRP is.
>
> When utilized in passes which only require ranges for a small number of
> ssa-names we see significant improvements.  The sprintf warning pass for
> instance allows us to remove the calculations of dominators and the
> resulting forced walk order. We see a 95% speedup (yes, 1/20th of the
> overall time!).  This is primarily due to no additional overhead and
> only calculating the few things that are actually needed.  The walloca
> and wrestrict passes are a similar model, but as they have not been
> converted to use EVRP ranges yet, we don’t see similar speedups there.
>
> That is the executive summary.  I will go into more details of each
> major thing mentioned in follow on notes so that comments and
> discussions can focus on one thing at a time.
>
> We think this approach is very solid and has many significant benefits
> to GCC. We’d like to address any concerns you may have, and work towards
> finding a way to integrate this model with the code base during this
> stage 1.
>
> Comments and feedback always welcome!
> Thanks
> Andrew

Re: gcc: -ftest-coverage and -auxbase

[RFC] zstd as a compression algorithm for LTO

Re: Avoid stack references in inline assembly

Confusion with labels as values

Re: Avoid stack references in inline assembly

Re: Expanding roundeven (Was: Re: About GSOC.)

Re: Confusion with labels as values

Re: [RFC] zstd as a compression algorithm for LTO

Re: Confusion with labels as values

RE: gcc: -ftest-coverage and -auxbase

Re: Avoid stack references in inline assembly

Re: Confusion with labels as values

Re: [RFC] zstd as a compression algorithm for LTO

C++17 Support and Website

Re: C++17 Support and Website

Re: C++17 Support and Website

Re: [RFC] zstd as a compression algorithm for LTO

Re: [RFC] zstd as a compression algorithm for LTO

Re: [RFC] zstd as a compression algorithm for LTO

Re: On-Demand range technology [1/5] - Executive Summary

20 matches

Site Navigation

Mail list logo

Footer information