Re: gcc: -ftest-coverage and -auxbase
On 6/18/19 11:51 PM, david.tay...@dell.com wrote: >> From: Martin Liška >> Sent: Tuesday, June 18, 2019 11:20 AM >> >> .gcno files are created during compilation and contain info about a source >> file. >> These files will be created by a cross compiler, so that's fine. >> >> During a run of a program a .gcda file is created. It contains information >> about >> number of execution of edges. These files are dumped during at_exit by an >> instrumented application. And the content is stored to a disk (.gcda >> extension). >> >> So what difficulties do you have with that please? >> >> Martin > > Not sure I understand the question. > > Conceptually I don't have any problems with the compiler > creating .gcno files at compile time and the program creating > .gcda files at run-time. > > As far as the .gcda files go, exit is never called -- it's an embedded > operating system. The kernel does not call exit. Application > specific glue code will need to be written. This is to be expected. > And is completely reasonable. Yep, then call __gcov_dump at a place where you want to finish instrumentation: https://gcc.gnu.org/onlinedocs/gcc/Gcov-and-Optimization.html > > As far as the .gcno files go -- currently, while doing over 10,000 > compiles GCC wants to write all the .gcno files to the same file > name in the same NFS mounted directory. This is simultaneously > not useful and very very slow. Please take a look at attached patch, that will allow you to do: ./gcc/xgcc -Bgcc /tmp/main.c --coverage -fprofile-note-dir=/tmp/ $ ls -l /tmp/main.gcno -rw-r--r-- 1 marxin users 228 Jun 19 09:18 /tmp/main.gcno Is the suggested patch working for you? Martin > > Down the road I'm going to want to make additional changes -- > for example, putting the instrumentation data into a section > specified on the command line rather than .data. > > Right now I'm concerned about the .gcno files. I want to be > able to specify the pathname or the base of the pathname > on the command line. I don't really care whether it is called > -auxbase or something else. I was thinking '-auxbase' as that > is the name currently passed to the sub-processes. I do not > ultimately care what the name is... > > Additionally, if we do this I want it to be done in a manner > that when contributed back is likely to be accepted. > > diff --git a/gcc/common.opt b/gcc/common.opt index a1544d06824..d382e70317d 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2096,6 +2096,10 @@ Common Joined RejectNegative Var(profile_data_prefix) Set the top-level directory for storing the profile data. The default is 'pwd'. +fprofile-note-dir= +Common Joined RejectNegative Var(profile_note_prefix) +Set the top-level directory for storing the profile note file. + fprofile-correction Common Report Var(flag_profile_correction) Enable correction of flow inconsistent profile data input. diff --git a/gcc/coverage.c b/gcc/coverage.c index 1ffefd5f482..ea7b258d9dd 100644 --- a/gcc/coverage.c +++ b/gcc/coverage.c @@ -1204,6 +1204,12 @@ coverage_init (const char *filename) int len = strlen (filename); int prefix_len = 0; +#if HAVE_DOS_BASED_FILE_SYSTEM + const char *separator = "\\"; +#else + const char *separator = "/"; +#endif + /* Since coverage_init is invoked very early, before the pass manager, we need to set up the dumping explicitly. This is similar to the handling in finish_optimization_passes. */ @@ -1217,11 +1223,6 @@ coverage_init (const char *filename) of filename in order to prevent file path clashing. */ if (profile_data_prefix) { -#if HAVE_DOS_BASED_FILE_SYSTEM - const char *separator = "\\"; -#else - const char *separator = "/"; -#endif filename = concat (getpwd (), separator, filename, NULL); filename = mangle_path (filename); len = strlen (filename); @@ -1259,6 +1260,9 @@ coverage_init (const char *filename) memcpy (bbg_file_name, filename, len); strcpy (bbg_file_name + len, GCOV_NOTE_SUFFIX); + if (profile_note_prefix) + bbg_file_name = concat (profile_note_prefix, separator, bbg_file_name, NULL); + if (!gcov_open (bbg_file_name, -1)) { error ("cannot open %s", bbg_file_name);
[RFC] zstd as a compression algorithm for LTO
Hi. I've written a patch draft that replaces zlib with the zstd compression algorithm ([1]) in LTO. I'm also sending statistics that are collected for couple of quite big C++ source files. Observation I did: - LTO stream compression takes 3-4% of LGEN compile time - zstd in default compression level (3) generated slighly smaller LTO elf files - zstd compression is 4-8x faster - decompression is quite negligible, but for a bigger project (godot) I can reduction from 1.37 to 0.53 seconds - ZSTD API is much simpler to use Suggestion based on the observation: - I would suggest to make zstd optional (--enable-zstd) and one would use #include + -lzstd - I like the default level as we want to mainly speed up LTO compilation - we can provide an option to control algorithm (-flto-compression-algorithm), similarly to -flto-compression-level - we can discuss possible compression of LTO bytecode that is distributed between WPA stage and individual LTRANS phases. Thoughts? Thanks, Martin [1] https://github.com/facebook/zstd >From 4939e90b2a8051128b7b2b0214a5fad5183f3bca Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Wed, 19 Jun 2019 09:40:35 +0200 Subject: [PATCH] Replace zlib with zstd. --- gcc/Makefile.in| 2 +- gcc/common.opt | 2 +- gcc/lto-compress.c | 161 - gcc/timevar.def| 4 +- 4 files changed, 33 insertions(+), 136 deletions(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index d9e0885b96b..8aedcccb717 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -373,7 +373,7 @@ OUTPUT_OPTION = -o $@ # This is where we get zlib from. zlibdir is -L../zlib and zlibinc is # -I../zlib, unless we were configured with --with-system-zlib, in which # case both are empty. -ZLIB = @zlibdir@ -lz +ZLIB = @zlibdir@ -lzstd -lz ZLIBINC = @zlibinc@ # How to find GMP diff --git a/gcc/common.opt b/gcc/common.opt index a1544d06824..f15e21914f3 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1888,7 +1888,7 @@ Specify the algorithm to partition symbols and vars at linktime. ; The initial value of -1 comes from Z_DEFAULT_COMPRESSION in zlib.h. flto-compression-level= -Common Joined RejectNegative UInteger Var(flag_lto_compression_level) Init(-1) IntegerRange(0, 9) +Common Joined RejectNegative UInteger Var(flag_lto_compression_level) Init(-1) IntegerRange(0, ) -flto-compression-level= Use zlib compression level for IL. flto-odr-type-merging diff --git a/gcc/lto-compress.c b/gcc/lto-compress.c index 3287178f257..b24f30f956e 100644 --- a/gcc/lto-compress.c +++ b/gcc/lto-compress.c @@ -27,13 +27,9 @@ along with GCC; see the file COPYING3. If not see #include "gimple.h" #include "cgraph.h" #include "lto-streamer.h" -/* zlib.h includes other system headers. Those headers may test feature - test macros. config.h may define feature test macros. For this reason, - zlib.h needs to be included after, rather than before, config.h and - system.h. */ -#include #include "lto-compress.h" #include "timevar.h" +#include /* Compression stream structure, holds the flush callback and opaque token, the buffered data, and a note of whether compressing or uncompressing. */ @@ -48,45 +44,23 @@ struct lto_compression_stream bool is_compression; }; -/* Overall compression constants for zlib. */ - -static const size_t Z_BUFFER_LENGTH = 4096; static const size_t MIN_STREAM_ALLOCATION = 1024; -/* For zlib, allocate SIZE count of ITEMS and return the address, OPAQUE - is unused. */ - -static void * -lto_zalloc (void *opaque, unsigned items, unsigned size) -{ - gcc_assert (opaque == Z_NULL); - return xmalloc (items * size); -} - -/* For zlib, free memory at ADDRESS, OPAQUE is unused. */ - -static void -lto_zfree (void *opaque, void *address) -{ - gcc_assert (opaque == Z_NULL); - free (address); -} - -/* Return a zlib compression level that zlib will not reject. Normalizes +/* Return a zstd compression level that zstd will not reject. Normalizes the compression level from the command line flag, clamping non-default values to the appropriate end of their valid range. */ static int -lto_normalized_zlib_level (void) +lto_normalized_zstd_level (void) { int level = flag_lto_compression_level; - if (level != Z_DEFAULT_COMPRESSION) + if (level != ZSTD_CLEVEL_DEFAULT) { - if (level < Z_NO_COMPRESSION) - level = Z_NO_COMPRESSION; - else if (level > Z_BEST_COMPRESSION) - level = Z_BEST_COMPRESSION; + if (level < 1) + level = 1; + else if (level > ZSTD_maxCLevel ()) + level = ZSTD_maxCLevel (); } return level; @@ -169,57 +143,19 @@ void lto_end_compression (struct lto_compression_stream *stream) { unsigned char *cursor = (unsigned char *) stream->buffer; - size_t remaining = stream->bytes; - const size_t outbuf_length = Z_BUFFER_LENGTH; - unsigned char *outbuf = (unsigned char *) xmalloc (outbuf_length); - z_stream out_stream; - size_t compressed_bytes = 0; -
Re: Avoid stack references in inline assembly
On Tue, Jun 18, 2019 at 12:19:51PM +0200, Florian Weimer wrote: > For example, on POWER, the condition register is used to indicate > errors. Instead of using that directly, we need to store that in a > register, via mfcr: Hrm, that example shows that my suggestion in https://gcc.gnu.org/ml/gcc/2019-06/msg00170.html needs to be extended a bit, for multiple return values. We usually do extra return values for builtins via an extra pointer; here, that cannot come as last argument though, the call args are varargs. So I suggested a generic builtin retval = __builtin_syscall (STYLE, syscall_nr, arg0, arg1, ...); but that should for PowerPC syscalls be retval = __builtin_syscall (STYLE, &err, syscall_nr, arg0, arg1, ...); Generically it it just retval = __builtin_syscall (STYLE, args...); (or without retval, if the target decides this style call has no retval?) > <__GI___getdents64>: >0: addis r2,r12,0 > 0: R_PPC64_REL16_HA .TOC. >4: addir2,r2,0 > 4: R_PPC64_REL16_LO .TOC.+0x4 >8: li r0,202 >c: sc > 10: mfcrr0 You also get this mfcr if the error code isn't used currently, even. And it uses mfcr always it seems, not mfocrf, or directly use the bit like in a conditional branch insn or an isel or whatnot. > Ideally, the mfcr, andis, beqlr instructions would just be a bclr > instruction. Yup. Segher
Confusion with labels as values
Hi, Recently I wanted to take and print the address of a label. When compiling with -O2, I noticed that the address equals the function body start address if the label is not used as a goto target. Here is an example: #include int main(void) { printf("main: %p\n", main); printf("label1: %p\n", &&label1); label1: puts("---"); return 0; } compile with: $ gcc -O2 -o example1 example1.c or more specifically: $ gcc -O1 -fschedule-insns2 -o example1 example1.c Output: main: 0x562ed396216e label1: 0x562ed396216e --- (or compile with -S to see that the label is moved to the start of the function) That is not completely surprising because labels as values are not really valid outside of the originating function [1]. However when I assign the two addresses to automatic variables (which should be okay) and compare them, they are different (despite having the same value; the substraction result is 0). Passing them to an external function yields equality again (if the function is not inlined). #include void compare(size_t x, size_t y) { printf("x == y : %d\n", x == y); } int main(void) { size_t m = (size_t)main; size_t l = (size_t)&&label1; printf("m: %p\n", m); printf("l: %p\n", l); printf("m == l : %d\n", m == l); printf("m - l :% d\n", m - l); compare(m, l); label1: puts("---"); return 0; } Output: m: 0x559a775cd16e l: 0x559a775cd16e m - l : 0 m == l : 0 x == y : 1 --- The reasons for this behavior probably lies in constant folding/propagation. I'm not sure whether this is technically a bug (Labels as Values / Computed Gotos are not Standard C anyway). But this is at least confusing. Maybe the label should not be moved in the first place? Regards, Flo [1] https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html // Compile with: // $ gcc -O2 -o example2 example2.c // or // $ gcc -O1 -fschedule-insns2 -o example2 example2.c #include void compare(size_t x, size_t y) { printf("x == y : %d\n", x == y); } int main(void) { size_t m = (size_t)main; size_t l = (size_t)&&label1; printf("m: %p\n", m); printf("l: %p\n", l); printf("m == l : %d\n", m == l); printf("m - l :% d\n", m - l); compare(m, l); label1: puts("---"); return 0; } // Compile with: // $ gcc -O2 -o example1 example1.c // or // $ gcc -O1 -fschedule-insns2 -o example1 example1.c #include int main(void) { printf("main: %p\n", main); printf("label1: %p\n", &&label1); label1: puts("---"); return 0; }
Re: Avoid stack references in inline assembly
* Segher Boessenkool: >> <__GI___getdents64>: >>0: addis r2,r12,0 >> 0: R_PPC64_REL16_HA .TOC. >>4: addir2,r2,0 >> 4: R_PPC64_REL16_LO .TOC.+0x4 >>8: li r0,202 >>c: sc >> 10: mfcrr0 > > You also get this mfcr if the error code isn't used currently, even. And > it uses mfcr always it seems, not mfocrf, or directly use the bit like in > a conditional branch insn or an isel or whatnot. It's hard-coded in the glibc inline assembler fragment for system calls. I don't think the x/y constraints work in extended asm, despite being documented in the manual. If they did, we could output the condition register to a variable, say err, and then write err & (1 << 28) for the error check. GCC could then figure out that this actually checks the condition register and use it directly. Thanks, Florian
Re: Expanding roundeven (Was: Re: About GSOC.)
Hello. I have made following changes to inspect inlining of roundeven with the following test code: double plusone (double d) { return __builtin_roundeven (d) + 1; } Running the program using -O2 foo.c gave internal compiler error which I believe is because gcc_unreachable() at: if (TARGET_SSE4_1) emit_insn (gen_sse4_1_round2 (operands[0], operands[1], GEN_INT (ROUND_ | ROUND_NO_EXC))); I think the following condition matches the criterion? : > I think the code will be much clearer if it explicitly says > ROUND_ROUNDEVEN | ROUND_NO_EXC, else if (TARGET_64BIT || (mode != DFmode)) { if (ROUND_ == ROUND_FLOOR) ix86_expand_floorceil (operands[0], operands[1], true); else if (ROUND_ == ROUND_CEIL) ix86_expand_floorceil (operands[0], operands[1], false); else if (ROUND_ == ROUND_TRUNC) ix86_expand_trunc (operands[0], operands[1]); else gcc_unreachable (); } in: (define_expand "2" But with -mavx, generated the vroundsd insn. Does it mean ROUNDEVEN should have a condition in the else if, but comments above ix86_expand* functions in i386-expand.c says that those are for SSE2 sequences? Thanks, --Tejas On Mon, 17 Jun 2019 at 22:45, Joseph Myers wrote: > > On Mon, 17 Jun 2019, Tejas Joshi wrote: > > > > existing ROUND_NO_EXC definition in GCC. A new definition will need > > > adding alongside ROUND_FLOOR, ROUND_CEIL and ROUND_TRUNC to correspond to > > > rounding to nearest with ties to even, evaluating to 0.) > > > > So (ROUND_ROUNDEVEN 0x0) be declared for rounding towards nearest > > even for rounding mode argument? But reference says that RC field > > should end up as 00B for rounding ties to even? I am also much > > I think the code will be much clearer if it explicitly says > ROUND_ROUNDEVEN | ROUND_NO_EXC, than if it hardcodes implicit knowledge > that 0 is the value used for rounding to nearest with ties to even. > > -- > Joseph S. Myers > jos...@codesourcery.com diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index e547dda80f1..40536d7929c 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -906,6 +906,7 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundpd, "__builtin_ia32_floor BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundpd, "__builtin_ia32_ceilpd", IX86_BUILTIN_CEILPD, (enum rtx_code) ROUND_CEIL, (int) V2DF_FTYPE_V2DF_ROUND) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundpd, "__builtin_ia32_truncpd", IX86_BUILTIN_TRUNCPD, (enum rtx_code) ROUND_TRUNC, (int) V2DF_FTYPE_V2DF_ROUND) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundpd, "__builtin_ia32_rintpd", IX86_BUILTIN_RINTPD, (enum rtx_code) ROUND_MXCSR, (int) V2DF_FTYPE_V2DF_ROUND) +BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundpd, "__builtin_ia32_roundevenpd", IX86_BUILTIN_ROUNDEVENPD, (enum rtx_code) ROUND_ROUNDEVEN, (int) V2DF_FTYPE_V2DF_ROUND) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundpd_vec_pack_sfix, "__builtin_ia32_floorpd_vec_pack_sfix", IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX, (enum rtx_code) ROUND_FLOOR, (int) V4SI_FTYPE_V2DF_V2DF_ROUND) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundpd_vec_pack_sfix, "__builtin_ia32_ceilpd_vec_pack_sfix", IX86_BUILTIN_CEILPD_VEC_PACK_SFIX, (enum rtx_code) ROUND_CEIL, (int) V4SI_FTYPE_V2DF_V2DF_ROUND) @@ -917,6 +918,7 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, "__builtin_ia32_floor BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, "__builtin_ia32_ceilps", IX86_BUILTIN_CEILPS, (enum rtx_code) ROUND_CEIL, (int) V4SF_FTYPE_V4SF_ROUND) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, "__builtin_ia32_truncps", IX86_BUILTIN_TRUNCPS, (enum rtx_code) ROUND_TRUNC, (int) V4SF_FTYPE_V4SF_ROUND) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, "__builtin_ia32_rintps", IX86_BUILTIN_RINTPS, (enum rtx_code) ROUND_MXCSR, (int) V4SF_FTYPE_V4SF_ROUND) +BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, "__builtin_ia32_roundevenps", IX86_BUILTIN_ROUNDEVENPS, (enum rtx_code) ROUND_ROUNDEVEN, (int) V4SF_FTYPE_V4SF_ROUND) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps_sfix, "__builtin_ia32_floorps_sfix", IX86_BUILTIN_FLOORPS_SFIX, (enum rtx_code) ROUND_FLOOR, (int) V4SI_FTYPE_V4SF_ROUND) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps_sfix, "__builtin_ia32_ceilps_sfix", IX86_BUILTIN_CEILPS_SFIX, (enum rtx_code) ROUND_CEIL, (int) V4SI_FTYPE_V4SF_ROUND) diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 01213ccb82c..ce58b5e5c28 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -2468,6 +2468,7 @@ enum ix86_stack_slot SLOT_CW_TRUNC, SLOT_CW_FLOOR, SLOT_CW_CEIL, + SLOT_CW_ROUNDEVEN, SLOT_STV_TEMP, MAX_386_STACK_LOCALS }; @@ -2479,6 +2480,7 @@ enum ix86_entity I387_TRUNC, I387_FLOOR, I387_CEIL, + I387_ROUNDEVEN, MAX_386_ENTITIES }; d
Re: Confusion with labels as values
On 6/19/19 7:04 AM, Florian Rommel wrote: > Hi, > > Recently I wanted to take and print the address of a label. When > compiling with -O2, I noticed that the address equals the function body > start address if the label is not used as a goto target. > > Here is an example: > > #include > int main(void) { > printf("main: %p\n", main); > printf("label1: %p\n", &&label1); > label1: > puts("---"); > return 0; > } > > compile with: > $ gcc -O2 -o example1 example1.c > > or more specifically: > $ gcc -O1 -fschedule-insns2 -o example1 example1.c > > Output: > main: 0x562ed396216e > label1: 0x562ed396216e > --- > > > (or compile with -S to see that the label is moved to the start of the > function) > > That is not completely surprising because labels as values are not > really valid outside of the originating function [1]. > > However when I assign the two addresses to automatic variables (which > should be okay) and compare them, they are different (despite having > the same value; the substraction result is 0). Passing them to an > external function yields equality again (if the function is not > inlined). > > #include > void compare(size_t x, size_t y) { > printf("x == y : %d\n", x == y); > } > int main(void) { > size_t m = (size_t)main; > size_t l = (size_t)&&label1; > printf("m: %p\n", m); > printf("l: %p\n", l); > printf("m == l : %d\n", m == l); > printf("m - l :% d\n", m - l); > compare(m, l); > label1: > puts("---"); > return 0; > } > > Output: > m: 0x559a775cd16e > l: 0x559a775cd16e > m - l : 0 > m == l : 0 > x == y : 1 > --- > > > The reasons for this behavior probably lies in constant > folding/propagation. > > I'm not sure whether this is technically a bug (Labels as Values / > Computed Gotos are not Standard C anyway). But this is at least > confusing. Maybe the label should not be moved in the first place? A label used as a value, but which is not a jump target will have an indeterminate value -- it'll end up somewhere in its containing function, that's all we guarantee in that case. jeff
Re: [RFC] zstd as a compression algorithm for LTO
On 6/19/19 3:21 AM, Martin Liška wrote: > Hi. > > I've written a patch draft that replaces zlib with the zstd compression > algorithm ([1]) > in LTO. I'm also sending statistics that are collected for couple of quite > big C++ source > files. Observation I did: > > - LTO stream compression takes 3-4% of LGEN compile time > - zstd in default compression level (3) generated slighly smaller LTO elf > files > - zstd compression is 4-8x faster > - decompression is quite negligible, but for a bigger project (godot) I can > reduction from 1.37 to 0.53 seconds > - ZSTD API is much simpler to use > > Suggestion based on the observation: > - I would suggest to make zstd optional (--enable-zstd) and one would > use #include + -lzstd > - I like the default level as we want to mainly speed up LTO compilation > - we can provide an option to control algorithm (-flto-compression-algorithm), > similarly to -flto-compression-level > - we can discuss possible compression of LTO bytecode that is distributed > between WPA > stage and individual LTRANS phases. Presumably the reason we're not being more aggressive about switching is the build/run time dependency on zstd? I wonder if we could default to zstd and fallback to zlib when zstd isn't available? jeff
Re: Confusion with labels as values
On Wed, Jun 19, 2019 at 09:39:01AM -0600, Jeff Law wrote: > A label used as a value, but which is not a jump target will have an > indeterminate value -- it'll end up somewhere in its containing > function, that's all we guarantee in that case. In gimple it was fine and expected, and expand *did* make a code_label, it was just immediately optimised away: === ;; Generating RTL for gimple basic block 3 ;; label1: (code_label/s 14 13 15 2 ("label1") [0 uses]) (note 15 14 0 NOTE_INSN_BASIC_BLOCK) === and then we get === Merging block 3 into block 2... Merged blocks 2 and 3. Merged 2 and 3 without moving. === leaving === (note/s 14 13 16 2 ("label1") NOTE_INSN_DELETED_LABEL 2) === Do we want this to work as expected? Segher
RE: gcc: -ftest-coverage and -auxbase
Dell Customer Communication - Confidential > From: Martin Liška > Sent: Wednesday, June 19, 2019 3:19 AM > > On 6/18/19 11:51 PM, david.tay...@dell.com wrote: > >> From: Martin Liška > >> Sent: Tuesday, June 18, 2019 11:20 AM > >> > >> .gcno files are created during compilation and contain info about a source > file. > >> These files will be created by a cross compiler, so that's fine. > >> > >> During a run of a program a .gcda file is created. It contains > >> information about number of execution of edges. These files are > >> dumped during at_exit by an instrumented application. And the content > >> is stored to a disk (.gcda extension). > >> > >> So what difficulties do you have with that please? > >> > >> Martin > > > > Not sure I understand the question. > > > > Conceptually I don't have any problems with the compiler creating > > .gcno files at compile time and the program creating .gcda files at > > run-time. > > > > As far as the .gcda files go, exit is never called -- it's an embedded > > operating system. The kernel does not call exit. Application > > specific glue code will need to be written. This is to be expected. > > And is completely reasonable. > > Yep, then call __gcov_dump at a place where you want to finish > instrumentation: > https://gcc.gnu.org/onlinedocs/gcc/Gcov-and-Optimization.html > > > > > As far as the .gcno files go -- currently, while doing over 10,000 > > compiles GCC wants to write all the .gcno files to the same file name > > in the same NFS mounted directory. This is simultaneously not useful > > and very very slow. > > Please take a look at attached patch, that will allow you to do: > ./gcc/xgcc -Bgcc /tmp/main.c --coverage -fprofile-note-dir=/tmp/ > > $ ls -l /tmp/main.gcno > -rw-r--r-- 1 marxin users 228 Jun 19 09:18 /tmp/main.gcno > > Is the suggested patch working for you? > Martin > Thanks for the patch. Standalone it is not sufficient. Combined with the other two changes that have been discussed -- . allowing auxbase to be set and . changing the specs to only set auxbase if it isn't already set I think it might well solve the problem. [Although if auxbase is allowed to be set I'm not convinced that this patch is necessary. If auxbase cannot be set, then this patch alone is insufficient.] We do all of our compiles from the top of the workspace. There are a dozen different deliverables being built simultaneously. There are over 3600 *.c files spread across over 200 directories. Each deliverable is built by compiling and linking over 1000 of the *.c files. Overall over 16,000 compiles occur. They are all done from the top directory of the workspace. And each deliverable has its own set of compilation defines. So, foo.o linked into one deliverable may well be different from the foo.o linked into a different one. Further, the build tree structure mimics the source tree structure. So, you might well have two foo.o's in different directories... Our compilation lines combined with the current specs results In GCC trying to create over 16,000 GCNO files all named -.gcno and all living in the top level directory. > > Down the road I'm going to want to make additional changes -- for > > example, putting the instrumentation data into a section specified on > > the command line rather than .data. > > > > Right now I'm concerned about the .gcno files. I want to be able to > > specify the pathname or the base of the pathname on the command line. > > I don't really care whether it is called -auxbase or something else. > > I was thinking '-auxbase' as that is the name currently passed to the > > sub-processes. I do not ultimately care what the name is... > > > > Additionally, if we do this I want it to be done in a manner that when > > contributed back is likely to be accepted. > > > >
Re: Avoid stack references in inline assembly
On Wed, Jun 19, 2019 at 03:09:08PM +0200, Florian Weimer wrote: > * Segher Boessenkool: > > >> <__GI___getdents64>: > >>0: addis r2,r12,0 > >> 0: R_PPC64_REL16_HA .TOC. > >>4: addir2,r2,0 > >> 4: R_PPC64_REL16_LO .TOC.+0x4 > >>8: li r0,202 > >>c: sc > >> 10: mfcrr0 > > > > You also get this mfcr if the error code isn't used currently, even. And > > it uses mfcr always it seems, not mfocrf, or directly use the bit like in > > a conditional branch insn or an isel or whatnot. > > It's hard-coded in the glibc inline assembler fragment for system calls. Yup. > I don't think the x/y constraints work in extended asm, despite being > documented in the manual. You cannot make a CCmode variable in C. > If they did, we could output the condition > register to a variable, say err, and then write > > err & (1 << 28) > > for the error check. We could implement flag output constraints for the powerpc port, the "=@" stuff. > GCC could then figure out that this actually > checks the condition register and use it directly. Thing is, the system call sets one CR bit, while the way GCC models this is with CR fields, four bits each -- and the bit that is used (the SO bit) isn't even modelled in GCC. Segher
Re: Confusion with labels as values
On 6/19/19 11:09 AM, Segher Boessenkool wrote: > On Wed, Jun 19, 2019 at 09:39:01AM -0600, Jeff Law wrote: >> A label used as a value, but which is not a jump target will have an >> indeterminate value -- it'll end up somewhere in its containing >> function, that's all we guarantee in that case. > > In gimple it was fine and expected, and expand *did* make a code_label, > it was just immediately optimised away: Yea, because it wasn't used as a jump target. That's why it gets turned into a NOTE_INSN_DELETED_LABEL rather than just deleted. Jeff
Re: [RFC] zstd as a compression algorithm for LTO
On June 19, 2019 6:03:21 PM GMT+02:00, Jeff Law wrote: >On 6/19/19 3:21 AM, Martin Liška wrote: >> Hi. >> >> I've written a patch draft that replaces zlib with the zstd >compression algorithm ([1]) >> in LTO. I'm also sending statistics that are collected for couple of >quite big C++ source >> files. Observation I did: >> >> - LTO stream compression takes 3-4% of LGEN compile time >> - zstd in default compression level (3) generated slighly smaller LTO >elf files >> - zstd compression is 4-8x faster >> - decompression is quite negligible, but for a bigger project (godot) >I can >> reduction from 1.37 to 0.53 seconds >> - ZSTD API is much simpler to use >> >> Suggestion based on the observation: >> - I would suggest to make zstd optional (--enable-zstd) and one would >> use #include + -lzstd >> - I like the default level as we want to mainly speed up LTO >compilation >> - we can provide an option to control algorithm >(-flto-compression-algorithm), >> similarly to -flto-compression-level >> - we can discuss possible compression of LTO bytecode that is >distributed between WPA >> stage and individual LTRANS phases. >Presumably the reason we're not being more aggressive about switching >is >the build/run time dependency on zstd? I wonder if we could default to >zstd and fallback to zlib when zstd isn't available? Is zstd too big to include into the repository? But yes, we can properly encode the compression format in the LTO section header and use dlopen to 'find' the default to use. Richard. >jeff
C++17 Support and Website
Hi I was double checking the C++17 support in GCC for someone and the text at this URL states the support is experimental and gives the impression that the support is incomplete. The table of language features now has them all implemented. Is this text still accurate? https://gcc.gnu.org/projects/cxx-status.html#cxx17 Thanks. --joel
Re: C++17 Support and Website
On Wed, 19 Jun 2019 at 20:05, Joel Sherrill wrote: > > Hi > > I was double checking the C++17 support in GCC for someone and the text at > this URL states > the support is experimental and gives the impression that the support is > incomplete. The table > of language features now has them all implemented. > > Is this text still accurate? No, see the 9.1.0 announcement: https://gcc.gnu.org/ml/gcc-announce/2019/msg1.html > https://gcc.gnu.org/projects/cxx-status.html#cxx17 We should fix that, thanks!
Re: C++17 Support and Website
On Wed, Jun 19, 2019 at 2:07 PM Jonathan Wakely wrote: > On Wed, 19 Jun 2019 at 20:05, Joel Sherrill wrote: > > > > Hi > > > > I was double checking the C++17 support in GCC for someone and the text > at > > this URL states > > the support is experimental and gives the impression that the support is > > incomplete. The table > > of language features now has them all implemented. > > > > Is this text still accurate? > > No, see the 9.1.0 announcement: > https://gcc.gnu.org/ml/gcc-announce/2019/msg1.html > > > https://gcc.gnu.org/projects/cxx-status.html#cxx17 > > We should fix that, thanks! > Thanks for the quick reply. If it is any consolation, LLVM's status page appears to similarly suffer from not getting the introductory text updated.
Re: [RFC] zstd as a compression algorithm for LTO
On Wed, Jun 19, 2019 at 11:55 AM Richard Biener wrote: > > On June 19, 2019 6:03:21 PM GMT+02:00, Jeff Law wrote: > >On 6/19/19 3:21 AM, Martin Liška wrote: > >> Hi. > >> > >> I've written a patch draft that replaces zlib with the zstd > >compression algorithm ([1]) > >> in LTO. I'm also sending statistics that are collected for couple of > >quite big C++ source > >> files. Observation I did: > >> > >> - LTO stream compression takes 3-4% of LGEN compile time > >> - zstd in default compression level (3) generated slighly smaller LTO > >elf files > >> - zstd compression is 4-8x faster > >> - decompression is quite negligible, but for a bigger project (godot) > >I can > >> reduction from 1.37 to 0.53 seconds > >> - ZSTD API is much simpler to use > >> > >> Suggestion based on the observation: > >> - I would suggest to make zstd optional (--enable-zstd) and one would > >> use #include + -lzstd > >> - I like the default level as we want to mainly speed up LTO > >compilation > >> - we can provide an option to control algorithm > >(-flto-compression-algorithm), > >> similarly to -flto-compression-level > >> - we can discuss possible compression of LTO bytecode that is > >distributed between WPA > >> stage and individual LTRANS phases. > >Presumably the reason we're not being more aggressive about switching > >is > >the build/run time dependency on zstd? I wonder if we could default to > >zstd and fallback to zlib when zstd isn't available? > > Is zstd too big to include into the repository? But yes, we can properly > encode the compression format in the LTO section header and use dlopen to > 'find' the default to use. At least allow it to be built as part of the normal build like GMP, etc. are done. And include it in downloading using contrib/download_prerequisites like the libraries are done. Thanks, Andrew Pinski > > Richard. > > >jeff >
Re: [RFC] zstd as a compression algorithm for LTO
> > At least allow it to be built as part of the normal build like GMP, > etc. are done. > And include it in downloading using contrib/download_prerequisites > like the libraries are done. Anoying detail is that zstd builds with cmake, not autotools Honza > > Thanks, > Andrew Pinski > > > > > Richard. > > > > >jeff > >
Re: [RFC] zstd as a compression algorithm for LTO
On Wed, Jun 19, 2019 at 12:29 PM Jan Hubicka wrote: > > > > > At least allow it to be built as part of the normal build like GMP, > > etc. are done. > > And include it in downloading using contrib/download_prerequisites > > like the libraries are done. > > Anoying detail is that zstd builds with cmake, not autotools That makes doing candian crosses interesting and 1000x times harder than it should be :). Thanks, Andrew Pinski > > Honza > > > > Thanks, > > Andrew Pinski > > > > > > > > Richard. > > > > > > >jeff > > >
Re: On-Demand range technology [1/5] - Executive Summary
Hi Andrew, Thanks for working on this. Enable elimination of zext/sext with VRP patch had to be reverted in (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00672.html) due to the need for value ranges in PROMOTED_MODE precision for at least 1 test case for alpha. Playing with ranger suggest that it is not possible to get value ranges in PROMOTED_MODE precision on demand. Or is there any way we can use on-demand ranger here? Thanks, Kugan On Thu, 23 May 2019 at 11:28, Andrew MacLeod wrote: > > Now that stage 1 has reopened, I’d like to reopen a discussion about the > technology and experiences we have from the Ranger project I brought up > last year. https://gcc.gnu.org/ml/gcc/2018-05/msg00288.html . (The > original wiki pages are now out of date, and I will work on updating > them soon.) > > The Ranger is designed to evaluate ranges on-demand rather than through > a top-down approach. This means you can ask for a range from anywhere, > and it walks back thru the IL satisfying any preconditions and doing the > required calculations. It utilizes a cache to avoid re-doing work. If > ranges are processed in a forward dominator order, it’s not much > different than what we do today. Due to its nature, the order you > process things in has minimal impact on the overall time… You can do it > in reverse dominator order and get similar times. > > It requires no outside preconditions (such as dominators) to work, and > has a very simple API… Simply query the range of an ssa_name at any > point in the IL and all the details are taken care of. > > We have spent much of the past 6 months refining the prototype (branch > “ssa-range”) and adjusting it to share as much code with VRP as > possible. They are currently using a common code base for extracting > ranges from statements, as well as simplifying statements. > > The Ranger deals with just ranges. The other aspects of VRP are > intended to be follow on work that integrates tightly with it, but are > also independent and would be available for other passes to use. These > include: > - Equivalency tracking > - Relational processing > - Bitmask tracking > > We have implemented a VRP pass that duplicates the functionality of EVRP > (other than the bits mentioned above), as well as converted a few other > passes to use the interface.. I do not anticipate those missing bits > having a significant impact on the results. > > The prototype branch it quite stable and can successfully build and test > an entire Fedora distribution (9174 packages). There is an issue with > switches I will discuss later whereby the constant range of a switch > edge is not readily available and is exponentially expensive to > calculate. We have a design to address that problem, and in the common > case we are about 20% faster than EVRP is. > > When utilized in passes which only require ranges for a small number of > ssa-names we see significant improvements. The sprintf warning pass for > instance allows us to remove the calculations of dominators and the > resulting forced walk order. We see a 95% speedup (yes, 1/20th of the > overall time!). This is primarily due to no additional overhead and > only calculating the few things that are actually needed. The walloca > and wrestrict passes are a similar model, but as they have not been > converted to use EVRP ranges yet, we don’t see similar speedups there. > > That is the executive summary. I will go into more details of each > major thing mentioned in follow on notes so that comments and > discussions can focus on one thing at a time. > > We think this approach is very solid and has many significant benefits > to GCC. We’d like to address any concerns you may have, and work towards > finding a way to integrate this model with the code base during this > stage 1. > > Comments and feedback always welcome! > Thanks > Andrew