[Bug tree-optimization/28364] New: poor optimization choices when iterating over a std::string (probably not c++-specific)
Consider the following test program: #include bool has_bad_chars(std::string const & path) { for (std::string::const_iterator c = path.begin(); c != path.end(); c++) { unsigned char x = static_cast(*c); if (x <= 0x1f || x == 0x5c || x == 0x7f) return true; } return false; } At -O2, GCC 4.1 chooses to duplicate the entire body of the loop for no good reason; the code is not rendered more straight-line by this, and in fact it executes strictly more instructions even for a single-character string. I'll attach an assembly file showing what it did (Z13has_bad_charsRKSs) and what it should have done (_Z14has_bad_chars2RKSs). The bad transformation is done by the .t45.ch pass, which acronym does not mean anything to me. -- Summary: poor optimization choices when iterating over a std::string (probably not c++-specific) Product: gcc Version: 4.1.2 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: zackw at panix dot com GCC build triplet: i486-linux-gnu GCC host triplet: i486-linux-gnu GCC target triplet: i486-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364
[Bug tree-optimization/28364] poor optimization choices when iterating over a std::string (probably not c++-specific)
--- Comment #1 from zackw at panix dot com 2006-07-12 22:33 --- Created an attachment (id=11874) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11874&action=view) assembly output (bad on top, hand-corrected below) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364
[Bug tree-optimization/28364] poor optimization choices when iterating over a std::string (probably not c++-specific)
--- Comment #3 from zackw at panix dot com 2006-07-12 22:42 --- I should mention that the exact command line flags were -O2 -fomit-frame-pointer -march=pentium4, and that I hand-tweaked the label numbers for ease of reading. Also, -fno-tree-ch does suppress this bad optimization, but in exchange we get mildly worse code from the loop optimizer proper - it uses [reg+reg] indexing and a 0..n count instead of [reg] indexing and a base..limit count. The code is pretty short so I'll just paste it here (meaningless labels removed): _Z17has_bad_chars_newRKSs: pushl %ebx movl8(%esp), %eax movl(%eax), %eax xorl%ecx, %ecx movl-12(%eax), %ebx .L2: cmpl%ecx, %ebx je .L10 movzbl (%ecx,%eax), %edx cmpb$31, %dl jbe .L4 cmpb$92, %dl je .L4 addl$1, %ecx cmpb$127, %dl jne .L2 .L4: movl$1, %eax popl%ebx .p2align 4,,2 ret .L10: xorl%eax, %eax popl%ebx .p2align 4,,2 ret Looking at the code, I see that the entire purpose of tree-ch is to duplicate loop bodies in this fashion, and the justification given is that it "increases effectiveness of code motion and reduces the need for loop preconditioning", which I take to cover the above degradation in addressing mode choice. I'm not an optimizer expert, but surely there is a way to get the best of both worlds here...? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364
[Bug tree-optimization/28364] poor optimization choices when iterating over a std::string (probably not c++-specific)
--- Comment #4 from zackw at panix dot com 2006-07-12 22:48 --- I remembered that I had a build of 4.2 from last week lying around. It generates even worse code - still with the duplication of most of the loop, plus a bunch of unnecessary register fiddling and bad addressing mode choice. -- zackw at panix dot com changed: What|Removed |Added Known to fail||4.1.2 4.2.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364
[Bug tree-optimization/28364] poor optimization choices when iterating over a std::string (probably not c++-specific)
--- Comment #7 from zackw at panix dot com 2006-07-12 23:19 --- Created an attachment (id=11875) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11875&action=view) C test case (with interesting implications) I've found a plain C test case. In the process, I've found that the way libstdc++ is coded interacts interestingly with the optimizer. In the attached file, has_bad_chars_bad is a literal translation to C of the code seen by the optimizers after inlining for the original C++ test case. Yes, libstdc++ does the moral equivalent of ((struct rep*)path)[-1].len. This function compiles to the same bad code as my original test case. has_bad_chars_good, on the other hand, is how I naively thought worked on the first read-through. That one compiles to code which looks optimal to me. I suspect some optimizer or other is not smart enough to see through this particular construct ... it would be good to make it do so, since we want libstdc++ to generate good code. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364
[Bug tree-optimization/28364] poor optimization choices when iterating over a std::string (probably not c++-specific)
--- Comment #8 from zackw at panix dot com 2006-07-12 23:21 --- Zdenek: I don't see how you can say that what we get is optimal code "unless optimizing for size". The code generated *will* be slower than the alternative. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364
[Bug tree-optimization/28364] poor optimization choices when iterating over a std::string (probably not c++-specific)
--- Comment #10 from zackw at panix dot com 2006-07-12 23:33 --- I-cache. Also, more iterations before the branch predictors figure out what's going on. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364
[Bug tree-optimization/28364] poor optimization choices when iterating over a std::string (probably not c++-specific)
--- Comment #12 from zackw at panix dot com 2006-07-13 03:09 --- Subject: Re: poor optimization choices when iterating over a std::string (probably not c++-specific) > > I-cache. > this only matters if this increase in code size will make the code go > out of instruction cache. The real program that this is taken from is a large C++ application which is guaranteed to go out of cache - it's got slightly less than four megabytes of .text - the actual goal is to make sure all of its inner inner inner loops do stay in cache. And this is one of 'em. > > Also, more iterations before the branch predictors figure out what's > > going on. > But also possibly more consistent behavior with respect to branch > prediction, in case the loop is often exited in the first iteration. Again, in real life I know a priori that the function is rarely, if ever, called with a zero-length string. - I went through the tree dumps for my week-old 4.2.0 for the test program with a fine comb. They are quite instructive. If tree-ch were doing what it says on the label -- copying the loop header -- everything would be fine; dom1 cleans up the two copies of the header later. However, ch isn't just copying the loop header; it is also copying the *entire body of the loop*, which nothing can fix. I call that a clear bug. zw -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364
[Bug tree-optimization/28364] poor optimization choices when iterating over a std::string (probably not c++-specific)
--- Comment #14 from zackw at panix dot com 2006-07-13 03:40 --- Subject: Re: poor optimization choices when iterating over a std::string (probably not c++-specific) It's a validation routine, yes, which is run over every pathname the program is working on, and there can be hundreds or thousands of those. And why the heck shouldn't I be able to use std::string in inner loops? I sure don't want to be using bare char*... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364
[Bug tree-optimization/28364] poor optimization choices when iterating over a std::string (probably not c++-specific)
--- Comment #17 from zackw at panix dot com 2006-07-13 04:23 --- Subject: Re: poor optimization choices when iterating over a std::string (probably not c++-specific) > One more comment, a loop with an early exit is whole issue and that is the > reason why all of the code in the loop is considered the header. There are a > couple of loop headers in this case, one for each exit which makes it harder > to > deal with in general. I didn't know that, and it is not obvious from the optimizer dumps. Thanks for explaining. > What you did not mention is which how would this loop exit normally, via the > return 1 or all the way through the loop. There is no enough information from > GCC's point of view to figure that out without profiling (for this case). GCC > is assuming that the loop exits in the first if statement which seems > reasoniable. Maybe you should try with profiling information and see what GCC > does for this testcase. Feedback-directed optimization is only good for making compilers look better on benchmarks. It's useless in real life. I can, in fact, get good code out of gcc 4.1 by beating it over the head with __builtin_expect, but I don't think I should have to do that. I think my suggested version is better code no matter whether or not the loop exits early. 4.2 still makes what I consider to be bad addressing mode choices after that change, but Zdenek did say he would look at that. It also puts the "return 1" exit block in the middle of the loop in spite of being told that all three conditions leading to that are unlikely. struct rep { unsigned long len; unsigned long alloc; unsigned long dummy; }; struct data { char * ptr; }; struct string { struct rep R; struct data D; }; int has_bad_chars(struct data *path) { char *c; for (c = path->ptr; __builtin_expect(c < path->ptr + ((struct rep *)path)[-1].len, 1); c++) { unsigned char x = (unsigned char)(*c); if (__builtin_expect(x <= 0x1f || x == 0x5c || x == 0x7f, 0)) return 1; } return 0; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364
[Bug tree-optimization/28364] poor optimization choices when iterating over a std::string (probably not c++-specific)
--- Comment #20 from zackw at panix dot com 2006-07-13 08:25 --- Subject: Re: poor optimization choices when iterating over a std::string (probably not c++-specific) > > However, ch isn't just copying the loop header; it is also > > copying the *entire body of the loop*, which nothing can fix. I call > > that a clear bug. > > how do you define a loop header? I was under the impression it was just the one basic block called out in the .ch dump, e.g. ;; Loop 1 ;; header 6, latch 5 ;; depth 1, level 1, outer 0 -- basic block 6 happens to contain just the code from the syntactic loop condition. Andrew informs me that this is wrong, and that in this case the header is the entire loop, but I will come back at that with 'ch should never be duplicating the entire loop; if the header is the entire loop, it should do something more sensible, like duplicate just the first basic block or something.' -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364
[Bug tree-optimization/28364] poor optimization choices when iterating over a std::string (probably not c++-specific)
--- Comment #21 from zackw at panix dot com 2006-07-13 08:28 --- Subject: Re: poor optimization choices when iterating over a std::string (probably not c++-specific) > on your real program, how much performance do you gain by hand-rewriting > the assembler to look the way you like? Just to make sure there really > is a problem. I'm a little annoyed by the suggestion that this wouldn't be a real problem if I couldn't measure a performance difference. Depending on workload, other activity on the same machine, and phase of moon, this loop is between .1% and 1% of runtime, and my tweaks make it go about a third faster. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364
[Bug libstdc++/31464] New: Extension request: publicly visible forward-declaration headers for and all STL containers
It would be nice if there were publicly visible headers analogous to for and all the STL headers. To be more concrete: for every STL header there should be a that contains *only* forward declarations of the types with complete declarations in . The existing is an example of exactly what I have in mind, and should be promoted to . The value of these headers would be primarily in writing application headers; it is common to need to include or or whatever solely to be able to refer to the types it defines in function prototypes. By providing forward-declaration headers, the application headers would not need to drag in the full text of the STL headers. Application source files could then include the complete-declaration headers only for the types they actually used. -- Summary: Extension request: publicly visible forward-declaration headers for and all STL containers Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: libstdc++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: zackw at panix dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31464
[Bug fortran/31519] spurious ICE messages when module does not compile
--- Comment #6 from zackw at panix dot com 2007-04-15 21:22 --- Bugs where the compiler proper crashes when run under the driver, but not when run directly, can often be reproduced by varying the amount of space taken up by environment variables, e.g. x= while :; do x="x$x" X=$x ./f951 arguments... || break done -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31519
[Bug preprocessor/11242] [mingw32] #include takes my memory directory instead of the standard memory header file
--- Comment #6 from zackw at panix dot com 2007-04-21 02:56 --- I am inclined to think that this is an operating system bug and should be worked around in the mingw32 libraries, not in cpplib. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11242
[Bug driver/30972] Call to _access has invalid parameter when linked with msvcrt (for vista)
--- Comment #5 from zackw at panix dot com 2007-09-06 17:35 --- Nah, probably no one will ever get round to it and it's not that important. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30972
[Bug c/37591] New: suppress "signed and unsigned" warnings when signed value known to be positive
In a case such as this, GCC ought to be able to prove that the signed variable is positive and therefore suppress "signed and unsigned" warnings. I see this in both C and C++. #define MAX(a,b) ((a) > (b) ? (a) : (b)) #define MIN(a,b) ((a) < (b) ? (a) : (b)) unsigned int constrain(unsigned int index, unsigned int offset, unsigned int limit) { int adj = index - offset; adj = MAX(adj, 0); return MIN(adj, limit); /* { dg-bogus "signed and unsigned" } */ } -- Summary: suppress "signed and unsigned" warnings when signed value known to be positive Product: gcc Version: 4.3.2 Status: UNCONFIRMED Keywords: diagnostic Severity: enhancement Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: zackw at panix dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37591
[Bug c/37591] suppress "signed and unsigned" warnings when signed value known to be positive
--- Comment #2 from zackw at panix dot com 2008-09-19 21:28 --- I'd be fine with it being like uninitialized value warnings. The false positives here are *really* annoying. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37591
[Bug c/37591] suppress "signed and unsigned" warnings when signed value known to be positive
--- Comment #5 from zackw at panix dot com 2008-09-22 15:46 --- I'm not monitoring consensus of developers anymore, but I think we *should* either move these warnings to the middle end or do some CCP/VRP in the front ends. The -Wuninitialized warnings are a lot less trouble than the signed/unsigned warnings, right now. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37591
[Bug tree-optimization/57230] New: tree-ssa-strlen incorrectly optimizes a strlen to 0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57230 Bug ID: 57230 Summary: tree-ssa-strlen incorrectly optimizes a strlen to 0 Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com GCC 4.7 and 4.8 mis-optimize this test case: int main(void) { char pstring[] = " hello world"; pstring[0] = (char) (__builtin_strlen(pstring) - 1); __builtin_printf("%zd\n", __builtin_strlen(pstring)); return 0; } The value written to pstring[0] is (char)11, which is nonzero, so both calls to __builtin_strlen should return 12. However, tree-ssa-strlen replaces the second call with a constant 0. Here are tree dumps right before (-fdump-tree-phiopt2) ... main () { char pstring[13]; long unsigned int _3; unsigned char _4; unsigned char _5; char _6; long unsigned int _8; : pstring = " hello world"; _3 = __builtin_strlen (&pstring); _4 = (unsigned char) _3; _5 = _4 + 255; _6 = (char) _5; pstring[0] = _6; _8 = __builtin_strlen (&pstring); __builtin_printf ("%zd\n", _8); pstring ={v} {CLOBBER}; return 0; } ... and right after (-fdump-tree-strlen): main () { char pstring[13]; long unsigned int _3; unsigned char _4; unsigned char _5; char _6; long unsigned int _8; : pstring = " hello world"; _3 = __builtin_strlen (&pstring); _4 = (unsigned char) _3; _5 = _4 + 255; _6 = (char) _5; pstring[0] = _6; _8 = 0; __builtin_printf ("%zd\n", _8); pstring ={v} {CLOBBER}; return 0; } This is both a missed and an incorrect optimization; it should have replaced the *first* call to __builtin_strlen with 12, and ideally also been able to figure out that the second call was unaffected by the write to pstring[0]. However, only the replacement of the second call with 0 is a correctness issue. Simplified from Debian #707118 <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=707118> (which I'm about to reopen). Observed with both 4.7.3 and 4.8.0.
[Bug tree-optimization/57230] [4.7/4.8/4.9 Regression] tree-ssa-strlen incorrectly optimizes a strlen to 0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57230 --- Comment #5 from Zack Weinberg --- It might be good to include stores to nonzero offsets in the test case, and stores to bytes that used to be internal NULs, something like int main(void) { char s[] = "abc\0def"; s[1] = (char) strlen(s); if (strlen(s) != 3) abort(); s[2] = (char) strlen(s); if (strlen(s) != 3) abort(); s[3] = (char) strlen(s); if (strlen(s) != 7) abort(); return 0; } I dunno how much of this is worth trying to optimize, but I'll also mention that if the values at a certain offset before and after a store are not known for certain but are both known to be nonzero, that store doesn't have to invalidate a cached string length.
[Bug tree-optimization/57230] [4.7/4.8/4.9 Regression] tree-ssa-strlen incorrectly optimizes a strlen to 0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57230 --- Comment #8 from Zack Weinberg --- (In reply to Jakub Jelinek from comment #6) > We have just one strlen pass instance, and even if we optimize the first > strlen > there, having strlen pass duplicate constant propagation functionality just > to handle this weird testcase (storing strlen sizes inside of characters is > nothing common) doesn't make sense. I think I gave the wrong impression there; I was trying to exhibit a more comprehensive *correctness* test for this transformation, using strlen() as a convenient-to-hand function known in context to return a nonzero value. Lemme try again. extern __SIZE_TYPE__ strlen(const char *); extern int rand(void); extern void abort(void) __attribute__((noreturn)); #define assert(x) do { if (!(x)) abort(); } while (0) /* Returns a value that might or might not be zero. */ static inline char maybe_zero() { return rand() & 0x10; } /* Returns an unknown but definitely-nonzero value. */ static inline char nonzero() { return (rand() & 0x10) | 0x01; } int main(void) { char s[] = "abc\0def"; /* These operations do not change the length of the string. */ s[0] = nonzero(); assert(strlen(s) == 3); s[1] = nonzero(); assert(strlen(s) == 3); s[2] = nonzero(); assert(strlen(s) == 3); s[3] = '\0'; assert(strlen(s) == 3); s[4] = '\0'; assert(strlen(s) == 3); s[4] = 'd';assert(strlen(s) == 3); /* These operations change the length of the string, to a value (in principle) computable at compile time. */ s[3] = nonzero(); assert(strlen(s) == 7); s[5] = '\0'; assert(strlen(s) == 5); s[2] = '\0'; assert(strlen(s) == 2); s[2] = 'c';assert(strlen(s) == 5); s[5] = nonzero(); assert(strlen(s) == 7); /* This operation requires runtime recalculation of the length of the string. */ s[4] = maybe_zero(); assert(strlen(s) == (s[4] ? 7 : 4)); return 0; } Of all of that, the most important thing IMHO (given the original bug report) is to verify that writing '\0' at a nonzero *offset* within a string does not cause a subsequent strlen() to report the length of the string as zero.
[Bug c++/41233] New: Templated conversion operator produces symbol name that won't demangle
This program class Source; class Dest; struct converter { converter(Source *s) {} template operator D*(); }; Dest* f(Source* s) { return converter(s); } when compiled by several different versions of g++ (I tried 4.3.4 and 4.4.1; the person who asked me to reduce the test case found the problem on OSX with some version of Apple-modified gcc 4.2), produces this symbol name for the conversion operator: _ZN9convertercvPT_I4DestEEv c++filt from binutils 2.19 will not demangle this symbol. I have not been able to figure out for certain whether this is an incorrect mangling, a demangler bug, or both. These variations demangle to close approximations of the right thing: _ZN9convertercvT_IN4DestEEEv -> converter::operator Dest() _ZN9convertercvT_IPN4DestEEEv -> converter::operator Dest*() but _ZN9convertercvPT_IN4DestEEEv does not demangle, either. -- Summary: Templated conversion operator produces symbol name that won't demangle Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: zackw at panix dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41233
[Bug c/47623] New: false *negative* uninitialized warning
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47623 Summary: false *negative* uninitialized warning Product: gcc Version: 4.5.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: za...@panix.com Consider int foo(int x) { int a; if (x) return a; return 0; } This function should trigger the "‘a’ may be used uninitialized" warning when compiled with -O -Wall, but it doesn't; there are no diagnostics at all. The assembly output (x86-64) is just foo: movl$0, %eax ret Looking at optimization dumps, this is what we have right before CCP1: foo (int x) { int a; : if (x_2(D) != 0) goto ; else goto ; : a_4 = a_3(D); goto ; : a_5 = 0; : # a_1 = PHI return a_1; } but right *after* CCP1, we have instead : # a_1 = PHI return 0; -- I presume that CCP notices that one of the inputs to the PHI is undefined and ignores that possibility under the "well, if that happened, there would be undefined behavior, so we can assume that can't happen" principle. CDDCE1 then erases the remainder of the code and we're left with foo (int x) { : return 0; } which has no uninitialized variable accesses, of course. I'm not sure exactly where uninitialized warnings happen these days, but clearly it's after this point. As far as a fix, I recommend either that any pass that can make use of the "that variable is uninitialized so we're going to ignore the possibility of that control path" principle should emit -Wuninitialized warnings, or (my personal preference) we scrap the notion of doing -Wuninitialized from inside the optimizers, and switch to something predictable, like Java's definite assignment rules.
[Bug c++/47723] internal compiler: Segmentation fault - gcc 4.5.2 Arch Linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47723 --- Comment #1 from Zack Weinberg 2011-02-13 23:27:10 UTC --- > I have run the build a second time and it makes it past the point > where it segfaulted, This means your problem is almost certainly *not* a bug in GCC, but rather a hardware fault: most commonly a bad RAM module. Please run a memory tester against your computer (a good one is at http://www.memtest.org/ ) -- you may have to run it for several hours for definitive results.
[Bug tree-optimization/57315] New: LTO and/or vectorizer performance regression on salsa20 core, 4.7->4.8
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57315 Bug ID: 57315 Summary: LTO and/or vectorizer performance regression on salsa20 core, 4.7->4.8 Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com I'm seeing a significant performance regression from 4.7 to 4.8 (targeting x86-64) on the "salsa20" core function (this is a stream cipher). Repro instructions: $ git clone git://github.com/zackw/rngstats.git # ... $ make -s cipher-test CC=gcc-4.7 && ./cipher-test >&/dev/null && ./cipher-test KAT: aes128... ok KAT: aes256... ok KAT: arc4... ok KAT: isaac64... ok KAT: salsa20_128... ok KAT: salsa20_256... ok TIME: aes128... 2000 keys, 3.47834s -> 574.987 keys/s TIME: aes256... 2000 keys, 3.62452s -> 551.797 keys/s TIME:arc4... 2000 keys, 2.21746s -> 901.933 keys/s TIME: isaac64... 2000 keys, 2.03467s -> 982.962 keys/s TIME: salsa20_128... 2000 keys, 2.31960s -> 862.217 keys/s TIME: salsa20_256... 2000 keys, 2.31932s -> 862.320 keys/s $ make -s clean cipher-test CC=gcc-4.8 && ./cipher-test >&/dev/null && ./cipher-test KAT: aes128... ok KAT: aes256... ok KAT: arc4... ok KAT: isaac64... ok KAT: salsa20_128... ok KAT: salsa20_256... ok TIME: aes128... 2000 keys, 2.49224s -> 802.491 keys/s TIME: aes256... 2000 keys, 3.62372s -> 551.919 keys/s TIME:arc4... 2000 keys, 2.22794s -> 897.689 keys/s TIME: isaac64... 2000 keys, 2.05087s -> 975.194 keys/s TIME: salsa20_128... 2000 keys, 3.53085s -> 566.436 keys/s TIME: salsa20_256... 2000 keys, 2.53003s -> 790.505 keys/s The regression shows in the last two TIME: lines for each build. The relevant code is probably in ciphers/salsa20.c, or else in worker.c. Note that there are other programs in this repository, and they require unusual libraries to build. I recommend you do not attempt a "make all", and if you get errors, try commenting out the CFLAGS.mpi and LIBS.mpi lines in the Makefile.
[Bug tree-optimization/57315] LTO and/or vectorizer performance regression on salsa20 core, 4.7->4.8
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57315 --- Comment #2 from Zack Weinberg --- Created attachment 30210 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30210&action=edit self-contained test case Here's a self-contained test case. $ gcc-4.7 -std=c99 -O2 -march=native salsa20-regr.c && ./a.out 875.178 keys/s $ gcc-4.8 -std=c99 -O2 -march=native salsa20-regr.c && ./a.out 808.869 keys/s $ gcc-4.7 -std=c99 -O3 -march=native salsa20-regr.c && ./a.out 867.879 keys/s $ gcc-4.8 -std=c99 -O3 -march=native salsa20-regr.c && ./a.out 800.794 keys/s $ gcc-4.7 -std=c99 -O3 -fwhole-program -march=native salsa20-regr.c && ./a.out 606.605 keys/s $ gcc-4.8 -std=c99 -O3 -fwhole-program -march=native salsa20-regr.c && ./a.out 571.935 keys/s These numbers are stable to within about 1 key/s. So there's a 6-8% regression from 4.7 to 4.8 regardless of optimization level, but also -O3 and -O3 -fwhole-program are inferior to -O2 for this program, with both compilers. (-O2 -fwhole-program is within noise of just -O2 for both.) With 4.8, -march=native on my computer expands to -march=corei7-avx -mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt --param l1-cache-size=0 --param l1-cache-line-size=0 --param l2-cache-size=256 -mtune=corei7-avx
[Bug c++/50436] New: Crash or hang on invalid template code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50436 Bug #: 50436 Summary: Crash or hang on invalid template code Classification: Unclassified Product: gcc Version: 4.6.1 Status: UNCONFIRMED Keywords: ice-on-invalid-code Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: za...@panix.com This (invalid) test case causes gcc 4.6.1 to crash (segmentation fault). Variations cause it to go into an infinite loop instead. I tripped over the infinite loop while delta-minimizing a much, much larger test case for an unrelated problem. template struct VI {}; template struct IP { static const bool r = IP::r; }; template struct V { VI::r> vi; }; struct X; struct Y { V v; };
[Bug c++/50436] Crash or hang on invalid template code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50436 --- Comment #1 from Zack Weinberg 2011-09-16 19:13:13 UTC --- Here's a variant that hangs. template struct VI {}; template struct IP { static const bool r = IP::r; }; template struct V { static const bool r = IP::r; VI vi; }; struct X; struct Y { V v; }
[Bug c++/50442] New: Constructing T from implicit conversion to T& ambiguous in C++0x mode, not C++98
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50442 Bug #: 50442 Summary: Constructing T from implicit conversion to T& ambiguous in C++0x mode, not C++98 Classification: Unclassified Product: gcc Version: 4.6.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: za...@panix.com This test case ... template struct MoveRef { operator T& () {} }; template MoveRef Move(T&) {} struct Thing {}; Thing foo(const Thing* p) { return Thing(Move(*p)); } ... generates these diagnostics from g++ 4.6.1 in c++0x/gnu++0x mode, but not in c++98/gnu++98 mode: t.cc: In function ‘Thing foo(const Thing*)’: t.cc:4:50: error: call of overloaded ‘Thing(MoveRef)’ is ambiguous t.cc:4:50: note: candidates are: t.cc:3:8: note: constexpr Thing::Thing(const Thing&) t.cc:3:8: note: constexpr Thing::Thing(Thing&&) This is a regression from g++ 4.5.x and earlier.
[Bug c++/50595] New: template overload resolution insufficiently sensitive to name dependency?
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50595 Bug #: 50595 Summary: template overload resolution insufficiently sensitive to name dependency? Classification: Unclassified Product: gcc Version: 4.6.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: za...@panix.com Created attachment 25401 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25401 test case Per http://stackoverflow.com/questions/7630806 the expected output of the attached test case should be f(char): 1 f(int): T(1) f(int): t f(char): 1 f(char): T(1) f(char): t since the second and third calls to f() in g() have arguments that are template-dependent, so resolution of *those* function calls (but not the first call) should be deferred until the point of instantiation (h()), at which time both f(int) and f(char) are visible. What actually happens with g++ 4.6 is f(char): 1 f(char): T(1) f(char): t f(char): 1 f(char): T(1) f(char): t I'm not 100% convinced by the argument above, but I am convinced enough to see what y'all think.
[Bug driver/30972] Call to _access has invalid parameter when linked with msvcrt (for vista)
--- Comment #3 from zackw at panix dot com 2007-05-10 08:36 --- What the heck are we doing calling access() in the first place? We should just go ahead and try to execute .../cc1.exe for all relevant paths. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30972
[Bug libstdc++/31906] "-Xcompiler" is inserted after "-Xlinker" when building libstdc++
--- Comment #3 from zackw at panix dot com 2007-05-12 07:34 --- I believe Mike means to suggest you use -Wl,yada,yada,yada instead of -Xlinker yada -Xlinker yada -Xlinker yada. (I also suspect this will help.) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31906
[Bug c++/17422] Duplicate work, missed optimizations for registration of static destructors
--- Comment #4 from zackw at panix dot com 2007-06-06 00:41 --- Subject: Re: Duplicate work, missed optimizations for registration of static destructors It's better, but build_cleanup is still being called twice in the non-__cxa_atexit case, and I can't tell whether there is any extremely-short-lived garbage still in the __cxa_atexit case. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17422
[Bug middle-end/32441] [4.3 Regression] ICE in expand_expr_real_1, at expr.c:7109
--- Comment #9 from zackw at panix dot com 2007-07-06 17:59 --- Ian Taylor and I found a simpler patch that does not use langhooks, and also has the virtue of working. ;-) Committed as rev 126424. -- zackw at panix dot com changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32441
[Bug target/32659] New: abort during reload on (subreg:PSI (reg:DI ...) 4)
The attached test case produces this RTL sequence after lreg, when compiled at -O2 -mcpu=m32cm: (insn 12 11 13 2 ba-bug.ii:24 (set (reg:DI 26 [ p.2 ]) (mem/s:DI (reg/f:PSI 31) [3 S8 A8])) 176 {movdi_splittable} (expr_list:R EG_DEAD (reg/f:PSI 31) (nil))) (insn 13 12 14 2 ba-bug.ii:24 (set (reg:PSI 25 [ p$second ]) (subreg:PSI (reg:DI 26 [ p.2 ]) 4)) 172 {movpsi_op} (nil)) [For reference, in -mcpu=m32cm mode, this target uses PSImode for 24-bit pointers, which appear to be being padded to 32 bits in memory.] During processing of insn 13, reload demands a MODE_PARTIAL_INT mode with at least 64 bits. There is no such mode, and the compiler aborts in smallest_mode_for_size. The test case is simplified from libstdc++ bitmap_alloc.cc. It is extremely fragile; any further simplification and the problem RTL sequence will be eliminated by earlier optimizers. -- Summary: abort during reload on (subreg:PSI (reg:DI ...) 4) Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: zackw at panix dot com GCC target triplet: m32c-unknown-elf http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32659
[Bug target/32659] abort during reload on (subreg:PSI (reg:DI ...) 4)
--- Comment #1 from zackw at panix dot com 2007-07-06 22:12 --- Created an attachment (id=13858) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13858&action=view) test case -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32659
[Bug target/32659] abort during reload on (subreg:PSI (reg:DI ...) 4)
--- Comment #2 from zackw at panix dot com 2007-07-07 00:05 --- *** This bug has been marked as a duplicate of 32656 *** -- zackw at panix dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32659
[Bug middle-end/32656] [4.3 regression] m32c: ICE in smallest_mode_for_size, at stor-layout.c:220
--- Comment #2 from zackw at panix dot com 2007-07-07 00:05 --- *** Bug 32659 has been marked as a duplicate of this bug. *** -- zackw at panix dot com changed: What|Removed |Added CC||zackw at panix dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32656
[Bug middle-end/32656] [4.3 regression] m32c: ICE in smallest_mode_for_size, at stor-layout.c:220
--- Comment #3 from zackw at panix dot com 2007-07-07 00:06 --- Created an attachment (id=13859) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13859&action=view) Minimized test case Here's a minimal test case (from my duplicate report of this, #32659). I analyzed it a bit there - the important thing to know is that reload wants a MODE_PARTIAL_INT with 64 bits; as the target has no such mode, we abort. I think reload wants this in order to deal with a (subreg:PSI (reg:DI)) construct. -- zackw at panix dot com changed: What|Removed |Added Attachment #13856|0 |1 is obsolete|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32656
[Bug middle-end/32656] [4.3 regression] m32c: ICE in smallest_mode_for_size, at stor-layout.c:220
--- Comment #4 from zackw at panix dot com 2007-07-07 00:07 --- Adding DJ to cc: list, confirming. -- zackw at panix dot com changed: What|Removed |Added CC||dj at redhat dot com Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2007-07-07 00:07:50 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32656
[Bug middle-end/32656] [4.3 regression] m32c: ICE in smallest_mode_for_size, at stor-layout.c:220
--- Comment #5 from zackw at panix dot com 2007-07-07 00:15 --- Created an attachment (id=13860) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13860&action=view) Even smaller test case -- zackw at panix dot com changed: What|Removed |Added Attachment #13859|0 |1 is obsolete|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32656
[Bug middle-end/32656] [4.3 regression] m32c: ICE in smallest_mode_for_size, at stor-layout.c:220
--- Comment #6 from zackw at panix dot com 2007-07-07 00:20 --- Created an attachment (id=13861) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13861&action=view) and smaller still I can't believe I didn't think of these reductions three hours ago... -- zackw at panix dot com changed: What|Removed |Added Attachment #13860|0 |1 is obsolete|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32656
[Bug c/88576] -fno-math-errno causes GCC to consider that malloc does not set errno
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88576 Zack Weinberg changed: What|Removed |Added CC||zackw at panix dot com --- Comment #5 from Zack Weinberg --- The C standard doesn't say malloc _will_ set errno on failure, but it also doesn't say it _won't_, and all library functions are allowed to clobber the value of errno unless it is specifically documented that they won't (N1570 7.5 [Errors], para 3, last sentence). In any case, an option named -fno-math-errno has no business affecting the treatment of functions that have nothing to do with mathematics.
[Bug c/88576] -fno-math-errno causes GCC to consider that malloc does not set errno
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88576 Zack Weinberg changed: What|Removed |Added CC||rguenther at suse dot de --- Comment #8 from Zack Weinberg --- (In reply to Aurelien Jarno from comment #7) > (In reply to Zack Weinberg from comment #5) > > The C standard doesn't say malloc _will_ set errno on failure, but it also > > Well at least POSIX says: > Otherwise, it shall return a null pointer and set errno to indicate the > error. Yeah, I wasn't denying that, I was responding to Andrew taking the attitude that this was fine because ISO C proper _didn't_ say that. I dug into the code a little and it looks like this was an intentional re-use of -fno-math-errno to also mean "and neither will malloc", in the patch for PR 42944. I think that's wrong, but perhaps Richard Biener would like to argue otherwise...
[Bug c/88576] -fno-math-errno causes GCC to consider that malloc does not set errno
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88576 --- Comment #9 from Zack Weinberg --- ... whoops, hit send a little too early. AFAICT, the relevant code is call_may_clobber_ref_p_1 in tree-ssa-alias.c; I would say that the uses of flag_errno_math under the cases BUILT_IN_MALLOC, ALIGNED_ALLOC, CALLOC, STRDUP, STRNDUP, POSIX_MEMALIGN, REALLOC in that function are all wrong, and GCC should unconditionally assume errno may be clobbered by those builtins. If this behavior is felt to be valuable, it should get its own -f switch. (The uses of flag_errno_math under BUILT_IN_GAMMA*, LGAMMA*, and REMQUO* are appropriate, though, as those _are_ math functions.)
[Bug middle-end/88576] -fno-math-errno causes GCC to consider that malloc does not set errno
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88576 --- Comment #14 from Zack Weinberg --- I don't see why it would _ever_ make sense for -fno-alloc-errno to default to the setting of -fno-math-errno. The math functions and the memory allocation functions are independent components of the C library. Each toggle's default should be settable independently by the target configuration.
[Bug c/3885] Incorrect "invalid suffix on integer constant" error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3885 Zack Weinberg changed: What|Removed |Added CC||zackw at panix dot com --- Comment #14 from Zack Weinberg --- I think we should reopen this bug and treat it as a request to improve the diagnostics. GCC is correct to reject tokens of the form "0x000E-0x0001", but the accumulation of duplicate bug reports suggests that we could do a better job of explaining what is wrong. Proposal: whenever we are about to issue the "invalid suffix on (integer/floating) constant" error, if the first character of the "invalid suffix" is + or -, we say instead something like error: invalid numeric constant "" note: ('+'/'-') after ('e'/'p'/'E'/'P') continues a pp-number note: to perform (addition/subtraction), put a space before the ('+'/'-') Printing the entire pp-number token instead of just the "suffix", mentioning the standardese term "pp-number", and calling the token an "invalid numeric constant" instead of an "invalid integer/floating constant" should clue people in to _why_ this apparently perverse tokenization is happening.
[Bug rtl-optimization/86028] New: Dead stores created by va_start/va_arg are not fully cleaned up
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86028 Bug ID: 86028 Summary: Dead stores created by va_start/va_arg are not fully cleaned up Product: gcc Version: 8.1.0 Status: UNCONFIRMED Severity: minor Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com Target Milestone: --- On any ABI where arguments to a variadic function are passed in the same places that they would be if they were arguments to a non-variadic function, it should be possible to optimize 'foo_wrapper' in the following test program all the way down to a tail-call to 'foo' and nothing else: #include extern int a; extern int b; extern void *c; int __attribute__((noinline)) foo(int x, int y, void *z) { a = x; b = y; c = z; return 0; } int foo_wrapper(int x, int y, ...) { va_list ap; void *z; va_start(ap, y); z = va_arg(ap, void *); va_end(ap); return foo(x, y, z); } ('foo' is included in this test program so that one can easily verify that no argument shuffling is needed.) gcc-8.1 targeting x86-64-linux, x86-32-linux, or aarch64-linux (all of which meet the above ABI requirement) does not manage to do this. It actually does the best job for x86-32, where everything is on the stack: foo_wrapper: pushl 12(%esp) pushl 12(%esp) pushl 12(%esp) callfoo addl$12, %esp ret This is literally duplicating 'foo_wrapper's incoming arguments into a new frame in order to call 'foo'. The instructions are unnecessary, but they are not dead in the formal sense. Perhaps the issue here is just that variadic functions aren't being considered for sibcall optimization? For the targets where arguments are passed in registers, the code generation is worse, e.g. aarch64: foo_wrapper: stp x29, x30, [sp, -64]! add x3, sp, 48 add x4, sp, 64 mov x29, sp stp x4, x4, [sp, 16] str x3, [sp, 32] stp wzr, wzr, [sp, 40] str x2, [sp, 56] bl foo ldp x29, x30, [sp], 64 ret The actual arguments to foo_wrapper are in x0, x1, and x2, and that's also where foo wants them, and they aren't touched at all. All of the computation done here is dead. (I noticed this while messing around with glibc's syscall wrappers, which really do things like this. 'foo_wrapper' has the type signature of 'fcntl', for instance.)
[Bug c/91554] New: if (!__builtin_constant_p (x)) warning_function() works in inline when x is int, not when x is void *
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91554 Bug ID: 91554 Summary: if (!__builtin_constant_p (x)) warning_function() works in inline when x is int, not when x is void * Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com Target Milestone: --- This snippet ``` extern void thefun_called_with_nonnull_arg (void) __attribute__((__warning__( "'thefun' called with second argument not NULL"))); extern int real_thefun (void *, void *); static inline int thefun (void *a, void *b) { if (!__builtin_constant_p(b) || b != 0) thefun_called_with_nonnull_arg(); return real_thefun(a, b); } int warning_expected (void *a, void *b) { return thefun(a, b); } int warning_not_expected (void *a) { return thefun(a, 0); } ``` generates warnings from _both_ `warning_expected` and `warning_not_expected`, on all versions of GCC I can conveniently test (see https://godbolt.org/z/V-FHtZ ). If I change the type of `b` to be `int` throughout, or if I convert the static inline to a macro ``` #define thefun(a, b) \ (((!__builtin_constant_p(b) || (b) != 0) \ ? thefun_called_with_nonnull_arg() \ : (void) 0), \ real_thefun(a, b)) ``` then I get a warning only for `warning_expected`, which is the behavior I want. It seems to me that whether or not you can use `__builtin_constant_p` to guard a call to a function declared with attribute warning, should not depend on the type of __builtin_constant_p's argument.
[Bug c/91554] if (!__builtin_constant_p (x)) warning_function() works in inline when x is int, not when x is void *
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91554 --- Comment #2 from Zack Weinberg --- Additional fun detail: ``` static inline int thefun (void *a, void *b) { if (!__builtin_constant_p((__UINTPTR_TYPE__)b) || b != 0) thefun_called_with_nonnull_arg(); return real_thefun(a, b); } ``` still warns for any use of `thefun`, but ``` static inline int thefun (void *a, void *b) { if (!__builtin_constant_p((short)(__UINTPTR_TYPE__)b) || b != 0) thefun_called_with_nonnull_arg(); return real_thefun(a, b); } ``` works as intended! `(int)(__UINTPTR_TYPE__)` also works as intended on targets where __UINTPTR_TYPE__ is bigger than int. https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/builtins.c;h=f902e246f1f347be4d4dc04e339fa865393039fe#l8462 looks suspicious to me. Note also the STRIP_NOPS shortly above, which might explain why it matters whether the pointer is cast to a different-sized integer type.
[Bug c/91554] if (!__builtin_constant_p (fn_arg)) warning_function() works in inline when fn_arg is int, not when it is void *
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91554 --- Comment #4 from Zack Weinberg --- (In reply to Richard Biener from comment #3) > I guess you want to use > > __builtin_constant_p (b != 0) > > instead. That wouldn't do what I want. The goal is to warn for any argument _other than_ a compile-time null pointer. `!__builtin_constant_p(b) || b != 0` does just that (it might be easier to understand the De Morgan equivalent `!(__builtin_constant_p(b) && b == 0)`. This is in aid of deprecating the second argument to gettimeofday (see https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS;h=75453bbda865c7d51df39177caef40b16e086dcf#l53 and https://sourceware.org/git/?p=glibc.git;a=blob;f=manual/time.texi;h=cb234bd08fae1841034a2bdccf4e1d246be23034#l557 ). > The docs don't explain what a "constant at compile time" is > so whether for example the address of a global or the address of an > automatic var would be "constant". But I'd say the above incorrectly > disregards the NULL-pointer case. It seems like this code pre-dates tree optimizations, I would suggest removing these lines 8470 || AGGREGATE_TYPE_P (TREE_TYPE (arg)) 8471 || POINTER_TYPE_P (TREE_TYPE (arg)) (and fixing the comment above to match) and seeing if that breaks anything.
[Bug preprocessor/80005] cpp expands __has_include() filename parts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80005 Zack Weinberg changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-06-12 CC||zackw at panix dot com Ever confirmed|0 |1 --- Comment #1 from Zack Weinberg --- Confirming. This has now bitten an attempt to use __has_include in glibc, see discussion starting at https://sourceware.org/ml/libc-alpha/2019-06/msg00254.html . The proper fix, IMO, would be to apply the same special tokenization rules to the argument of __has_include that are used for the argument of #include. This is not exactly the same as "don't macro-expand the argument." Rather, in #if !__has_include() the character sequence should be processed as a single "h-char-sequence" token, in the language of C99 6.10.2. So there never is a "linux" identifier token to macro-expand (nor is there a "memfd" or "h" identifier). However, if we instead have #define HEADER_TO_LOOK_FOR "special.h" #if __has_include (HEADER_TO_LOOK_FOR) then HEADER_TO_LOOK_FOR should be macro-expanded. Also, h-char-sequences and q-char-sequences are not string literals. For example, if we have #if __has_include("X11\Xlib.h") the \X is _not_ an ill-formed escape sequence.
[Bug c++/90885] GCC should warn about 2^16 and 2^32 and 2^64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90885 --- Comment #13 from Zack Weinberg --- Since examples of this error were observed with base 10, I think the warning should cover 10^i for decimal literal i, too. Relatedly, “note: ^ performs exclusive or, not exponentiation” might be a nice addition to the existing error for ^ with a float for either operand.
[Bug lto/48200] Implement function attribute for symbol versioning (.symver)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48200 Zack Weinberg changed: What|Removed |Added CC||zackw at panix dot com --- Comment #26 from Zack Weinberg --- /subscribe I've tripped over this problem myself in the context of the new password-hashing library, libxcrypt (see https://github.com/besser82/libxcrypt/issues/24 ) Our symbol-versioning techniques are lifted from glibc (see https://github.com/besser82/libxcrypt/blob/aae4c1baea534d2e4c9dfe2faf42ee0c5f7a6f22/crypt-port.h#L122 ). Function attributes seem like a good replacement to me, although I'd ask that people think about how a library author might write macros that will seamlessly fall back to the older technique.
[Bug tree-optimization/28364] poor optimization choices when iterating over a std::string (probably not c++-specific)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28364 --- Comment #30 from Zack Weinberg --- It's been a very long time and I don't know exactly what changed, but GCC 7.3 generates essentially the same code for both of the functions in the "C test case" and I would not describe that code as "bad". I can still make it duplicate the entire body of the loop by relatively small tweaks, though. For instance, int has_bad_chars(char *str, __SIZE_TYPE__ len) { for (char *c = str; c < str + len; c++) { unsigned char x = (unsigned char)(*c); if (x <= 0x1f || x == 0x5c || x == 0x7f) return 1; } return 0; } generates significantly worse code (doubling cache footprint for no gain in branch predictability or any other metric) with -O2 than -Os. Also, the code generated for the body of the loop (with both the original test case and the above) is more complicated than it needs to be, but perhaps that should be a new bug report.
[Bug tree-optimization/84673] New: Overcomplicated code generation for a chain of mutually exclusive conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84673 Bug ID: 84673 Summary: Overcomplicated code generation for a chain of mutually exclusive conditions Product: gcc Version: 7.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com Target Milestone: --- This function int has_bad_chars(unsigned char *str, __SIZE_TYPE__ len) { for (unsigned char *c = str; c < str + len; c++) { unsigned char x = *c; if (__builtin_expect (x <= 0x1f || x == 0x5c || x == 0x7f, 0)) return 1; } return 0; } compiles with GCC 7.3.1 at -Os -march=native on a current-generation x86-64 to has_bad_chars: addq%rdi, %rsi .L2: cmpq%rsi, %rdi jnb .L7 movb(%rdi), %al cmpb$31, %al setbe %cl cmpb$92, %al sete%dl orb %dl, %cl jne .L5 cmpb$127, %al je .L5 incq%rdi jmp .L2 .L7: xorl%eax, %eax ret .L5: movl$1, %eax ret It is six bytes shorter, and also I think more efficient, to generate this instead: has_bad_chars: .LFB0: .cfi_startproc addq%rdi, %rsi .L2: cmpq%rsi, %rdi jnb .L7 movb(%rdi), %al cmpb$31, %al jbe .L5 cmpb$92, %al je .L5 cmpb$127, %al je .L5 incq%rdi jmp .L2 .L7: xorl%eax, %eax ret .L5: movl$1, %eax ret The same thing happens at -O2, but also a chunk of the loop body gets pointlessly duplicated above the loop (it looks like it tried to unroll the loop and got stuck halfway).
[Bug target/84772] New: powerpc-spe: Spurious "is used uninitialized" warning, or possibly incorrect codegen for va_arg(long double)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84772 Bug ID: 84772 Summary: powerpc-spe: Spurious "is used uninitialized" warning, or possibly incorrect codegen for va_arg(long double) Product: gcc Version: 7.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com Target Milestone: --- The following test case is a mechanical reduction from the implementation of vfprintf in GNU libc: extern void *memset(void *, int, __SIZE_TYPE__); void f(int *a, __builtin_va_list b) { memset(a, 2, sizeof(int)); for (;;) __builtin_va_arg(b, long double); } I get a spurious uninitialized value warning on the __builtin_va_arg line with a powerpc-spe compiler, but not with the exact same build targeting a different architecture -- not even a different powerpc variant: $ powerpc-glibc-linux-gnuspe-gcc -S -O -Wall -Werror vfp_min3.c; echo $? vfp_min3.c: In function ‘f’: vfp_min3.c:6:25: error: ‘va_arg_tmp.3’ is used uninitialized in this function [-Werror=uninitialized] __builtin_va_arg(b, long double); ^~~~ cc1: all warnings being treated as errors 1 $ powerpc-glibc-linux-gnu-gcc -S -O -Wall -Werror vfp_min3.c; echo $? 0 $ powerpc-glibc-linux-gnuspe-gcc --version powerpc-glibc-linux-gnuspe-gcc (GCC) 7.3.1 20180307 [gcc-7-branch revision 258338] Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ powerpc-glibc-linux-gnu-gcc --version powerpc-glibc-linux-gnu-gcc (GCC) 7.3.1 20180307 [gcc-7-branch revision 258338] Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This is an extra frustrating spurious-uninitialized bug because it's complaining about a variable that doesn't even exist in the source code -- I guess it's a temporary used inside the expansion of __builtin_va_arg? So it might not even be *spurious*, I suppose, it might be telling me that that expansion is wrong!
[Bug target/84772] powerpc-spe: Spurious "is used uninitialized" warning, or possibly incorrect codegen for va_arg(long double)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84772 --- Comment #2 from Zack Weinberg --- You don't appear to have the exact same build as me. But there's something fishy going on with that, because as far as I can tell, SVN rev 258338 is a *trunk* revision, not a gcc-7-branch revision. Anyway, I'm gonna attach a .stdarg dump from the compiler that does trigger the problem, maybe comparing that to the .stdarg dump you get will help?
[Bug target/84772] powerpc-spe: Spurious "is used uninitialized" warning, or possibly incorrect codegen for va_arg(long double)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84772 --- Comment #3 from Zack Weinberg --- Created attachment 43607 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43607&action=edit -fdump-tree-stdarg output for test case from problem compiler I do not entirely understand how to read this (what do the _123(D) suffixes on everything mean?) but right after there sure does appear to be a read of the variable va_arg_tmp.3, which was introduced by this pass, and which does not appear to have been previously initialized.
[Bug target/84772] powerpc-spe: Spurious "is used uninitialized" warning, or possibly incorrect codegen for va_arg(long double)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84772 --- Comment #4 from Zack Weinberg --- The last actual _change_ in my GCC build is r258315 | denisc | 2018-03-07 04:13:12 -0500 (Wed, 07 Mar 2018) | 9 lines Backport from mainline 2018-02-07 Georg-Johann Lay PR target/84209 * config/avr/avr.h (GENERAL_REGNO_P, GENERAL_REG_P): New macros. * config/avr/avr.md: Only post-reload split REG-REG moves if either register is GENERAL_REG_P.
[Bug target/84772] powerpc-spe: Spurious "is used uninitialized" warning, or possibly incorrect codegen for va_arg(long double)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84772 --- Comment #7 from Zack Weinberg --- I no longer remember enough about GIMPLE to comment on your actual proposed fix, but I do have a small nitpick on the test case: +va_arg (ap, long double); /* { dg-bogus "may be used uninitialized in this function" } */ I think that needs to be just { dg-bogus "used uninitialized" }, as you might get either the "may be used initialized" or the "is used uninitialized" message.
[Bug libstdc++/56166] New: std::string::clear() can allocate memory despite being marked noexcept
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56166 Bug #: 56166 Summary: std::string::clear() can allocate memory despite being marked noexcept Classification: Unclassified Product: gcc Version: 4.7.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: za...@panix.com The attached test case (which is regrettably large, but I haven't found a shorter way to trigger the problem) will call terminate() when compiled with -std=c++11, despite there being no point at which an exception should escape main(). The problem is deep inside std::basic_string: /usr/include/c++/4.7/bits/basic_string.h 796 /** 797* Erases the string, making it empty. 798*/ 799 void 800 clear() _GLIBCXX_NOEXCEPT 801 { _M_mutate(0, this->size(), 0); } _M_mutate can allocate memory even when it's being asked to erase the string (presumably due to internal reference-counting -- if you take 's2' out of the test case the crash goes away), and thus can throw bad_alloc. In C++11 mode, clear() is marked 'noexcept', so we wind up in terminate().
[Bug libstdc++/56166] std::string::clear() can throw despite being marked noexcept
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56166 --- Comment #1 from Zack Weinberg 2013-01-31 18:21:49 UTC --- Created attachment 29320 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29320 test case In case you're wondering, this was an attempt to do at least *some* testing of a library's robustness in the face of memory allocation failure.
[Bug libstdc++/56166] std::string::clear() can throw despite being marked noexcept
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56166 --- Comment #4 from Zack Weinberg 2013-02-01 00:48:43 UTC --- Is vstring going to be promoted to std::string in the *near* future? it doesn't seem done to me, eg there is no stringstream for it, and it appears to generate bulkier code. (not scientifically benchmarked) that said, it doesn't fail this test case, but I haven't checked how its clear() works, so I'm not sure it has no bug here.
[Bug middle-end/48580] missed optimization: integer overflow checks
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580 --- Comment #18 from Zack Weinberg 2013-02-02 21:59:37 UTC --- I find it a little disappointing that what should have been a straightforward additional optimization has gotten totally derailed into bikeshedding of an enormous class of builtins which seem unlikely ever to gain any traction. I just want the code in the original bug report to be optimized. That would be enough.
[Bug c++/70178] New: Loop-invariant memory loads from std::string innards are not hoisted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70178 Bug ID: 70178 Summary: Loop-invariant memory loads from std::string innards are not hoisted Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com Target Milestone: --- Consider #include #include #include using std::string; inline unsigned char hexval(unsigned char c) { if (c >= '0' && c <= '9') return c - '0'; else if (c >= 'A' && c <= 'F') return c - 'A' + 10; else if (c >= 'a' && c <= 'f') return c - 'a' + 10; else throw "input character not a hexadecimal digit"; } void hex2ascii_1(const string& in, string& out) { size_t inlen = in.length(); if (inlen % 2 != 0) throw "input length not a multiple of 2"; out.clear(); out.reserve(inlen / 2); for (string::const_iterator p = in.begin(); p != in.end(); p++) { unsigned char c = hexval(*p); p++; c = (c << 4) + hexval(*p); out.push_back(c); } } void hex2ascii_2(const string& in, string& out) { size_t inlen = in.length(); if (inlen % 2 != 0) throw "input length not a multiple of 2"; out.clear(); out.reserve(inlen / 2); std::transform(in.begin(), in.end() - 1, in.begin() + 1, std::back_inserter(out), [](unsigned char a, unsigned char b) { return (hexval(a) << 4) + hexval(b); }); } It seems to me that both of these should be optimizable to the equivalent thing you would write in C, with all the pointers in registers ... void hex2ascii_hypothetical(const string& in, string& out) { size_t inlen = in.length(); if (inlen % 2 != 0) throw "input length not a multiple of 2"; out.clear(); out.reserve(inlen / 2); const unsigned char *p = in._M_data(); const unsigned char *limit = p + in._M_length(); unsigned char *q = out._M_data(); // (check for pointer wrap-around here?) while (p < limit) { *q++ = (hexval(p[0]) << 4) + hexval(p[1]); p += 2; } } Maybe it wouldn't be able to deduce that capacity overflow is impossible by construction, but that's a detail. The important thing is that g++ 5 and 6 cannot hoist the pointer initializations out of the loop as shown. They reload p, limit, and q from memory (that is, from the relevant string objects) on every iteration.
[Bug c++/70178] Loop-invariant memory loads from std::string innards are not hoisted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70178 --- Comment #2 from Zack Weinberg --- That was my working hypothesis as well. Isn't there some way we can annotate s->data to reassure the compiler that *this* char* doesn't alias? I don't know enough about the guts of std::string to know whether 'restrict' is accurate, but it certainly shouldn't be pointing to anything but string contents.
[Bug c++/70178] Loop-invariant memory loads from std::string innards are not hoisted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70178 --- Comment #5 from Zack Weinberg --- It seems to me that a pair of extended integer types (signed and unsigned), that are the same size as `char` but don't get the special TBAA treatment for "character types", would be easier to implement and validate than figuring out what "restrict" means when applied to a data member, and probably just as effective in this context. `std::string` doesn't expose bare writable pointers so I *think* this has no compatibility implications. Note that I do *not* think it would be safe for [u]int8_t to be (defined as) those types.
[Bug preprocessor/71102] New: _Pragma("GCC warning ...") should concatenate string literals
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71102 Bug ID: 71102 Summary: _Pragma("GCC warning ...") should concatenate string literals Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: preprocessor Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com Target Milestone: --- Consider #define warn1(msg) _Pragma (#msg) #define warn(msg) warn1 (GCC warning msg) #define lengthy_explanation(name) \ "This is a long explanation about why `" #name "' is deprecated." #define SYMBOL warn(lengthy_explanation(SYMBOL)) 1 int main(void) { return SYMBOL; } With gcc (5 and 6), the warning message printed is test.c:9:13: warning: This is a long explanation about why ` which is obviously not what was wanted. (The column position of the diagnostic is also wrong, but that's secondary.) One could theoretically work around this by not using string literals in lengthy_explanation() but that risks blowing up in one's face if any of the words in the explanation happen to be macros (and also I'm not sure how I would get the quotation marks in there). n.b. clang gets this right: test.c:9:25: warning: This is a long explanation about why `SYMBOL' is deprecated. [-W#pragma-messages]
[Bug c/82134] New: warn_unused_result triggers on empty structs even when they are used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82134 Bug ID: 82134 Summary: warn_unused_result triggers on empty structs even when they are used Product: gcc Version: 7.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com Target Milestone: --- If a function that returns an empty struct is tagged with attribute warn_unused_result, any call will trigger the warning, even if the return value is (nominally) used. For instance struct Empty {}; extern void use_empty(struct Empty); __attribute__((warn_unused_result)) extern struct Empty make_empty(void); void should_warn(void) { make_empty(); } void shouldnt_warn_1(void) { use_empty(make_empty()); } void shouldnt_warn_2(void) { struct Empty e = make_empty(); use_empty(e); } --> test.c: In function ‘should_warn’: test.c:10:5: warning: ignoring return value of ‘make_empty’, declared with attribute warn_unused_result [-Wunused-result] make_empty(); ^~~~ test.c: In function ‘shouldnt_warn_1’: test.c:15:5: warning: ignoring return value of ‘make_empty’, declared with attribute warn_unused_result [-Wunused-result] use_empty(make_empty()); ^~~ test.c: In function ‘shouldnt_warn_2’: test.c:20:22: warning: ignoring return value of ‘make_empty’, declared with attribute warn_unused_result [-Wunused-result] struct Empty e = make_empty(); ^~~~ with both GCC 6 and GCC 7. (From here: https://stackoverflow.com/questions/46096628/gcc-empty-structs-and-wunused-result)
[Bug c/82134] warn_unused_result triggers on empty structs even when they are used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82134 --- Comment #2 from Zack Weinberg --- The claim in the Stack Overflow post was that this was useful in a scenario involving machine-generated code that couldn't return void for some external reason, but they didn't go into any kind of detail. It has always been my opinion that warnings in general should be made optimization-independent, including early dead code optimizations like this, with _possible_ carefully-defined-and-documented exceptions for `if (0) { /* don't warn about stuff in here */ }` and similar. But I realize that's a long way from where GCC is or has ever been.
[Bug c/82752] Support %OB, %Ob strftime formats
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82752 Zack Weinberg changed: What|Removed |Added CC||zackw at panix dot com --- Comment #1 from Zack Weinberg --- And in strptime, %OB, %Ob, and %Oh should be supported.
[Bug c/83236] New: "Did you mean" suggestions maybe shouldn't offer implementation-private names
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83236 Bug ID: 83236 Summary: "Did you mean" suggestions maybe shouldn't offer implementation-private names Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: minor Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com Target Milestone: --- On Linux/glibc, $ cat > test.c < ino_t inode; EOF $ gcc -std=c89 -fsyntax-only test.c test.c:2:1: error: unknown type name ‘ino_t’; did you mean ‘__ino_t’? ino_t inode; ^ __ino_t $ gcc --version gcc (Debian 7.2.0-16) 7.2.0 __ino_t is an implementation detail of glibc's headers, and application programmers shouldn't be encouraged to use it. To first order, I don't think any implementation-namespace names (/^_[_A-Z]/) should be offered as suggestions. To second order, they could be suggested when the unknown symbol is itself in the implementation namespace, or when the suggestion is a _documented_ impl-namespace symbol like __DATE__ or _IOFBF or _Bool.
[Bug c/83236] "Did you mean" suggestions maybe shouldn't offer implementation-private names
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83236 --- Comment #3 from Zack Weinberg --- Maybe name_reserved_for_implementation_p should be a langhook?
[Bug c/83236] "Did you mean" suggestions maybe shouldn't offer implementation-private names
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83236 --- Comment #5 from Zack Weinberg --- I was just thinking that other language front-ends might want to offer spell-checking suggestions with their own rules for which names are/aren't appropriate to suggest in context, but maybe we can worry about that when it happens.
[Bug c/43691] Code segfault when compiled with -Os, -O2, or -O3
--- Comment #4 from zackw at panix dot com 2010-04-08 17:28 --- (In reply to comment #0) > When this testcase, using inline assembly, is compiled with -Os, -O2, or -O3 > it segfaults. -O0 and -O1 allow it to run correctly. > > Moving the inline assembly into a separate file and including it in the > compilation allow the program to run correctly at all -O levels. >From these symptoms, it is practically certain that you have done something wrong with the asm inputs and outputs. I don't have an Alpha compiler to hand, but just from looking at your code, I bet it will work correctly if you rewrite it like so: unsigned long rewritten(const unsigned long b[2]) { unsigned long ofs, output; asm( "cmoveq %0,64,%1# ofs= (b[0] ? ofs : 64);\n" "cmoveq %0,%2,%0# temp = (b[0] ? b[0] : b[1]);\n" "cttz %0,%0 # output = cttz(temp);\n" : "=r" (output), "=r" (ofs) : "r" (b[1]), "0" (b[0]), "1" (0) ); return output + ofs; } (I've assumed that the semantic of "cmoveq a,b,c" is "if (a==0) c=b;") The trick with asm() is to do as little as possible. I assume that the reason the assembly version beats the pure-C version is the cmoveq's, so I stripped the setup code and the addition. This allows me to express the _real_ argument constraints rather than fake ones, which lets me be confident that the optimizers will do what you want. Note that this also means "volatile" is unnecessary. As a general principle, if you find yourself writing an asm() with a big long list of earlyclobber outputs but no inputs, you are doing it wrong. -- zackw at panix dot com changed: What|Removed |Added -------- CC||zackw at panix dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43691
[Bug c/43691] Code segfault when compiled with -Os, -O2, or -O3
-- zackw at panix dot com changed: What|Removed |Added Status|UNCONFIRMED |WAITING http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43691
[Bug driver/68038] New: "Internal compiler error: Killed: program cc1" should read "Virtual memory exhausted"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68038 Bug ID: 68038 Summary: "Internal compiler error: Killed: program cc1" should read "Virtual memory exhausted" Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com Target Milestone: --- 99+% of the time, when the compiler proper dies on signal 9, it's because we're on Linux and the out-of-memory killer has killed the compiler proper because it ate all available RAM. But the error message you get when this happens *sounds* like there's a bug in GCC. I suggest the driver be changed to special-case SIGKILL and print a different message, along the lines of gcc: compilation process (cc1) killed by SIGKILL This almost always means your computer does not have enough RAM to compile this program. Try adding swap space or reducing the number of concurrent compile jobs.
[Bug c++/68265] New: Arbitrary syntactic nonsense silently accepted after 'int (*){}' until the next close brace
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68265 Bug ID: 68265 Summary: Arbitrary syntactic nonsense silently accepted after 'int (*){}' until the next close brace Product: gcc Version: 5.2.1 Status: UNCONFIRMED Keywords: accepts-invalid Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com Target Milestone: --- (From https://stackoverflow.com/questions/33614455/ :) The C++ compiler fails to diagnose ill-formed constructs such as int main() { int (*) {} any amount of syntactic nonsense on multiple lines, with *punctuation* and ++operators++ even... will be silently discarded until the next close brace } With -pedantic -std=c++98 you do get "warning: extended initializer lists only available with -std=c++11 or -std=gnu++11", but with -std=c++11, not a peep. If any one (or more) of the tokens 'int ( * ) { }' are removed, you do get an error. Also, the C compiler does not have the same bug.
[Bug c++/68265] Arbitrary syntactic nonsense silently accepted after 'int (*){}' until the next close brace
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68265 --- Comment #2 from Zack Weinberg --- This problem apparently goes back at least as far as 4.8. Stack Overflow people found a number of variations, please consult https://stackoverflow.com/questions/23033043/is-it-a-new-c11-style-of-comments https://stackoverflow.com/questions/23015482/strange-code-that-compiles-with-g
[Bug lto/61886] [4.8/4.9/5 Regression] LTO breaks fread with _FORTIFY_SOURCE=2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61886 --- Comment #34 from Zack Weinberg --- > As I tried to explain, it is currently design decision to have one declaration > and one symtam node for one symbol. The one decl rule was introduced by > Codesourcery (Zack) in 2003-4. He updated frontends to avoid copying and > dropped code that dealt with duplicated declarations. Due to lack of sanity > checking some cases remained, like this one (because at that time we did not > really have proper asm name hash). There are couple open PRs since that time > that was never corrected. It's been long enough that I don't recall precisely what my goals were, but I don't think it was my intention to enforce a "one decl per asm name" rule at the time. I was trying to deal with C front-end issues: according to https://gcc.gnu.org/ml/gcc-patches/2004-01/msg00808.html, "at least bug 12267, bug 12336, bug 12373, bug 12391, bug 12560, bug 12934, bug 13129". My recollection is that the fundamental problem was getting more than one TREE_DECL object per *declared name* in -funit-at-a-time mode, which was shiny and new back then! ... Digging through the mailing list archives, maybe I'm remembering wrong. https://gcc.gnu.org/ml/gcc-patches/2004-03/msg01440.html has > The old code took the copying > approach; this was bad because it violated the basic assumption made > elsewhere in the compiler that there is exactly one DECL node for each > assembly-level symbol. Hence all the bugs. I do not remember anymore what parts of the compiler were making those assumptions, but I'm guessing it was probably the debug information generators, or mostly those. Anyhow, I don't think y'all should be taking a decision I made for the C front end under time and peer pressure (people were *really mad* at me for leaving some of those bugs unfixed for several releases in a row) in 2004 as a permanent compiler-wide design constraint ten years later :) (Note also bug 13801, and the discussion of its fix, here: https://gcc.gnu.org/ml/gcc-patches/2004-08/msg00085.html -- we wound up having to back down from the "not only is there only one DECL per assembly level symbol, it only ever accumulates information" rule that I originally wanted, due to C90 conformance problems.)
[Bug lto/62249] Spurious FORTIFY_SOURCE warning with -flto, poll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62249 --- Comment #4 from Zack Weinberg --- The "delta-minimized self-contained test case" should require only `-O2 -flto` to reproduce the problem. The other test case, however, should require `-D_FORTIFY_SOURCE=2` in addition, and to be compiled against recent glibc. Note also that this does not happen with 4.8 or older.
[Bug preprocessor/61638] New: "warning: multi-line comment" unclear and has false positives
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61638 Bug ID: 61638 Summary: "warning: multi-line comment" unclear and has false positives Product: gcc Version: unknown Status: UNCONFIRMED Keywords: diagnostic Severity: enhancement Priority: P3 Component: preprocessor Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com The warning for a // comment continued on the next line by a backslash is "warning: multi-line comment", which doesn't make a whole lot of sense if you don't already know that this is a possibility. (Caret diagnostics help, but the caret is in the wrong place: it points at the //, not the \.) It also warns when the construct is harmless, e.g. because the next line has another // comment on it: this naturally happens when someone feels like drawing a tree //A // / \ // B C This peeves people enough to write snarky comments like this (from https://code.monotone.ca/p/monotone/source/tree/h:net.venge.monotone/test/unit/tests/graph.cc#L260 ): // This tests the nasty case described in the giant comment above // get_uncommon_ancestors: // // 9 // |\ . Extraneous dots brought to you by the // 8 \ . Committee to Shut Up the C Preprocessor // /| \. (CSUCPP), and viewers like you and me. /// | | // (... ASCII art continues ...) So I would like to suggest the following four improvements: 1) The warning message should be changed to "warning: backslash-newline continues // comment onto the next line". 2) The caret should be corrected to point to the backslash, not the //. 3) The warning should be suppressed if the next line is blank or contains only another // comment. (It should *not* be suppressed if the next line is blank up to a /*, because that genuinely does change the meaning of the code. 4) "warning: backslash and newline separated by space" should not be suppressed in // comments, because see bug 8270 for a whole bunch of people being very confused why putting a space after the backslash in their ASCII art doesn't make it not a backslash-newline. (It should still be suppressed in /* comments.)
[Bug tree-optimization/62112] New: Optimize out malloc when block is unused or write-only
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62112 Bug ID: 62112 Summary: Optimize out malloc when block is unused or write-only Product: gcc Version: 4.9.1 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com This program #include #include int main(void) { size_t n = 1000; float *x = calloc(n,sizeof(float)); float *y = malloc(n*sizeof(float)); if (x && y) memcpy(y,x,sizeof(float)*n); return 0; } can be optimized (in the absence of `-fno-builtin-(memcpy|malloc|calloc)`) to int main(void) { return 0; } because: the memory block pointed to by `y` is write-only, so the `memcpy` and `malloc` can be discarded; after that is done, the memory block pointed to by `x` is unused, so that allocation can be discarded as well. `calloc` is used here to avoid any question of UB due to reading uninitialized memory even within `memcpy`. The optimization should apply to all heap-allocation functions, including especially C++ operator new (as long as the constructor has no side effects outside the just-allocated object). Clang 3.5 does perform this optimization.
[Bug tree-optimization/62112] Optimize out malloc when block is unused or write-only
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62112 --- Comment #2 from Zack Weinberg --- I observe that the `memcpy` does get lowered to inline code. Is it just a phase-ordering problem that we then don't detect the stores as dead?
[Bug c++/50595] template overload resolution insufficiently sensitive to name dependency?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50595 Zack Weinberg changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #2 from Zack Weinberg --- Per http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#197 GCC's behavior is correct.
[Bug lto/62249] New: Spurious FORTIFY_SOURCE warning with -flto, poll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62249 Bug ID: 62249 Summary: Spurious FORTIFY_SOURCE warning with -flto, poll Product: gcc Version: 4.9.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com Created attachment 33388 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33388&action=edit test case (compile on recent glibc system) Compile the attached program on a recent-glibc system with -D_FORTIFY_SOURCE=2 -O2 -flto and you get a spurious warning: $ gcc -flto -O2 -D_FORTIFY_SOURCE=2 test.c In function ‘__poll_alias’, inlined from ‘main’ at test.c:17:9: /usr/include/x86_64-linux-gnu/bits/poll2.h:41:2: warning: call to ‘__poll_chk_warn’ declared with attribute warning: poll called with fds buffer too small file nfds entries return __poll_chk (__fds, __nfds, __timeout, __bos (__fds)); $ gcc -fwhole-program -O2 -D_FORTIFY_SOURCE=2 test.c $ Inspection of bits/poll2.h leads me to believe that the glibc folks expect this construct ... __fortify_function int poll (struct pollfd *__fds, nfds_t __nfds, int __timeout) { if (__bos (__fds) != (__SIZE_TYPE__) -1) { if (! __builtin_constant_p (__nfds)) return __poll_chk (__fds, __nfds, __timeout, __bos (__fds)); else if (__bos (__fds) / sizeof (*__fds) < __nfds) return __poll_chk_warn (__fds, __nfds, __timeout, __bos (__fds)); } return __poll_alias (__fds, __nfds, __timeout); } ... to have the dead arm of the inner 'if' eliminated (one of them must be dead, because the condition is a call to __builtin_constant_p) before warnings are issued.
[Bug lto/62249] Spurious FORTIFY_SOURCE warning with -flto, poll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62249 --- Comment #1 from Zack Weinberg --- Created attachment 33389 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33389&action=edit Delta-minimized self-contained test case Requires no headers anymore. Delta-minimization revealed something interesting: int __poll_chk (struct pollfd *, unsigned, int, unsigned); int __poll_chk_warn (struct pollfd *, unsigned, int, unsigned) __asm__ ("__poll_chk") __attribute__ ((__warning__ ("poll called with fds buffer too small"))); The __asm__("__poll_chk") annotation is essential; if it's removed the bug goes away. I don't know anything about the guts of LTO, but to me that suggests a bug in its unification of function decls, rather than a bug in dead-code removal.
[Bug lto/62249] Spurious FORTIFY_SOURCE warning with -flto, poll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62249 --- Comment #2 from Zack Weinberg --- Incidentally, yes, the test case is based on a real program.
[Bug sanitizer/62307] New: -fsanitize=undefined doesn't pay attention to __attribute__((returns_nonnull))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62307 Bug ID: 62307 Summary: -fsanitize=undefined doesn't pay attention to __attribute__((returns_nonnull)) Product: gcc Version: 4.9.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gnu.org Consider extern int *__errno_location(void) __attribute__ ((__const__)) __attribute__ ((__returns_nonnull__)); void set_errno(int x) { (*__errno_location()) = x; } (This is a close approximation to how GNU libc defines errno -- there currently isn't a returns_nonnull annotation, but I'm about to file a bug for _that_ ;-) With 4.9.1, -fsanitize=undefined -O2 -fdump-tree-optimized, I get set_errno (int x) { int * _1; : _1 = __errno_location (); if (_1 == 0B) goto ; else goto ; : __builtin___ubsan_handle_type_mismatch (&*.Lubsan_data0, 0); : *_1 = x_3(D); return; } The returns_nonnull annotation should cause the compiler to delete the runtime check. (None of the RTL passes delete it either.) This type of unnecessary check is the single largest source of calls to __builtin___ubsan_handle_type_mismatch in a real program. I see a number of other cases where we could be doing better at eliminating these checks on "this pointer is already known to be non-NULL" grounds, e.g. if (_191 == 0B) goto ; else goto ; : __builtin___ubsan_handle_type_mismatch (&*.Lubsan_data26, 0); : *_191 = _37; if (_191 == 0B) goto ; else goto ; : __builtin___ubsan_handle_type_mismatch (&*.Lubsan_data27, 0); ... so maybe it's a pass-ordering problem? Or rather, the fact that UBSAN_NULL() is an opaque operation until .sanopt, which prevents nearly all of the tree optimizers from doing anything with it?
[Bug sanitizer/62307] -fsanitize=undefined doesn't pay attention to __attribute__((returns_nonnull))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62307 --- Comment #2 from Zack Weinberg --- (In reply to Marc Glisse from comment #1) > -fsanitize=null seems to imply -fno-delete-null-pointer-checks, so I assume > this is on purpose. It would actually be quite natural for the sanitizer to > insert an extra check after every call to a returns_nonnull function, > checking that the result is indeed !=0. Otherwise yes, sanopt is way too > late for any other optimization to take place. I had a different mental model of how ubsan was supposed to function -- I was expecting it to insert explicit checks before every construct with potentially runtime-undefined behavior (not due to array bounds or pointer lifetimes), but then for the compiler to try as hard as possible to *eliminate* the checks where provably unnecessary. I was hoping, in fact, to be able to use -fsanitize=undefined -fdump-tree-optimized as a poor man's "tell me all the places where the program can't be proven *not* to have runtime-undefined behavior."
[Bug tree-optimization/63271] New: Should commute arithmetic with vector load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63271 Bug ID: 63271 Summary: Should commute arithmetic with vector load Product: gcc Version: 4.9.1 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zackw at panix dot com Consider #include __m128i foo(char C) { return _mm_set_epi8( 0,C, 2*C, 3*C, 4*C, 5*C, 6*C, 7*C, 8*C, 9*C, 10*C, 11*C, 12*C, 13*C, 14*C, 15*C); } __m128i bar(char C) { __m128i v = _mm_set_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15); v *= C; return v; } I *believe* these functions compute the same value, and should therefore generate identical code, but with gcc 4.9 foo() generates considerably larger and slower code. The test case is expressed in terms of x86 but I have no reason to believe it isn't a generic missed optimization in the tree-level vectorizer.
[Bug rtl-optimization/48580] New: missed optimization: integer overflow checks
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580 Summary: missed optimization: integer overflow checks Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: za...@panix.com To the best of my knowledge, this is the only safe way (without -fwrapv) to check whether the product of two signed integers overflowed: bool product_does_not_overflow(signed x, signed y) { unsigned tmp = x * unsigned(y); return signed(tmp) > 0 && tmp / x == unsigned(y); } (I believe C and C++ are the same in this regard but I could be wrong. If there is a better way to write this test I would love to know about it.) g++ 4.6 produces this assembly dump on x86-64: _Z25product_does_not_overflowii: movl%esi, %edx xorl%eax, %eax imull%edi, %edx testl%edx, %edx jle.L2 movl%edx, %eax xorl%edx, %edx divl%edi cmpl%eax, %esi sete%al .L2: rep ret but, if I understand the semantics of IMUL correctly, it could do this instead: _Z25product_does_not_overflowii: xorl%eax, %eax imull%edi, %esi setno%al ret which is a pretty substantial micro-win, particularly in getting rid of a divide.
[Bug rtl-optimization/48580] missed optimization: integer overflow checks
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580 --- Comment #2 from Zack Weinberg 2011-04-12 20:40:41 UTC --- (In reply to comment #1) > > Two signed integers given that they are known to be positive, anyway. > This may return unexpected results if either or both arguments are > negative or zero. ... > (If the function gets called with one constant operand, you can make it > inline and use __builtin_constant_p to replace a divide with a range check > on the other operand. That's only useful for some cases of overflow > checks, of course.) In the code that this is cut down from, both arguments are known to be strictly positive, but neither is constant. (They're only signed for historical reasons, I think, but it would be a huge amount of work to change that.) > I sort of think GCC should have built-in functions exposing C and C++ > interfaces for: each basic arithmetic operation, defined to wrap; each > basic arithmetic operation, defined to saturate; each basic arithmetic > operation, defined to have undefined overflow; each basic arithmetic > operation, with a separate overflow flag being set; each basic arithmetic > operation, defined to trap on overflow. All of these for both signed and > unsigned and for any desired number of bits (up to the maximum number > supported for arithmetic, so generally 1-64 bits on 32-bit configurations > and 1-128 bits on 64-bit configurations); except for the defined-to-trap > case, all would still have undefined behavior on division by 0. You could > then have optimizations mapping generic C idioms to such built-in > operations where the target supports efficient code for the operations. > But this rather relies on the no-undefined-overflow work being finished > first so that some of the required operations actually exist inside GCC, > before they can easily be exposed to the user. So you see this as more of a tree-level than an RTL-level missed optimization, then? Your plan sounds fine to me, although I might look for a less ambitious but more likely to get done soon approach, personally.
[Bug rtl-optimization/48580] missed optimization: integer overflow checks
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580 --- Comment #4 from Zack Weinberg 2011-04-12 21:03:01 UTC --- On Tue, Apr 12, 2011 at 1:52 PM, joseph at codesourcery dot com wrote: >> In the code that this is cut down from, both arguments are known to be >> strictly >> positive, but neither is constant. (They're only signed for historical >> reasons, I think, but it would be a huge amount of work to change that.) > > My point in noting the need for the integers to be positive was really > that unless the compiler knows they are positive, the transformation > you're asking for appears to be incorrect - the semantics of your function > are that a product with either term 0 counts as overflowing, but using a > processor overflow flag would report it as not overflowing. Well, if the compiler didn't know that, it could still use the overflow flag plus an extra test for either input operand being zero, couldn't it? The C idiom has to test for a zero result, because e.g. 0x4000_U * 16 wraps to zero. (The original code does in fact check for x or y <= 0 in a place where VRP would notice; I should have said that instead of "known to be strictly positive", sorry for any confusion.)
[Bug rtl-optimization/48580] missed optimization: integer overflow checks
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580 --- Comment #5 from Zack Weinberg 2011-04-12 21:04:34 UTC --- Addendum: what would *you* describe as the correct C idiom for ensuring that the product of two signed integers was positive and did not overflow the range of a same-sized signed integer, assuming nothing about either multiplicand?
[Bug c/45861] New: Possible missed optimization - array ops vs shift-and-mask
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45861 Summary: Possible missed optimization - array ops vs shift-and-mask Product: gcc Version: 4.5.1 Status: UNCONFIRMED Severity: minor Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: za...@panix.com Created attachment 21938 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=21938 test case The attached .c file contains two functions, which (unless I screwed up) compute exactly the same (mathematical) function - they take an array of 8 bytes, permute its elements, and stuff them into a 64-bit integer, which is then returned. However, GCC generates very different code for each (on x86-64). There seem to be two missed optimization opportunities here: 1) I don't know *which* of the two code generation possibilities here is better, but it seems like GCC ought to know and ought to generate that code for both functions. 2) Could we be taking advantage of SSEn vector permute instructions here?
[Bug c/44179] New: warn about sizeof(char)
sizeof(char) is defined to be 1, and is therefore almost always unnecessary. It would be nice if the C and C++ front ends could complain about it. -- Summary: warn about sizeof(char) Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: zackw at panix dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44179
[Bug c/44179] warn about sizeof(char)
--- Comment #2 from zackw at panix dot com 2010-05-17 23:59 --- So it has to be a little smarter than the obvious, so what else is new. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44179