Re: Fw: Problems with compiling autogen with GCC8 or newer versions
On 1/8/21 12:38 PM, Bruce Korb via Gcc wrote: $ bash cc.sh + wrn+=' -Werror=format-overflow' + gcc -DHAVE_CONFIG_H -I. -I.. -I../autoopts -Wno-format-contains-nul -Wall -Werror -Wcast-align -Wmissing-prototypes -Wpointer-arith -Wshadow -Wstrict-prototypes -Wwrite-strings -Wstrict-aliasing=3 -Wextra -Wno-cast-qual -Werror=format-overflow -fno-strict-aliasing -g -O2 -MT gd.o -MD -MP -MF .deps/gd.Tpo -c -o gd.o gd.c $ bash cc.sh/(edited to do pre-processing only, to wit:)/ + gcc -DHAVE_CONFIG_H -I. -I.. -I../autoopts -E -o gd.i gd.c $ gcc --version gcc (SUSE Linux) 7.5.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. buildPreamble() starts on line 10230. This is the code that must be confusing to GCC. "def_str" points to the second character in the 520 byte buffer. "def_scan" points to a character that we already know we're going to copy into the destination, so the "spn" function doesn't look at it: { char * end = spn_ag_char_map_chars(def_scan + 1, 31); size_t len = end - def_scan; if (len >= 256) goto fail_return; memcpy(def_str, def_scan, len); def_str += len; *def_str = '\0'; def_scan = end; } "spn_ag_char_map_chars" skips over characters that are in the 31st character set. (It's generated.) If there are too many characters in that set, this code fragment will go into failure mode. *IF* this .i file doesn't generate the error, then Thomas' compiler must be yielding something weird in its preprocessing output. It compiles just fine with my 7.5.0 compiler. In my testing GCC 7 doesn't warn for any of the sprintf callss in the translation unit, at least in part because it determine the size of the destination. GCC 8 also doesn't warn but it does determine the size. Here's the output for the relevant directive (from the output of -fdump-tree-printf-return-value in GCC versions prior to 10, or -fdump-tree-strlen in GCC 10 and later). objsize is the size of the destination, or 520 bytes here (this is in contrast to the 255 in the originally reported message). The Result numbers are the minimum and maximum size of the output (between 0 and 255 characters). Computing maximum subobject size for def_str_146: getdefs.c:275: sprintf: objsize = 520, fmtstr = " %s'" Directive 1 at offset 0: " ", length = 2 Result: 2, 2, 2, 2 (2, 2, 2, 2) Directive 2 at offset 2: "%s" Result: 0, 255, 255, 255 (2, 257, 257, 257) Directive 3 at offset 4: "'", length = 1 Result: 1, 1, 1, 1 (3, 258, 258, 258) Directive 4 at offset 5: "", length = 1 Besides provable overflow, it's worth noting that -Wformat-overflow also diagnoses a subset of cases where it can't prove that overflow cannot happen. One common case is: extern char a[8], b[8]; sprintf (a, "a=%s", b); warning: ‘%s’ directive writing up to 7 bytes into a region of size 6 [-Wformat-overflow=] Although b's length isn't known, GCC uses its size to warn about the possible overflow. The solution is to either use precision to constrain the amount of output or in GCC 10 and later to assert that b's length is less than 7. So if in the autogen file def_str is ever less than 258 bytes I'd expect the warning to trigger with a message close to the original. Martin On 1/8/21 11:01 AM, Jeff Law wrote: On 1/8/21 10:39 AM, Bruce Korb via Gcc wrote: Also, GCC's code analysis is wrong. "name_bf" contains *NO MORE* than MAXNAMELEN characters. That is provable. "def_str" points into a buffer of size ((MAXNAMELEN * 2) + 8) and at an offset maximum of MAXNAMELEN+1 (also provable), meaning that at a minimum there are MAXNAMELEN+6 bytes left in the buffer. That objected-to sprintf can add a maximum of MAXNAMELEN + 4 to where "def_str" points. GCC is wrong. It is unable to figure out how far into the buffer "def_str" can point. Can you get a .i file, command line and file a report. It'd be appreciated. jeff
Re: Fw: Problems with compiling autogen with GCC8 or newer versions
Hi Martin, On 1/10/21 11:01 AM, Martin Sebor wrote: On 1/8/21 12:38 PM, Bruce Korb via Gcc wrote: This is the code that must be confusing to GCC. "def_str" points to the second character in the 520 byte buffer. "def_scan" points to a character that we already know we're going to copy into the destination, so the "spn" function doesn't look at it: { char * end = spn_ag_char_map_chars(def_scan + 1, 31); size_t len = end - def_scan; if (len >= 256) goto fail_return; memcpy(def_str, def_scan, len); def_str += len; *def_str = '\0'; def_scan = end; } In the function preamble, "def_str" points to the first character (character "0") of a 520 byte buffer. Before this fragment, "*def_str" is set to an apostrophe and the pointer advanced. After execution passes through this fragment, "def_str" is pointing to a NUL byte that can be as far as 257 bytes into the buffer (character "257"). That leaves 263 more bytes. The "offending" sprintf is: sprintf(def_str, " %s'", name_bf); GCC correctly determines that "name_bf" cannot contain more than 255 bytes. Add 3 bytes of text and a NUL byte and the sprintf will be dropping *AT MOST* 259 characters into the buffer. The buffer is 4 bytes longer than necessary. GCC 8 also doesn't warn but it does determine the size. Here's the output for the relevant directive (from the output of -fdump-tree-printf-return-value in GCC versions prior to 10, or -fdump-tree-strlen in GCC 10 and later). objsize is the size of the destination, or 520 bytes here (this is in contrast to the 255 in the originally reported message). The Result numbers are the minimum and maximum size of the output (between 0 and 255 characters). Computing maximum subobject size for def_str_146: getdefs.c:275: sprintf: objsize = 520, fmtstr = " %s'" Directive 1 at offset 0: " ", length = 2 Result: 2, 2, 2, 2 (2, 2, 2, 2) Directive 2 at offset 2: "%s" Result: 0, 255, 255, 255 (2, 257, 257, 257) Directive 3 at offset 4: "'", length = 1 Result: 1, 1, 1, 1 (3, 258, 258, 258) Directive 4 at offset 5: "", length = 1 Besides provable overflow, it's worth noting that -Wformat-overflow It can /not/ overflow. Those compiler stats are not decipherable by me. also diagnoses a subset of cases where it can't prove that overflow cannot happen. One common case is: extern char a[8], b[8]; sprintf (a, "a=%s", b); My example would have the "a" array sized at 16 bytes and "b" provably not containing more than 7 characters (plus a NUL). There would be no overflow. ... The solution is to either use precision to constrain the amount of output or in GCC 10 and later to assert that b's length is less than 7. See the fragment below to see where the characters in name_bf can */NOT/* be more than 255. There is no need for either a precision constraint or an assertion, based on that code fragment. So if in the autogen file def_str is ever less than 258 bytes/[259 -- NUL byte, too]/ I'd expect the warning to trigger with a message close to the original. It can not be less than 263. For the sake of those not reading the code, here is the fragment that fills in "name_bf[256]": { char * end = spn_ag_char_map_chars(def_scan + 1, 26); size_t len = end - def_scan; if (len >= 256) goto fail_return; memcpy(name_bf, def_scan, len); name_bf[len] = '\0'; def_scan = end; }
gcc-11-20210110 is now available
Snapshot gcc-11-20210110 is now available on https://gcc.gnu.org/pub/gcc/snapshots/11-20210110/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 11 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch master revision 872373360dab259d51caa002ff1722ff84746d8b You'll find: gcc-11-20210110.tar.xz Complete GCC SHA256=4599e54829d6bb6cd570ad67c7d7b0798e5704963fbe254263dca5e7664d7e59 SHA1=3f73bbdafc21e76e513685a0dd27a3f28a383b30 Diffs from 11-20210103 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-11 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Fw: Problems with compiling autogen with GCC8 or newer versions
On 1/10/21 12:28 PM, Bruce Korb wrote: Hi Martin, On 1/10/21 11:01 AM, Martin Sebor wrote: On 1/8/21 12:38 PM, Bruce Korb via Gcc wrote: This is the code that must be confusing to GCC. "def_str" points to the second character in the 520 byte buffer. "def_scan" points to a character that we already know we're going to copy into the destination, so the "spn" function doesn't look at it: { char * end = spn_ag_char_map_chars(def_scan + 1, 31); size_t len = end - def_scan; if (len >= 256) goto fail_return; memcpy(def_str, def_scan, len); def_str += len; *def_str = '\0'; def_scan = end; } In the function preamble, "def_str" points to the first character (character "0") of a 520 byte buffer. Before this fragment, "*def_str" is set to an apostrophe and the pointer advanced. After execution passes through this fragment, "def_str" is pointing to a NUL byte that can be as far as 257 bytes into the buffer (character "257"). That leaves 263 more bytes. The "offending" sprintf is: sprintf(def_str, " %s'", name_bf); Sure. I was confirming that based on the GCC dump there is no risk of an overflow in the translation unit, and so there is no warning. GCC correctly determines that "name_bf" cannot contain more than 255 bytes. Add 3 bytes of text and a NUL byte and the sprintf will be dropping *AT MOST* 259 characters into the buffer. The buffer is 4 bytes longer than necessary. GCC 8 also doesn't warn but it does determine the size. Here's the output for the relevant directive (from the output of -fdump-tree-printf-return-value in GCC versions prior to 10, or -fdump-tree-strlen in GCC 10 and later). objsize is the size of the destination, or 520 bytes here (this is in contrast to the 255 in the originally reported message). The Result numbers are the minimum and maximum size of the output (between 0 and 255 characters). Computing maximum subobject size for def_str_146: getdefs.c:275: sprintf: objsize = 520, fmtstr = " %s'" Directive 1 at offset 0: " ", length = 2 Result: 2, 2, 2, 2 (2, 2, 2, 2) Directive 2 at offset 2: "%s" Result: 0, 255, 255, 255 (2, 257, 257, 257) Directive 3 at offset 4: "'", length = 1 Result: 1, 1, 1, 1 (3, 258, 258, 258) Directive 4 at offset 5: "", length = 1 Besides provable overflow, it's worth noting that -Wformat-overflow It can /not/ overflow. Those compiler stats are not decipherable by me. They indicate the minimum, likely, maximum, and unlikely maximum number of bytes of output for each directive and the running totals for the call (in parentheses). The relevant lines are these: Directive 2 at offset 2: "%s" Result: 0, 255, 255, 255 (2, 257, 257, 257) The result tells us that the length of the %s argument is between 0 and 255 bytes long. Since objsize (the size of the destination) is 520 there is no buffer overflow. The note in the forwarded message indicates that GCC computes the destination size to be much smaller for some reason: note: 'sprintf' output between 4 and 259 bytes into a destination of size 255 I.e., it thinks it's just 255 bytes. As I explained, such a small size would trigger the warning by design. I can't really think of a reason why GCC would compute a smaller size here (it looks far from trivial). We'd need to see the original poster's translation unit and know the host and the target GCC was configured for. Martin also diagnoses a subset of cases where it can't prove that overflow cannot happen. One common case is: extern char a[8], b[8]; sprintf (a, "a=%s", b); My example would have the "a" array sized at 16 bytes and "b" provably not containing more than 7 characters (plus a NUL). There would be no overflow. ... The solution is to either use precision to constrain the amount of output or in GCC 10 and later to assert that b's length is less than 7. See the fragment below to see where the characters in name_bf can */NOT/* be more than 255. There is no need for either a precision constraint or an assertion, based on that code fragment. So if in the autogen file def_str is ever less than 258 bytes/[259 -- NUL byte, too]/ I'd expect the warning to trigger with a message close to the original. It can not be less than 263. For the sake of those not reading the code, here is the fragment that fills in "name_bf[256]": { char * end = spn_ag_char_map_chars(def_scan + 1, 26); size_t len = end - def_scan; if (len >= 256) goto fail_return; memcpy(name_bf, def_scan, len); name_bf[len] = '\0'; def_scan = end; }
Re: Fw: Problems with compiling autogen with GCC8 or newer versions
Hi, On 1/10/21 3:56 PM, Martin Sebor wrote: Sure. I was confirming that based on the GCC dump there is no risk of an overflow in the translation unit, and so there is no warning. OK. :) I didn't understand the GCC dump. Despite having commit privs, I'm not actually a compiler guru. It can /not/ overflow. Those compiler stats are not decipherable by me. They indicate the minimum, likely, maximum, and unlikely maximum number of bytes of output for each directive and the running totals for the call (in parentheses). The relevant lines are these: Directive 2 at offset 2: "%s" Result: 0, 255, 255, 255 (2, 257, 257, 257) The result tells us that the length of the %s argument is between 0 and 255 bytes long. It should be 1 to 255. 0 is actually impossible, but it would take crazy complicated sleuthing to figure it out, even though the "spn_*" functions should be inlined. Since objsize (the size of the destination) is 520 there is no buffer overflow. The size of the destination is guaranteed to be between 263 and 518 bytes. The "def_str" pointer will always point at least two bytes past the start of the 520 byte buffer. The note in the forwarded message indicates that GCC computes the destination size to be much smaller for some reason: note: 'sprintf' output between 4 and 259 bytes into a destination of size 255 I.e., it thinks it's just 255 bytes. As I explained, such a small size would trigger the warning by design. Yep. If it can accurately figure out the minimum size remaining, that would be completely fine. "If." I can't really think of a reason why GCC would compute a smaller size here (it looks far from trivial). If it can figure out that the minimum size is 263, that'd be great. If it can't, it needs to be quiet. We'd need to see the original poster's translation unit and know the host and the target GCC was configured for. OK. Not anything I can do. Thomas would have to send in his version of "gd.i". Thank you! Regards, Bruce
Re: [RFC] restricting aliasing by standard containers (PR 98465)
On Fri, Jan 8, 2021 at 9:16 PM Martin Sebor wrote: > > On 1/8/21 12:51 AM, Richard Biener wrote: > > On Thu, Jan 7, 2021 at 10:41 PM Martin Sebor wrote: > >> > >> The test case in PR 98465 brings to light a problem we've discussed > >> before (e.g., PR 93971) where a standard container (std::string in > >> this case but the problem applies to any class that owns and manages > >> allocated memory) might trigger warnings for unreachable code. > >> The code is not eliminated due to a missing aliasing constraint: > >> because GCC doesn't know that the member pointer to the memory > >> managed by the container cannot alias other objects, it emits code > >> that can never be executed in a valid program and that's prone to > >> causing false positives. > >> > >> To illustrate, at the moment it's impossible to fold away the assert > >> below because there's no way to determine in the middle end that > >> String::s cannot point to a: > >> > >> extern char array[]; > >> > >> class String { > >> char *s; > >> public: > >>String (const char *p): s (strdup (p)) { } > >>String (const String &str): s (strdup (str.s)) { } > >>~String () { free (s); } > >> > >>void f () { assert (s != array); } > >> }; > >> > >> The constraint is obvious to a human reader (String::s is private > >> and nothing sets it to point to array) but there's no way for GCC > >> to infer it from the code alone (at least not in general): there > >> could be member or friend functions defined in other translation > >> units that violate this assumption. > >> > >> One way to solve the problem is to explicitly declare that > >> String::s, in fact, doesn't point to any such objects and that it > >> only ever points to allocated memory. My idea for doing that is > >> to extend attribute malloc to (or add a new attribute for) pointer > >> variables to imply that the pointer only points to allocated memory. > >> > >> However, besides pointing to allocated memory, std::string can also > >> point to its own internal buffer, so the extended malloc attribute > >> couldn't be used there by itself. I think this could be solved by > >> also either extending the may_alias attribute or adding a new > >> "alias" (or some such) attribute to denote that a pointer variable > >> may point to an object or subobject. > >> > >> Putting the two together, to eliminate the assert, std::string would > >> be annotated like so: > >> > >> class string { > >> char *s __attribute__ ((malloc, may_alias (buf))); > >> char buf[8]; > >> public: > >>string (): s (buf) { } > >>string (const char *p): s (strdup (p)) { } > >>string (const string &str): s (strdup (str.s)) { } > >>~string () { if (s != buf) free (s); } > >> > >>void f () { assert (s != array); } > >> }; > >> > >> The may_alias association with members is relative to the this pointer > >> (i.e., as if by may_alias (this->buf), as opposed to being taken as > >> may_alias (String::buf) and meaning that s might be equal to any other > >> String::s with a different this. To help avoid mistakes, setting s > >> in violation of the constraints would trigger warnings. > >> > >> If this sounds reasonable I'm prepared to prototype it, either for > >> GCC 11 if it's in scope to solve the PR and there's still time, or > >> (I suspect more likely) for GCC 12. > >> > >> Richard, what are your thoughts/concerns? > > > > I'm not sure it's feasible to make use of this attribute. First > > there's the malloc part which has difficult semantics (similar > > to restrict) when generating PTA constraints. We might see > > > > _1 = str.s; > > _2 = str.s; > > > > but are of course required to associate the same allocated > > dummy object with both pointers (as opposed to when we'd > > see two malloc calls). What would possibly work is to > > have the object keyed on the field decl, but then for > > > > _1 = p_to_str_4(D); > > _2 = _1 + offsetof-s; > > _3 = *_2; > > > > we have to somehow conservatively arrive at the same object. > > I don't see how that can work out. > > > > All the same applies to the may_alias part but I guess when the > > malloc part falls apart that's not of much interest. > > > > So I'm concerned about correctness - I'm sure you can hack > > sth together to get some testcases optimized. But I'm not sure > > you can make it correct in all cases (within the current PTA > > framework). > > Thanks for the feedback. > > Absent some source level annotation I can't think of a good way > to avoid these false positives. Do you have any other ideas? > > If not, would you be opposed to introducing these attributes to > suppress warnings (at least at first)? Besides avoiding the false > positives, implementing just that part might also be a good proof > of concept for the aliasing solution (or a confirmation of your > intuition). I don't think "overloading" malloc and alias attributes in the suggested way
Re: Adjust offset of array reference in named address space
On Sat, Jan 9, 2021 at 12:24 AM Tucker Kern via Gcc wrote: > > Hi, > > I'm implementing named addresses spaces for a Harvard architecture machine > to support copying data from instruction memory to data memory. This is > achieved via a special instruction. e.g. think AVR and progmem/__flash. > > However, the instruction memory is narrower than the data memory (12 vs 16 > bits) on this machine. So a single data word is split across 2 instruction > words. When copied from IMEM to DMEM the two parts are combined via SHIFT + > OR patterns. > > This is all working fine for regular variables (i.e. int som_var), but it > falls apart for array references (i.e. some_array[1]). Since the data is > stored across 2 IMEM words, I need to scale the calculated offset of each > array reference by 2. e.g. array[0] is actually stored in imem[0] & imem[1] > and array[1] is stored in imem[2] & imem[3]. So you are saying that a 16bit data word in IMEM is actually two 12bit data words (supposedly only the lower 8 bits used in each) and thus the array contains "padding"? That's not really supported and is also not the scope of named address spaces. I'd suggest you go down the route of providing intrinsics for the transfer of data instead which could resort to target specific builtin functions. > e.g. > static __imem int imem_array[2]; > return imem_array[1]; > > // needs to generate a symbol reference like > &imem_array.869+2 > > Similarly if the array index was a function parameter, I need to scale the > parameter by 2. > __imem int imem_array[2]; > int some_func(int a) > { > // a needs to be scaled by 2 when generating RTL/ASM > return imem_array[a]; > } > > I haven't found any target hooks that would allow me to override the offset > calculation. Originally I thought I could handle it in a splitter but this > approach didn't work for the function parameter example as I ended up > scaling the entire address instead of just the offset. > > I had another thought of using a combo of > TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS and > TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P to scale the offset and mark it as > adjusted but I don't think this combo will work in the end. > > Is there any way to achieve this? > > Thanks, > Tucker