[Bug libstdc++/102259] New: ifstream::read(…, count) fails when count >= 2^31 on darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102259 Bug ID: 102259 Summary: ifstream::read(…, count) fails when count >= 2^31 on darwin Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: mimomorin at gmail dot com Target Milestone: --- Created attachment 51431 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51431&action=edit Testcase for ifstream::read(…, count >= 2^31) I tried to read a large file using `ifstream::read` on Mac, but it fails to read any byte when count >= 2^31. Note that the system is 64-bit and `std::streamsize` has 8 bytes. Here is a testcase. #include #include int main() { std::ifstream is{"2GB.bin", std::ios::binary}; // filesize >= 2^31 bytes auto buffer = new char[1LL << 31]; is.read(buffer, 1LL << 31); std::cout << is.good() << " (" << is.gcount() << " bytes)\n"; // Expected output: "1 (2147483648 bytes)" // Actual output (on Mac): "0 (0 bytes)" } My system is macOS 10.15 running on x86_64 Mac. The testcase failed on Homebrew's GCC (ver. 6, 9, 10, 11) and MacPorts' GCC (ver. 6), but it succeeded on LLVM Clang (trunk) and Apple Clang (ver. 12). `ifstream::read(…, count)` works fine when count < 2^31. So if we split is.read(buffer, 1LL << 31); into is.read(buffer, (1LL << 31) - 1); is.read(buffer + (1LL << 31) - 1, 1); then everything goes OK. Additionally, `istringstream::read(…, count >= 2^31)` works fine both on GCC and Clang. I don't think this simple issue went unnoticed, so maybe I've missed something.
[Bug libstdc++/102259] ifstream::read(…, count) fails when count >= 2^31 on darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102259 --- Comment #2 from Michel Morin --- Whoa, darwin's (and FreeBSD's too?) `read(…, …, nbyte)` fails when nbyte >= 2^31! This is the culprit, I think. I also found the following description in FreeBSD's manpage of read (https://www.unix.com/man-page/FreeBSD/2/read/): ERRORS [EINVAL] The value nbytes is greater than INT_MAX. Given that the testcase works file when compiled with Clang, libcxx would have some workround for it.
[Bug libstdc++/102259] ifstream::read(…, count) fails when count >= 2^31 on darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102259 --- Comment #4 from Michel Morin --- I googled and found that Rust and Python had the same issue (and fixed it): [Rust] https://github.com/rust-lang/rust/issues/38590 (PR: https://github.com/ziglang/zig/pull/6333) [Python] https://bugs.python.org/issue24658 (PR: https://github.com/python/cpython/pull/1705) These bug reports also says that the darwin's `write(…, …, nbyte)` fails when nbyte > INT_MAX, and I confirmed that. > Maybe they do a loop around the read for sizes >= INT_MAX. Sounds good to me.
[Bug libstdc++/102259] ifstream::read(…, count) fails when count >= 2^31 on darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102259 --- Comment #5 from Michel Morin --- I put a wrong link for Rust's PR. The correct link is https://github.com/rust-lang/rust/pull/38622 .
[Bug c++/77565] `typdef int Int;` --> did you mean `typeof`?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77565 --- Comment #3 from Michel Morin --- There is a typo in this PR's Description. Here is a more readable one: When we enable `typeof` GCC extension (e.g. using `-std=gnu++**` options), we get strange did-you-mean suggestions. `typdef int Int;` -> error: 'typdef' does not name a type; did you mean 'typeof'? `typedeff int Int;` -> error: 'typedeff' does not name a type; did you mean 'typeof'? Confirmed on GCC 11.2.
[Bug c++/77565] `typdef int Int;` --> did you mean `typeof`?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77565 --- Comment #4 from Michel Morin --- It seems that the reason is: `cp_keyword_starts_decl_specifier_p` in `cp/parser.c` does not include `RID_TYPENAME`. Note that `typedef` is a decl-specifier ([dcl.spec] p.1 in the Standard).
[Bug c++/77565] `typdef int Int;` --> did you mean `typeof`?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77565 --- Comment #5 from Michel Morin --- Confirmed the fix. Will send a patch to ML. > I had use -std=c++98 This comment helps me a lot to understand what's going on. Thanks!
[Bug libstdc++/109891] New: Null pointer special handling in ostream's operator << for C-strings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109891 Bug ID: 109891 Summary: Null pointer special handling in ostream's operator << for C-strings Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: mimomorin at gmail dot com Target Milestone: --- This code #include int main() { std::cout << (char*)nullptr; } does not cause any bad things (like SEGV), because libstdc++'s operator<<(ostream, char const*) has special handling of null pointers: template inline basic_ostream<_CharT, _Traits>& operator<<(basic_ostream<_CharT, _Traits>& __out, const _CharT* __s) { if (!__s) __out.setstate(ios_base::badbit); else __ostream_insert(...); return __out; } Passing a null pointer to this operator is a precondition violation, so the current implementation perfectly conforms to the C++ standard. But, why don't we remove this special handling? By doing so, we get - better interoperability with toolings (i.e. sanitizers can find the bug easily) - unnoticeable performace improvement and we lose - deterministic behaviors (of poor codes) on a particular stdlib I believe the first point makes more sense than the last point. It seems that old special handling `if (s == NULL) s = "(null)";` (https://github.com/gcc-mirror/gcc/blob/6599da0/libio/iostream.cc#L638) was removed in GCC 3.0, but reintroduced (in the current form) in GCC 3.2 in response to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6518 .
[Bug libstdc++/109891] Null pointer special handling in ostream's operator << for C-strings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109891 --- Comment #3 from Michel Morin --- >From the safety point of view, I agree with you. But, at the same time, I thought that detectable UB (with the help of sanitizers) is useful than silent bug. How about `throw`ing as in std::string's constructor?
[Bug libstdc++/109891] Null pointer special handling in ostream's operator << for C-strings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109891 --- Comment #6 from Michel Morin --- True. Detectable is not correct — that's "maybe-detectable" at most, and the bug is not silent. In a code that I checked, the buggy code (`std::cout << NullCharPtr;`) is the last printing call to std::cout, so I failed to see the side-effect. The patchlet using `_GLIBCXX_DEBUG_PEDASSERT` works fine. Actually I would like `_GLIBCXX_DEBUG_ASSERT` (because I've been using `_GLIBCXX_DEBUG` but never `_GLIBCXX_DEBUG_PEDANTIC`), but I guess using `_GLIBCXX_DEBUG_PEDASSERT` rather than `_GLIBCXX_DEBUG_ASSERT` in this case is a delibarate choice.
[Bug libstdc++/109891] Null pointer special handling in ostream's operator << for C-strings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109891 --- Comment #9 from Michel Morin --- > (which even mentions the std::string((const char*)nullptr) case): > https://gcc.gnu.org/onlinedocs/libstdc++/manual/debug_mode_semantics.html Oh, that's good to know. Understood that PEDASSERT fits better. > can we add a "pednonnull" attribute or something to produce a -Wnonnull > warning like the nonnull attribute but w/o affecting code generation as well? I think such an attribute (like Clang's _Nonnull) would be a nice addition. So I grepped Nonnull on libc++, but strangely there are __no__ uses of _Nonnull/__nonnull. I only found a few __gnu__::__nonnull__ in __memory_resource/memory_resource.h. In libc++, std::string constructors have assertions for nullptr check, but there are no attributes.
[Bug libstdc++/110190] New: regex: incorrect match results on DFA engines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110190 Bug ID: 110190 Summary: regex: incorrect match results on DFA engines Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: mimomorin at gmail dot com Target Milestone: --- libstdc++ makes incorrect matches with the sample code in https://en.cppreference.com/w/cpp/regex/syntax_option_type . (Though the description of the "leftmost longest rule" is not correct in that page, their expected results are fine). Here is a slightly shorter version: #include #include #include int main() { std::string text = "regexp"; std::regex re(".*(ex|gexp)", std::regex::extended); std::smatch m; std::regex_search(text, m, re); std::cout << m[0] << '\n'; // => should be "regexp" on DFA engines } This should print "regexp", but libstdc++ prints "regex". (libc++ works fine.)
[Bug libstdc++/102259] ifstream::read(…, count) fails when count >= 2^31 on darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102259 --- Comment #11 from Michel Morin --- Brilliant, I appreciate it! I tested with an 8 GB file and confirmed that this fixes the issue on both Intel and Apple silicon Macs.
[Bug libstdc++/102259] ifstream::read(…, count) fails when count >= 2^31 on darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102259 --- Comment #14 from Michel Morin --- Thanks, the committed version works fine too. Note that `read` will fail when n > INT_MAX (without equality), so we can define _GLIBCXX_MAX_READ_SIZE simply as __INT_MAX__.
[Bug libstdc++/102259] ifstream::read(…, count) fails when count >= 2^31 on darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102259 --- Comment #15 from Michel Morin --- FreeBSD's `read` manpage has been updated recently: https://github.com/freebsd/freebsd-src/commit/3e95158 [2024-02-10] read.2: Describe debug.iosize_max_clamp … read() … will succeed unless: - The value nbytes is greater than INT_MAX. + The value nbytes is greater than SSIZE_MAX + (or greater than INT_MAX, if the sysctl debug.iosize_max_clamp is non-zero). Then, I checked the source code to find the related changes. It turns out that the manual hadn't been updated to reflect the code changes over ten years. The configuration `iosize_max_clamp` (default to 1) was added in FreeBSD ver. 10: https://github.com/freebsd/freebsd-src/commit/526d0bd [2012-02-21] Fix found places where uio_resid is truncated to int. The default was changed to 0 in FreeBSD ver. 11: https://github.com/freebsd/freebsd-src/commit/cd4dd44 [2013-10-15] By default, allow up to SSIZE_MAX i/o for non-devfs files. While the default becomes "don't clamp to INT_MAX", users can set `iosize_max_clamp` to 1 through sysctl. So I think applying the fix without conditioning on FreeBSD versions (i.e. the current fix) makes sense!
[Bug libstdc++/118162] New: ofstream::write(…, count) fails when count >= 2^31 on darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118162 Bug ID: 118162 Summary: ofstream::write(…, count) fails when count >= 2^31 on darwin Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: mimomorin at gmail dot com Target Milestone: --- Created attachment 59937 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59937&action=edit Testcase for ofstream::write(…, count >= 2^31) This is a companion PR to PR102259. On macOS, `ofstream::write(…, count >= 2^31)` fails without partial write. Here is the attached test case: #include #include int main() { auto buffer = new char[1LL << 31]; std::ofstream os{"2GiB.bin", std::ios::binary}; os.write(buffer, 1LL << 31); std::cout << os.good() << " (" << os.tellp() << " bytes)\n"; // Expected output: "1 (2147483648 bytes)" // Actual output on macOS 11 or newer: "0 (-1 bytes)" } Here are the manpages for `write` and `writev` in macOS and FreeBSD: [macOS manpage] https://keith.github.io/xcode-man-pages/write.2.html … write() … will fail and the file pointer will remain unchanged if: The value provided for nbyte exceeds INT_MAX. … writev() … may also return the following errors: The sum of the iov_len values in the iov array overflows a 32-bit integer. [FreeBSD manpage] https://man.freebsd.org/cgi/man.cgi?query=writev … write(), writev() … will fail and the file pointer will remain unchanged: The value nbytes is greater than SSIZE_MAX (or greater than INT_MAX, if the sysctl debug.iosize_max_clamp is non-zero). … writev() … may return one of the following errors: The sum of the iov_len values is greater than SSIZE_MAX (or greater than INT_MAX, if the sysctl debug.iosize_max_clamp is non-zero).
[Bug libstdc++/102259] ifstream::read(…, count) fails when count >= 2^31 on darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102259 --- Comment #18 from Michel Morin --- I tested on old mac systems, including the 32-bit version of MacOS X 10.5, and confirmed that the `read` syscall with count = INT_MAX does not trigger EINVAL. (Additionally, the same applies to the `write` syscall.)
[Bug libstdc++/118162] ofstream::write(…, count) fails when count >= 2^31 on darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118162 --- Comment #1 from Michel Morin --- Strictly speaking the `writev` syscall with large counts (and the attached testcase) succeed on macOS 10.xx. It seems that the restriction described in manpage ("… return the following errors … the sum of the iov_len values in the iov array overflows a 32-bit integer") is only implemented on macOS 11 and later. But I think applying the fix without conditioning on macOS versions is beneficial, since the `write` syscall has the restriction (i.e. "… will fail … if the value provided for nbyte exceeds INT_MAX") on any macOS versions.
[Bug libstdc++/102259] ifstream::read(…, count) fails when count >= 2^31 on darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102259 --- Comment #17 from Michel Morin --- > I thought I saw some docs saying >= INT_MAX fails, but maybe I'm wrong. > The Rust change uses INT_MAX - 1 The comment in the Rust code says On OSX ... by rejecting any read with a size larger than or equal to INT_MAX But at least on my tested mac systems (from 10.15 to 14) the read syscall and istream::read works fine for count = INT_MAX. If you'd like me to test it in the old mac (e.g. 10.7), please let me know (I can test it after the weekend).