As discussed at http://stackoverflow.com/q/43769773/981959 (and kinda hinted at by http://wg21.link/lwg1200) there's a problem with char_traits<char16_t>::eof() because it returns int_type(-1) which is the same value as u'\uFFFF', a valid UTF-16 code point.
i.e. because all values of int_type are also valid values of char_type we cannot meet the requirement that: "The member eof() shall return an implementation-defined constant that cannot appear as a valid UTF-16 code unit." I've reported this as a defect, suggesting that the wording above needs to change. One consequence is that basic_streambuf<char16_t>::sputc(u'\uFFFF') always returns the same value, whether it succeeds or not. On success it returns to_int_type(u'\uFFFF') and on failure it returns eof(), which is the same value. I think that can be solved with the attached change, which preserves the invariant in [char.traits.require] that eof() returns: "a value e such that X::eq_int_type(e,X::to_int_type(c)) is false for all values c." This can be true if we ensure that to_int_type never returns the eof() value. http://www.unicode.org/faq/private_use.html#nonchar10 suggests doing something like this. It means that when writing u'\uFFFF' to a streambuf we write that character successfully, but return u'\uFFFD' instead; and when reading u'\uFFFF' from a streambuf we return u'\uFFFD' instead. This is asymmetrical, as we can write that character but not read it back. It might be better to refuse to write u'\uFFFF' and write it as the replacement character instead, but I think I prefer to write the right character when possible. It also doesn't require any extra changes. All tests pass with this, does anybody see any problems with this approach?
commit 8ab705e4920e933d3b0e90fd004b93d89aab8619 Author: Jonathan Wakely <jwak...@redhat.com> Date: Fri May 5 16:57:07 2017 +0100 PR libstdc++/80624 satisfy invariant for char_traits<char16_t>::eof() PR libstdc++/80624 * doc/xml/manual/status_cxx2011.xml: Document to_int_type behaviour. * include/bits/char_traits.h (char_traits<char16_t>::to_int_type): Transform eof value to U+FFFD. * testsuite/21_strings/char_traits/requirements/char16_t/eof.cc: New. * testsuite/27_io/basic_streambuf/sgetc/char16_t/80624.cc: New. * testsuite/27_io/basic_streambuf/sputc/char16_t/80624.cc: New. diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml index 705f2ee..0fa4bc0 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml @@ -2630,6 +2630,10 @@ particular release. <classname>u32streampos</classname> are both synonyms for <classname>fpos<mbstate_t></classname>. The function <function>eof</function> returns <code>int_type(-1)</code>. + <function>char_traits<char16_t>::to_int_type</function> will + transform the "noncharacter" U+FFFF to U+FFFD (REPLACEMENT CHARACTER). + This is done to ensure that <function>to_int_type</function> never + returns the same value as <function>eof</function>, which is U+FFFF. </para> <para> diff --git a/libstdc++-v3/include/bits/char_traits.h b/libstdc++-v3/include/bits/char_traits.h index 75db5b8..f19120b 100644 --- a/libstdc++-v3/include/bits/char_traits.h +++ b/libstdc++-v3/include/bits/char_traits.h @@ -507,7 +507,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION static constexpr int_type to_int_type(const char_type& __c) noexcept - { return int_type(__c); } + { return __c == eof() ? int_type(0xfffd) : int_type(__c); } static constexpr bool eq_int_type(const int_type& __c1, const int_type& __c2) noexcept diff --git a/libstdc++-v3/testsuite/21_strings/char_traits/requirements/char16_t/eof.cc b/libstdc++-v3/testsuite/21_strings/char_traits/requirements/char16_t/eof.cc new file mode 100644 index 0000000..05def7f --- /dev/null +++ b/libstdc++-v3/testsuite/21_strings/char_traits/requirements/char16_t/eof.cc @@ -0,0 +1,31 @@ +// Copyright (C) 2017 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// <http://www.gnu.org/licenses/>. + +// { dg-do compile { target c++11 } } + +#include <string> + + +constexpr bool not_equal_to_eof(char16_t c) +{ + using T = std::char_traits<char16_t>; + return T::eq_int_type(T::eof(), T::to_int_type(c)) == false; +} + +// Last two code points of the BMP are noncharacters: +static_assert(not_equal_to_eof(u'\uFFFE'), "U+FFFE compares unequal to eof"); +static_assert(not_equal_to_eof(u'\uFFFF'), "U+FFFF compares unequal to eof"); diff --git a/libstdc++-v3/testsuite/27_io/basic_streambuf/sgetc/char16_t/80624.cc b/libstdc++-v3/testsuite/27_io/basic_streambuf/sgetc/char16_t/80624.cc new file mode 100644 index 0000000..c08c227 --- /dev/null +++ b/libstdc++-v3/testsuite/27_io/basic_streambuf/sgetc/char16_t/80624.cc @@ -0,0 +1,56 @@ +// Copyright (C) 2017 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// <http://www.gnu.org/licenses/>. + +// { dg-do run { target c++11 } } + +#include <streambuf> +#include <testsuite_hooks.h> + +struct streambuf : std::basic_streambuf<char16_t> +{ + basic_streambuf* setbuf(char_type* s, std::streamsize n) override + { + setp(s, s + n); + setg(s, s, s + n); + return this; + } +}; + +void +test01() +{ + using traits = streambuf::traits_type; + + char16_t buf[2] = { streambuf::char_type(-1), streambuf::char_type(-2) }; + streambuf sb; + sb.pubsetbuf(buf, 2); + + streambuf::int_type res; + + res = sb.sbumpc(); + VERIFY( traits::eq_int_type(res, traits::eof()) == false ); + res = sb.sbumpc(); + VERIFY( traits::eq_int_type(res, traits::eof()) == false ); + res = sb.sbumpc(); + VERIFY( traits::eq_int_type(res, traits::eof()) == true ); +} + +int +main() +{ + test01(); +} diff --git a/libstdc++-v3/testsuite/27_io/basic_streambuf/sputc/char16_t/80624.cc b/libstdc++-v3/testsuite/27_io/basic_streambuf/sputc/char16_t/80624.cc new file mode 100644 index 0000000..a2a1917 --- /dev/null +++ b/libstdc++-v3/testsuite/27_io/basic_streambuf/sputc/char16_t/80624.cc @@ -0,0 +1,59 @@ +// Copyright (C) 2017 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// <http://www.gnu.org/licenses/>. + +// { dg-do run { target c++11 } } + +#include <streambuf> +#include <testsuite_hooks.h> + +struct streambuf : std::basic_streambuf<char16_t> +{ + basic_streambuf* setbuf(char_type* s, std::streamsize n) override + { + setp(s, s + n); + setg(s, s, s + n); + return this; + } +}; + +void +test01() +{ + using traits = streambuf::traits_type; + + char16_t buf[2] = { }; + streambuf sb; + sb.pubsetbuf(buf, 2); + + streambuf::int_type res; + + res = sb.sputc(streambuf::char_type(-1)); + VERIFY( traits::eq_int_type(res, traits::eof()) == false ); + res = sb.sputc(streambuf::char_type(-2)); + VERIFY( traits::eq_int_type(res, traits::eof()) == false ); + res = sb.sputc(streambuf::char_type(1)); + VERIFY( traits::eq_int_type(res, traits::eof()) == true ); + + VERIFY( buf[0] == streambuf::char_type(-1) ); + VERIFY( buf[1] == streambuf::char_type(-2) ); +} + +int +main() +{ + test01(); +}