[PATCH] D26979: Do not hard-code locale data in unit tests: get it from the OS instead
vangyzen updated this revision to Diff 79322. vangyzen added a comment. Handle multibyte thousands_sep; do not reference possibly stale locale data https://reviews.llvm.org/D26979 Files: projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/grouping.pass.cpp projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp Index: projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp === --- projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp +++ projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp @@ -16,10 +16,8 @@ // char_type thousands_sep() const; -// TODO: investigation needed -// XFAIL: linux-gnu - #include +#include #include #include "platform_support.h" // locale name macros @@ -40,29 +38,45 @@ } } { +char expected; +wchar_t wexpected; +assert(std::setlocale(LC_ALL, LOCALE_en_US_UTF_8) != NULL); +const char *ts = std::localeconv()->thousands_sep; +expected = *ts; +int nb = mbtowc(&wexpected, ts, strlen(ts)+1); +assert(nb >= 0); +assert(std::setlocale(LC_ALL, "C") != NULL); std::locale l(LOCALE_en_US_UTF_8); { typedef char C; const std::numpunct& np = std::use_facet >(l); -assert(np.thousands_sep() == ','); +assert(np.thousands_sep() == expected); } { typedef wchar_t C; const std::numpunct& np = std::use_facet >(l); -assert(np.thousands_sep() == L','); +assert(np.thousands_sep() == wexpected); } } { +char expected; +wchar_t wexpected; +assert(std::setlocale(LC_ALL, LOCALE_fr_FR_UTF_8) != NULL); +const char *ts = std::localeconv()->thousands_sep; +expected = *ts; +int nb = mbtowc(&wexpected, ts, strlen(ts)+1); +assert(nb >= 0); +assert(std::setlocale(LC_ALL, "C") != NULL); std::locale l(LOCALE_fr_FR_UTF_8); { typedef char C; const std::numpunct& np = std::use_facet >(l); -assert(np.thousands_sep() == ','); +assert(np.thousands_sep() == expected); } { typedef wchar_t C; const std::numpunct& np = std::use_facet >(l); -assert(np.thousands_sep() == L','); +assert(np.thousands_sep() == wexpected); } } } Index: projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/grouping.pass.cpp === --- projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/grouping.pass.cpp +++ projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/grouping.pass.cpp @@ -16,10 +16,8 @@ // string grouping() const; -// TODO: investigation needed -// XFAIL: linux-gnu - #include +#include #include #include "platform_support.h" // locale name macros @@ -40,29 +38,37 @@ } } { +assert(std::setlocale(LC_NUMERIC, LOCALE_en_US_UTF_8) != NULL); +char *expected = strdup(std::localeconv()->grouping); +assert(std::setlocale(LC_NUMERIC, "C") != NULL); std::locale l(LOCALE_en_US_UTF_8); { typedef char C; const std::numpunct& np = std::use_facet >(l); -assert(np.grouping() == "\3\3"); +assert(np.grouping() == expected); } { typedef wchar_t C; const std::numpunct& np = std::use_facet >(l); -assert(np.grouping() == "\3\3"); +assert(np.grouping() == expected); } +free(expected); } { +assert(std::setlocale(LC_NUMERIC, LOCALE_fr_FR_UTF_8) != NULL); +char *expected = strdup(std::localeconv()->grouping); +assert(std::setlocale(LC_NUMERIC, "C") != NULL); std::locale l(LOCALE_fr_FR_UTF_8); { typedef char C; const std::numpunct& np = std::use_facet >(l); -assert(np.grouping() == "\x7F"); +assert(np.grouping() == expected); } { typedef wchar_t C; const std::numpunct& np = std::use_facet >(l); -assert(np.grouping() == "\x7F"); +assert(np.grouping() == expected); } +free(expected); } } ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D26979: Do not hard-code locale data in unit tests: get it from the OS instead
vangyzen added a comment. I'm glad you mentioned a multibyte thousands_sep, because it //is// multibyte in fr_FR.UTF-8 on FreeBSD 11. Specifically, it's a no-break space (U+00A0). https://reviews.llvm.org/D26979 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D26979: Do not hard-code locale data in unit tests: get it from the OS instead
vangyzen added inline comments. Comment at: projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/grouping.pass.cpp:42 +assert(std::setlocale(LC_NUMERIC, LOCALE_en_US_UTF_8) != NULL); +char *expected = strdup(std::localeconv()->grouping); +assert(std::setlocale(LC_NUMERIC, "C") != NULL); localeconv()->grouping can become invalid after we reset the locale to "C", so duplicate the string into a local. Comment at: projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp:79 const std::numpunct& np = std::use_facet >(l); -assert(np.thousands_sep() == L','); +assert(np.thousands_sep() == wexpected); } This assertion now fails for me. wexpected is 0xa0 (NBSP), which is correct. However, np.thousands_sep() is -62, which is 0xc2, which is the first byte of a UTF-8-encoded NBSP. It looks like the library isn't correctly handling multibyte thousands separators. https://reviews.llvm.org/D26979 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D27167: [libc++] Support multibyte decimal_point and thousands_sep
vangyzen created this revision. vangyzen added a subscriber: cfe-commits. Herald added a subscriber: emaste. Herald added a reviewer: EricWF. numpunct_byname assumed that decimal_point and thousands_sep were ASCII and simply copied the first byte from them. Add support for multibyte strings in these fields. I found this problem on FreeBSD 11, where thousands_sep in fr_FR.UTF-8 is a no-break space (U+00A0). https://reviews.llvm.org/D27167 Files: src/locale.cpp Index: src/locale.cpp === --- src/locale.cpp +++ src/locale.cpp @@ -4281,23 +4281,51 @@ { } +#if defined(__clang__) +#pragma clang diagnostic ignored "-Wmissing-braces" +#endif + void numpunct_byname::__init(const char* nm) { if (strcmp(nm, "C") != 0) { __locale_unique_ptr loc(newlocale(LC_ALL_MASK, nm, 0), freelocale); if (loc == nullptr) -__throw_runtime_error("numpunct_byname::numpunct_byname" -" failed to construct for " + string(nm)); +__throw_runtime_error("numpunct_byname::numpunct_byname" + " failed to construct for " + string(nm)); lconv* lc = __libcpp_localeconv_l(loc.get()); -if (*lc->decimal_point) -__decimal_point_ = *lc->decimal_point; -if (*lc->thousands_sep) -__thousands_sep_ = *lc->thousands_sep; +if (*lc->decimal_point) { +size_t len = strlen(lc->decimal_point); +mbstate_t mbs = {0}; +wchar_t wc; +size_t nb = __libcpp_mbrtowc_l(&wc, lc->decimal_point, len, &mbs, +loc.get()); +if (nb == len) { +__decimal_point_ = wc; +} else { +__throw_runtime_error("numpunct_byname: decimal_point" + " is not a valid multibyte character: " + + string(lc->decimal_point)); +} +} +if (*lc->thousands_sep) { +size_t len = strlen(lc->thousands_sep); +mbstate_t mbs = {0}; +wchar_t wc; +size_t nb = __libcpp_mbrtowc_l(&wc, lc->thousands_sep, len, &mbs, +loc.get()); +if (nb == len) { +__thousands_sep_ = wc; +} else { +__throw_runtime_error("numpunct_byname: thousands_sep" + " is not a valid multibyte character: " + + string(lc->thousands_sep)); +} +} __grouping_ = lc->grouping; -// locallization for truename and falsename is not available +// localization for truename and falsename is not available } } @@ -4861,10 +4889,6 @@ return result; } -#if defined(__clang__) -#pragma clang diagnostic ignored "-Wmissing-braces" -#endif - template <> wstring __time_get_storage::__analyze(char fmt, const ctype& ct) Index: src/locale.cpp === --- src/locale.cpp +++ src/locale.cpp @@ -4281,23 +4281,51 @@ { } +#if defined(__clang__) +#pragma clang diagnostic ignored "-Wmissing-braces" +#endif + void numpunct_byname::__init(const char* nm) { if (strcmp(nm, "C") != 0) { __locale_unique_ptr loc(newlocale(LC_ALL_MASK, nm, 0), freelocale); if (loc == nullptr) -__throw_runtime_error("numpunct_byname::numpunct_byname" -" failed to construct for " + string(nm)); +__throw_runtime_error("numpunct_byname::numpunct_byname" + " failed to construct for " + string(nm)); lconv* lc = __libcpp_localeconv_l(loc.get()); -if (*lc->decimal_point) -__decimal_point_ = *lc->decimal_point; -if (*lc->thousands_sep) -__thousands_sep_ = *lc->thousands_sep; +if (*lc->decimal_point) { +size_t len = strlen(lc->decimal_point); +mbstate_t mbs = {0}; +wchar_t wc; +size_t nb = __libcpp_mbrtowc_l(&wc, lc->decimal_point, len, &mbs, +loc.get()); +if (nb == len) { +__decimal_point_ = wc; +} else { +__throw_runtime_error("numpunct_byname: decimal_point" + " is not a valid multibyte character: " + + string(lc->decimal_point)); +} +} +if (*lc->thousands_sep) { +size_t len = strlen(lc->thousands_sep); +mbstate_t mbs = {0}; +wchar_t wc; +size_t nb = __libcpp_mbrtowc_l(&wc, lc->thousands_sep, len, &mbs, +loc.get()); +if (nb == len) { +__thousands_sep_ = wc; +} else { +__throw_runtime_error("numpunct_byname: thousands_sep
[PATCH] D26979: Do not hard-code locale data in unit tests: get it from the OS instead
vangyzen added inline comments. Comment at: projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp:79 const std::numpunct& np = std::use_facet >(l); -assert(np.thousands_sep() == L','); +assert(np.thousands_sep() == wexpected); } vangyzen wrote: > This assertion now fails for me. wexpected is 0xa0 (NBSP), which is correct. > However, np.thousands_sep() is -62, which is 0xc2, which is the first byte > of a UTF-8-encoded NBSP. It looks like the library isn't correctly handling > multibyte thousands separators. D27167 adds support for multibyte strings in decimal_point and thousands_sep. With that change, this unit test passes. https://reviews.llvm.org/D26979 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D27167: [libc++] Support multibyte decimal_point and thousands_sep
vangyzen added a comment. A unit test change in https://reviews.llvm.org/D26979 found this and will continue to test it (on some OSes). https://reviews.llvm.org/D27167 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D26920: [libc++] Add validation to Stage 2 of num_get
vangyzen updated this revision to Diff 79625. vangyzen added a comment. Herald added a subscriber: emaste. Restore support for a sign character preceding a "nan" I accidentally broke +nan and -nan. Add unit test cases for these, and update my change to support them. https://reviews.llvm.org/D26920 Files: include/locale test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp Index: test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp === --- test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp +++ test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp @@ -136,6 +136,30 @@ assert(v == INFINITY); } { +const char str[] = "+Inf"; +hex(ios); +std::ios_base::iostate err = ios.goodbit; +input_iterator iter = +f.get(input_iterator(str), + input_iterator(str+sizeof(str)), + ios, err, v); +assert(iter.base() == str+sizeof(str)-1); +assert(err == ios.goodbit); +assert(v == INFINITY); +} +{ +const char str[] = "+iNF"; +hex(ios); +std::ios_base::iostate err = ios.goodbit; +input_iterator iter = +f.get(input_iterator(str), + input_iterator(str+sizeof(str)), + ios, err, v); +assert(iter.base() == str+sizeof(str)-1); +assert(err == ios.goodbit); +assert(v == INFINITY); +} +{ const char str[] = "-inf"; hex(ios); std::ios_base::iostate err = ios.goodbit; @@ -184,6 +208,30 @@ assert(std::isnan(v)); } { +const char str[] = "+NaN"; +hex(ios); +std::ios_base::iostate err = ios.goodbit; +input_iterator iter = +f.get(input_iterator(str), + input_iterator(str+sizeof(str)), + ios, err, v); +assert(iter.base() == str+sizeof(str)-1); +assert(err == ios.goodbit); +assert(std::isnan(v)); +} +{ +const char str[] = "-nAn"; +hex(ios); +std::ios_base::iostate err = ios.goodbit; +input_iterator iter = +f.get(input_iterator(str), + input_iterator(str+sizeof(str)), + ios, err, v); +assert(iter.base() == str+sizeof(str)-1); +assert(err == ios.goodbit); +assert(std::isnan(v)); +} +{ v = -1; const char str[] = "123_456_78_9;125"; std::ios_base::iostate err = ios.goodbit; @@ -208,6 +256,19 @@ assert(v == 2); } { +// http://cplusplus.github.io/LWG/lwg-active.html#2381 +v = -1; +const char str[] = "0x1a.bp+07p"; +std::ios_base::iostate err = ios.goodbit; +input_iterator iter = +f.get(input_iterator(str), + input_iterator(str+sizeof(str)), + ios, err, v); +assert(err == ios.goodbit); +assert(iter.base() == str+sizeof(str)-2); +assert(v == 0x1a.bp+07); +} +{ v = -1; const char str[] = "1.79779e+309"; // unrepresentable std::ios_base::iostate err = ios.goodbit; Index: include/locale === --- include/locale +++ include/locale @@ -505,17 +505,67 @@ return -1; } if (__x == 'x' || __x == 'X') -__exp = 'P'; +{ +unsigned __sign = (*__a == '-') + (*__a == '+'); +if (__a_end == __a + __sign + 1 && __a[__sign] == '0') +__exp = 'P'; +else +return -1; +} else if ((__x & 0x5F) == __exp) { +if (__exp & 0x80) +return -1; __exp |= 0x80; if (__in_units) { __in_units = false; if (__grouping.size() != 0 && __g_end-__g < __num_get_buf_sz) *__g_end++ = __dc; } } +else if (__f > 9) +{ +if (__x == 'P' || __x == 'p') +return -1; +unsigned __sign = (*__a == '-') + (*__a == '+'); +if (__x == 'I' || __x == 'i') +{ +if (__a_end != __a + __sign)// start of Inf +return -1; +} +else if (__x == 'N' || __x == 'n') +{ +bool __ok = +(__a_end == __a + __sign) // start of NaN +|| +(__a_end == __a + __sign + 2 && // end of NaN +(__a[__sign ] & 0x5F) == 'N' && +(__a[__sign+1] & 0x5F) == 'A') +|| +(__a_end == __a + __sign + 1 && // middle of Inf +(__a[__sign]
[PATCH] D26979: Do not hard-code locale data in unit tests: get it from the OS instead
vangyzen added a comment. @EricWF Do you have time and interest to review this again? https://reviews.llvm.org/D26979 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D26979: Do not hard-code locale data in unit tests: get it from the OS instead
vangyzen added a comment. > @EricWF Do you have time and interest to review this again? I should ask, do you have time and interest to //commit// this? https://reviews.llvm.org/D26979 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D24218: [libc++] Fix support for multibyte thousands_sep and decimal_point in moneypunct_byname and numpunct_byname.
vangyzen added inline comments. Comment at: src/locale.cpp:4339 if (loc == nullptr) __throw_runtime_error("numpunct_byname::numpunct_byname" " failed to construct for " + string(nm)); While you're here, you might change `char` to `wchar_t`. Comment at: src/locale.cpp:4348 __grouping_ = lc->grouping; // locallization for truename and falsename is not available } Maybe fix the spelling of "localization". https://reviews.llvm.org/D24218 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D27167: [libc++] Support multibyte decimal_point and thousands_sep
vangyzen abandoned this revision. vangyzen added a comment. I'm abandoning this revision because https://reviews.llvm.org/D24218 is clearly superior. https://reviews.llvm.org/D27167 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits