[PATCH] D26979: Do not hard-code locale data in unit tests: get it from the OS instead

2016-11-25 Thread Eric van Gyzen via Phabricator via cfe-commits
vangyzen updated this revision to Diff 79322.
vangyzen added a comment.

Handle multibyte thousands_sep; do not reference possibly stale locale data


https://reviews.llvm.org/D26979

Files:
  
projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/grouping.pass.cpp
  
projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp

Index: projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp
===
--- projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp
+++ projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp
@@ -16,10 +16,8 @@
 
 // char_type thousands_sep() const;
 
-// TODO: investigation needed
-// XFAIL: linux-gnu
-
 #include 
+#include 
 #include 
 
 #include "platform_support.h" // locale name macros
@@ -40,29 +38,45 @@
 }
 }
 {
+char expected;
+wchar_t wexpected;
+assert(std::setlocale(LC_ALL, LOCALE_en_US_UTF_8) != NULL);
+const char *ts = std::localeconv()->thousands_sep;
+expected = *ts;
+int nb = mbtowc(&wexpected, ts, strlen(ts)+1);
+assert(nb >= 0);
+assert(std::setlocale(LC_ALL, "C") != NULL);
 std::locale l(LOCALE_en_US_UTF_8);
 {
 typedef char C;
 const std::numpunct& np = std::use_facet >(l);
-assert(np.thousands_sep() == ',');
+assert(np.thousands_sep() == expected);
 }
 {
 typedef wchar_t C;
 const std::numpunct& np = std::use_facet >(l);
-assert(np.thousands_sep() == L',');
+assert(np.thousands_sep() == wexpected);
 }
 }
 {
+char expected;
+wchar_t wexpected;
+assert(std::setlocale(LC_ALL, LOCALE_fr_FR_UTF_8) != NULL);
+const char *ts = std::localeconv()->thousands_sep;
+expected = *ts;
+int nb = mbtowc(&wexpected, ts, strlen(ts)+1);
+assert(nb >= 0);
+assert(std::setlocale(LC_ALL, "C") != NULL);
 std::locale l(LOCALE_fr_FR_UTF_8);
 {
 typedef char C;
 const std::numpunct& np = std::use_facet >(l);
-assert(np.thousands_sep() == ',');
+assert(np.thousands_sep() == expected);
 }
 {
 typedef wchar_t C;
 const std::numpunct& np = std::use_facet >(l);
-assert(np.thousands_sep() == L',');
+assert(np.thousands_sep() == wexpected);
 }
 }
 }
Index: projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/grouping.pass.cpp
===
--- projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/grouping.pass.cpp
+++ projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/grouping.pass.cpp
@@ -16,10 +16,8 @@
 
 // string grouping() const;
 
-// TODO: investigation needed
-// XFAIL: linux-gnu
-
 #include 
+#include 
 #include 
 
 #include "platform_support.h" // locale name macros
@@ -40,29 +38,37 @@
 }
 }
 {
+assert(std::setlocale(LC_NUMERIC, LOCALE_en_US_UTF_8) != NULL);
+char *expected = strdup(std::localeconv()->grouping);
+assert(std::setlocale(LC_NUMERIC, "C") != NULL);
 std::locale l(LOCALE_en_US_UTF_8);
 {
 typedef char C;
 const std::numpunct& np = std::use_facet >(l);
-assert(np.grouping() == "\3\3");
+assert(np.grouping() == expected);
 }
 {
 typedef wchar_t C;
 const std::numpunct& np = std::use_facet >(l);
-assert(np.grouping() == "\3\3");
+assert(np.grouping() == expected);
 }
+free(expected);
 }
 {
+assert(std::setlocale(LC_NUMERIC, LOCALE_fr_FR_UTF_8) != NULL);
+char *expected = strdup(std::localeconv()->grouping);
+assert(std::setlocale(LC_NUMERIC, "C") != NULL);
 std::locale l(LOCALE_fr_FR_UTF_8);
 {
 typedef char C;
 const std::numpunct& np = std::use_facet >(l);
-assert(np.grouping() == "\x7F");
+assert(np.grouping() == expected);
 }
 {
 typedef wchar_t C;
 const std::numpunct& np = std::use_facet >(l);
-assert(np.grouping() == "\x7F");
+assert(np.grouping() == expected);
 }
+free(expected);
 }
 }
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D26979: Do not hard-code locale data in unit tests: get it from the OS instead

2016-11-25 Thread Eric van Gyzen via Phabricator via cfe-commits
vangyzen added a comment.

I'm glad you mentioned a multibyte thousands_sep, because it //is// multibyte 
in fr_FR.UTF-8 on FreeBSD 11.  Specifically, it's a no-break space (U+00A0).


https://reviews.llvm.org/D26979



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D26979: Do not hard-code locale data in unit tests: get it from the OS instead

2016-11-25 Thread Eric van Gyzen via Phabricator via cfe-commits
vangyzen added inline comments.



Comment at: 
projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/grouping.pass.cpp:42
+assert(std::setlocale(LC_NUMERIC, LOCALE_en_US_UTF_8) != NULL);
+char *expected = strdup(std::localeconv()->grouping);
+assert(std::setlocale(LC_NUMERIC, "C") != NULL);

localeconv()->grouping can become invalid after we reset the locale to "C", so 
duplicate the string into a local.



Comment at: 
projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp:79
 const std::numpunct& np = std::use_facet >(l);
-assert(np.thousands_sep() == L',');
+assert(np.thousands_sep() == wexpected);
 }

This assertion now fails for me.  wexpected is 0xa0 (NBSP), which is correct.  
However, np.thousands_sep() is -62, which is 0xc2, which is the first byte of a 
UTF-8-encoded NBSP.  It looks like the library isn't correctly handling 
multibyte thousands separators.


https://reviews.llvm.org/D26979



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D27167: [libc++] Support multibyte decimal_point and thousands_sep

2016-11-28 Thread Eric van Gyzen via Phabricator via cfe-commits
vangyzen created this revision.
vangyzen added a subscriber: cfe-commits.
Herald added a subscriber: emaste.
Herald added a reviewer: EricWF.

numpunct_byname assumed that decimal_point and thousands_sep
were ASCII and simply copied the first byte from them.  Add support
for multibyte strings in these fields.

I found this problem on FreeBSD 11, where thousands_sep in fr_FR.UTF-8
is a no-break space (U+00A0).


https://reviews.llvm.org/D27167

Files:
  src/locale.cpp


Index: src/locale.cpp
===
--- src/locale.cpp
+++ src/locale.cpp
@@ -4281,23 +4281,51 @@
 {
 }
 
+#if defined(__clang__)
+#pragma clang diagnostic ignored "-Wmissing-braces"
+#endif
+
 void
 numpunct_byname::__init(const char* nm)
 {
 if (strcmp(nm, "C") != 0)
 {
 __locale_unique_ptr loc(newlocale(LC_ALL_MASK, nm, 0), freelocale);
 if (loc == nullptr)
-__throw_runtime_error("numpunct_byname::numpunct_byname"
-" failed to construct for " + string(nm));
+__throw_runtime_error("numpunct_byname::numpunct_byname"
+  " failed to construct for " + string(nm));
 
 lconv* lc = __libcpp_localeconv_l(loc.get());
-if (*lc->decimal_point)
-__decimal_point_ = *lc->decimal_point;
-if (*lc->thousands_sep)
-__thousands_sep_ = *lc->thousands_sep;
+if (*lc->decimal_point) {
+size_t len = strlen(lc->decimal_point);
+mbstate_t mbs = {0};
+wchar_t wc;
+size_t nb = __libcpp_mbrtowc_l(&wc, lc->decimal_point, len, &mbs,
+loc.get());
+if (nb == len) {
+__decimal_point_ = wc;
+} else {
+__throw_runtime_error("numpunct_byname: decimal_point"
+  " is not a valid multibyte character: " +
+  string(lc->decimal_point));
+}
+}
+if (*lc->thousands_sep) {
+size_t len = strlen(lc->thousands_sep);
+mbstate_t mbs = {0};
+wchar_t wc;
+size_t nb = __libcpp_mbrtowc_l(&wc, lc->thousands_sep, len, &mbs,
+loc.get());
+if (nb == len) {
+__thousands_sep_ = wc;
+} else {
+__throw_runtime_error("numpunct_byname: thousands_sep"
+  " is not a valid multibyte character: " +
+  string(lc->thousands_sep));
+}
+}
 __grouping_ = lc->grouping;
-// locallization for truename and falsename is not available
+// localization for truename and falsename is not available
 }
 }
 
@@ -4861,10 +4889,6 @@
 return result;
 }
 
-#if defined(__clang__)
-#pragma clang diagnostic ignored "-Wmissing-braces"
-#endif
-
 template <>
 wstring
 __time_get_storage::__analyze(char fmt, const ctype& ct)


Index: src/locale.cpp
===
--- src/locale.cpp
+++ src/locale.cpp
@@ -4281,23 +4281,51 @@
 {
 }
 
+#if defined(__clang__)
+#pragma clang diagnostic ignored "-Wmissing-braces"
+#endif
+
 void
 numpunct_byname::__init(const char* nm)
 {
 if (strcmp(nm, "C") != 0)
 {
 __locale_unique_ptr loc(newlocale(LC_ALL_MASK, nm, 0), freelocale);
 if (loc == nullptr)
-__throw_runtime_error("numpunct_byname::numpunct_byname"
-" failed to construct for " + string(nm));
+__throw_runtime_error("numpunct_byname::numpunct_byname"
+  " failed to construct for " + string(nm));
 
 lconv* lc = __libcpp_localeconv_l(loc.get());
-if (*lc->decimal_point)
-__decimal_point_ = *lc->decimal_point;
-if (*lc->thousands_sep)
-__thousands_sep_ = *lc->thousands_sep;
+if (*lc->decimal_point) {
+size_t len = strlen(lc->decimal_point);
+mbstate_t mbs = {0};
+wchar_t wc;
+size_t nb = __libcpp_mbrtowc_l(&wc, lc->decimal_point, len, &mbs,
+loc.get());
+if (nb == len) {
+__decimal_point_ = wc;
+} else {
+__throw_runtime_error("numpunct_byname: decimal_point"
+  " is not a valid multibyte character: " +
+  string(lc->decimal_point));
+}
+}
+if (*lc->thousands_sep) {
+size_t len = strlen(lc->thousands_sep);
+mbstate_t mbs = {0};
+wchar_t wc;
+size_t nb = __libcpp_mbrtowc_l(&wc, lc->thousands_sep, len, &mbs,
+loc.get());
+if (nb == len) {
+__thousands_sep_ = wc;
+} else {
+__throw_runtime_error("numpunct_byname: thousands_sep

[PATCH] D26979: Do not hard-code locale data in unit tests: get it from the OS instead

2016-11-28 Thread Eric van Gyzen via Phabricator via cfe-commits
vangyzen added inline comments.



Comment at: 
projects/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp:79
 const std::numpunct& np = std::use_facet >(l);
-assert(np.thousands_sep() == L',');
+assert(np.thousands_sep() == wexpected);
 }

vangyzen wrote:
> This assertion now fails for me.  wexpected is 0xa0 (NBSP), which is correct. 
>  However, np.thousands_sep() is -62, which is 0xc2, which is the first byte 
> of a UTF-8-encoded NBSP.  It looks like the library isn't correctly handling 
> multibyte thousands separators.
D27167 adds support for multibyte strings in decimal_point and thousands_sep.  
With that change, this unit test passes.


https://reviews.llvm.org/D26979



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D27167: [libc++] Support multibyte decimal_point and thousands_sep

2016-11-28 Thread Eric van Gyzen via Phabricator via cfe-commits
vangyzen added a comment.

A unit test change in https://reviews.llvm.org/D26979 found this and will 
continue to test it (on some OSes).


https://reviews.llvm.org/D27167



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D26920: [libc++] Add validation to Stage 2 of num_get

2016-11-29 Thread Eric van Gyzen via Phabricator via cfe-commits
vangyzen updated this revision to Diff 79625.
vangyzen added a comment.
Herald added a subscriber: emaste.

Restore support for a sign character preceding a "nan"

I accidentally broke +nan and -nan.  Add unit test cases for these,
and update my change to support them.


https://reviews.llvm.org/D26920

Files:
  include/locale
  
test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp

Index: test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp
===
--- test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp
+++ test/std/localization/locale.categories/category.numeric/locale.num.get/facet.num.get.members/get_double.pass.cpp
@@ -136,6 +136,30 @@
 assert(v == INFINITY);
 }
 {
+const char str[] = "+Inf";
+hex(ios);
+std::ios_base::iostate err = ios.goodbit;
+input_iterator iter =
+f.get(input_iterator(str),
+  input_iterator(str+sizeof(str)),
+  ios, err, v);
+assert(iter.base() == str+sizeof(str)-1);
+assert(err == ios.goodbit);
+assert(v == INFINITY);
+}
+{
+const char str[] = "+iNF";
+hex(ios);
+std::ios_base::iostate err = ios.goodbit;
+input_iterator iter =
+f.get(input_iterator(str),
+  input_iterator(str+sizeof(str)),
+  ios, err, v);
+assert(iter.base() == str+sizeof(str)-1);
+assert(err == ios.goodbit);
+assert(v == INFINITY);
+}
+{
 const char str[] = "-inf";
 hex(ios);
 std::ios_base::iostate err = ios.goodbit;
@@ -184,6 +208,30 @@
 assert(std::isnan(v));
 }
 {
+const char str[] = "+NaN";
+hex(ios);
+std::ios_base::iostate err = ios.goodbit;
+input_iterator iter =
+f.get(input_iterator(str),
+  input_iterator(str+sizeof(str)),
+  ios, err, v);
+assert(iter.base() == str+sizeof(str)-1);
+assert(err == ios.goodbit);
+assert(std::isnan(v));
+}
+{
+const char str[] = "-nAn";
+hex(ios);
+std::ios_base::iostate err = ios.goodbit;
+input_iterator iter =
+f.get(input_iterator(str),
+  input_iterator(str+sizeof(str)),
+  ios, err, v);
+assert(iter.base() == str+sizeof(str)-1);
+assert(err == ios.goodbit);
+assert(std::isnan(v));
+}
+{
 v = -1;
 const char str[] = "123_456_78_9;125";
 std::ios_base::iostate err = ios.goodbit;
@@ -208,6 +256,19 @@
 assert(v == 2);
 }
 {
+// http://cplusplus.github.io/LWG/lwg-active.html#2381
+v = -1;
+const char str[] = "0x1a.bp+07p";
+std::ios_base::iostate err = ios.goodbit;
+input_iterator iter =
+f.get(input_iterator(str),
+  input_iterator(str+sizeof(str)),
+  ios, err, v);
+assert(err == ios.goodbit);
+assert(iter.base() == str+sizeof(str)-2);
+assert(v == 0x1a.bp+07);
+}
+{
 v = -1;
 const char str[] = "1.79779e+309"; // unrepresentable
 std::ios_base::iostate err = ios.goodbit;
Index: include/locale
===
--- include/locale
+++ include/locale
@@ -505,17 +505,67 @@
 return -1;
 }
 if (__x == 'x' || __x == 'X')
-__exp = 'P';
+{
+unsigned __sign = (*__a == '-') + (*__a == '+');
+if (__a_end == __a + __sign + 1 && __a[__sign] == '0')
+__exp = 'P';
+else
+return -1;
+}
 else if ((__x & 0x5F) == __exp)
 {
+if (__exp & 0x80)
+return -1;
 __exp |= 0x80;
 if (__in_units)
 {
 __in_units = false;
 if (__grouping.size() != 0 && __g_end-__g < __num_get_buf_sz)
 *__g_end++ = __dc;
 }
 }
+else if (__f > 9)
+{
+if (__x == 'P' || __x == 'p')
+return -1;
+unsigned __sign = (*__a == '-') + (*__a == '+');
+if (__x == 'I' || __x == 'i')
+{
+if (__a_end != __a + __sign)// start of Inf
+return -1;
+}
+else if (__x == 'N' || __x == 'n')
+{
+bool __ok =
+(__a_end == __a + __sign)   // start of NaN
+||
+(__a_end == __a + __sign + 2 && // end of NaN
+(__a[__sign  ] & 0x5F) == 'N' &&
+(__a[__sign+1] & 0x5F) == 'A')
+||
+(__a_end == __a + __sign + 1 && // middle of Inf
+(__a[__sign]

[PATCH] D26979: Do not hard-code locale data in unit tests: get it from the OS instead

2016-11-29 Thread Eric van Gyzen via Phabricator via cfe-commits
vangyzen added a comment.

@EricWF Do you have time and interest to review this again?


https://reviews.llvm.org/D26979



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D26979: Do not hard-code locale data in unit tests: get it from the OS instead

2016-11-30 Thread Eric van Gyzen via Phabricator via cfe-commits
vangyzen added a comment.

> @EricWF Do you have time and interest to review this again?

I should ask, do you have time and interest to //commit// this?


https://reviews.llvm.org/D26979



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D24218: [libc++] Fix support for multibyte thousands_sep and decimal_point in moneypunct_byname and numpunct_byname.

2016-12-05 Thread Eric van Gyzen via Phabricator via cfe-commits
vangyzen added inline comments.



Comment at: src/locale.cpp:4339
 if (loc == nullptr)
 __throw_runtime_error("numpunct_byname::numpunct_byname"
 " failed to construct for " + string(nm));

While you're here, you might change `char` to `wchar_t`.



Comment at: src/locale.cpp:4348
 __grouping_ = lc->grouping;
 // locallization for truename and falsename is not available
 }

Maybe fix the spelling of "localization".


https://reviews.llvm.org/D24218



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D27167: [libc++] Support multibyte decimal_point and thousands_sep

2016-12-05 Thread Eric van Gyzen via Phabricator via cfe-commits
vangyzen abandoned this revision.
vangyzen added a comment.

I'm abandoning this revision because https://reviews.llvm.org/D24218 is clearly 
superior.


https://reviews.llvm.org/D27167



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits