[Bug libstdc++/45896] [C++0x] Facet time_get not reading dates according to the IEEE 1003 standard.

redi at gcc dot gnu.org via Gcc-bugs Tue, 24 Sep 2024 12:47:36 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45896


--- Comment #13 from Jonathan Wakely <redi at gcc dot gnu.org> ---
I noticed that time_get::_M_extract_via_format just ignores any O or E
modifier:

              if (__c == 'E' || __c == 'O')
                __c = __ctype.narrow(__format[++__i], 0);

So we treat %Ec exactly like %c and only use the D_T_FMT string, instead of
ERA_D_T_FMT:

                case 'c':
                  // Default time and date representation.
                  const char_type*  __dt[2];
                  __tp._M_date_time_formats(__dt);
                  __beg = _M_extract_via_format(__beg, __end, __io, __tmperr, 
                                                __tm, __dt[0], __state);

We should use __dt[1] when the modifier was 'E'.

The commit msg for Jakub's patch mentions these other unfixed things:

1) seems %j, %r, %U, %w and %W aren't handled (not sure if all of them
   are already in POSIX 2009 or some are later)
2) I haven't touched the %y/%Y/%C and year handling stuff, that is
   definitely not matching what POSIX 2009 says:
       C       All  but the last two digits of the year {2}; leading zeros
shall be permitted but shall not be required. A leading '+' or '−' character
shall be permitted before
               any leading zeros but shall not be required.
       y       The  last  two  digits of the year. When format contains neither
a C conversion specifier nor a Y conversion specifier, values in the range
[69,99] shall refer to
               years 1969 to 1999 inclusive and values in the range [00,68]
shall refer to years 2000 to 2068 inclusive; leading zeros shall be permitted
but shall  not  be  re‐
               quired. A leading '+' or '−' character shall be permitted before
any leading zeros but shall not be required.

               Note:     It is expected that in a future version of this
standard the default century inferred from a 2-digit year will change. (This
would apply to all commands
                         accepting a 2-digit year as input.)
       Y       The full year {4}; leading zeros shall be permitted but shall
not be required. A leading '+' or '−' character shall be permitted  before  any
 leading  zeros  but
               shall not be required.
   I've tried to avoid making changes to _M_extract_num for these as well
   to keep current status quo (the __len == 4 cases).  One thing is what
   to do for things with %C %y and/or %Y in the formats, another thing
   is what to do in the methods that directly perform _M_extract_num
   for year
3) the above question what to do for leading whitespace of any numbers
   being parsed
4) the %p%I issue mentioned above and generally what to do if we
   pass state and have finalizers at the end of parsing
5) _M_extract_via_format is also inconsistent with its callers on handling
   the non-whitespace characters in between format specifiers, the caller
   follows https://eel.is/c++draft/locale.time.get#members-8.6 and does
   case insensitive comparison:
          // TODO real case-insensitive comparison
          else if (__ctype.tolower(*__s) == __ctype.tolower(*__fmt) ||
                   __ctype.toupper(*__s) == __ctype.toupper(*__fmt))
   while _M_extract_via_format only compares exact characters:
              // Verify format and input match, extract and discard.
              if (__format[__i] == *__beg)
                ++__beg;
   (another question is if there is a better way how to do real
   case-insensitive comparison of 2 characters and whether we e.g. need
   to handle the Turkish i/İ and ı/I which have different number of bytes
   in UTF-8)
6) _M_extract_name does something weird for case-sensitivity,
      // NB: Some of the locale data is in the form of all lowercase
      // names, and some is in the form of initially-capitalized
      // names. Look for both.
      if (__beg != __end)
   and
            if (__c == __names[__i1][0]
                || __c == __ctype.toupper(__names[__i1][0]))
   for the first letter while just
        __name[__pos] == *__beg
   on all the following letters.  strptime says:
   In case a text string (such as the name of a day of the week or a month
   name) is to be matched, the comparison is case insensitive.
   so supposedly all the _M_extract_name comparisons should be case
   insensitive.

[Bug libstdc++/45896] [C++0x] Facet time_get not reading dates according to the IEEE 1003 standard.

Reply via email to