[PATCH] D106577: [clang] Define __STDC_ISO_10646__

James Y Knight via Phabricator via cfe-commits Tue, 27 Jul 2021 08:02:02 -0700

jyknight added a comment.

BTW, looks like the standard wording came from:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_273.htm

which indeed seems to suggest that the intent was to:

1. ensure that WCHAR_MAX is at least the maximum character actually defined so 
far by the standard (which in past versions was 0xffff, and in current versions 
is 0x10ffff).
2. ensure that for each of those characters defined by the standard, that it 
has the same numeric value stored in a wchar_t as the value the standard 
specifies.

In D106577#2906542 <https://reviews.llvm.org/D106577#2906542>, @rsmith wrote:

> The "old libc" case is for old versions of glibc that put the macro in 
> `features.h` instead of in `stdc-predef.h`. The macros in `stdc-predef.h` 
> aren't a problem until / unless we start auto-including that header.

The `features.h` header in every version of glibc since `stdc-predef.h` was 
split off has had `#include <stdc-predef.h>` in it. If the redefinition is a 
problem, it's still a problem in current versions.

> In D106577#2905755 <https://reviews.llvm.org/D106577#2905755>, @jyknight 
> wrote:
>
>> In D106577#2904960 <https://reviews.llvm.org/D106577#2904960>, @rsmith wrote:
>>
>>> One benefit we don't get with this approach is providing the right value 
>>> for the macro (without paying the cost of always including 
>>> `stdc-predefs.h`).
>>
>> What do you mean by "right value", though? As Aaron pointed out, the value 
>> seems only dependent upon what characters can fit into a wchar_t, which is 
>> independent of what unicode version the libc supports.
>
> I don't see how that follows from the definition in the C standard; it says 
> "every character in the Unicode required set, when stored in an object of 
> type `wchar_t`, has the same value as the short identifier of that 
> character". This doesn't say anything about character or string literals, and 
> for example `mbstowcs` stores characters in objects of type `wchar_t` too (it 
> "stores not more than `n` wide characters into the array pointed to by 
> `pwcs`"), so unless `mbstowcs` does the right thing I don't see how we can 
> claim support for a new Unicode standard version. 
> As far as I can tell, this macro is documenting a property of the complete 
> implementation (compiler plus standard library), and should be set to the 
> minimum of the version supported by the compiler and the version supported by 
> the stdlib. I think it's OK for the compiler to say it supports *any* 
> version, though, because we don't expect future Unicode versions to require 
> any changes on our part. But they may require standard library changes.

But that's exactly it -- there are no library OR compiler changes changes 
required to remain conformant with this property when a new standard version is 
released. The range of values wchar_t needs to represent won't change. Even 
considering mbstowcs, there's no problem because it will already do the right 
thing, with zero changes, no matter how many new characters are defined within 
that valid range of 0x0-0x10ffff -- assuming that it does store unicode ordinal 
values into wchar_t in the first place. UTF-8/16/32 encoding and decoding are 
agnostic to which characters have been defined.

Of course, the library does need to make certain other changes corresponding to 
a new version, e.g. updating the tables for iswalpha to return true for newly 
defined alphabetical characters, but that functionality seems irrelevant to 
this define.

> If Aaron's checked with WG14 and the intent is for this to only constrain how 
> literals are represented, and not the complete implementation, then I'm 
> entirely fine with us defining the macro ourselves. But that's not the 
> interpretation that several other vendors have taken. If we're confident that 
> the intent is just that this macro lists (effectively) the latest version of 
> the Unicode standard that we've heard of, we should let the various libc 
> vendors that currently define the macro know that they're doing it wrong and 
> the definition belongs in the compiler.

It's surely intended to cover the complete system, since the standard doesn't 
consider "compiler" vs "libc" as separate things, they're both just components 
of the "implementation". But as per above comments, I don't think that changes 
the conclusion here.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106577/new/

https://reviews.llvm.org/D106577

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106577: [clang] Define __STDC_ISO_10646__

Reply via email to