Initially it was a simple patch to make `btowc` and `wctob` match UCRT 
behavior. If do serious changes to `btowc` and `wctob`, I think we should also 
take a look at `mb*towc*` and `wc*tomb*` functions provided by mingw-w64.

I do not say and I do not think that we should replace `mb*towc*` and 
`wc*tomb*` functions for UCRT. What we can do is make sure that provided 
replacements match CRT's behavior (e.g. use lossy conversion and follow this 
strange "C" locale behavior).

At this point it would be easier to implement both `btowc` and `wctob` in terms 
of `mbrtowc` and `wcrtomb` respectively.

I suggest we start a new discussion in a new thread. I have some other details 
regarding CRT's locale support since I am currently working on code which 
implements POSIX locale functions on top of Win32 and CRT.

- Kirill Makurin

________________________________
From: LIU Hao
Sent: Saturday, June 14, 2025 8:55 PM
To: Kirill Makurin; mingw-w64-public
Subject: Re: [Mingw-w64-public] Inconsistent behavior of btowc with "C" locale

在 2025-6-8 00:21, Kirill Makurin 写道:
> I guess sticking to range [0,255] is our best choice.
>
> I attached patches.
>

Mostly these look good to me. However I get errors from libc++ testsuite:

    
https://github.com/lhmouse/mingw-w64/actions/runs/15650737822/job/44095645474#step:7:13365

which failed at this, which can by producedby installing mingw-w64 CRT with the 
first patch and compiling
the testcase with `clang++ -static`:

    std::locale l;
    typedef std::ctype_byname<wchar_t> F;
    std::locale ll(l, new F("C"));
    const F& f = std::use_facet<F>(ll);
    assert(f.widen(char(-5)) == L'\u00fb');


And here's backtrace:

    #0  0x00007ff657205139 in btowc (c=-5) at misc/btowc.c:16
    #1  0x00007ff6571fcd61 in std::__1::__locale::__btowc(int, 
std::__1::__locale::__locale_t) ()
    #2  0x00007ff6571dda9a in std::__1::ctype_byname<wchar_t>::do_widen(char) 
const ()
    #3  0x00007ff6571b19ac in 
std::__1::ctype<wchar_t>::widen[abi:ne200100](char) const (this=0x5b9c40,
__c=-5 '\373') at C:/MSYS64/clang64/include/c++/v1/__locale:490
    #4  0x00007ff6571b1884 in main () at test.cc:37


Here we can see the parameter `c` of type `int` is a sign-extension of the 
argument, so I think this

    if (cp == 0)
      return (unsigned) c <= 0xFF ? c : WEOF;

is being skeptical. What if we blindly truncate `c`, just like the code beneath 
it:

    if (cp == 0)
      return (unsigned char) c;



--
Best regards,
LIU Hao

_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to