在 2025-01-17 00:38, Lasse Collin 写道:
NAME_MAX is a POSIX constant; Windows doesn't define it. For this
reason, assume that NAME_MAX is only about filenames in multibyte
representation. In the UTF-8 code page, filenames can be up to
255 * 3 bytes excluding the terminating null character. (It's not
255 * 4 because four-byte UTF-8 characters consume two UTF-16
code units.)

If I have understood correctly, there is no Windows locale that
supports a code page with longer encodings. For example, a single
UTF-16 code unit may produce four bytes in GB18030 but it cannot
be used as a locale code page.

It seems so.

Although it is possible to pass GB 18030 (code page 54936) to `WideCharToMultiByte()`, Windows in Simplified Chinese uses GBK by default, which itself is a extension to GB/T 2312-1980 and reuses its identifier (code page 936).

While GBK includes many traditional Chinese and Japanese characters, it does not seem to support four-byte characters in GB 18030:


   UCRT64 ~/Desktop/t
   $ cat find_files.c
   #define WIN32_LEAN_AND_MEAN 1
   #include <windows.h>
   #include <stdio.h>

   int
   main(void)
     {
       printf("active code page = %d\n", GetACP());
       WIN32_FIND_DATAA file;
       HANDLE h = FindFirstFileA("*", &file);
       if(h != INVALID_HANDLE_VALUE) {
         do {
           int n = lstrlenA(file.cFileName);
           printf("found '%s': %d byte(s):", file.cFileName, n);
           for(int i = 0; i < n; ++i)
             printf(" %.2hhx", file.cFileName[i]);
           printf("\n");
         }
         while(FindNextFileA(h, &file));
         FindClose(h);
       }
      }

   UCRT64 ~/Desktop/t
   $ touch $'\uFFFF'   # four bytes in GB 18030:  84 31 a4 39

   UCRT64 ~/Desktop/t
   $ touch '测试文件'

   UCRT64 ~/Desktop/t
   $ touch '測試檔案'

   UCRT64 ~/Desktop/t
   $ touch 'テストファイル'

   UCRT64 ~/Desktop/t
   $ ls -l
   total 1
   -rw-r--r-- 1 lh_mouse lh_mouse   0 Jan 18 15:25 ''$'\357\277\277'
   -rw-r--r-- 1 lh_mouse lh_mouse 542 Jan 18 15:23  find_files.c
   -rw-r--r-- 1 lh_mouse lh_mouse   0 Jan 18 15:25  テストファイル
   -rw-r--r-- 1 lh_mouse lh_mouse   0 Jan 18 15:25  测试文件
   -rw-r--r-- 1 lh_mouse lh_mouse   0 Jan 18 15:25  測試檔案

   UCRT64 ~/Desktop/t
   $ gcc find_files.c  -o find_files.exe -Wall -Wextra

   UCRT64 ~/Desktop/t
   $ ./find_files.exe
   active code page = 936
   found '.': 1 byte(s): 2e
   found '..': 2 byte(s): 2e 2e
   found 'find_files.c': 12 byte(s): 66 69 6e 64 5f 66 69 6c 65 73 2e 63
   found 'find_files.exe': 14 byte(s): 66 69 6e 64 5f 66 69 6c 65 73 2e 65 78 65
   found 'テストファイル': 14 byte(s): a5 c6 a5 b9 a5 c8 a5 d5 a5 a1 a5 a4 a5 eb
   found '测试文件': 8 byte(s): b2 e2 ca d4 ce c4 bc fe
   found '測試檔案': 8 byte(s): 9c 79 d4 87 99 6e b0 b8
   found '?': 1 byte(s): 3f





--
Best regards,
LIU Hao

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to