В Tue, 15 Aug 2023 08:38:11 +0200 Tomas Kalibera <tomas.kalib...@gmail.com> пишет:
> As this was reported to be regression in 4.3, it is entirely possible > this change came with a regression (though a bit surprising we didn't > catch it earlier by testing), so it would be a great help if I could > have the example and debug it. Sorry, let me try to be more clear. The Windows filename length limit is 255(?) wide characters. The WIN32_FIND_DATAA structure contains a 260-byte buffer for the filename to be returned by FindFirstFileA()/FindNextFileA(). If a wide character takes more than one byte to be represented in UTF-8, it may overflow the 260 byte limit in the WIN32_FIND_DATAA structure despite being below the 260 wide character limit. When such an overflow happens, FindNextFile() returns FALSE with GetLastError() == ERROR_MORE_DATA, which results in R_readdir() returning NULL and makes list_files() stop before listing the rest of the directory. This is easier to make happen by accident with Chinese characters, because they take three UTF-8 bytes per character. Take the ø (\uf8) letter. It takes two bytes to represent in UTF-8. Create a file with a name consisting of this symbol repeated 140 times. When you run list.files() on the resulting directory on Windows with a UTF-8 locale, Windows tries to fit (0xc3 0xb8) times 140 into a 260-byte buffer, which doesn't work. I'm afraid the only way to avoid such a failure is to rewrite R_readdir using the wide character API and convert the file names on the fly. (Just like mingw readdir() did in the past?) stopifnot(.Platform$OS.type == 'windows', l10n_info()$`UTF-8`) # any character for which nchar(enc2utf8(.), 'bytes') > 1 will do # any number >260/2 should do file.create(strrep('\uf8', 140)) list.files() Does this work? I don't have access to a UTF-8 Windows machine right now. -- Best regards, Ivan ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel