On 8/16/23 13:11, yu gong wrote: > a little more information for this issue. > Search in MS website today , found doc about "Maximum Path Length > Limitation", Maximum Path Length Limitation - Win32 apps | Microsoft > Learn > <https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry> > . > According the doc, need to do two things to avoid this issue on window > 10 and latter: > 1 edit registry or group policy set > HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem] > "LongPathsEnabled"=dword:00000001 > > 2 app manifest (R already done it)
These settings are for long paths (meaning a full path containing of multiple elements separated by backslashes), more about that is also in [1]. But the problem that Ivan reported (which is not clear whether it is the same problem as the one reported originally on this thread), is about the limit for a single file/directory name - that is, for a single element of a path. Having the long paths enabled in the registry wouldn't help with this. These two limits are not directly related, except the obvious: by choosing rather long names for individual files, one usually soon runs out of the limit for the full path. Best Tomas [1] - https://blog.r-project.org/2023/03/07/path-length-limit-on-windows/index.html > > Regards, > yu > > ------------------------------------------------------------------------ > *From:* R-devel <r-devel-boun...@r-project.org> on behalf of Tomas > Kalibera <tomas.kalib...@gmail.com> > *Sent:* Wednesday, August 16, 2023 15:42 > *To:* Ivan Krylov <krylov.r...@gmail.com> > *Cc:* r-devel@r-project.org <r-devel@r-project.org> > *Subject:* Re: [Rd] R-4.3 version list.files function could not work > correctly in chinese > > On 8/15/23 16:00, Tomas Kalibera wrote: > > > > On 8/15/23 09:04, Ivan Krylov wrote: > >> В Tue, 15 Aug 2023 08:38:11 +0200 > >> Tomas Kalibera <tomas.kalib...@gmail.com> пишет: > >> > >>> As this was reported to be regression in 4.3, it is entirely possible > >>> this change came with a regression (though a bit surprising we didn't > >>> catch it earlier by testing), so it would be a great help if I could > >>> have the example and debug it. > >> Sorry, let me try to be more clear. > >> > >> The Windows filename length limit is 255(?) wide characters. The > >> WIN32_FIND_DATAA structure contains a 260-byte buffer for the filename > >> to be returned by FindFirstFileA()/FindNextFileA(). If a wide character > >> takes more than one byte to be represented in UTF-8, it may overflow > >> the 260 byte limit in the WIN32_FIND_DATAA structure despite being > >> below the 260 wide character limit. When such an overflow happens, > >> FindNextFile() returns FALSE with GetLastError() == ERROR_MORE_DATA, > >> which results in R_readdir() returning NULL and makes list_files() stop > >> before listing the rest of the directory. > >> > >> This is easier to make happen by accident with Chinese characters, > >> because they take three UTF-8 bytes per character. > >> > >> Take the ø (\uf8) letter. It takes two bytes to represent in UTF-8. > >> Create a file with a name consisting of this symbol repeated 140 times. > >> When you run list.files() on the resulting directory on Windows with a > >> UTF-8 locale, Windows tries to fit (0xc3 0xb8) times 140 into a > >> 260-byte buffer, which doesn't work. I'm afraid the only way to avoid > >> such a failure is to rewrite R_readdir using the wide character API and > >> convert the file names on the fly. (Just like mingw readdir() did in > >> the past?) > >> > >> stopifnot(.Platform$OS.type == 'windows', l10n_info()$`UTF-8`) > >> # any character for which nchar(enc2utf8(.), 'bytes') > 1 will do > >> # any number >260/2 should do > >> file.create(strrep('\uf8', 140)) > >> list.files() > >> > >> Does this work? I don't have access to a UTF-8 Windows machine right > >> now. > > > > Thanks, yes, I can reproduce the problem. Some Windows functions > > impose 260 wide characters limit, but other 260 bytes limit, so one > > can create a file with a name too long to be found by FindNextFileA. > > > > In R 4.2, we used readdir() from mingw-w64, which itself used > > findnext, which however had the same problem, it used a buffer of size > > 260 bytes and from the code of mingw-w64 and the Windows > > documentation, it should have behaved the same, it should have stopped > > the search on such a long file name. However, in my use case, R 4.2.3 > > crashed inside findnext due to stack overrun, R 4.1.3 worked, but > > clearly it would require a different use case to overrun this buffer > > as it didn't use UTF-8. This suggests that findnext didn't have a > > check for this and hence caused memory corruption, which can lead to a > > crash or work by coincidence. Which could have been the case for the > > user reporting this as a regression compared to R 4.2. But it is not a > > regression, the problem existed for long. > > > > So, yes, we'd probably have to use wide variants of > > FindNext/FindFirst. I'll fix. > > Fixed in R-devel (84960). Please let me know if you see any problem with > the fix. > > Thanks, > Tomas > > > > > Thanks for debugging this, > > Tomas > > > > > > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > <https://stat.ethz.ch/mailman/listinfo/r-devel> [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel