On 2025-02-12 Martin Storsjö wrote: > (One could maybe even consider the struct to be a struct with an > unspecified length of the d_name field? That'd make it clearer to > callers that they'd better not try to allocate it themselves with a > static size.)
This is the intention in POSIX. d_name has an unspecified size and the sizeof operator shouldn't be used on it. The sizeof is mentioned in the "Rationale" section: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/dirent.h.html > On Thu, 16 Jan 2025, Lasse Collin wrote: > > > I wonder how big problem the ABI breakage in patches 6-7 is. If > > a library has DIR* or "struct dirent" in their public API & ABI, > > things break if such a pointer is passed between two components > > that were built with different dirent code, because each library > > or executable contains its own statically-linked copy. > > I think that practically, the risk for ABI breakage is very low - it > feels like a highly unusual thing to expose in the public API of a > component. It's definitely uncommon but such cases exist. One example is FLTK: https://packages.msys2.org/packages/mingw-w64-ucrt-x86_64-fltk https://mirror.msys2.org/mingw/ucrt64/mingw-w64-ucrt-x86_64-fltk-1.4.1-2-any.pkg.tar.zst From ucrt64/include/FL/platform_types.h: #if defined(_WIN32) && !defined(__MINGW32__) struct dirent {char d_name[1];}; #else # include <dirent.h> #endif ucrt64/include/FL/filename.H declares a few functions like fl_alphasort that take struct dirent** arguments. Since those are pointers to pointers, sizeof(struct dirent) doesn't matter, only the offset of d_name does. The code allocates its own struct dirent copies but it does it correctly so that sizeof(d_name) isn't used. > The main thing I could see happening is that something is exposing a > "struct dirent*" out of e.g. a DLL; this would be an issue if this > struct changes layout, but it mostly doesn't - it only grows, and as > long as the recipient doesn't use static copies of such structs and > only accesses the ones returned by readdir, this change should be > fine. Currently my upcoming new version changes struct dirent so that d_name moves. I guess should preserve the offset of d_name since it's possible. > > Reordering the members in the new DIR might make it fully compatible > > with the old closedir(). It might also be doable to make the old > > readdir() return NULL with zero errno. So if new opendir() was used, > > then old readdir() would falsely show the dir as empty and then > > closedir() would close it properly. Is that better or worse than > > a crash or other weird behavior? Compatibility the other way > > around I didn't think at all yet. None of this is attemped in > > this patchset. > > I think all of this is overkill. Good. :-) > As far as I see it, the primary concern is ease of reading the > implementation code, for verifying correctness. Right, especially since the dirent code is only a POSIX compatibility feature. My upcoming version doesn't have any <tchar.h> usage, and thus it no longer needs mingw-w64-crt/misc/wdirent.c. The #ifdefs in dirent.c are gone too. Overall it shouldn't be more hairy than the version you saw. The version on the list has a bug in the patch 6: Always using CP_ACP is wrong. Sometimes it needs to be CP_OEMCP or even CP_UTF8, and CP_UTF8 needs WC_ERR_INVALID_CHARS instead of WC_NO_BEST_FIT_CHARS. > I'm mostly ok with patches 1-4, the only thing I'd like to clarify is > the discussion about whether POSIX really promises to never touch > errno on success. My understanding is that it only promises that errno won't be explicitly set to 0. Other errno changes on success are fine unless documented differently for a specific function. If standard C or POSIX functions were allowed to set errno = 0, then strtoul() and readdir() likely would do that. Instead, they are documented so that caller must set errno = 0 to detect errors. > I remember there was a discussion earlier about e.g. the use of > WC_NO_BEST_FIT_CHARS to WideCharToMultiByte wrt support for older > versions of Windows - is this fine for us to use, or do we need to > try to only use it when supported? Thanks for the reminder. This needs to be verified. > As for allowing best-fit mapping; I don't think we should start > trying to have users set flag fields in the DIR structure (it's > supposed to be opaque anyway, as this patchset does). For an API like > this, the thing we provide should be reasonable out of the box > without extra configuration. I agree. I don't know if it makes sense but my upcoming version has an extra API, _readdir_8dot3, which tries to fallback to a 8.3 name if one exists before giving up and skipping the file (plus EILSEQ at the end of dir). It could make sense in use cases where the filenames themselves aren't important, and one just wants to read all files. It cannot be the default because programs that copy files or derive new filenames will then produce results like in the 1990s (longfi~1.txt). Maybe it's not likely to see much use in practice. It's not many lines of code, and the code is very simple to delete if it isn't wanted. > Other things to consider; some of the changes start relying on > _tfullpath doing malloc if the first parameter is null. Do older CRTs > also do this, or is this a feature that was introduced at some point? I don't know, but maybe it won't matter because my upcoming code calls GetFullPathNameW instead. That way no temporary buffer needs to be allocated. The new code uses FindFirstFileW instead of _wfindfirst. Direct Win32 API seems faster. Also, _wfindfirst has awkward errno usage which has to be worked around by calling GetLastError, so FindFirstFileW and FindNextFileW feel like a cleanup. The wide char APIs aren't available in Windows 95/98/ME. I hope it doesn't matter. The dirent code is called from mingw-w64-crt/misc/ftw.c which is also a POSIX compatibility thing. Nothing else in mingw-w64 itself seems to use the dirent or ftw code. > All in all; the patchset looks mostly ok I think. In general I'm not > a fan of added complexity (and this particular API doesn't strike me > as the most pressing part of the worstfit vulnerability), but the fix > looks mostly safe and straightforward as well. It definitely isn't a high priority thing. At first I thought it's not much to do, so why not do it... Of course it wasn't that simple, but at that point I didn't want to give up. There were also a few non-best-fit things to improve at the same time (like error handling). > I don't have a lot of practical code using these APIs as test case > though, so I hope you've done some practical sanity testing of it. I had tested these so that I felt confident enough (but CP_ACP bug was still there). The new code I want to test a bit more before sending it. I apologize for not properly communicating about the new developments after I had sent the current patches. -- Lasse Collin _______________________________________________ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public