On 2025-01-13 Lasse Collin wrote:
> On 2025-01-12 Paul Eggert wrote:
> > Also, the POSIX spec suggests that readdir should return a null
> > pointer right away with errno set, rather than wait for the end of
> > the directory. A subsequent readdir resumes traversal of the
> > directory, even after such an error. Doing it this nicer way should
> > avoid the need for the new label and goto, and it would also let the
> > caller count how many bad entries the directory has.  
> 
> Returning an error immediately makes the code slightly simpler. I
> wonder how many apps continue after any error though.

POSIX allows readdir() to buffer the input, thus I guess reordering the
failures until the end isn't strictly prohibited. The motivation was to
be friendlier to users of programs that don't handle EOVERFLOW
specially/correctly. With reordering they would see all other files. At
this point I have no opinion if it is a good or bad idea.

I attached two versions:

  - v2-simple.patch uses EOVERFLOW immediately as was discussed
    earlier.

  - v2-counting.patch counts the number of problematic file names. At
    the end of the directory, EOVERFLOW is set as many times as there
    were problematic file names. This way callers can count the number
    of problems.

I fixed a few a few bad bugs that were in the first patch:

  - The return value from GetFullPathNameW was mishandled, which could
    result in non-working mask string.

  - dirp->status was set to EOVERFLOW. It made readdir() set it
    infinitely at the end of the directory.

  - "return NULL;" was missing from one error path.

Thank you for the feedback! I learned a few things once again. Similar
issues in MinGW-w64's <dirent.h> implementation may be fixed at some
point, and the comments here might a mistake or two. So if it turns out
that my patches cannot be merged to Gnulib for now, this discussion has
still been valuable. Thanks!

-- 
Lasse Collin
From ee205ebf07c9546df14f5d514675305de652e2cc Mon Sep 17 00:00:00 2001
From: Lasse Collin <lasse.col...@tukaani.org>
Date: Mon, 13 Jan 2025 17:31:13 +0200
Subject: [PATCH] opendir, readdir, rewinddir: Fix issues on native Windows

(1) With legacy code pages (not UTF-8), certain non-ASCII
    Unicode characters in file names are converted in a lossy
    way ("best-fit mapping"). For example, if active code page
    is Windows-1252 and there is a file name with U+2044 in it,
    it becomes ASCII '/'. Thus, malicious file names result in
    directory traversal.

    Fix by reading the file names as wide chars and converting
    them losslessly to active code page. If lossless conversion
    isn't possible, skip the name. At the end of the directory,
    return NULL with errno set to EOVERFLOW as many times as
    there were skipped files.

    While EILSEQ could result in a better error message from strerror(),
    POSIX readdir() docs say that EOVERFLOW is the correct choice.
    Some apps like "ls" from GNU coreutils correctly continue reading
    the directory after EOVERFLOW but many apps don't. Delaying
    the error allows such apps to see all other file names still.
    The downside is delayed error detection in apps that will fail
    their task on any error anyway.

    The check for lossless conversion is useful even with the UTF-8
    code page because unpaired surrogates (invalid UTF-16) are
    converted to U+FFFD. It seems likely that file names with
    unpaired surrogates cannot be accessed via multibyte APIs.

    For more information, see the blog post by Orange Tsai and
    Splitline from the DEVCORE Research Team.[*]

(2) With UTF-8 code page, file names can be longer than MAX_PATH bytes.
    For example, a long name may fit in MAX_PATH when encoded as
    Windows-1252 and then readdir() returns it correctly. If the same
    name exceeds MAX_PATH as UTF-8, readdir() used to fail. Such
    file names can still be opened etc. as long as their wide char
    encoding doesn't exceed MAX_PATH.

(3) If application is declared as "long path aware" in an application
    manifest, then file names can be longer than MAX_PATH even in the
    native UTF-16 encoding (up to about 32767 wide chars). Fix by
    dynamically allocating the buffer for the absolute file name.

[*] https://devco.re/blog/2025/01/09/worstfit-unveiling-hidden-transformers-in-windows-ansi/
---
 lib/dirent-private.h |  33 +++++++---
 lib/opendir.c        | 141 +++++++++++++++++++++++++++----------------
 lib/readdir.c        |  45 ++++++++++----
 lib/rewinddir.c      |   7 +--
 4 files changed, 150 insertions(+), 76 deletions(-)

diff --git a/lib/dirent-private.h b/lib/dirent-private.h
index 22d847106a..42ddb0d0f1 100644
--- a/lib/dirent-private.h
+++ b/lib/dirent-private.h
@@ -37,10 +37,7 @@ struct gl_directory
 
 # define WIN32_LEAN_AND_MEAN
 # include <windows.h>
-
-/* Don't assume that UNICODE is not defined.  */
-# undef WIN32_FIND_DATA
-# define WIN32_FIND_DATA WIN32_FIND_DATAA
+# include <limits.h>
 
 struct gl_directory
 {
@@ -53,13 +50,35 @@ struct gl_directory
      0 means the entry needs to be filled using FindNextFile.
      A positive value is an error code.  */
   int status;
+  /* Number of file names that cannot be encoded in the active code page.
+     If the end of the directory is otherwise reached successfully, readdir()
+     sets errno to EOVERFLOW as many times as there were problematic files.  */
+  unsigned int problematic_names_count;
   /* Handle, reading the directory, at current position.  */
   HANDLE current;
   /* Found directory entry.  */
-  WIN32_FIND_DATA entry;
-  /* Argument to pass to FindFirstFile.  It consists of the absolutized
+  WIN32_FIND_DATAW entry;
+  /* readdir() returns a pointer to this struct. d_name is entry.cFileName
+     as multibyte string. A file name is at most 255 UTF-16 code units
+     (not code points) followed by the terminating null character. The
+     length as multibyte string depends on the active code page:
+       * In single-byte code pages like Windows-1252 it's 255 bytes
+         plus the terminating null character.
+       * UTF-8: The longest name uses 3-byte UTF-8 characters.
+         4-byte UTF-8 characters consume two UTF-16 code units, thus
+         3-byte chars result in the longest possible UTF-8 file name.
+       * GB18030: Some BMP chars like U+FFFD are four bytes.
+       * MB_LEN_MAX is 5 though. Use that to be certain that we can
+         handle all long names.
+     NOTE: With char-based APIs, file names with unpaired surrogates
+     (invalid UTF-16) are inaccessible even with the UTF-8 code page.  */
+  struct {
+    char d_type;
+    char d_name[255 * MB_LEN_MAX + 1];
+  } dirent_entry;
+  /* Argument to pass to FindFirstFileW.  It consists of the absolutized
      directory name, followed by a directory separator and the wildcards.  */
-  char dir_name_mask[1];
+  wchar_t dir_name_mask[1];
 };
 
 #endif
diff --git a/lib/opendir.c b/lib/opendir.c
index a121e1befa..b566926cc0 100644
--- a/lib/opendir.c
+++ b/lib/opendir.c
@@ -44,16 +44,6 @@
 # include <unistd.h>
 #endif
 
-#if defined _WIN32 && ! defined __CYGWIN__
-/* Don't assume that UNICODE is not defined.  */
-# undef WIN32_FIND_DATA
-# define WIN32_FIND_DATA WIN32_FIND_DATAA
-# undef GetFullPathName
-# define GetFullPathName GetFullPathNameA
-# undef FindFirstFile
-# define FindFirstFile FindFirstFileA
-#endif
-
 DIR *
 opendir (const char *dir_name)
 #undef opendir
@@ -90,10 +80,6 @@ opendir (const char *dir_name)
 
 #else
 
-  char dir_name_mask[MAX_PATH + 1 + 1 + 1];
-  int status;
-  HANDLE current;
-  WIN32_FIND_DATA entry;
   struct gl_directory *dirp;
 
   if (dir_name[0] == '\0')
@@ -102,36 +88,104 @@ opendir (const char *dir_name)
       return NULL;
     }
 
-  /* Make the dir_name absolute, so that we continue reading the same
-     directory if the current directory changed between this opendir()
-     call and a subsequent rewinddir() call.  */
-  if (!GetFullPathName (dir_name, MAX_PATH, dir_name_mask, NULL))
-    {
-      errno = EINVAL;
-      return NULL;
-    }
-
-  /* Append the mask.
-     "*" and "*.*" appear to be equivalent.  */
   {
-    char *p;
+    /* Convert dir_name to UTF-16. */
+    int wdir_name_size = MultiByteToWideChar (CP_ACP, MB_ERR_INVALID_CHARS,
+                                              dir_name, -1, NULL, 0);
+    if (wdir_name_size <= 0)
+      {
+        /* dir_name isn't a valid multibyte string. */
+        errno = EILSEQ;
+        return NULL;
+      }
+
+    wchar_t *wdir_name = malloc (wdir_name_size * sizeof (wchar_t));
+    if (wdir_name == NULL)
+      {
+        errno = ENOMEM;
+        return NULL;
+      }
+
+    if (MultiByteToWideChar (CP_ACP, MB_ERR_INVALID_CHARS,
+                             dir_name, -1, wdir_name, wdir_name_size)
+                            != wdir_name_size)
+      {
+        /* Conversion succeeded earlier so this should never be reached. */
+        free (wdir_name);
+        errno = EILSEQ;
+        return NULL;
+      }
+
+    /* Make the dir_name absolute, so that we continue reading the same
+       directory if the current directory changed between this opendir()
+       call and a subsequent rewinddir() call.
+
+       Long path aware applications need to support pathnames longer than
+       MAX_PATH. Call GetFullPathNameW() in a loop because the required
+       buffer size can change if something renames a file name component
+       between our GetFullPathNameW() calls.  */
+    dirp = NULL;
+    DWORD dir_name_mask_len = 0;
 
-    p = dir_name_mask + strlen (dir_name_mask);
-    if (p > dir_name_mask && !ISSLASH (p[-1]))
-      *p++ = '\\';
-    *p++ = '*';
-    *p = '\0';
+    while (true)
+      {
+        DWORD old_len = dir_name_mask_len;
+        dir_name_mask_len =
+          GetFullPathNameW (wdir_name, dir_name_mask_len,
+                            dirp == NULL ? NULL : dirp->dir_name_mask,
+                            NULL);
+        if (dir_name_mask_len == 0)
+          {
+            free (dirp);
+            free (wdir_name);
+            errno = EINVAL;
+            return NULL;
+          }
+
+         /* When there is enough space, dir_name_mask_len doesn't include
+            the terminating \0. When there isn't enough space, the \0 is
+            counted in dir_name_mask_len.  */
+        if (dir_name_mask_len < old_len)
+          break;
+
+        /* Here dir_name_mask_len includes the terminating \0.
+           We also need space to append the two characters "\*".  */
+        struct gl_directory *new_dirp =
+          realloc(dirp, offsetof (struct gl_directory, dir_name_mask[0])
+                        + (dir_name_mask_len + 1 + 1) * sizeof (wchar_t));
+        if (new_dirp == NULL)
+          {
+            free (dirp);
+            free (wdir_name);
+            errno = ENOMEM;
+            return NULL;
+          }
+
+        dirp = new_dirp;
+      }
+
+    free (wdir_name);
+
+    /* Append the mask.
+       "*" and "*.*" appear to be equivalent.
+       Because wchar_t is UTF-16, casting to int for ISSLASH is OK.  */
+    wchar_t *p = dirp->dir_name_mask + dir_name_mask_len;
+    if (p > dirp->dir_name_mask && !ISSLASH ((int) p[-1]))
+      *p++ = L'\\';
+    *p++ = L'*';
+    *p = L'\0';
   }
 
   /* Start searching the directory.  */
-  status = -1;
-  current = FindFirstFile (dir_name_mask, &entry);
-  if (current == INVALID_HANDLE_VALUE)
+  dirp->status = -1;
+  dirp->problematic_names_count = 0;
+  dirp->current = FindFirstFileW (dirp->dir_name_mask, &dirp->entry);
+  if (dirp->current == INVALID_HANDLE_VALUE)
     {
       switch (GetLastError ())
         {
         case ERROR_FILE_NOT_FOUND:
-          status = -2;
+          dirp->status = -2;
           break;
         case ERROR_PATH_NOT_FOUND:
           errno = ENOENT;
@@ -148,24 +202,7 @@ opendir (const char *dir_name)
         }
     }
 
-  /* Allocate the result.  */
-  dirp =
-    (struct gl_directory *)
-    malloc (offsetof (struct gl_directory, dir_name_mask[0])
-            + strlen (dir_name_mask) + 1);
-  if (dirp == NULL)
-    {
-      if (current != INVALID_HANDLE_VALUE)
-        FindClose (current);
-      errno = ENOMEM;
-      return NULL;
-    }
   dirp->fd_to_close = -1;
-  dirp->status = status;
-  dirp->current = current;
-  if (status == -1)
-    memcpy (&dirp->entry, &entry, sizeof (WIN32_FIND_DATA));
-  strcpy (dirp->dir_name_mask, dir_name_mask);
 
 #endif
 
diff --git a/lib/readdir.c b/lib/readdir.c
index 78225ec486..9961dc718a 100644
--- a/lib/readdir.c
+++ b/lib/readdir.c
@@ -26,10 +26,6 @@
 # include "dirent-private.h"
 #endif
 
-/* Don't assume that UNICODE is not defined.  */
-#undef FindNextFile
-#define FindNextFile FindNextFileA
-
 struct dirent *
 readdir (DIR *dirp)
 #undef readdir
@@ -38,7 +34,6 @@ readdir (DIR *dirp)
   return readdir (dirp->real_dirp);
 #else
   char type;
-  struct dirent *result;
 
   /* There is no need to add code to produce entries for "." and "..".
      According to the POSIX:2008 section "4.12 Pathname Resolution"
@@ -49,20 +44,31 @@ readdir (DIR *dirp)
         for dot and one entry shall be returned for dot-dot; otherwise,
         they shall not be returned."  */
 
+again:
   switch (dirp->status)
     {
     case -2:
       /* End of directory already reached.  */
+      if (dirp->problematic_names_count > 0)
+        {
+          --dirp->problematic_names_count;
+          errno = EOVERFLOW;
+        }
       return NULL;
     case -1:
       break;
     case 0:
-      if (!FindNextFile (dirp->current, &dirp->entry))
+      if (!FindNextFileW (dirp->current, &dirp->entry))
         {
           switch (GetLastError ())
             {
             case ERROR_NO_MORE_FILES:
               dirp->status = -2;
+              if (dirp->problematic_names_count > 0)
+                {
+                  --dirp->problematic_names_count;
+                  errno = EOVERFLOW;
+                }
               return NULL;
             default:
               errno = EIO;
@@ -98,12 +104,27 @@ readdir (DIR *dirp)
   else
     type = DT_UNKNOWN;
 
-  /* Reuse the memory of dirp->entry for the result.  */
-  result =
-    (struct dirent *)
-    ((char *) dirp->entry.cFileName - offsetof (struct dirent, d_name[0]));
-  result->d_type = type;
+  dirp->dirent_entry.d_type = type;
+
+  /* Try to convert to multibyte string.  */
+  BOOL conv_was_lossy = TRUE;
+  int conv_result = WideCharToMultiByte (CP_ACP, WC_NO_BEST_FIT_CHARS,
+                                         dirp->entry.cFileName, -1,
+                                         dirp->dirent_entry.d_name,
+                                         sizeof (dirp->dirent_entry.d_name),
+                                         NULL, &conv_was_lossy);
+  if (conv_result <= 0 || conv_was_lossy)
+    {
+      /* This file name cannot be encoded in the active code page.
+         Try to list the other file names still but remember how
+         many problematic names were seen. By delaying the character
+         set conversion errors until the end of the directory, apps
+         that don't handle EOVERFLOW specially can still see all
+         other file names in the directory.  */
+      ++dirp->problematic_names_count;
+      goto again;
+    }
 
-  return result;
+  return (struct dirent *) &dirp->dirent_entry;
 #endif
 }
diff --git a/lib/rewinddir.c b/lib/rewinddir.c
index 3fd31cdf4a..2046384c77 100644
--- a/lib/rewinddir.c
+++ b/lib/rewinddir.c
@@ -25,10 +25,6 @@
 # include "dirent-private.h"
 #endif
 
-/* Don't assume that UNICODE is not defined.  */
-#undef FindFirstFile
-#define FindFirstFile FindFirstFileA
-
 void
 rewinddir (DIR *dirp)
 #undef rewinddir
@@ -42,7 +38,8 @@ rewinddir (DIR *dirp)
 
   /* Like in opendir().  */
   dirp->status = -1;
-  dirp->current = FindFirstFile (dirp->dir_name_mask, &dirp->entry);
+  dirp->problematic_names_count = 0;
+  dirp->current = FindFirstFileW (dirp->dir_name_mask, &dirp->entry);
   if (dirp->current == INVALID_HANDLE_VALUE)
     {
       switch (GetLastError ())
-- 
2.47.1

From 41890316752dc94e978e70ada637f5047f3d06f0 Mon Sep 17 00:00:00 2001
From: Lasse Collin <lasse.col...@tukaani.org>
Date: Mon, 13 Jan 2025 17:31:46 +0200
Subject: [PATCH] opendir, readdir, rewinddir: Fix issues on native Windows

(1) With legacy code pages (not UTF-8), certain non-ASCII
    Unicode characters in file names are converted in a lossy
    way ("best-fit mapping"). For example, if active code page
    is Windows-1252 and there is a file name with U+2044 in it,
    it becomes ASCII '/'. Thus, malicious file names result in
    directory traversal.

    Fix by reading the file names as wide chars and converting
    them losslessly to active code page. If lossless conversion
    isn't possible, set errno to EOVERFLOW and return NULL. readdir()
    can then be called again to continue reading the directory.

    While EILSEQ could result in a better error message from strerror(),
    POSIX readdir() docs say that EOVERFLOW is the correct choice.
    Some apps like "ls" from GNU coreutils correctly continue reading
    the directory after EOVERFLOW.

    The check for lossless conversion is useful even with the UTF-8
    code page because unpaired surrogates (invalid UTF-16) are
    converted to U+FFFD. It seems likely that file names with
    unpaired surrogates cannot be accessed via multibyte APIs.

    For more information, see the blog post by Orange Tsai and
    Splitline from the DEVCORE Research Team.[*]

(2) With UTF-8 code page, file names can be longer than MAX_PATH bytes.
    For example, a long name may fit in MAX_PATH when encoded as
    Windows-1252 and then readdir() returns it correctly. If the same
    name exceeds MAX_PATH as UTF-8, readdir() used to fail. Such
    file names can still be opened etc. as long as their wide char
    encoding doesn't exceed MAX_PATH.

(3) If application is declared as "long path aware" in an application
    manifest, then file names can be longer than MAX_PATH even in the
    native UTF-16 encoding (up to about 32767 wide chars). Fix by
    dynamically allocating the buffer for the absolute file name.

[*] https://devco.re/blog/2025/01/09/worstfit-unveiling-hidden-transformers-in-windows-ansi/
---
 lib/dirent-private.h |  29 ++++++---
 lib/opendir.c        | 140 +++++++++++++++++++++++++++----------------
 lib/readdir.c        |  29 +++++----
 lib/rewinddir.c      |   6 +-
 4 files changed, 128 insertions(+), 76 deletions(-)

diff --git a/lib/dirent-private.h b/lib/dirent-private.h
index 22d847106a..9e458011bf 100644
--- a/lib/dirent-private.h
+++ b/lib/dirent-private.h
@@ -37,10 +37,7 @@ struct gl_directory
 
 # define WIN32_LEAN_AND_MEAN
 # include <windows.h>
-
-/* Don't assume that UNICODE is not defined.  */
-# undef WIN32_FIND_DATA
-# define WIN32_FIND_DATA WIN32_FIND_DATAA
+# include <limits.h>
 
 struct gl_directory
 {
@@ -56,10 +53,28 @@ struct gl_directory
   /* Handle, reading the directory, at current position.  */
   HANDLE current;
   /* Found directory entry.  */
-  WIN32_FIND_DATA entry;
-  /* Argument to pass to FindFirstFile.  It consists of the absolutized
+  WIN32_FIND_DATAW entry;
+  /* readdir() returns a pointer to this struct. d_name is entry.cFileName
+     as multibyte string. A file name is at most 255 UTF-16 code units
+     (not code points) followed by the terminating null character. The
+     length as multibyte string depends on the active code page:
+       * In single-byte code pages like Windows-1252 it's 255 bytes
+         plus the terminating null character.
+       * UTF-8: The longest name uses 3-byte UTF-8 characters.
+         4-byte UTF-8 characters consume two UTF-16 code units, thus
+         3-byte chars result in the longest possible UTF-8 file name.
+       * GB18030: Some BMP chars like U+FFFD are four bytes.
+       * MB_LEN_MAX is 5 though. Use that to be certain that we can
+         handle all long names.
+     NOTE: With char-based APIs, file names with unpaired surrogates
+     (invalid UTF-16) are inaccessible even with the UTF-8 code page.  */
+  struct {
+    char d_type;
+    char d_name[255 * MB_LEN_MAX + 1];
+  } dirent_entry;
+  /* Argument to pass to FindFirstFileW.  It consists of the absolutized
      directory name, followed by a directory separator and the wildcards.  */
-  char dir_name_mask[1];
+  wchar_t dir_name_mask[1];
 };
 
 #endif
diff --git a/lib/opendir.c b/lib/opendir.c
index a121e1befa..53d70c3068 100644
--- a/lib/opendir.c
+++ b/lib/opendir.c
@@ -44,16 +44,6 @@
 # include <unistd.h>
 #endif
 
-#if defined _WIN32 && ! defined __CYGWIN__
-/* Don't assume that UNICODE is not defined.  */
-# undef WIN32_FIND_DATA
-# define WIN32_FIND_DATA WIN32_FIND_DATAA
-# undef GetFullPathName
-# define GetFullPathName GetFullPathNameA
-# undef FindFirstFile
-# define FindFirstFile FindFirstFileA
-#endif
-
 DIR *
 opendir (const char *dir_name)
 #undef opendir
@@ -90,10 +80,6 @@ opendir (const char *dir_name)
 
 #else
 
-  char dir_name_mask[MAX_PATH + 1 + 1 + 1];
-  int status;
-  HANDLE current;
-  WIN32_FIND_DATA entry;
   struct gl_directory *dirp;
 
   if (dir_name[0] == '\0')
@@ -102,36 +88,103 @@ opendir (const char *dir_name)
       return NULL;
     }
 
-  /* Make the dir_name absolute, so that we continue reading the same
-     directory if the current directory changed between this opendir()
-     call and a subsequent rewinddir() call.  */
-  if (!GetFullPathName (dir_name, MAX_PATH, dir_name_mask, NULL))
-    {
-      errno = EINVAL;
-      return NULL;
-    }
-
-  /* Append the mask.
-     "*" and "*.*" appear to be equivalent.  */
   {
-    char *p;
+    /* Convert dir_name to UTF-16. */
+    int wdir_name_size = MultiByteToWideChar (CP_ACP, MB_ERR_INVALID_CHARS,
+                                              dir_name, -1, NULL, 0);
+    if (wdir_name_size <= 0)
+      {
+        /* dir_name isn't a valid multibyte string. */
+        errno = EILSEQ;
+        return NULL;
+      }
+
+    wchar_t *wdir_name = malloc (wdir_name_size * sizeof (wchar_t));
+    if (wdir_name == NULL)
+      {
+        errno = ENOMEM;
+        return NULL;
+      }
+
+    if (MultiByteToWideChar (CP_ACP, MB_ERR_INVALID_CHARS,
+                             dir_name, -1, wdir_name, wdir_name_size)
+                            != wdir_name_size)
+      {
+        /* Conversion succeeded earlier so this should never be reached. */
+        free (wdir_name);
+        errno = EILSEQ;
+        return NULL;
+      }
+
+    /* Make the dir_name absolute, so that we continue reading the same
+       directory if the current directory changed between this opendir()
+       call and a subsequent rewinddir() call.
+
+       Long path aware applications need to support pathnames longer than
+       MAX_PATH. Call GetFullPathNameW() in a loop because the required
+       buffer size can change if something renames a file name component
+       between our GetFullPathNameW() calls.  */
+    dirp = NULL;
+    DWORD dir_name_mask_len = 0;
 
-    p = dir_name_mask + strlen (dir_name_mask);
-    if (p > dir_name_mask && !ISSLASH (p[-1]))
-      *p++ = '\\';
-    *p++ = '*';
-    *p = '\0';
+    while (true)
+      {
+        DWORD old_len = dir_name_mask_len;
+        dir_name_mask_len =
+          GetFullPathNameW (wdir_name, dir_name_mask_len,
+                            dirp == NULL ? NULL : dirp->dir_name_mask,
+                            NULL);
+        if (dir_name_mask_len == 0)
+          {
+            free (dirp);
+            free (wdir_name);
+            errno = EINVAL;
+            return NULL;
+          }
+
+         /* When there is enough space, dir_name_mask_len doesn't include
+            the terminating \0. When there isn't enough space, the \0 is
+            counted in dir_name_mask_len.  */
+        if (dir_name_mask_len < old_len)
+          break;
+
+        /* Here dir_name_mask_len includes the terminating \0.
+           We also need space to append the two characters "\*".  */
+        struct gl_directory *new_dirp =
+          realloc(dirp, offsetof (struct gl_directory, dir_name_mask[0])
+                        + (dir_name_mask_len + 1 + 1) * sizeof (wchar_t));
+        if (new_dirp == NULL)
+          {
+            free (dirp);
+            free (wdir_name);
+            errno = ENOMEM;
+            return NULL;
+          }
+
+        dirp = new_dirp;
+      }
+
+    free (wdir_name);
+
+    /* Append the mask.
+       "*" and "*.*" appear to be equivalent.
+       Because wchar_t is UTF-16, casting to int for ISSLASH is OK.  */
+    wchar_t *p = dirp->dir_name_mask + dir_name_mask_len;
+    if (p > dirp->dir_name_mask && !ISSLASH ((int) p[-1]))
+      *p++ = L'\\';
+    *p++ = L'*';
+    *p = L'\0';
   }
 
   /* Start searching the directory.  */
-  status = -1;
-  current = FindFirstFile (dir_name_mask, &entry);
-  if (current == INVALID_HANDLE_VALUE)
+  dirp->status = -1;
+  dirp->current = FindFirstFileW (dirp->dir_name_mask, &dirp->entry);
+  if (dirp->current == INVALID_HANDLE_VALUE)
     {
       switch (GetLastError ())
         {
         case ERROR_FILE_NOT_FOUND:
-          status = -2;
+          dirp->status = -2;
           break;
         case ERROR_PATH_NOT_FOUND:
           errno = ENOENT;
@@ -148,24 +201,7 @@ opendir (const char *dir_name)
         }
     }
 
-  /* Allocate the result.  */
-  dirp =
-    (struct gl_directory *)
-    malloc (offsetof (struct gl_directory, dir_name_mask[0])
-            + strlen (dir_name_mask) + 1);
-  if (dirp == NULL)
-    {
-      if (current != INVALID_HANDLE_VALUE)
-        FindClose (current);
-      errno = ENOMEM;
-      return NULL;
-    }
   dirp->fd_to_close = -1;
-  dirp->status = status;
-  dirp->current = current;
-  if (status == -1)
-    memcpy (&dirp->entry, &entry, sizeof (WIN32_FIND_DATA));
-  strcpy (dirp->dir_name_mask, dir_name_mask);
 
 #endif
 
diff --git a/lib/readdir.c b/lib/readdir.c
index 78225ec486..a9a796bdbb 100644
--- a/lib/readdir.c
+++ b/lib/readdir.c
@@ -26,10 +26,6 @@
 # include "dirent-private.h"
 #endif
 
-/* Don't assume that UNICODE is not defined.  */
-#undef FindNextFile
-#define FindNextFile FindNextFileA
-
 struct dirent *
 readdir (DIR *dirp)
 #undef readdir
@@ -38,7 +34,6 @@ readdir (DIR *dirp)
   return readdir (dirp->real_dirp);
 #else
   char type;
-  struct dirent *result;
 
   /* There is no need to add code to produce entries for "." and "..".
      According to the POSIX:2008 section "4.12 Pathname Resolution"
@@ -57,7 +52,7 @@ readdir (DIR *dirp)
     case -1:
       break;
     case 0:
-      if (!FindNextFile (dirp->current, &dirp->entry))
+      if (!FindNextFileW (dirp->current, &dirp->entry))
         {
           switch (GetLastError ())
             {
@@ -98,12 +93,22 @@ readdir (DIR *dirp)
   else
     type = DT_UNKNOWN;
 
-  /* Reuse the memory of dirp->entry for the result.  */
-  result =
-    (struct dirent *)
-    ((char *) dirp->entry.cFileName - offsetof (struct dirent, d_name[0]));
-  result->d_type = type;
+  dirp->dirent_entry.d_type = type;
+
+  /* Try to convert to multibyte string.  */
+  BOOL conv_was_lossy = TRUE;
+  int conv_result = WideCharToMultiByte (CP_ACP, WC_NO_BEST_FIT_CHARS,
+                                         dirp->entry.cFileName, -1,
+                                         dirp->dirent_entry.d_name,
+                                         sizeof (dirp->dirent_entry.d_name),
+                                         NULL, &conv_was_lossy);
+  if (conv_result <= 0 || conv_was_lossy)
+    {
+      /* This file name cannot be encoded in the active code page.  */
+      errno = EOVERFLOW;
+      return NULL;
+    }
 
-  return result;
+  return (struct dirent *) &dirp->dirent_entry;
 #endif
 }
diff --git a/lib/rewinddir.c b/lib/rewinddir.c
index 3fd31cdf4a..45c2a1cd9f 100644
--- a/lib/rewinddir.c
+++ b/lib/rewinddir.c
@@ -25,10 +25,6 @@
 # include "dirent-private.h"
 #endif
 
-/* Don't assume that UNICODE is not defined.  */
-#undef FindFirstFile
-#define FindFirstFile FindFirstFileA
-
 void
 rewinddir (DIR *dirp)
 #undef rewinddir
@@ -42,7 +38,7 @@ rewinddir (DIR *dirp)
 
   /* Like in opendir().  */
   dirp->status = -1;
-  dirp->current = FindFirstFile (dirp->dir_name_mask, &dirp->entry);
+  dirp->current = FindFirstFileW (dirp->dir_name_mask, &dirp->entry);
   if (dirp->current == INVALID_HANDLE_VALUE)
     {
       switch (GetLastError ())
-- 
2.47.1

Reply via email to