Prof Brian Ripley wrote:
On Fri, 26 Jun 2009, Tony Plate wrote:

I find that Sys.glob() doesn't like UNC paths where the initial slashes are backslashes. The help page for Sys.glob() doesn't specificly mention UNC paths, but does say: "File paths in Windows are interpreted with separator \ or /." Is the failure to treat a path beginning with a double-backslash as a UNC network drive path the intended behavior?

Yes. There are general warnings about non-POSIX Windows paths in several of the help files.

The following comments should alert you to possible restrictions:

  The \code{glob} system call is not part of Windows, and we supply an
  emulation.

  File paths in Windows are interpreted with separator \code{\\} or
  \code{/}.  Paths with a drive but relative (such as \code{c:foo\\bar})
  are tricky, but an attempt is made to handle them correctly.

If you want to submit a well-tested patch, it will be considered.
The problem seems to be in the function dos_wglob() in src/gnuwin32/dos_wglob.c. This function treats backslashes as a escape characters when they precede one of the metacharacters []-{}~\. So, an initial double backslash is changed to an initial single backslash. Consequently, the existing code does see network drives when the prefix is 3 or 4 backslashes.

Here's a patch that adds special treatment for a prefix of exactly two backslashes so that Sys.glob() sees a network drive in this case:

/cygdrive/c/Rbuild/R-2.9.1/src/gnuwin32
$ diff -c dos_wglob.c~ dos_wglob.c
*** dos_wglob.c~        Sun Sep 21 16:05:28 2008
--- dos_wglob.c Mon Jun 29 12:09:47 2009
***************
*** 203,208 ****
--- 203,222 ----
       *bufnext++ = BG_SEP;
       patnext += 2;
     }
+     /* Hack to treat UNC network drive specification correctly:
+ * Without this code, '\\' (i.e., literally two backslashes in pattern)
+      * at the beginning of a path is not recognized as a network drive,
+ * because the GLOB_QUOTE loop below changes the two backslashes to one.
+      * So, in the case where there are two but not three backslashes at
+      * the beginning of the path, transfer these to the output.
+      */
+     if (patnext == pattern && bufend - bufnext > 2 &&
+       pattern[0] == BG_SEP2 && pattern[1] == BG_SEP2 &&
+       pattern[2] != BG_SEP2) {
+       *bufnext++ = pattern[0];
+       *bufnext++ = pattern[1];
+       patnext += 2;
+     }
 #endif

     if (flags & GLOB_QUOTE) {

*** end of patch

This changes behavior in the just the case where the prefix is two backslashes. With the fix, the behavior is:
> Sys.glob("\\jacona\\home\\tplate")
character(0)
> Sys.glob("\\\\jacona\\home\\tplate")
[1] "\\\\jacona\\home\\tplate"
> Sys.glob("\\\\\\jacona\\home\\tplate")
[1] "\\\\jacona\\home\\tplate"
> Sys.glob("\\\\\\\\jacona\\home\\tplate")
[1] "\\\\jacona\\home\\tplate"

Without the fix, the behavior is:
> Sys.glob("\\jacona\\home\\tplate")
character(0)
> Sys.glob("\\\\jacona\\home\\tplate")
character(0)
> Sys.glob("\\\\\\jacona\\home\\tplate")
[1] "\\\\jacona\\home\\tplate"
> Sys.glob("\\\\\\\\jacona\\home\\tplate")
[1] "\\\\jacona\\home\\tplate"


Here is a corresponding change to the docs:

tpl...@oberon /cygdrive/c/Rbuild/R-2.9.1/src/library/base/man
*** Sys.glob.Rd~        Thu Mar 19 17:05:24 2009
--- Sys.glob.Rd Mon Jun 29 13:52:57 2009
***************
*** 89,94 ****
--- 89,104 ----
   File paths in Windows are interpreted with separator \code{\\} or
   \code{/}.  Paths with a drive but relative (such as \code{c:foo\\bar})
   are tricky, but an attempt is made to handle them correctly.
+   Backslashes in paths are tricky because they can serve dual purposes:
+   meta-function remover and path separator.  As a result, single or
+   double backslashes can serve as path separators.  UNC network drive
+   paths specified with backslashes (such as \code{\\\\foo\\bar}) are
+   treated specially so that the network drive is found when the path
+   begins with two, three, or four backslashes (i.e., paths beginning
+   with \code{\\\\foo\\bar}, \code{\\\\\\foo\\bar}, and
+   \code{\\\\\\foo\\bar} all result in the same output).  UNC network
+   drive paths can also be specified with two forward slashes.
+
 #endif
 }
 \value{
***************
*** 117,122 ****
--- 127,138 ----
 \examples{
 \dontrun{
 Sys.glob(file.path(R.home(), "library", "*", "R", "*.rdx"))
+ # different ways of seeing the same network drive
+ Sys.glob("\\\\\\\\foo\\\\bar")
+ Sys.glob("\\\\\\\\foo\\\\\\\\bar")
+ Sys.glob("\\\\\\\\\\\\foo\\\\\\\\bar")
+ Sys.glob("\\\\\\\\\\\\\\\\foo\\\\\\\\bar")
+ Sys.glob("//foo/bar")
 }}
 \keyword{utilities}
 \keyword{file}

*** end of patch

R compiled with this fix passes 'make check-all' (or at least all the differences and warnings printed appear to be minor and unrelated to this change.)

I suspect that it is a matter of taste whether or not this "fix" is desirable.

The argument for it is that it helps Sys.glob() recognize standard UNC network drive specifications, which can begin with a double backslash. Paths of this form can be returned by some system calls, e.g., getwd().

The argument against it would be that it is more important that Sys.glob() consistently treats backslashes as an escape mechanism when preceding any of []-{}~\. If the latter argument is more forceful, then this should be documented in the help page for Sys.glob(), and callers of Sys.glob() should be careful not to pass it a UNC double-backslash prefix, e.g., as can returned by getwd() (an example of passing a path with UNC double-backslash prefix can occur in tools:::.writePkgIndices).

-- Tony Plate


E.g., on a Windows system where \\foo is a network drive and \\foo\bar exists, I see:

Sys.glob("//foo/bar")
[1] "//foo/bar"
Sys.glob("//foo\\bar")
[1] "//foo\\bar"
Sys.glob("\\\\foo/bar")
character(0)
Sys.glob("\\\\foo\\bar")

(the pattern of behavior seems to be that initial backslashes are not equivalent to forward slashes, but later backslashes are.)

This is not a big deal, but I noticed it because it results in Rcmd check giving a spurious warning when started from a cygwin shell with a working directory that is a network drive specified as a UNC path. This happens because mandir in tools:::.writePkgIndices has the form \\foo/bar/R/packages/mypkg/man, which results in the false warning "there is a 'man' dir but no help pages in this package." A simple workaround was to use a drive-letter mount for the network drive.

sessionInfo()
R version 2.9.1 (2009-06-26) i386-pc-mingw32 locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base


-- Tony Plate

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to