On 29/06/2020 10:39 a.m., Johannes Rauh wrote:
Dear R Developers,

I noticed that `basename` and `dirname` always return "UTF-8" on Windows 
(tested with R-4.0.0 and R-3.6.3):

p <- "Föö/Bär"
Encoding(p)
[1] "latin1"
Encoding(dirname(p))
[1] "UTF-8"
Encoding(basename(p))
[1] "UTF-8"

Is this on purpose?  At least I did not find any relevant comment in the 
documentation of `dirname`/`basename`.

Background: I'm currently struggeling with a directory name containing a 
latin1-character.  (I know that this is a bad idea, but I did not create the directory 
and I cannot rename it.)  I now want to pass a latin1-directory name to a function, which 
internally uses `tools::makeLazyLoadDB`.  At that point, internally, `dirname` is called, 
which changes the encoding, and things break.  If I use `debug` to halt the processing 
and "fix" the encoding, things work as expected.

So, if possible, I would prefer that `dirname` and `basename` preserve the 
encoding.

Actually, makeLazyLoadDB isn't exported from tools, so strictly speaking you shouldn't be calling it. Or perhaps you have a good reason to call it, and should be asking for it to be exported, or you are calling a published function which calls it: in either case it should probably be fixed to accept UTF-8.

But it doesn't call dirname or basename, so maybe the function that calls it is the one that needs fixing.

In any case, while asking dirname() and basename() to preserve the encoding sounds reasonable, it seems like it would just be covering up a deeper problem.

Duncan Murdoch

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to