On 29/06/2020 10:39 a.m., Johannes Rauh wrote:
Dear R Developers,
I noticed that `basename` and `dirname` always return "UTF-8" on Windows
(tested with R-4.0.0 and R-3.6.3):
p <- "Föö/Bär"
Encoding(p)
[1] "latin1"
Encoding(dirname(p))
[1] "UTF-8"
Encoding(basename(p))
[1] "UTF-8"
Is this on purpose? At least I did not find any relevant comment in the
documentation of `dirname`/`basename`.
Background: I'm currently struggeling with a directory name containing a
latin1-character. (I know that this is a bad idea, but I did not create the directory
and I cannot rename it.) I now want to pass a latin1-directory name to a function, which
internally uses `tools::makeLazyLoadDB`. At that point, internally, `dirname` is called,
which changes the encoding, and things break. If I use `debug` to halt the processing
and "fix" the encoding, things work as expected.
So, if possible, I would prefer that `dirname` and `basename` preserve the
encoding.
Actually, makeLazyLoadDB isn't exported from tools, so strictly speaking
you shouldn't be calling it. Or perhaps you have a good reason to call
it, and should be asking for it to be exported, or you are calling a
published function which calls it: in either case it should probably be
fixed to accept UTF-8.
But it doesn't call dirname or basename, so maybe the function that
calls it is the one that needs fixing.
In any case, while asking dirname() and basename() to preserve the
encoding sounds reasonable, it seems like it would just be covering up a
deeper problem.
Duncan Murdoch
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel