Hello, everyone, thank you for your quick and helpful responses and the detailed information.
Sorry for not providing a reproducible example for the (potential) bug in `tools::makeLazyLoadDB`. The main point of my mail was the surprising behaviour of `basename` and `dirname`. Fixing those functions would probably solve my problem for me (as a workaround, probably hiding some underlying problem, and likely leading to a failure for someone else fighting with encodings). Concerning my underlying direct problem with `tools::makeLazyLoadDB`, I'm having difficulty to make my example reproducible. I'm trying to use a directory with a non-ASCII-name for a knitr cache. My R-4.0.0 here behaves different from my R-3.6.3, but when I filed a bug report with knitr, Yihui could not reproduce this difference (https://github.com/yihui/knitr/issues/1840). So I'll try R-4.0.2 next, let's see what happens. Cheers Johannes > Gesendet: Dienstag, 30. Juni 2020 um 09:25 Uhr > Von: "Tomas Kalibera" <tomas.kalib...@gmail.com> > An: "Johannes Rauh" <jar...@web.de>, "r-devel" <r-devel@r-project.org> > Betreff: Re: [Rd] `basename` and `dirname` change the encoding to "UTF-8" > > On 6/29/20 4:39 PM, Johannes Rauh wrote: > > Dear R Developers, > > > > I noticed that `basename` and `dirname` always return "UTF-8" on Windows > > (tested with R-4.0.0 and R-3.6.3): > > > >> p <- "Föö/Bär" > >> Encoding(p) > > [1] "latin1" > >> Encoding(dirname(p)) > > [1] "UTF-8" > >> Encoding(basename(p)) > > [1] "UTF-8" > > > > Is this on purpose? At least I did not find any relevant comment in the > > documentation of `dirname`/`basename`. > > Background: I'm currently struggeling with a directory name containing a > > latin1-character. (I know that this is a bad idea, but I did not create > > the directory and I cannot rename it.) I now want to pass a > > latin1-directory name to a function, which internally uses > > `tools::makeLazyLoadDB`. At that point, internally, `dirname` is called, > > which changes the encoding, and things break. If I use `debug` to halt the > > processing and "fix" the encoding, things work as expected. > > > > So, if possible, I would prefer that `dirname` and `basename` preserve the > > encoding. > > Please try to always submit a minimal reproducible example with your > reports and test with at least the latest released version of R, ideally > also with R-devel. > > As you have not sent a reproducible example, it is hard to tell for > sure, but most likely as Kevin wrote you have run into a real bug, which > was however already fixed in 4.0.2 and in R-devel (17833). The lazy > loading cache did not work with file names in non-native encoding. > > That real bug has been uncovered by legitimate and correct changes like > the ones you report, where file operations started returning non-ASCII > strings in UTF-8. Historically in R such functions would instead return > native strings with misrepresented characters, and we were reluctant to > change that expecting waking bugs in code silently assuming native > encoding. Still, as people were increasingly running into problems with > non-representable characters, we did that change in several functions > anyway, and yes, it started waking up bugs. > > With some performance overhead and added complexity, we could be > returning preferentially results in native encoding, and in UTF-8 only > when they included non-representable characters. That would increase the > code complexity, increase performance overhead, but wake up existing > bugs with smaller probability. Note - some code that relied previously > on best-fit conversions done by Windows will have been broken anyway. We > would have to bypass win_iconv/iconv for that (adding more complexity). > Bugs in code not handling encodings properly would still be triggered > via non-representable characters. I've recently changed file.path() in > R-devel to be slightly more conservative again, along these lines. > > We can still do it more widely, but it is not high on the priority list. > The way to fix all of these problems is switching to UTF-8 as native > encoding on Windows and every day spent on tuning the existing behavior > postpones that real solution. > > Best > Tomas > > > > > > Best regards > > Johannes > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel