A little more digging revealed a Unix/Windows discrepancy here. On Unix, saving images and preparing for lazyloading/lazydata is done with LC_ALL=C: on Windows with LC_COLLATE=C. I will change Windows to match.
Unfortunately how the C locale is implemented is OS-dependent. Strictly it should not allow bytes 0x80 to 0xff but it does on some OSes (including Windows). So the strict consequences of this should be that when using lazy-loading or a saved image - all names have to be ASCII alphanumeric - \uxxxx sequences are not allowed except \u007f and lower (they are not valid at all in a C locale prior to 2.3.1 so I would not expect to see them in a package). - bytes in character strings are copied byte for byte. This leaves an inconsistency between packages which use lazy-loading / save image and those which do not. We could resolve that by switching to the C locale when loading R code in packages (or, better, R code that was not a loader stub): I didn't think that would be worthwhile but in fact 5 of the packages listed are small enough not to be lazy-loaded. The other consequence is that the only way we allow packages to have object names which are not ASCII alphanumeric is to disable lazy loading. One possibility is to allow a package to specify its required locale for loading in the DESCRIPTION file, and make use of that. I am inclined to do nothing about these issues unless people have an actual need to have packages tailored on a non-English locale. On Wed, 17 May 2006, Prof Brian Ripley wrote: > The report on R_help about problems loading package irr (in a UTF-8 locale, > it seemed) prompted me to look a little deeper. There are quite a few > packages with Latin-1 chars in their .R files, and a couple in UTF-8. > > Apart from non-ASCII chars in comments, this is a problem as the code > concerned cannot be represented in some locales R runs in (for example > Japanese on Windows). It happens that irr is so small that lazy-loading is > not used, but when lazy-loading or a saved image is used, the locale in use > when the package is installed determines how the code is parsed (and may not > be the same as when the package is used, and indeed it is not uncommon on > Linux/Unix systems for different users to use different locales). > > This means that using non-ASCII chars is not portable, and I've added code to > R CMD check in R-devel to warn about such usage. In the examples I have > investigated the usages have been > > - messages in a non-English language, typically French. > - startup messages with people's names. > - use of characters that I can only guess are intended to be in the > WinAnsi encoding, e.g. a copyright symbol. > > The only reason I have not made this an error is that people might want to > produce packages for a known locale, e.g. a student class, but perhaps it > should be an error for packages submitted to CRAN. > > I do not believe there is much we can do about this: messages which are not > entirely in ASCII cannot be displayed on many R platforms and it seems > incorrect to allow French messages and not Japanese ones. > > The packages currently throwing warnings are > > FactoMineR FunCluster JointGLM LoopAnalyst Sciviews ade4 adehabitat ape > climatol crossdes deal grasper irr lsa mvrpart pastecs sn surveillance > truncgof > > > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel