>>>>> "Stéphane" == Stéphane Dray <[EMAIL PROTECTED]> >>>>> on Thu, 19 Oct 2006 09:46:49 +0200 writes:
Stéphane> Thanks a lot for this clear answer. So there is no way to preserve our Stéphane> french cultural exception (accented characters), I agree that there are many French cultural exceptions ;-) --- and as a Swiss, I highly estimate several of them --- however "accented" characters (with the appropriate meaning of "accented") are not at all a French exception, rather almost a continental European one {as long as we are staying in the "latin" alphabet context}. If I think of what I know of Europe, the only country/language *not* using some version of "accented" characters are the British and (I think) the Dutch/Flamish. Everyone else (? probably I forgot some, and don't know about others like gaelic,...) has some kind of accents... I agree with Stéphane that this is unfortunate for quite a few of us, and it came as a big surprise to me when I first heard about this from Brian. .. aah, life was easy when we western chauvinists could behave as if the whole relevant part of the world was happy with iso-latin1... Martin Stéphane> if we want to be international... I have thought Stéphane> that the inclusion of a parameter encoding in data Stéphane> function (e.g. data(mydata,encoding="latin1")) Stéphane> like in the function 'file' could be an way to Stéphane> solve the problem. Apparently, the problem is much Stéphane> more complicated... Stéphane> Sincerely. Stéphane> Prof Brian Ripley wrote: >> Only ASCII letters are portable: those accented characters do not even >> exist in many of the encodings used for R, e.g. Russian and Japanese >> on Windows machines. >> >> There is no way to associate an encoding with a character string in >> R. We considered it, but it would have had severe back-compatibility >> problems and little advantage (you cannot display non-ASCII character >> strings portably: even if you have a Unicode encoding you still need >> to select a suitable font). >> >> 'B. Ripley' (sic) >> >> >> On Wed, 18 Oct 2006, Stéphane Dray wrote: >> >>> Hello, >>> I have some questions concerning encoding and package distribution. We >>> develop the ade4 package. For some data sets included in the package, >>> there are accentued character (e.g. é,è...). The data sets have been >>> saved using latin1 encoding, but some of us use utf-8 and can not see >>> some data sets which contains accented chracters. >>> e.g: >>> >>> librarry(ade4) >>> data(rankrock) >>> rankrock >>> >>> in this case, characters are in rownames. Other data sets have such >>> characters in data (e.g. levels of factors..). A solution is to use >>> iconv... this is quite easy for us but perhaps more difficult for a user >>> which can have no idea of the problem. This problem is quite marginal >>> for the moment but some linux distribution are utf-8 by default (e.g. >>> ubuntu) and I suppose that the problem will be more and more present in >>> the future. >>> >>> So we wonder if there is a proper way to code and save these data sets. >>> I have found some documents of B. Ripley and this note : >>> >>> http://developer.r-project.org/210update.txt >>> >>> - Names in data objects (e.g. in .rda files) are problematic. It >>> is likely that by release time these will be treated as in >>> Latin-1. >>> >>> If I am correct, I did not find an answer to this problem. >>> >>> What are the plans of R gurus on this question ? >>> Thanks a lot. >>> Sincerely. >>> >>> Please add my adress in answers as I am not subsciber of this list. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel