Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?
Duncan Murdoch writes: > users of other languages may want to have messages and variable names > in their native language, and ASCII might not be enough for that. Allowing for messages in non-ASCII encodings would probably be a good idea, but I think allowing non-ASCII variable names is dangerous. -- Regards, Bjørn-Helge Mevik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?
On 12/12/2014, 4:12 AM, Bjørn-Helge Mevik wrote: > Duncan Murdoch writes: > >> users of other languages may want to have messages and variable names >> in their native language, and ASCII might not be enough for that. > > Allowing for messages in non-ASCII encodings would probably be a good > idea, but I think allowing non-ASCII variable names is dangerous. Dangerous in what way? I agree that CRAN probably shouldn't accept packages like that, at least for exported symbols: packages there should run anywhere. But I suspect that the majority of R packages are for private use, and will never be sent to CRAN. Do you know any reason that non-ASCII names would be dangerous for those? Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?
On Fri, Dec 12, 2014 at 06:01:22AM -0500, Duncan Murdoch wrote: > On 12/12/2014, 4:12 AM, Bj??rn-Helge Mevik wrote: > > Duncan Murdoch writes: > > > >> users of other languages may want to have messages and variable names > >> in their native language, and ASCII might not be enough for that. > > > > Allowing for messages in non-ASCII encodings would probably be a good > > idea, but I think allowing non-ASCII variable names is dangerous. > > Dangerous in what way? > > I agree that CRAN probably shouldn't accept packages like that, at least > for exported symbols: packages there should run anywhere. But I > suspect that the majority of R packages are for private use, and will > never be sent to CRAN. Do you know any reason that non-ASCII names > would be dangerous for those? > > Duncan Murdoch > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel I'm would perhaps not go as far as calling them dangerous, but non-ASCII characters in code are a mixed blessing which personally I'd opt to not have, on balance. Being German I can understand that people may want umlauted characters in their variable names, but where this catches on, it's just a matter of time that people get characters into their code that are different but indistinguishable in the font they use (I've seen this with \H{o} rather than a \"{o}), and mega-personmonths are wasted puzzling over tracking down these problems. While many packages are used in-house at least initially, making a package is a step towards releasing it, so I'd anticipate that having an option to support weeding out any potentially troublesome identifiers has the potential to do some good. Best regards, Jan -- +- Jan T. Kim ---+ | email: jtt...@gmail.com| | WWW: http://www.jtkim.dreamhosters.com/ | *-=< hierarchical systems are for files, not for humans >=-* __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?
On 12/12/2014, 7:34 AM, Jan Kim wrote: > On Fri, Dec 12, 2014 at 06:01:22AM -0500, Duncan Murdoch wrote: >> On 12/12/2014, 4:12 AM, Bj??rn-Helge Mevik wrote: >>> Duncan Murdoch writes: >>> users of other languages may want to have messages and variable names in their native language, and ASCII might not be enough for that. >>> >>> Allowing for messages in non-ASCII encodings would probably be a good >>> idea, but I think allowing non-ASCII variable names is dangerous. >> >> Dangerous in what way? >> >> I agree that CRAN probably shouldn't accept packages like that, at least >> for exported symbols: packages there should run anywhere. But I >> suspect that the majority of R packages are for private use, and will >> never be sent to CRAN. Do you know any reason that non-ASCII names >> would be dangerous for those? >> >> Duncan Murdoch >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > I'm would perhaps not go as far as calling them dangerous, but non-ASCII > characters in code are a mixed blessing which personally I'd opt to not > have, on balance. Being German I can understand that people may want > umlauted characters in their variable names, but where this catches on, > it's just a matter of time that people get characters into their code that > are different but indistinguishable in the font they use (I've seen this > with \H{o} rather than a \"{o}), and mega-personmonths are wasted puzzling > over tracking down these problems. > > While many packages are used in-house at least initially, making a > package is a step towards releasing it, so I'd anticipate that having > an option to support weeding out any potentially troublesome identifiers > has the potential to do some good. That's a good point. I guess I'm thinking of Asian languages where the transliteration into ASCII loses a lot of information, and (I'm told) is uncomfortable for native speakers to read. I think R should be usable in those languages in a way that is comfortable for them, but they should be warned that doing so limits portability. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?
> I'm would perhaps not go as far as calling them dangerous, but non-ASCII > characters in code are a mixed blessing which personally I'd opt to not > have, on balance. Being German I can understand that people may want > umlauted characters in their variable names, but where this catches on, > it's just a matter of time that people get characters into their code that > are different but indistinguishable in the font they use (I've seen this > with \H{o} rather than a \"{o}), and mega-personmonths are wasted puzzling > over tracking down these problems. Related: http://en.wikipedia.org/wiki/IDN_homograph_attack Hadley -- http://had.co.nz/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?
Duncan Murdoch writes: > On 12/12/2014, 4:12 AM, Bjørn-Helge Mevik wrote: >> Duncan Murdoch writes: >> >>> users of other languages may want to have messages and variable names >>> in their native language, and ASCII might not be enough for that. >> >> Allowing for messages in non-ASCII encodings would probably be a good >> idea, but I think allowing non-ASCII variable names is dangerous. > > Dangerous in what way? Perhaps "dangerous" is a little too strong, but it opens up possibilities for problems with sharing code or running it on other systems. Also, judging by the many files I've seen (and created myself :) with a mixture of iso8859-1 and utf8, or with "double-encoded" utf8, it is surprisingly easy to make encoding mistakes when editing or processing files. And as Jan Kim wrote, you could get things that look similar but are different. -- Regards, Bjørn-Helge Mevik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?
On Fri, Dec 12, 2014 at 12:34 PM, Jan Kim wrote: > it's just a matter of time that people get characters into their code that > are different but indistinguishable in the font they use (I've seen this > with \H{o} rather than a \"{o}), and mega-personmonths are wasted puzzling > over tracking down these problems. Then R should ban variable names from having 'l', 'i', '1', '0' and 'O' in them! Barry __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?
On Fri, Dec 12, 2014 at 04:58:52PM +, Barry Rowlingson wrote: > On Fri, Dec 12, 2014 at 12:34 PM, Jan Kim wrote: > > > it's just a matter of time that people get characters into their code that > > are different but indistinguishable in the font they use (I've seen this > > with \H{o} rather than a \"{o}), and mega-personmonths are wasted puzzling > > over tracking down these problems. > > Then R should ban variable names from having 'l', 'i', '1', '0' and > 'O' in them! well -- I can live with 'i', but if I came across code using variable names i, \'{\i}, \`{\i} and also \i, \u{\i}, \r{\i}, \d{\i} etc. I'd consider that dangerous to my sanity (especially if they're all used in the same piece of code)... ;-) More seriously, as I (literally) see it, the problems of confusing l / I / 1 or O / 0 etc. are reasonably solvable by using a decent font (e.g. Deja Vu, Source Code Pro), but ensuring distinctness of glyphs in the same way won't scale to character sets the size of Unicode. Best regards, Jan > Barry > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- +- Jan T. Kim ---+ | email: jtt...@gmail.com| | WWW: http://www.jtkim.dreamhosters.com/ | *-=< hierarchical systems are for files, not for humans >=-* __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel