Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?

2014-12-12 Thread Bjørn-Helge Mevik
Duncan Murdoch  writes:

> users of other languages may want to have messages and variable names
> in their native language, and ASCII might not be enough for that.

Allowing for messages in non-ASCII encodings would probably be a good
idea, but I think allowing non-ASCII variable names is dangerous.

-- 
Regards,
Bjørn-Helge Mevik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?

2014-12-12 Thread Duncan Murdoch
On 12/12/2014, 4:12 AM, Bjørn-Helge Mevik wrote:
> Duncan Murdoch  writes:
> 
>> users of other languages may want to have messages and variable names
>> in their native language, and ASCII might not be enough for that.
> 
> Allowing for messages in non-ASCII encodings would probably be a good
> idea, but I think allowing non-ASCII variable names is dangerous.

Dangerous in what way?

I agree that CRAN probably shouldn't accept packages like that, at least
for exported symbols:  packages there should run anywhere.  But I
suspect that the majority of R packages are for private use, and will
never be sent to CRAN.  Do you know any reason that non-ASCII names
would be dangerous for those?

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?

2014-12-12 Thread Jan Kim
On Fri, Dec 12, 2014 at 06:01:22AM -0500, Duncan Murdoch wrote:
> On 12/12/2014, 4:12 AM, Bj??rn-Helge Mevik wrote:
> > Duncan Murdoch  writes:
> > 
> >> users of other languages may want to have messages and variable names
> >> in their native language, and ASCII might not be enough for that.
> > 
> > Allowing for messages in non-ASCII encodings would probably be a good
> > idea, but I think allowing non-ASCII variable names is dangerous.
> 
> Dangerous in what way?
> 
> I agree that CRAN probably shouldn't accept packages like that, at least
> for exported symbols:  packages there should run anywhere.  But I
> suspect that the majority of R packages are for private use, and will
> never be sent to CRAN.  Do you know any reason that non-ASCII names
> would be dangerous for those?
> 
> Duncan Murdoch
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

I'm would perhaps not go as far as calling them dangerous, but non-ASCII
characters in code are a mixed blessing which personally I'd opt to not
have, on balance. Being German I can understand that people may want
umlauted characters in their variable names, but where this catches on,
it's just a matter of time that people get characters into their code that
are different but indistinguishable in the font they use (I've seen this
with \H{o} rather than a \"{o}), and mega-personmonths are wasted puzzling
over tracking down these problems.

While many packages are used in-house at least initially, making a
package is a step towards releasing it, so I'd anticipate that having
an option to support weeding out any potentially troublesome identifiers
has the potential to do some good.

Best regards, Jan
-- 
 +- Jan T. Kim ---+
 | email: jtt...@gmail.com|
 | WWW:   http://www.jtkim.dreamhosters.com/  |
 *-=<  hierarchical systems are for files, not for humans  >=-*

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?

2014-12-12 Thread Duncan Murdoch
On 12/12/2014, 7:34 AM, Jan Kim wrote:
> On Fri, Dec 12, 2014 at 06:01:22AM -0500, Duncan Murdoch wrote:
>> On 12/12/2014, 4:12 AM, Bj??rn-Helge Mevik wrote:
>>> Duncan Murdoch  writes:
>>>
 users of other languages may want to have messages and variable names
 in their native language, and ASCII might not be enough for that.
>>>
>>> Allowing for messages in non-ASCII encodings would probably be a good
>>> idea, but I think allowing non-ASCII variable names is dangerous.
>>
>> Dangerous in what way?
>>
>> I agree that CRAN probably shouldn't accept packages like that, at least
>> for exported symbols:  packages there should run anywhere.  But I
>> suspect that the majority of R packages are for private use, and will
>> never be sent to CRAN.  Do you know any reason that non-ASCII names
>> would be dangerous for those?
>>
>> Duncan Murdoch
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> I'm would perhaps not go as far as calling them dangerous, but non-ASCII
> characters in code are a mixed blessing which personally I'd opt to not
> have, on balance. Being German I can understand that people may want
> umlauted characters in their variable names, but where this catches on,
> it's just a matter of time that people get characters into their code that
> are different but indistinguishable in the font they use (I've seen this
> with \H{o} rather than a \"{o}), and mega-personmonths are wasted puzzling
> over tracking down these problems.
> 
> While many packages are used in-house at least initially, making a
> package is a step towards releasing it, so I'd anticipate that having
> an option to support weeding out any potentially troublesome identifiers
> has the potential to do some good.

That's a good point.  I guess I'm thinking of Asian languages where the
transliteration into ASCII loses a lot of information, and (I'm told) is
uncomfortable for native speakers to read.  I think R should be usable
in those languages in a way that is comfortable for them, but they
should be warned that doing so limits portability.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?

2014-12-12 Thread Hadley Wickham
> I'm would perhaps not go as far as calling them dangerous, but non-ASCII
> characters in code are a mixed blessing which personally I'd opt to not
> have, on balance. Being German I can understand that people may want
> umlauted characters in their variable names, but where this catches on,
> it's just a matter of time that people get characters into their code that
> are different but indistinguishable in the font they use (I've seen this
> with \H{o} rather than a \"{o}), and mega-personmonths are wasted puzzling
> over tracking down these problems.

Related: http://en.wikipedia.org/wiki/IDN_homograph_attack

Hadley

-- 
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?

2014-12-12 Thread Bjørn-Helge Mevik
Duncan Murdoch  writes:

> On 12/12/2014, 4:12 AM, Bjørn-Helge Mevik wrote:
>> Duncan Murdoch  writes:
>> 
>>> users of other languages may want to have messages and variable names
>>> in their native language, and ASCII might not be enough for that.
>> 
>> Allowing for messages in non-ASCII encodings would probably be a good
>> idea, but I think allowing non-ASCII variable names is dangerous.
>
> Dangerous in what way?

Perhaps "dangerous" is a little too strong, but it opens up
possibilities for problems with sharing code or running it on other
systems.  Also, judging by the many files I've seen (and created myself
:) with a mixture of iso8859-1 and utf8, or with "double-encoded" utf8,
it is surprisingly easy to make encoding mistakes when editing or
processing files.  And as Jan Kim wrote, you could get things that look
similar but are different.

-- 
Regards,
Bjørn-Helge Mevik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?

2014-12-12 Thread Barry Rowlingson
On Fri, Dec 12, 2014 at 12:34 PM, Jan Kim  wrote:

> it's just a matter of time that people get characters into their code that
> are different but indistinguishable in the font they use (I've seen this
> with \H{o} rather than a \"{o}), and mega-personmonths are wasted puzzling
> over tracking down these problems.

 Then R should ban variable names from having 'l', 'i', '1', '0' and
'O' in them!

Barry

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Force install.packages() to use ASCII encoding when parse():ing code?

2014-12-12 Thread Jan Kim
On Fri, Dec 12, 2014 at 04:58:52PM +, Barry Rowlingson wrote:
> On Fri, Dec 12, 2014 at 12:34 PM, Jan Kim  wrote:
> 
> > it's just a matter of time that people get characters into their code that
> > are different but indistinguishable in the font they use (I've seen this
> > with \H{o} rather than a \"{o}), and mega-personmonths are wasted puzzling
> > over tracking down these problems.
> 
>  Then R should ban variable names from having 'l', 'i', '1', '0' and
> 'O' in them!

well -- I can live with 'i', but if I came across code using variable names
i, \'{\i}, \`{\i} and also \i, \u{\i}, \r{\i}, \d{\i} etc. I'd consider that
dangerous to my sanity (especially if they're all used in the same piece of
code)...  ;-)

More seriously, as I (literally) see it, the problems of confusing l / I / 1
or O / 0 etc. are reasonably solvable by using a decent font (e.g. Deja Vu,
Source Code Pro), but ensuring distinctness of glyphs in the same way
won't scale to character sets the size of Unicode.

Best regards, Jan

> Barry
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
 +- Jan T. Kim ---+
 | email: jtt...@gmail.com|
 | WWW:   http://www.jtkim.dreamhosters.com/  |
 *-=<  hierarchical systems are for files, not for humans  >=-*

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel