Bert,

Thank you for correcting my inaccuracy. A quick look at the original question might help you understand what I meant:

d<- data.frame(x = c(0, 1))
d<- data.frame(d, y = c(0,1))
names(d)[2]<- "a.-5"
d
 x a.-5
1 0    0
2 1    1
d1<- data.frame(d, y = c(0,1))
d1
 x a..5 y
1 0    0 0
2 1    1 1
d2<- data.frame(d, y = c(0,1), check.names=FALSE)
d2
  x a.-5 y
1 0    0 0
2 1    1 1

With check.names=TRUE, the dash is converted to a period. With check.names=FALSE, the dash is conserved. So the dash is not a problem per se, because data.frame() doesn't throw an error or warning in this case.

Then my question is, why is it converted? To avoid problems with other functions? To avoid confusion and mischief as you mentioned because it is the symbol for subtraction? If it can be that problematic, why not just not allow it at all? I guess there are reasons for these behaviors and I am curious to learn more about the logic behind it.

Actually, I find that data.frame() can be confusing. On the one hand it accepts unquoted strings to define column names, like in your first example. But on the other hand, it doesn't accept it if it can be confusing like in your second example. I am definitely not experienced enough to judge whether the behavior makes sense or not, but I am curious to know why quoted strings are not required in data.frame(). This behavior would be consistent, and therefore easier to understand for beginners, I think.

Thank you for your insights,
Ivan



Le 24/01/12 16:53, Bert Gunter a écrit :
Ivan:

On Tue, Jan 24, 2012 at 6:47 AM, Ivan Calandra
<[email protected]>  wrote:
By "it works anyway", I mean that you can have a dash in a column name,
there is no error or even warning.
I guess that some functions would throw an error or warning, depending on
the requirements, but data.frame() doesn't.
This is false. Please don't guess. Read the Help pages.

data.frame(a = 1:3)  #fine
data.frame(a-3 = 1:3) # Error: unexpected '=' in "data.frame(a-3 ="
The name in **NOT** OK. However,
data.frame("a-3" = 1:3) # fine
   a.3
1   1
2   2
3   3

## A quoted  character string can be used as a column name
## The name will be changed to a legal name unless:

data.frame("a-3" = 1:3,check.names=FALSE)
   a-3
1   1
2   2
3   3

However, as is obvious, there is much mischief possible from such
practices, so that they are best avoided.

-- Bert


Ivan

Le 24/01/12 15:35, David Winsemius a écrit :

On Jan 24, 2012, at 4:44 AM, Ivan Calandra wrote:

Hi Mark,

I cannot tell you why (maybe someone else can), but the check.names
argument to data.frame() interpret "a.-5" as an unvalid name and convert to
to a valid one. What I don't understand is why it isn't "valid" since it
works anyway.

The dash is not a valid character for column names. What do you mean by
"it works anyway"?

--
Ivan CALANDRA
Université de Bourgogne
UMR CNRS/uB 6282 Biogéosciences
6 Boulevard Gabriel
21000 Dijon, FRANCE
+33(0)3.80.39.63.06
[email protected]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Ivan CALANDRA
Université de Bourgogne
UMR CNRS/uB 6282 Biogéosciences
6 Boulevard Gabriel
21000 Dijon, FRANCE
+33(0)3.80.39.63.06
[email protected]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to