When you attach() something, it loads it into memory and there it stays. It is not a link, reference, or pointer to the original. Changing the original (the version in the dataframe), which is what you did, does not change the attached copy in memory. In essence, you did a type conversion on one copy, but afterwards started looking at the other copy.

See also an interjected comments below.

-Don

At 8:54 AM +0000 11/23/09, Alan Kelly wrote:
Deal list,
I have a data frame (birth) with mixed variables (numeric and alphanumeric). One variable "t1stvisit" was originally coded as numeric with values 1,2, and 3. After attaching the data frame, this is what I see when I use str(t1stvisit)

$ t1stvisit: int  1 1 1 1 1 1 1 1 2 2 ...

This is as expected.
I then convert t1stvisit to a factor and to avoid creating a second copy of this variable independent of the data frame I use:
birth$t1stvisit = as.factor(birth$t1stvisit)
if I check that the conversion has worked:
is.factor(t1stvisit)
[1] FALSE
Now the only object present in the workspace in the data frame "birth" and, as noted, I have not created any new variables. So why does R still treat t1stvisit as numeric?
is.factor(t1stvisit)
[1] FALSE

Yet when I try the following:
 is.factor(birth$t1stvisit)
[1] TRUE
So, there appears to be two versions of "t1stvisit" - the original numeric version and the correct factor version although ls() only shows "birth" as present in the workspace.

Right.
  find('t1stvisit')
will show you there are two of them, and where in memory they are located.
If you type
   t1stvisit
at the prompt, you always get the first one. The one in the attached dataframe is the second one. Use the
  search()
function to show you the different locations in memory where objects can be found.

When you did the attach(), did you get a message like:

 attach(tmp)

        The following object(s) are masked _by_ .GlobalEnv :

         x

(yours would have referred to your variables, not the "x" in my example).
That message tells you you have two variables of the same name, stored in two different locations in the search path.

As a general rule, it's just plain confusing to have more than one object of the same name in more than one location. In your situation, I would get rid of the one that's not in the dataframe. But even then, if you change it in the dataframe you'll still need to detach and re-attach the dataframe, so using attach() is probably not the best choice in the long run. Maybe the with() function would meet your needs.

If I type:
 summary(t1stvisit)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
  1.000   1.000   2.000   1.574   2.000   3.000  29.000
I get the numeric version, but if I try
summary(birth$t1stvisit)
   1    2    3 NA's
 180  169   22   29
I get the factor version.

Frankly I feel that this behaviour is non-intuitive and potentially problematic. Nor have I seen warnings about this in the various text books on R.
Can anyone comment on why this should occur?
Many thanks,
Alan Kelly

Dr. Alan Kelly
Department of Public Health & Primary Care
Trinity College Dublin

______________________________________________
R-help@r-project.org mailing list
https://*stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to