hadley wickham wrote:
On Tue, Jun 17, 2008 at 9:28 AM, Tom Backer Johnsen <[EMAIL PROTECTED]> wrote:
In a research project we are using a web-based tools for collecting data
from questionnaire. The system generates files that are simple to read as a
data frame in the "long" format, which are simple to convert to the "wide"
format.
Something that might happen are: (a) there are two (multiple) references to
the same cell, and (b) if there are missing values? So, the data set has
two references to S2/T2 and none to the S2/T1 combination:
d
values person time
1 1 S1 T1
2 2 S1 T2
3 3 S1 T3
4 4 S1 T4
5 22 S2 T2
6 6 S2 T2
7 7 S2 T3
8 8 S2 T4
9 9 S3 T1
10 10 S3 T2
11 11 S3 T3
12 12 S3 T4
reshape (d, idvar="person", v.names=c("values"), timevar="time",
direction="wide")
person values.T1 values.T2 values.T3 values.T4
1 S1 1 2 3 4
5 S2 NA 22 7 8
9 S3 9 10 11 12
The missing cell gets an NA as expected. But the surprise is in the case
where there are two references to the same cell. The the *first* is used
(22 rather than 6).
You might try using the reshape package instead:
last <- function(x) x[length(x)]
names(d) <- c("value", "person", "time")
cast(d, person ~ time, last)
The first and the last line I think is clear, although I will have to
experiment more to understand the call on cast () better. However, what
I do not understand is the purpose of the second line. I can print out
names(d) right after the reading the frame with the read.table function.
If I print names (d) right after that statement has been executed,
then I see no difference. Even so, it seems to be necessary for the
call on cast to work. It seems that "names" is not the same as "names".
Something along the lines of a with () or attach () perhaps?
Tom
You can find out more at http://had.co.nz/reshape
Hadley
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.