Dear Chuck, John, Vikas, and useRs,
thank you very much for your great suggestions.
I received three replies providing different ways to reshape my original
data.frame (original question at the bottom). There are however some
discrepancies in their results (most likely because I didn't explain my
question clearly enough) so I think it is better to discuss them a little
bit. I have prepared a slightly simplified version of the data set, to
facilitate comparisons. Please see below
Cheers!
Ahimsa
#=============================================================
# data in its original shape:
indiv <- rep(c("A","B"),c(3,3))
level.1 <- c(7, 5, 1, 2, 5, 3) # comes from <- rpois(6,lambda=3)
covar.1 <- c(26.4, 48.9, 62.7, 135.3, 40.1, 17.4) # comes from <-
rlnorm(6,3,1)
level.2 <- c(5, 2, 1, 3, 6, 0) # comes from <- rpois(6,lambda=3)
covar.2 <- c(4.5, 58.6, 6.4, 47.2, 16.9, 59.4) # comes from <- rlnorm(6,3,1)
my.dat <- data.frame(indiv,level.1,covar.1,level.2,covar.2)
# level.1 and level.2 are levels from a common factor and their values
represent the number
# of replicates for that combination of factor level and value of the
covariate (covar.1 for
# level.1 and covar.2 for level.2 cases). I want each replicate to be
represented as a row,
# and therefore the number of rows in the new data frame should be:
sum(level.1) + sum(level.2)
# [1] 40
# and the number of rows from individual A are:
sum(my.dat[my.dat$indiv=="A",c(2,4)])
# [1] 21
# solution 1: ========================================
long <- reshape(my.dat, varying = list(c("level.1","level.2"),
c("covar.1","covar.2")),
timevar="level", idvar="case.id",
v.names=c("ncases","covar"),
direction="long")
newdf <- with(long, data.frame(indiv = rep( indiv, ncases),
level = rep( level, ncases),
covar = rep( covar, ncases),
case.id = rep(case.id, ncases)))
summary(newdf)
# we have 40 cases (rows) of which 21 belong to indiv A;
# this is provides exactly what I was looking for
# solution 2: ========================================
fact1 <- rep("level.1", length(my.dat[,1]))
fact2 <- rep("level.2", length(my.dat[,1]))
lels <- c(fact1,fact2)
nams <- c("indiv", "case.id", "covar")
set1 <- my.dat[, c(1,2,3)] ; names(set1) <- nams
set2 <- my.dat[,c(1, 4,5)] ; names(set2) <- nams
newdata <- cbind(lels, rbind(set1,set2))
mydata <- rbind(newdata[, c(2,1,4,3)], newdata[,
c(2,1,4,3)])
names(mydata) <- c("indiv", "factor", "covar",
"caseid")
mydata[order(mydata$indiv, mydata$caseid,
mydata$factor),]
summary(mydata)
head(mydata)
# this is not exactly what I meant
# it provides 24 rows, half of them from indiv A
# caseid has inherited the values from level.1 and level.2
# up to newdata the process is correct but the next step
# duplicates newdata, obtaining 24 (rows) in which each row is repeated
# what we actually wanted was to create as many replicates of each case
(row)
# as the value of that case's level.x (now renamed as case.id).
# we should do as in the 2nd paragraph of solution one
with(newdata, data.frame(lels=rep(lels,case.id),....
# solution 3: ========================================
library(reshape)
melt(my.dat,id=c("indiv","covar.1","covar.2"))->my.dat.1
names(my.dat.1)[4:5]<-c("level","case.id")
melt(my.dat.1,id=c("indiv","level","case.id"))->my.dat.2
summary(my.dat.2)
# this is by far the most elegant solution
# unfortunately it provides similar results to solution 2 and
# it has another issue: it alters the relationship
# between factor (level.1 and level.2) and covar (covar.1 and covar.2)
# solution 1 is the adequate one thus!
# Thanks a lot for the three stimulant solutions. And apologies for not
# explaining the case more clearly.
# Cheers!
##
Dear all,
I'm having a few problems trying to reshape a data frame. I tried with
reshape{stats} and melt{reshape} but I was missing something. Any help is
very welcome. Please find details below:
#################################
# data in its original shape:
indiv <- rep(c("A","B"),c(10,10))
level.1 <- rpois(20, lambda=3)
covar.1 <- rlnorm(20, 3, 1)
level.2 <- rpois(20, lambda=3)
covar.2 <- rlnorm(20, 3, 1)
my.dat <- data.frame(indiv,level.1,covar.1,level.2,covar.2)
# the values of level.1 and level.2 represent the number of cases for the
particular
# combination of indiv*level*covar value
# I would like to do two things:
# 1. reshape to long reducing my.dat[,2:5] into two colums "factor" (levels=
level.1 & level.2)
# and the covariate
# 2. create one new row for each case in level.1 and level.2
# the new reshaped data.frame would should look like this:
# indiv factor covar case.id
# A level.1 4.614105 1
# A level.1 4.614105 2
# A level.2 31.064405 1
# A level.2 31.064405 2
# A level.2 31.064405 3
# A level.2 31.064405 4
# A level.1 19.185784 1
# A level.2 48.455929 1
# A level.2 48.455929 2
# A level.2 48.455929 3
# etc...
############################
--
ahimsa campos-arceiz
www.camposarceiz.com
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.