On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:
Hi all,
I have played a bit with the "reshape" package and function along with
"melt" and "cast", but I feel I still don't have a good handle on
how to
use them efficiently. Below I have included a application of
"reshape" that
is rather clunky and I'm hoping someone can offer advice on how to use
reshape (or melt/cast) more efficiently.
You do realize that the 'reshape' function is _not_ in the reshape
package, right? And also that the reshape package has been superseded
by the reshape2 package?
--
David.
#For this example I am using climate change data available on-line
file <- ("
http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv")
clim.data <- read.csv(file, header=TRUE)
library(lubridate)
library(reshape)
#I've been playing with the lubridate package a bit to work with
dates, but
as the climate dataset only uses year and month I have
#added a "day" to each entry in the "yr_mn" column and then used
"dym" from
lubridate to generate the POSIXlt formatted dates in
#a new column clim.data$date
clim.data$yr_mn<-paste("01", clim.data$yr_mn, sep="")
clim.data$date<-dym(clim.data$yr_mn)
#Now to the reshape. The dataframe is in a wide format. The columns
GISS,
HAD, NOAA, RSS, and UAH are all different sources
#from which the global temperature anomaly has been calculated since
1880
(actually only 1978 for RSS and UAH). What I would like to
#do is plot the temperature anomaly vs date and use ggplot to facet
by the
different data source (GISS, HAD, etc.). Thus I need the
#data in long format with a date column, a temperature anomaly
column, and
a data source column. The code below works, but its
#really very clunky and I'm sure I am not using these tools as
efficiently
as I can.
#The varying=list(3:7) specifies the columns in the dataframe that
corresponded to the sources (GISS, etc.), though then in the resulting
#reshaped dataframe the sources are numbered 1-5, so I have to
reassigned
their names. In addition, the original dataframe has
#additional data columns I do not want and so after reshaping I create
another! dataframe with just the columns I need, and
#then I have to rename them so that I can keep track of what
everything is.
Whew! Not the most elegant of code.
d<-reshape(clim.data, varying=list(3:7),idvar="date",
v.names="anomaly",direction="long")
d$time<-ifelse(d$time==1,"GISS",d$time)
d$time<-ifelse(d$time==2,"HAD",d$time)
d$time<-ifelse(d$time==3,"NOAA",d$time)
d$time<-ifelse(d$time==4,"RSS",d$time)
d$time<-ifelse(d$time==5,"UAH",d$time)
new.data<-data.frame(d$date,d$time,d$anomaly)
names(new.data)<-c("date","source","anomaly")
I realize this is a mess, though it works. I think with just some
help on
how better to work this example I'll probably get over the learning
hump
and actually figure out how to use these data manipulation functions
more
cleanly.
Any advice or assistance would be appreciated.
Thanks,
Nate
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Alameda, CA, USA
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.