See below.
On Wed, 13 Aug 2014, Neotropical bat risk assessments wrote:
Hi all,
Thanks go out to those who provided helpful suggestions last year with a
similar issue.
I am working with a new data set and trying what I assumed was a simple
aggregation in reshape2 but is not working. I have a large number of similar
data sets to run so getting the code correct is important.
I have tried this code line in bold (both plyr and reshape2 are loaded):
ChenaPond <- read.table("C:/Bat papers in prep/Chile/Data &
analyses/ChenaPond.txt",header=T,sep="\t",quote="")
I find it most efficient to use the "stringsAsFactors=FALSE" option and
only convert to factor those columns that I know I want to be factors.
In particular, dates and times can be challenging to read in directly as
date/times... I find it most clear to read them in as strings and
convert them using specific conversion statements.
dat1<-ChenaPond
*res2<-ddply(dat1,.(Location,Species),summarize, Time=sum(Time))*
*Error in Summary.factor(c(3L, 4L, 5L, 15L, 39L, 45L, 18L, 24L, 25L, 26L, :
sum not meaningful for factors*
Attached is the data. Not sure why it is all factors and when I tried
changing to double precision the times were corrupted. I recall that R does
not do well with time values. Do I need a line using chron as well
beforehand?
R is actually much more specific about time values than, say, Excel. This
may make it appear to be a hassle, but it is actually capable of
considerably more than Excel in regards to dates and times with a minimum
of additional work. The hardest part is understanding how our calendar
and timezones actually work.
chron is certainly an option, but I typically use POSIXt so that is what I
am more familiar with. You can read [1] and decide what you would prefer.
Word to the wise: you will probably get into trouble if you convert POSIXt
types to numeric... chron may be more forgiving.
I even tried for several hours looking at the ReshapeGUI package to see what
I may have been doing incorrectly to no avail.
I am completely baffled why you chose to focus on the reshaping method
rather than following the lead of your error above which pointed to
factors as the problem.
1. What I need to do to analyze all the data in another program is to
reformat it so that I have a Species by Time matrix summarized in 5
minute time blocks. The result needs to be Species as rows, and
time intervals are arranged chronologically in columns.
Below is one way to proceed. I suggest you step through it one piece at a
time interspersed with appropriate use of the str() function to clarify
what the data looks like at each step. You probably ought to read
?DateTimeClasses and follow links from there as well.
The follwing statement is the output of the "dput" function, which is
recommended in [2].
dta <- structure(list(
Species = c("Myochi", "Lascin", "Lascin", "Lascin",
"Tadbra", "Lasvar", "Lasvar", "Lasvar", "Lascin", "Tadbra", "Lascin",
"Lascin", "Lasvar", "Lasvar", "Lasvar", "Lasvar", "Lasvar", "Myochi",
"Myochi", "Lascin", "Lasvar", "Lasvar", "Lasvar", "Myochi", "Lasvar",
"Lascin", "Lascin", "Lascin", "Myochi", "Myochi", "Myochi", "Myochi",
"Lascin", "Lasvar", "Lasvar", "Myochi", "Lasvar", "Lasvar", "Lascin",
"Lasvar", "Lasvar", "Lascin", "Lascin", "Tadbra", "Lascin", "Lascin",
"Lascin", "Lascin", "Lascin", "Lasvar", "Lasvar", "Lascin", "Lasvar",
"Tadbra", "Myochi", "Myochi", "Lasvar", "Myochi", "Myochi", "Myochi",
"Lasvar", "Lasvar", "Tadbra", "Lasvar", "Lasvar", "Lasvar", "Tadbra"
), Location = c("Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond"),
Date = c("5/26/09", "5/26/09", "5/26/09", "5/26/09", "5/26/09",
"5/26/09", "5/26/09", "5/26/09", "5/26/09", "5/26/09", "5/26/09",
"5/26/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/16/09"),
Time = c("18:38", "18:51", "19:38", "19:39",
"19:47", "20:12", "20:16", "20:56", "21:19", "21:20", "22:47",
"22:56", "20:51", "20:55", "20:56", "20:57", "20:59", "21:00",
"21:26", "21:29", "21:33", "21:34", "21:35", "21:55", "21:56",
"21:59", "22:00", "22:01", "22:03", "22:08", "22:08", "22:09",
"22:17", "22:23", "22:24", "22:26", "22:30", "22:31", "22:42",
"22:42", "22:44", "22:46", "22:49", "22:49", "22:50", "22:51",
"22:53", "22:54", "22:57", "23:01", "23:06", "23:08", "23:09",
"23:14", "23:30", "23:31", "23:33", "23:35", "23:35", "23:38",
"23:39", "23:44", "23:45", "23:47", "23:52", "23:59", "0:00"
)),
.Names = c("Species", "Location", "Date", "Time")
, class = "data.frame", row.names = c(NA, -67L))
library(lubridate)
library(reshape2)
# set time zone to something that doesn't use daylight savings
# this may not be how your data are actually recorded... look
# up ?timezones ... the short answer is you may need to look
# at the names of some files on your system or in your R install
# directory to find out what labels correspond to your data's timezone.
Sys.setenv( TZ="Etc/GMT+5" )
dta$Dtm <- mdy_hm( paste( dta$Date, dta$Time ) )
floor5 <- function( dtm ) {
# break up the POSIXct (number of seconds since 1/1/1970 GMT)
dtmlt <- as.POSIXlt( dtm )
# floor the minutes and seconds to the next lower 5 minutes
dtmlt$sec <- 0
dtmlt$min <- 5 * ( dtmlt$min %/% 5 )
as.POSIXct( dtmlt )
}
dta$Dtm5 <- floor5( dta$Dtm )
# can be done with table
#table( dta$Dtm5, dta$Species )
# I prefer data frames, so reshape2 helps out
dtat <- dcast( dta, Dtm5~Species, fun.aggregate = length, value.var="Dtm5" )
2. Then I need the matrix converted such that each unique Species will
have proportional abundances of time (0 to 100) so totals for each
species should be the same (or 100%).
I don't like to do all the work for other people. Is dividing some vectors
by their sums something you need help with?
What do folks suggest?
Plyr, Reshape2 or try tables?
Any of these... depending on your preference. You just need to get a grip
on how R handles time. [1]
Thanks,
Bruce
--
Bruce W. Miller, PhD.
Neotropical bat risk assessments
If we lose the bats, we may lose much of the tropical vegetation and the
lungs of the planet
Using acoustic sampling to map species distributions for >15 years.
Providing Interactive identification keys to the vocal signatures of New
World Bats
For various project details see:
https://sites.google.com/site/batsoundservices/
[1] http://www.r-project.org/doc/Rnews/Rnews_2004-1.pdf starting on page 29
[2]
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<[email protected]> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.