This problem nearly always boils down to using meta knowledge about the file.
Having informal TZ info in the file is very helpful, but PST is not necessarily
a uniquely-defined time zone specification so you have to draw on information
outside of the file to know that these codes correspond to -
What is the best way to read (from a text file) timestamps from the fall
time change, where there are two 1:15am's? E.g., here is an extract from a
US Geological Survey web site giving data on the river through our county
on 2020-11-01, when we changed from PDT to PST,
https://nwis.waterdata.usgs.
On Fri, 3 Sep 2021, Jeff Newmiller wrote:
The fact that your projects are in a single time zone is irrelevant. I am
not sure how you can be so confident in saying it does not matter whether
the data were recorded in PDT or PST, since if it were recorded in PDT
then there would be a day in March
On Fri, 3 Sep 2021, Rich Shepard wrote:
On Thu, 2 Sep 2021, Jeff Newmiller wrote:
Regardless of whether you use the lower-level split function, or the
higher-level aggregate function, or the tidyverse group_by function, the
key is learning how to create the column that is the same for all reco
On Thu, 2 Sep 2021, Jeff Newmiller wrote:
Regardless of whether you use the lower-level split function, or the
higher-level aggregate function, or the tidyverse group_by function, the
key is learning how to create the column that is the same for all records
corresponding to the time interval of
On Thu, 2 Sep 2021, Jeff Newmiller wrote:
Regardless of whether you use the lower-level split function, or the
higher-level aggregate function, or the tidyverse group_by function, the
key is learning how to create the column that is the same for all records
corresponding to the time interval of
Regardless of whether you use the lower-level split function, or the
higher-level aggregate function, or the tidyverse group_by function, the key is
learning how to create the column that is the same for all records
corresponding to the time interval of interest.
If you convert the sampdate to
On Thu, 2 Sep 2021, Andrew Simmons wrote:
You could use 'split' to create a list of data frames, and then apply a
function to each to get the means and sds.
cols <- "cfs" # add more as necessary
S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m"))
means <- do.call("rbind",
You could use 'split' to create a list of data frames, and then apply a
function to each to get the means and sds.
cols <- "cfs" # add more as necessary
S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m"))
means <- do.call("rbind", lapply(S, colMeans, na.rm = TRUE))
sds <-
On Thu, 2 Sep 2021, Rich Shepard wrote:
If I correctly understand the output of as.POSIXlt each date and time
element is separate, so input such as 2016-03-03 12:00 would now be 2016 03
03 12 00 (I've not read how the elements are separated). (The TZ is not
important because all data are either
On Mon, 30 Aug 2021, Richard O'Keefe wrote:
x <- rnorm(samples.per.day * 365)
length(x)
[1] 105120
Reshape the fake data into a matrix where each row represents one
24-hour period.
m <- matrix(x, ncol=samples.per.day, byrow=TRUE)
Richard,
Now I understand the need to keep the date and tim
On Tue, 31 Aug 2021, Jeff Newmiller wrote:
Never use stringsAsFactors on uncleaned data. For one thing you give a
factor to as.Date and it tries to make sense of the integer
representation, not the character representation.
Jeff,
Oops! I had changed it in a previous version of the script and
On Wed, 1 Sep 2021, Richard O'Keefe wrote:
You have missed the point. The issue is not the temporal distance, but the
fact that the data you have are NOT the raw instrumental data and are NOT
subject to the limitations of the recording instruments. The data you get
from the USGS is not the raw i
I wrote:
> > By the time you get the data from the USGS, you are already far past the
> > point
> > where what the instruments can write is important.
Rich Shepard replied:
> The data are important because they show what's happened in that period of
> record. Don't physicians take a medical histor
Never use stringsAsFactors on uncleaned data. For one thing you give a factor
to as.Date and it tries to make sense of the integer representation, not the
character representation.
library(dplyr)
dta <- read.csv( text =
"sampdate,samptime,cfs
2020-08-26,09:30,136000
2020-08-26,09:35,126000
2020-
On Sun, 29 Aug 2021, Jeff Newmiller wrote:
The general idea is to create a "grouping" column with repeated values for
each day, and then to use aggregate to compute your combined results. The
dplyr package's group_by/summarise functions can also do this, and there
are also proponents of the data
On Tue, 31 Aug 2021, Richard O'Keefe wrote:
By the time you get the data from the USGS, you are already far past the point
where what the instruments can write is important.
Richard,
The data are important because they show what's happened in that period of
record. Don't physicians take a med
By the time you get the data from the USGS, you are already far past the point
where what the instruments can write is important.
(Obviously an instrument can be sufficiently broken that it cannot
write anything.)
The data for Rogue River that I just downloaded include this comment:
# Data for the
I do not wish to express any opinion on what should be done or how. But...
1. I assume that when data are missing, they are missing -- i.e.
simply not present in the data. So there will be possibly several/many
in succession missing rows of data corresponding to those times,
right? (Apologies for
means from 5-minute interval data
On Tue, 31 Aug 2021, Richard O'Keefe wrote:
> I made up fake data in order to avoid showing untested code. It's not
> part of the process I was recommending. I expect data recorded every N
> minutes to use NA when something is missing, n
On Tue, 31 Aug 2021, Richard O'Keefe wrote:
I made up fake data in order to avoid showing untested code. It's not part
of the process I was recommending. I expect data recorded every N minutes
to use NA when something is missing, not to simply not be recorded. Well
and good, all that means is th
I made up fake data in order to avoid showing untested code.
It's not part of the process I was recommending.
I expect data recorded every N minutes to use NA when something
is missing, not to simply not be recorded. Well and good, all that
means is that reshaping the data is not a trivial call to
On Mon, 30 Aug 2021, Richard O'Keefe wrote:
Why would you need a package for this?
samples.per.day <- 12*24
That's 12 5-minute intervals per hour and 24 hours per day.
Generate some fake data.
Richard,
The problem is that there are days with fewer than 12 recorded values for
various reason
It is not clear to me who Jeff Newmiller's comment about periodicity
is addressed to.
The original poster, for asking for daily summaries?
A summary of what I wrote:
- daily means and standard deviations are a very poor choice for river flow data
- if you insist on doing that anyway, no fancy packa
IMO assuming periodicity is a bad practice for this. Missing timestamps happen
too, and there is no reason to build a broken analysis process.
On August 29, 2021 7:09:01 PM PDT, Richard O'Keefe wrote:
>Why would you need a package for this?
>> samples.per.day <- 12*24
>
>That's 12 5-minute inter
Why would you need a package for this?
> samples.per.day <- 12*24
That's 12 5-minute intervals per hour and 24 hours per day.
Generate some fake data.
> x <- rnorm(samples.per.day * 365)
> length(x)
[1] 105120
Reshape the fake data into a matrix where each row represents one
24-hour period.
> m
On Sun, 29 Aug 2021, Andrew Simmons wrote:
I would suggest something like:
Thanks, Andrew.
Stay well,
Rich
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting
On Sun, 29 Aug 2021, Rui Barradas wrote:
Hope this helps,
Rui,
Greatly! I'll study it carefully so I fully understand the process.
Many thanks.
Stay well,
Rich
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.eth
Hello,
I forgot in my previous answer, sorry for the duplicated mails.
The function in my previous mail has a na.rm argument, defaulting to
FALSE, pass na.rm = TRUE to remove the NA's.
agg <- aggregate(cfs ~ date, df1, fun, na.rm = TRUE)
Or simply change the default. I prefer to set na.rm
On Sun, 29 Aug 2021, Jeff Newmiller wrote:
You may find something useful on handling timestamp data here:
https://jdnewmil.github.io/
Jeff,
I'll certainly read those articles.
Many thanks,
Rich
__
R-help@r-project.org mailing list -- To UNSUBSCRI
Hello,
You have date and hour in two separate columns, so to compute daily
stats part of the work is already done. (Were they in the same column
you would have to extract the date only.)
# convert to class "Date"
df1$date <- as.Date(df1$date)
# function to compute the stats required
# it's
Hello,
I would suggest something like:
date <- seq(as.Date("2020-01-01"), as.Date("2020-12-31"), 1)
time <- sprintf("%02d:%02d", rep(0:23, each = 12), seq.int(0, 55, 5))
x <- data.frame(
date = rep(date, each = length(time)),
time = time
)
x$cfs <- stats::rnorm(nrow(x))
cols2aggregate
On Sun, 29 Aug 2021, Rui Barradas wrote:
I forgot in my previous answer, sorry for the duplicated mails.
The function in my previous mail has a na.rm argument, defaulting to FALSE,
pass na.rm = TRUE to remove the NA's.
agg <- aggregate(cfs ~ date, df1, fun, na.rm = TRUE)
Or simply change the
You may find something useful on handling timestamp data here:
https://jdnewmil.github.io/
On August 29, 2021 9:23:31 AM PDT, Jeff Newmiller
wrote:
>The general idea is to create a "grouping" column with repeated values for
>each day, and then to use aggregate to compute your combined results.
On Sun, 29 Aug 2021, Jeff Newmiller wrote:
The general idea is to create a "grouping" column with repeated values for
each day, and then to use aggregate to compute your combined results. The
dplyr package's group_by/summarise functions can also do this, and there
are also proponents of the data
On Sun, 29 Aug 2021, Eric Berger wrote:
Provide dummy data (e.g. 5-10 lines), say like the contents of a csv file,
and calculate by hand what you'd like to see in the plot. (And describe
what the plot would look like.)
Eric,
Mea culpa! I extracted a set of sample data and forgot to include it
The general idea is to create a "grouping" column with repeated values for each
day, and then to use aggregate to compute your combined results. The dplyr
package's group_by/summarise functions can also do this, and there are also
proponents of the data.table package which is high performance bu
Hi Rich,
Your request is a bit open-ended but here's a suggestion that might help
get you an answer.
Provide dummy data (e.g. 5-10 lines), say like the contents of a csv file,
and calculate by hand what you'd like to see in the plot. (And describe
what the plot would look like.)
It sounds like what
I have a year's hydraulic data (discharge, stage height, velocity, etc.)
from a USGS monitoring gauge recording values every 5 minutes. The data
files contain 90K-93K lines and plotting all these data would produce a
solid block of color.
What I want are the daily means and standard deviation fro
39 matches
Mail list logo