This problem nearly always boils down to using meta knowledge about the file. Having informal TZ info in the file is very helpful, but PST is not necessarily a uniquely-defined time zone specification so you have to draw on information outside of the file to know that these codes correspond to -0800 etc. (e.g. CST could be China Standard Time or US Central Standard Time.) Thus, it is tough to make this into a broadly-useful function.
You can also construct the timezone column from knowledge about the location of interest and the monotonicity of the time data. https://jdnewmil.github.io/eci298sp2016/QuickHowtos1.html#handling-time-data ... but the answer to "easy" seems firmly in the eyes of the beholder. On September 5, 2021 10:18:48 AM PDT, Bill Dunlap <williamwdun...@gmail.com> wrote: >What is the best way to read (from a text file) timestamps from the fall >time change, where there are two 1:15am's? E.g., here is an extract from a >US Geological Survey web site giving data on the river through our county >on 2020-11-01, when we changed from PDT to PST, >https://nwis.waterdata.usgs.gov/wa/nwis/uv/?cb_00010=on&cb_00060=on&cb_00065=on&format=rdb&site_no=12200500&period=&begin_date=2020-11-01&end_date=2020-11-05 >. > >The timestamps include the date and time as well as PDT or PST. > >river <- >c("datetime,tz,discharge,height,temp", > "2020-11-01 00:00,PDT,20500,16.44,9.3", > "2020-11-01 00:15,PDT,20500,16.44,9.3", > "2020-11-01 00:30,PDT,20500,16.43,9.3", > "2020-11-01 00:45,PDT,20400,16.40,9.3", > "2020-11-01 01:00,PDT,20400,16.40,9.3", > "2020-11-01 01:00,PST,20200,16.34,9.2", > "2020-11-01 01:15,PDT,20400,16.39,9.3", > "2020-11-01 01:15,PST,20200,16.34,9.2", > "2020-11-01 01:30,PDT,20300,16.37,9.2", > "2020-11-01 01:30,PST,20100,16.31,9.2", > "2020-11-01 01:45,PDT,20300,16.35,9.2", > "2020-11-01 01:45,PST,20100,16.29,9.2", > "2020-11-01 02:00,PST,20100,16.29,9.2", > "2020-11-01 02:15,PST,20000,16.27,9.1", > "2020-11-01 02:30,PST,20000,16.26,9.1" > ) >d <- read.table(text=river, sep=",",header=TRUE) > >The entries are obviously not in time order. > >Is there a simple way to read the timedate and tz columns together? One >way is to use d$tz to construct an offset that can be read with >strptime's "%z". > >> d$POSIXct <- >as.POSIXct(paste(d$datetime,ifelse(d$tz=="PDT","-0700","-0800")), >format="%Y-%m-%d %H:%M %z") >> d > datetime tz discharge height temp POSIXct >1 2020-11-01 00:00 PDT 20500 16.44 9.3 2020-11-01 00:00:00 >2 2020-11-01 00:15 PDT 20500 16.44 9.3 2020-11-01 00:15:00 >3 2020-11-01 00:30 PDT 20500 16.43 9.3 2020-11-01 00:30:00 >4 2020-11-01 00:45 PDT 20400 16.40 9.3 2020-11-01 00:45:00 >5 2020-11-01 01:00 PDT 20400 16.40 9.3 2020-11-01 01:00:00 >6 2020-11-01 01:00 PST 20200 16.34 9.2 2020-11-01 01:00:00 >7 2020-11-01 01:15 PDT 20400 16.39 9.3 2020-11-01 01:15:00 >8 2020-11-01 01:15 PST 20200 16.34 9.2 2020-11-01 01:15:00 >9 2020-11-01 01:30 PDT 20300 16.37 9.2 2020-11-01 01:30:00 >10 2020-11-01 01:30 PST 20100 16.31 9.2 2020-11-01 01:30:00 >11 2020-11-01 01:45 PDT 20300 16.35 9.2 2020-11-01 01:45:00 >12 2020-11-01 01:45 PST 20100 16.29 9.2 2020-11-01 01:45:00 >13 2020-11-01 02:00 PST 20100 16.29 9.2 2020-11-01 02:00:00 >14 2020-11-01 02:15 PST 20000 16.27 9.1 2020-11-01 02:15:00 >15 2020-11-01 02:30 PST 20000 16.26 9.1 2020-11-01 02:30:00 >> with(d[order(d$POSIXct),], plot(temp)) # monotonic temperature > >-Bill > > >On Thu, Sep 2, 2021 at 12:41 PM Jeff Newmiller <jdnew...@dcn.davis.ca.us> >wrote: > >> Regardless of whether you use the lower-level split function, or the >> higher-level aggregate function, or the tidyverse group_by function, the >> key is learning how to create the column that is the same for all records >> corresponding to the time interval of interest. >> >> If you convert the sampdate to POSIXct, the tz IS important, because most >> of us use local timezones that respect daylight savings time, and a naive >> conversion of standard time will run into trouble if R is assuming daylight >> savings time applies. The lubridate package gets around this by always >> assuming UTC and giving you a function to "fix" the timezone after the >> conversion. I prefer to always be specific about timezones, at least by >> using so something like >> >> Sys.setenv( TZ = "Etc/GMT+8" ) >> >> which does not respect daylight savings. >> >> Regarding using character data for identifying the month, in order to have >> clean plots of the data I prefer to use the trunc function but it returns a >> POSIXlt so I convert it to POSIXct: >> >> discharge$sampmonthbegin <- as.POSIXct( trunc( discharge$sampdate, >> units = "months" ) ) >> >> Then any of various ways can be used to aggregate the records by that >> column. >> >> On September 2, 2021 12:10:15 PM PDT, Andrew Simmons <akwsi...@gmail.com> >> wrote: >> >You could use 'split' to create a list of data frames, and then apply a >> >function to each to get the means and sds. >> > >> > >> >cols <- "cfs" # add more as necessary >> >S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m")) >> >means <- do.call("rbind", lapply(S, colMeans, na.rm = TRUE)) >> >sds <- do.call("rbind", lapply(S, function(xx) sapply(xx, sd, na.rm = >> >TRUE))) >> > >> >On Thu, Sep 2, 2021 at 3:01 PM Rich Shepard <rshep...@appl-ecosys.com> >> >wrote: >> > >> >> On Thu, 2 Sep 2021, Rich Shepard wrote: >> >> >> >> > If I correctly understand the output of as.POSIXlt each date and time >> >> > element is separate, so input such as 2016-03-03 12:00 would now be >> 2016 >> >> 03 >> >> > 03 12 00 (I've not read how the elements are separated). (The TZ is >> not >> >> > important because all data are either PST or PDT.) >> >> >> >> Using this script: >> >> discharge <- read.csv('../data/water/discharge.dat', header = TRUE, sep >> = >> >> ',', stringsAsFactors = FALSE) >> >> discharge$sampdate <- as.POSIXlt(discharge$sampdate, tz = "", >> >> format = '%Y-%m-%d %H:%M', >> >> optional = 'logical') >> >> discharge$cfs <- as.numeric(discharge$cfs, length = 6) >> >> >> >> I get this result: >> >> > head(discharge) >> >> sampdate cfs >> >> 1 2016-03-03 12:00:00 149000 >> >> 2 2016-03-03 12:10:00 150000 >> >> 3 2016-03-03 12:20:00 151000 >> >> 4 2016-03-03 12:30:00 156000 >> >> 5 2016-03-03 12:40:00 154000 >> >> 6 2016-03-03 12:50:00 150000 >> >> >> >> I'm completely open to suggestions on using this output to calculate >> >> monthly >> >> means and sds. >> >> >> >> If dplyr:summarize() will do so please show me how to modify this >> command: >> >> disc_monthly <- ( discharge >> >> %>% group_by(sampdate) >> >> %>% summarize(exp_value = mean(cfs, na.rm = TRUE)) >> >> because it produces daily means, not monthly means. >> >> >> >> TIA, >> >> >> >> Rich >> >> >> >> ______________________________________________ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > >> > [[alternative HTML version deleted]] >> > >> >______________________________________________ >> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >https://stat.ethz.ch/mailman/listinfo/r-help >> >PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> >and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Sent from my phone. Please excuse my brevity. >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> -- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.