Use the col_type argument to specify your column types. [Why would you expect '2009' to be read as a string instead of a number?]. It looks like an initial zero causes an otherwise numeric looking entry to be considered a string (handy for zip codes in the northeastern US).
help(read_csv) says the column type guessing is "not robust" and its algorithm doesn't seem to be documented in the help file: col_types One of NULL, a cols() specification, or a string. See vignette("readr") for more details. If NULL, all column types will be imputed from guess_max rows on the input interspersed throughout the file. This is convenient (and fast), but not robust. If the imputation fails, you'll need to increase the guess_max or supply the correct types yourself. ... -Bill On Mon, Nov 1, 2021 at 10:16 AM Rich Shepard <rshep...@appl-ecosys.com> wrote: > > On Mon, 1 Nov 2021, Kevin Thorpe wrote: > > > I do not have a specific answer to your particular problem. All I can say > > is when a CSV import doesn’t work, it can mean there is something in the > > CSV file that is unexpected. When read_csv() fails, I will try read.csv() > > to compare the results. > > Kevin, > > Interesting that there's no error: > cor_disc <- read.csv("../data/cor-disc.csv", header = TRUE) > ... > 12496 14171600 2010 3 15 16 45 PDT 1060 > 12497 14171600 2010 3 15 17 0 PDT 1060 > 12498 14171600 2010 3 15 17 15 PDT 1050 > 12499 14171600 2010 3 15 17 45 PDT 1050 > [ reached 'max' / getOption("max.print") -- omitted 402856 rows ] > > head(cor_disc) > site_nbr year mon day hr min tz disc > 1 14171600 2009 10 23 0 0 PDT 8750 > 2 14171600 2009 10 23 0 15 PDT 8750 > 3 14171600 2009 10 23 0 30 PDT 8750 > 4 14171600 2009 10 23 0 45 PDT 8750 > 5 14171600 2009 10 23 1 0 PDT 8750 > 6 14171600 2009 10 23 1 15 PDT 8750 > > str(cor_disc) > 'data.frame': 415355 obs. of 8 variables: > $ site_nbr: chr "14171600" "14171600" "14171600" "14171600" ... > $ year : int 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 ... > $ mon : int 10 10 10 10 10 10 10 10 10 10 ... > $ day : int 23 23 23 23 23 23 23 23 23 23 ... > $ hr : int 0 0 0 0 1 1 1 1 2 2 ... > $ min : int 0 15 30 45 0 15 30 45 0 15 ... > $ tz : chr "PDT" "PDT" "PDT" "PDT" ... > $ disc : int 8750 8750 8750 8750 8750 8750 8750 8730 8730 8730 ... > > So, where might I look to see why tidyverse's read_csv() doesn't produce the > same results? > > Regards, > > Rich > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.