On Mon, 1 Nov 2021, Bill Dunlap wrote:
Use the col_type argument to specify your column types. [Why would you expect '2009' to be read as a string instead of a number?]. It looks like an initial zero causes an otherwise numeric looking entry to be considered a string (handy for zip codes in the northeastern US).
help(read_csv) says the column type guessing is "not robust" and its algorithm doesn't seem to be documented in the help file:
Bill, That makes sense. I read that in the book and forgot about it. I'll specify the col_type for each column in the read_csv() function. Specifying column names got me much closer:
cor_disc <- read_csv("../data/cor-disc.csv", col_names = TRUE, col_types = c("c","c","c","c","c","c","i"))
cor_disc
# A tibble: 415,263 × 8 site_nbr year mon day hr min tz disc <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> 1 14171600 2009 10 23 00 00 PDT 8750 2 14171600 2009 10 23 00 15 PDT 8750 3 14171600 2009 10 23 00 30 PDT 8750 4 14171600 2009 10 23 00 45 PDT 8750 5 14171600 2009 10 23 01 00 PDT 8750 6 14171600 2009 10 23 01 15 PDT 8750 7 14171600 2009 10 23 01 30 PDT 8750 8 14171600 2009 10 23 01 45 PDT 8730 9 14171600 2009 10 23 02 00 PDT 8730 10 14171600 2009 10 23 02 15 PDT 8730 # … with 415,253 more rows The col_types for year was specified as "c", for disc as "i" but both are input as doubles. That's a non-issue for disc (discharge in fps), but year is a character as are months, days, etc. Have I still missed something in specifying column types? Regards, Rich ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.