On Mon, 1 Nov 2021, Bill Dunlap wrote:

Use the col_type argument to specify your column types. [Why would you
expect '2009' to be read as a string instead of a number?]. It looks like
an initial zero causes an otherwise numeric looking entry to be considered
a string (handy for zip codes in the northeastern US).

help(read_csv) says the column type guessing is "not robust" and its
algorithm doesn't seem to be documented in the help file:

Bill,

That makes sense. I read that in the book and forgot about it. I'll specify
the col_type for each column in the read_csv() function.

Specifying column names got me much closer:
cor_disc <- read_csv("../data/cor-disc.csv", col_names = TRUE, col_types = 
c("c","c","c","c","c","c","i"))

cor_disc
# A tibble: 415,263 × 8
   site_nbr  year mon   day   hr    min   tz     disc
   <chr>    <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
 1 14171600  2009 10    23    00    00    PDT    8750
 2 14171600  2009 10    23    00    15    PDT    8750
 3 14171600  2009 10    23    00    30    PDT    8750
 4 14171600  2009 10    23    00    45    PDT    8750
 5 14171600  2009 10    23    01    00    PDT    8750
 6 14171600  2009 10    23    01    15    PDT    8750
 7 14171600  2009 10    23    01    30    PDT    8750
 8 14171600  2009 10    23    01    45    PDT    8730
 9 14171600  2009 10    23    02    00    PDT    8730
10 14171600  2009 10    23    02    15    PDT    8730
# … with 415,253 more rows

The col_types for year was specified as "c", for disc as "i" but both are
input as doubles. That's a non-issue for disc (discharge in fps), but year
is a character as are months, days, etc.

Have I still missed something in specifying column types?

Regards,

Rich

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to