Like Bert, I can't see an easy approach for datasets that have character rather than numeric data. But here's a simple approach for distinguishing files that have possible character headers but numeric data.
readheader <- function(filename) { possibleheader <- read.table(filename, nrows=1, sep=",", header=FALSE) if(all(is.numeric(possibleheader[,1]))) { # no header infile <- read.table(filename, sep=",", header=FALSE) } else { # has header infile <- read.table(filename, sep=",", header=TRUE) } infile } #### file noheader.csv #### 1,1,1 2,2,2 3,3,3 #### file hasheader.csv #### a,b,c 1,1,1 2,2,2 3,3,3 ######################## > readheader("noheader.csv") V1 V2 V3 1 1 1 1 2 2 2 2 3 3 3 3 > readheader("hasheader.csv") a b c 1 1 1 1 2 2 2 2 3 3 3 3 Sarah On Tue, Aug 13, 2019 at 2:00 PM Christopher W Ryan <cr...@binghamton.edu> wrote: > > Alas, we spend so much time and energy on data wrangling . . . . > > I'm given a collection of csv files to work with---"found data". They arose > via saving Excel files to csv format. They all have the same column > structure, except that some were saved with column names and some were not. > > I have a code snippet that I've used before to traverse a directory and > read into R all the csv files of a certain filename pattern within it, and > combine them all into a single dataframe: > > library(dplyr) > ## specify the csv files that I will want to access > files.to.read <- list.files(path = "H:/EH", pattern = > "WICLeadLabOrdersDone.+", all.files = FALSE, full.names = TRUE, recursive = > FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE) > > ## function to read csv files back in > read.csv.files <- function(filename) { > bb <- read.csv(filename, colClasses = "character", header = TRUE) > bb > } > > ## now read the csv files, as all character > b <- lapply(files.to.read, read.csv.files) > > ddd <- bind_rows(b) > > But this assumes that all files have column names in their first row. In > this case, some don't. Any advice how to handle it so that those with > column names and those without are read in and combined properly? The only > thing I've come up with so far is: > > ## function to read csv files back in > ## Unfortunately, some of the csv files are saved with column headers, and > some are saved without them. > ## This presents a problem when defining the function to read them: header > = TRUE or header = FALSE? > ## The best solution I can think of as of 13 August 2019 is to use header = > FALSE and skip the > ## first row of every file. This will sacrifice one record from each csv of > about 80 files > read.csv.files <- function(filename) { > bb <- read.csv(filename, colClasses = "character", header = FALSE, skip > = 1) > bb > } > > This sacrifices about 80 out of about 1600 records. For my purposes in this > instance, this may be acceptable, but of course I'd rather not. > > Thanks. > > --Chris Ryan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Sarah Goslee (she/her) http://www.numberwright.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.