On Wed, Sep 5, 2012 at 5:31 PM, David Reiner <david.rei...@xrtrading.com> wrote: > I'm trying to use sqdf's function read.csv.sql to read CSV files in which the > missing values are represented by NA's. > Plain old read.csv works fine on these files, but they are rather large and > I'd like to filter using sql-like statements. > However, even if I specify field.types correctly and nrows=-1, it still turns > the columns with NA's into chars or 0. > I'm trying to make this OS independent, so I don't think I can use a filter > to convert the NA's to NULL's or whatever SQLite would understand. > I can accept it everything has to be read in as char and then convert to > doubles with as.numeric, but I'm looking for speed. > > Here is code I thought would read the file (I've attached a small sample.) > It almost works if there are no NA's in the initial rows, but it still turns > NA's into 0's instead of NA or something I can change into NA; > and it returns characters if there are NA's in the initial rows. > (0 is a possible value so I can't filter out the 0's.) > > field.types <- list(V1='char', V2='char', V3='real', V4='int', V5='real', > V6='int', V7='real') > dtst <- read.csv.sql("./tmp.csv", header=FALSE, field.types=field.types, > nrows=-1) > str(dtst) > > 'data.frame': 32 obs. of 7 variables: > $ V1: chr "2012-07-01" "2012-07-01" "2012-07-01" "2012-07-01" ... > $ V2: chr "15:50:00" "15:51:00" "15:52:00" "15:53:00" ... > $ V3: chr "NA" "NA" "NA" "NA" ... > $ V4: int 0 0 0 0 0 0 0 0 0 0 ... > $ V5: chr "NA" "NA" "NA" "NA" ... > $ V6: int 0 0 0 0 0 0 0 0 0 0 ... > $ V7: chr "NA" "NA" "NA" "NA" ... >
See FAQ#14 on the sqldf home page noting the part at the end of the answer about csvfix. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.