Dear list, I have to read some clinical data a file coming from Filemaker on Macintosh (Ugh ! But it could be worse and come from Excel...). Exporting via Excel is out of question since the file has 467 columns and 121 lines (+ headers), which is out of reach of Excel. So I received an "mer"" files, which is what Filemaker exports as a text file.
It seems to be a semicolon delimited file, with all data item enclosed in double quotes, and header items not delimited. So, first attempt is : In bash : # Convert from apparent Latin-1 to UTF-8 [EMAIL PROTECTED]:/media/EMTEC/MARS-Point2007$ cat export.mer | iconv -f latin1 -t utf8 > export-utf8.mer # Check size (line count) [EMAIL PROTECTED]:/media/EMTEC/MARS-Point2007$ wc export-utf8.mer 122 8632 311252 export-utf8.mer In R : foo<-read.table("export-utf8.mer", header=TRUE, sep=";", dec=",", check.names=TRUE, strip.white=TRUE, encoding="UTF-8") /* Looong wait ! Almost 1 minute on a 3,2 GHz dual core... */ Warning message: In read.table("export-utf8.mer", header = TRUE, sep = ";", dec = ",", : readTableHeader a trouvé une ligne finale incomplète dans 'export-utf8.mer' > dim(foo) [1] 0 468 Ouch ! One column too much, and no lines... Trying to print foo seems to start R in an endless loop. According to emacs (ESS is my favorite interface to R), R tries to print more than 1 million lines (empty) before I give up and kill emacs. Second attempt : separate headers from data : In bash : # Same conversion, discarding the first line [EMAIL PROTECTED]:/media/EMTEC/MARS-Point2007$ cat export.mer | iconv -f latin1 -t utf8 | tail -n +2 > data.utf8 # check size [EMAIL PROTECTED]:/media/EMTEC/MARS-Point2007$ wc data.utf8 121 7581 301942 data.utf8 # Convert header line, and stack headers by replacing the separator (;) # by a newline [EMAIL PROTECTED]:/media/EMTEC/MARS-Point2007$ cat export.mer | iconv -f latin1 -t utf8 | head -n 1 | sed -e "s/;/\n/g" > headers.utf8 # Check size [EMAIL PROTECTED]:/media/EMTEC/MARS-Point2007$ wc headers.utf8 467 1517 9310 headers.utf8 In R : > bar<-read.table("data.utf8", header=FALSE, sep=";", dec=",", strip.white=TRUE, encoding="UTF-8") > dim(bar) [1] 121 467 /* Fine ... */ > hdrs<-read.table("headers.utf8", header=FALSE, sep=";", strip.white=TRUE, encoding="UTF-8") > dim(hdrs) [1] 467 1 /* Also fine. Let's use these names : */ > names(bar)<-make.names(hdrs[,1]) This is quite fast, and the resulting data frame *seems* to be fine and dandy. I have found a pragmatic solution, which (more or less) solves my problem. But I'm frustrated at understanding the root of the problem. Of course, I also tried to read the header line alone as a semicolon separated file of one line : it fails with a warning about an unterminated final line and returns a data frame with 467 columns and 0 line. And, by the way (let's try to be Ripley'd for something else than forgetting to read the FAQ) : > R.Version() $platform [1] "i486-pc-linux-gnu" $arch [1] "i486" $os [1] "linux-gnu" $system [1] "i486, linux-gnu" $status [1] "" $major [1] "2" $minor [1] "6.0" $year [1] "2007" $month [1] "10" $day [1] "03" $`svn rev` [1] "43063" (This is the version packaged for Ubuntu grabbed from the relevant R repository). The R FAQ was unhelpful in this special case. The import/export guide suggests to fiddle with delimiters and quoting, which I did, to no avail. Searching for read.table in the searchable archives of R help gives about 900 références for 2007. An honest effort to read the headers and seemingly relevant messages did not shed any light. My best clue so far is that the data items are enclosed in double quotes, while items of the header line are not (just separated by semicolons) ; I do not know how to coax read.table to read *that*. I would appreciate any hint as to the source of the problem and a possible solution (I try to find an as automated as possible way, because I think I will have to read successive versions of this data file many times...). Sincerely yours, Emmanuel Charpentier ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.