Dear friends, I stumbled into beheaviour of read.delim which I would consider a bug or at least an inconsistency that should be improved upon.
Recently we had to work with data that used "", two double quotes, as symbol to start and end character input. Essentially the data looked like this data.csv ======== V1, V2, V3 ""data"", 3, """" The last sequence of """" indicating a missing. One obvious solution to read in this data is using some gsub(), but that's not the point I want to make. Consider this case we found during tests: test.csv ======== V1, V2, V3, V4 """", """", 3, "" and read it with > read.delim("test.csv", sep=",", header=TRUE, na.strings="\"") you get the following V1 V2 V3 V4 1 NA " 3 NA (and a warning) I would have assumed to get some error message or at least the same result for both appearances of """" in the input file. (the setting na.strings="\"" turned out to be working for a colleague and his specific data, while I think it shouldn't) My main concern is the different interpretation for the two """" sequences. Real bug? Minor inconsistency? I don't know. All the best Detlef -- 'People who say "I have nothing to hide" misunderstand the purpose of surveillance. It was never about privacy. It's about power.' E. Snowden ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel