I’ve been struggling with seemingly ‘corrupt’ data.frames for a few days, and
believe I’ve narrowed the problem down to some odd behaviour from read.table
I receive a tab delimited file from an external provider where strings are
encoded as =“content”. Not sure why, perhaps as most users open it in Excel.
My specific issue is that trailing spaces in any of the strings are causing
strange results from read.table
# No trailing spaces
read.table(text="ID\tValue\n=\"Total\"\t1000\n=\"CJ01\"\t550\n=\"CF02\"\t450",header=FALSE,sep='\t’)
V1 V2
1 ID Value
2 =Total 1000
3 =CJ01 550
4 =CF02 450
# Now with trailing spaces in line 3
read.table(text="ID\tValue\n=\"Total\"\t1000\n=\"CJ01
\"\t550\n=\"CF02\"\t450",header=FALSE,sep='\t')
V1 V2
1 =CF02 450
2 ID Value
3 =Total 1000
4 =CJ01 550
5 =CF02 450
I solved my specific problem by setting quote=‘’, and extracting the string
content after calling read.table. As my original code had header=TRUE, I was
finding random rows were being used as column names!
Flagging a potential issue with read.table, although I can easily accept I'm
missing something obvious here.
Best,
Michael
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit) / x86_64-pc-linux-gnu (64-bit)
Running under: macOS High Sierra 10.13.2 / Ubuntu 16.04.3 LTS
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.