I’ve been struggling with seemingly ‘corrupt’ data.frames for a few days, and believe I’ve narrowed the problem down to some odd behaviour from read.table
I receive a tab delimited file from an external provider where strings are encoded as =“content”. Not sure why, perhaps as most users open it in Excel. My specific issue is that trailing spaces in any of the strings are causing strange results from read.table # No trailing spaces read.table(text="ID\tValue\n=\"Total\"\t1000\n=\"CJ01\"\t550\n=\"CF02\"\t450",header=FALSE,sep='\t’) V1 V2 1 ID Value 2 =Total 1000 3 =CJ01 550 4 =CF02 450 # Now with trailing spaces in line 3 read.table(text="ID\tValue\n=\"Total\"\t1000\n=\"CJ01 \"\t550\n=\"CF02\"\t450",header=FALSE,sep='\t') V1 V2 1 =CF02 450 2 ID Value 3 =Total 1000 4 =CJ01 550 5 =CF02 450 I solved my specific problem by setting quote=‘’, and extracting the string content after calling read.table. As my original code had header=TRUE, I was finding random rows were being used as column names! Flagging a potential issue with read.table, although I can easily accept I'm missing something obvious here. Best, Michael R version 3.4.3 (2017-11-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) / x86_64-pc-linux-gnu (64-bit) Running under: macOS High Sierra 10.13.2 / Ubuntu 16.04.3 LTS [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.