I’ve been struggling with seemingly ‘corrupt’ data.frames for a few days, and 
believe I’ve narrowed the problem down to some odd behaviour from read.table

I receive a tab delimited file from an external provider where strings are 
encoded as =“content”. Not sure why, perhaps as most users open it in Excel. 
My specific issue is that trailing spaces in any of the strings are causing 
strange results from read.table

# No trailing spaces
read.table(text="ID\tValue\n=\"Total\"\t1000\n=\"CJ01\"\t550\n=\"CF02\"\t450",header=FALSE,sep='\t’)
      V1    V2
1     ID Value
2 =Total  1000
3  =CJ01   550
4  =CF02   450

# Now with trailing spaces in line 3
read.table(text="ID\tValue\n=\"Total\"\t1000\n=\"CJ01   
\"\t550\n=\"CF02\"\t450",header=FALSE,sep='\t')
        V1    V2
1    =CF02   450
2       ID Value
3   =Total  1000
4 =CJ01      550
5    =CF02   450

I solved my specific problem by setting quote=‘’, and extracting the string 
content after calling read.table. As my original code had header=TRUE, I was 
finding random rows were being used as column names! 

Flagging a potential issue with read.table, although I can easily accept I'm 
missing something obvious here. 

Best,
 Michael

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)  / x86_64-pc-linux-gnu (64-bit)
Running under: macOS High Sierra 10.13.2 /  Ubuntu 16.04.3 LTS







        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to