On Mar 4, 2011, at 9:50 AM, Asan Ramzan wrote:
Hello R-help
I am working with large data table that have the occasional label,
a particular time point in an experiment. E.g:
"Time (min)", "R1 R1", "R2 R1", "R3 R1", "R4 R1"
.909, 1.117, 1.225, 1.048, 1.258
3.942, 1.113, 1.230, 1.049, 1.262
3.976, 1.105, 1.226, 1.051, 1.259
4.009, 1.114, 1.231, 1.053, 1.259
4.042, 1.107, 1.230, 1.048, 1.262
4.076, 1.108, 1.226, 1.045, 1.257
4.109, 1.109, 1.227, 1.047, 1.259
4.142, 1.108, 1.225, 1.052, 1.260
4.176, 1.105, 1.222, 1.046, 1.260
4.209, 1.106, 1.226, 1.050, 1.258
4.242, 1.105, 1.224, 1.047, 1.258
4.276, 1.104, 1.223, 1.048, 1.259
4.309, 1.106, 1.228, 1.050, 1.260
4.342, 1.103, 1.219, 1.049, 1.260
4.376, 1.107, 1.225, 1.052, 1.259
4.409, 1.105, 1.222, 1.047, 1.258
4.442, 1.106, 1.227, 1.048, 1.262
4.476, 1.105, 1.222, 1.049, 1.261
4.509, 1.102, 1.222, 1.047, 1.259
4.555, "Gly sar"
4.555, 1.107, 1.224, 1.048, 1.261
4.576, 1.109, 1.228, 1.053, 1.259
4.609, 1.103, 1.218, 1.046, 1.258
4.642, 1.105, 1.223, 1.048, 1.256
4.676, 1.108, 1.217, 1.048, 1.260
4.709, 1.124, 1.222, 1.047, 1.258
When I try to read in the table, I get:
try<-read.table("200810_01.R",header=T,sep=",")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings, :
line 136 did not have 5 elements
Is there any way to tell R to ignore these labels or better
still interpret them as being label for particular time
points, so when it comes to draw a line graph it is annotated
with these labels.
Option 1:
Prepare your data properly with an editor:
Option 2:
You could read the file with readLines, identify the offending lines
with grep or grepl, then separate the offenders and non-offenders.
lines <- readLines(textConnection('"Time (min)", "R1 R1", "R2 R1", "R3
R1", "R4 R1"
.909, 1.117, 1.225, 1.048, 1.258
3.942, 1.113, 1.230, 1.049, 1.262
3.976, 1.105, 1.226, 1.051, 1.259
4.009, 1.114, 1.231, 1.053, 1.259
4.042, 1.107, 1.230, 1.048, 1.262
4.076, 1.108, 1.226, 1.045, 1.257
4.109, 1.109, 1.227, 1.047, 1.259
4.142, 1.108, 1.225, 1.052, 1.260
4.176, 1.105, 1.222, 1.046, 1.260
4.209, 1.106, 1.226, 1.050, 1.258
4.242, 1.105, 1.224, 1.047, 1.258
4.276, 1.104, 1.223, 1.048, 1.259
4.309, 1.106, 1.228, 1.050, 1.260
4.342, 1.103, 1.219, 1.049, 1.260
4.376, 1.107, 1.225, 1.052, 1.259
4.409, 1.105, 1.222, 1.047, 1.258
4.442, 1.106, 1.227, 1.048, 1.262
4.476, 1.105, 1.222, 1.049, 1.261
4.509, 1.102, 1.222, 1.047, 1.259
4.555, "Gly sar"
4.555, 1.107, 1.224, 1.048, 1.261
4.576, 1.109, 1.228, 1.053, 1.259
4.609, 1.103, 1.218, 1.046, 1.258
4.642, 1.105, 1.223, 1.048, 1.256
4.676, 1.108, 1.217, 1.048, 1.260
4.709, 1.124, 1.222, 1.047, 1.258'))
read.table(textConnection(
lines[ c(TRUE, !grepl("[[:alpha:]]", lines)[-1]) ]),
skip=1)
# the quotes and spaces don't work well with R column naming
conventions
V1 V2 V3 V4 V5
1 .909, 1.117, 1.225, 1.048, 1.258
2 3.942, 1.113, 1.230, 1.049, 1.262
3 3.976, 1.105, 1.226, 1.051, 1.259
snipped
23 4.642, 1.105, 1.223, 1.048, 1.256
24 4.676, 1.108, 1.217, 1.048, 1.260
25 4.709, 1.124, 1.222, 1.047, 1.258
So even more compact would be:
read.table(textConnection(
lines[ !grepl("[[:alpha:]]", lines) ] ) )
Using the non-negated grepl expression should get you all the "labels"
lines
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.