on 12/10/2008 12:50 PM Chris Poliquin wrote:
> Hi,
> 
> I need to read in a series of text files with a time series on each
> row.  The series are of different lengths and I'd like to just use the
> first row as the length and have R ignore extra values in rows that go
> over this length.
> 
> For example:
> 
> 1 0 3 4 5
> 1 3 5 6 8 7 7
> 2 1 1 1 4 7 7 7
> 
> So the 7s would be ignored and I would have a 5x3 matrix.  I tried
> creating a series of colClasses with NULLs for the extra values by using
> max(count.fields(file)) - min(count.fields(file)) but this didn't work
> and would be too time consuming for lots of files.
> 
> fill=T doesn't seem to be working either.  When I use fill=T I get extra
> rows for some reason in the table.  R doesn't seem to just be appending
> NAs to the end of the short rows.
> 
> Any way to accomplish this?
> 
> - Chris

Not sure why you had issues with 'fill = TRUE'.

Presuming that you do not know 'a priori' the resultant matrix size, you
could do something like the following.

Essentially, use read.table() to get the following initial result,
filling in the short rows, converting the 7's to NA values:

DF <- read.table("clipboard", fill = TRUE, na.strings = 7)

> DF
  V1 V2 V3 V4 V5 V6 V7 V8
1  1  0  3  4  5 NA NA NA
2  1  3  5  6  8 NA NA NA
3  2  1  1  1  4 NA NA NA

We can then use complete.cases() on the transposed data frame to get the
indices of the columns that have NAs:

> complete.cases(t(DF))
[1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

Thus:

> DF[, complete.cases(t(DF))]
  V1 V2 V3 V4 V5
1  1  0  3  4  5
2  1  3  5  6  8
3  2  1  1  1  4


HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to