On May 18, 2009, at 11:24 AM, Steve Murray wrote:


Dear all,

I have a file which I've converted from NetCDF (.nc) to text (.txt) using ncdump in Unix (as I had problems using the ncdf package to do this). The first few rows (as copied and pasted from the Unix console) of the file appear as follows:

_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,


As you can see, there are a lot of NA values before the actual numeric values start further down the dataset. My problem is that I'm having trouble reading this file into R. I think the problem lies with the sep= argument, although I may be wrong. I tried the following command at first, as the data appear to be comma separated:

read.table("test86.txt", skip=43, na.strings="-", header=FALSE, sep=",") -> test86 # skip =43 due to meta-data information being held in the initial rows
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
 line 29 did not have 25 elements

I then tried sep=" ", followed by sep="" but received a similar-type error message (although line 29 doesn't appear to be especially different from the rest).

I subsequently tried using sep=\t and then sep=\n. These both result in the data being read in without an error message being displayed, although the data are formatted as follows:

head(test86)
                                                                           V1
1 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 2 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 3 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 4 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 5 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 6 _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _,


dim(test86)
[1] 179899      1


Instead of one column, I'd expect there to be 720.


I think I'm getting something wrong relating to the sep= argument (or possibly mis-using na.strings?). If anyone has any solutions to this then I'd be very grateful to hear them.

Many thanks for any advice,

Steve


Two problems,

1. Your first line above has one more column/entry than the subsequent lines. If that is correct, you need to use the 'fill = TRUE' argument so that all subsequent rows are filled to have the same number of columns. If the above is due to a copy/paste error, then disregard this.

2. You are using a '-' (hyphen) as your 'na.strings' character, when the data is using a '_' (underscore).

Additionally, I would use 'strip.white = TRUE', to aid in getting rid of extraneous white space around your fields/separators. That will also help with column separations.


Thus (on OSX) with the above data copied to the clipboard:

> read.table(pipe("pbpaste"), na.strings = "_", sep = ",", fill = TRUE, strip.white = TRUE) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 10 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA



HTH,

Marc Schwartz

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to