Thanks so much Jim! It works without a glitch!
My only problem is that the text files to be parsed are quite big, up to
several thousands rows (my apologies for the incomplete informations in
my former post), so loops are not my first choice. I'll take a look at
'lapply' using your code as a model. Thanks again!
Sincerely,
Paolo
jim holtman ha scritto:
This should do what you want: (it uses loops; you can work at
replacing those with 'lapply' and such -- it all depends on if it is
going to take you more time to rewrite the code than to process a set
of data; you never did say how large the data was). This also "grows"
a data.frame, but you have not indicated how efficient is has to be.
So this could be used as a model.
x <- readLines(textConnection("x x_string
+ y y_string
+ id1 id1_string
+ id2 id2_string
+ z z_string
+ w w_string
+ stuff stuff stuff
+ stuff stuff stuff
+ stuff stuff stuff
+ //
+ x x_string1
+ y y_string1
+ z z_string1
+ w w_string1
+ stuff stuff stuff
+ stuff stuff stuff
+ stuff stuff stuff
+ //
+ x x_string2
+ y y_string2
+ id1 id1_string1
+ id2 id2_string1
+ z z_string2
+ w w_string2
+ stuff stuff stuff
+ stuff stuff stuff
+ stuff stuff stuff
+ //"))
# I assume that each group is delimited by "//"
# initialize data.frame with desired values
.keys <- data.frame(x=NA, y=NA, id1=NA, id2=NA, w=NA)
.out <- .keys # for the first pass
.save <- NULL
for (i in seq_along(x)){
+ if (x[i] == "//"){ # output the current data
+ .save <- rbind(.save, .out)
+ .out <- .keys # setup for the next pass
+ } else {
+ .split <- strsplit(x[i], "\\s+")
+ if (.split[[1]][1] %in% names(.out)){
+ .out[[.split[[1]][1]]] <- .split[[1]][2]
+ }
+ }
+ }
.save
x y id1 id2 w
1 x_string y_string id1_string id2_string w_string
2 x_string1 y_string1 <NA> <NA> w_string1
3 x_string2 y_string2 id1_string1 id2_string1 w_string2
On Wed, Jul 9, 2008 at 5:33 AM, Paolo Sonego <[EMAIL PROTECTED]> wrote:
Dear R users,
I have a big text file formatted like this:
x x_string
y y_string
id1 id1_string
id2 id2_string
z z_string
w w_string
stuff stuff stuff
stuff stuff stuff
stuff stuff stuff
//
x x_string1
y y_string1
z z_string1
w w_string1
stuff stuff stuff
stuff stuff stuff
stuff stuff stuff
//
x x_string2
y y_string2
id1 id1_string1
id2 id2_string1
z z_string2
w w_string2
stuff stuff stuff
stuff stuff stuff
stuff stuff stuff
//
...
...
I'd like to parse this file and retrieve the x, y, id1, id2, z, w fields and
save them into a a matrix object:
x y id1 id2 z w
x_string y_string id1_string id2_string z_string w_string x_string1
y_string1 NA NA z_string1 w_string1
x_string2 y_string2 id1_string1 id2_string1 z_string2 w_string2
...
...
id1, id2 fields are not always present within a section (the interval
between x and the last stuff) and
I'd like to insert a NA when they are absent (see above) so that
length(x)==length(y)==length(id1)==... .
Without the id1, id2 fields the task is easily solvable importing the text
file with readLines and retrieving the single fields with grep:
input = readLines("file.txt")
x = grep("^x\\s", input, value = T)
id1 = grep("^id1\\s", input, value = T)
...
I'd like to accomplish this task entirely in R (no SQL, no perl script),
possibly without using loops.
Any suggestions are quite welcome!
Regards,
Paolo
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.