Re: [R] Parsing

Paolo Sonego Wed, 09 Jul 2008 08:01:26 -0700

Thanks so much Jim! It works without a glitch!

My only problem is that the text files to be parsed are quite big, up toseveral thousands rows (my apologies for the incomplete informations inmy former post), so loops are not my first choice. I'll take a look at'lapply' using your code as a model. Thanks again!


Sincerely,
Paolo

jim holtman ha scritto:

This should do what you want: (it uses loops; you can work at
replacing those with 'lapply' and such -- it all depends on if it is
going to take you more time to rewrite the code than to process a set
of data; you never did say how large the data was).  This also "grows"
a data.frame, but you have not indicated how efficient is has to be.
So this could be used as a model.

x <- readLines(textConnection("x      x_string

+ y      y_string
+ id1    id1_string
+ id2    id2_string
+ z      z_string
+ w      w_string
+ stuff  stuff  stuff
+ stuff  stuff  stuff
+ stuff  stuff  stuff
+ //
+ x      x_string1
+ y      y_string1
+ z      z_string1
+ w      w_string1
+ stuff  stuff  stuff
+ stuff  stuff  stuff
+ stuff  stuff  stuff
+ //
+ x      x_string2
+ y      y_string2
+ id1    id1_string1
+ id2    id2_string1
+ z      z_string2
+ w      w_string2
+ stuff  stuff  stuff
+ stuff  stuff  stuff
+ stuff  stuff  stuff
+ //"))

# I assume that each group is delimited by "//"
# initialize data.frame with desired values
.keys <- data.frame(x=NA, y=NA, id1=NA, id2=NA, w=NA)
.out <- .keys  # for the first pass
.save <- NULL
for (i in seq_along(x)){

+     if (x[i] == "//"){  # output the current data
+         .save <- rbind(.save, .out)
+         .out <- .keys    # setup for the next pass
+     } else {
+         .split <- strsplit(x[i], "\\s+")
+         if (.split[[1]][1] %in% names(.out)){
+             .out[[.split[[1]][1]]] <- .split[[1]][2]
+         }
+     }
+ }

.save

          x         y         id1         id2         w
1  x_string  y_string  id1_string  id2_string  w_string
2 x_string1 y_string1        <NA>        <NA> w_string1
3 x_string2 y_string2 id1_string1 id2_string1 w_string2


On Wed, Jul 9, 2008 at 5:33 AM, Paolo Sonego <[EMAIL PROTECTED]> wrote:

Dear R users,

I have a big text file formatted like this:

x      x_string
y      y_string
id1    id1_string
id2    id2_string
z      z_string
w      w_string
stuff  stuff  stuff
stuff  stuff  stuff
stuff  stuff  stuff
//
x      x_string1
y      y_string1
z      z_string1
w      w_string1
stuff  stuff  stuff
stuff  stuff  stuff
stuff  stuff  stuff
//
x      x_string2
y      y_string2
id1    id1_string1
id2    id2_string1
z      z_string2
w      w_string2
stuff  stuff  stuff
stuff  stuff  stuff
stuff  stuff  stuff
//
...
...


I'd like to parse this file and retrieve the x, y, id1, id2, z, w fields and
save them into a a matrix object:

x        y          id1         id2         z          w
x_string y_string   id1_string  id2_string  z_string   w_string  x_string1
y_string1 NA          NA          z_string1  w_string1
x_string2 y_string2 id1_string1 id2_string1 z_string2  w_string2
...
...

id1, id2 fields  are not always present within a section (the interval
between x and the last stuff) and
I'd like to insert a NA when they are absent (see above) so that
length(x)==length(y)==length(id1)==... .

Without the id1, id2 fields the task is easily solvable  importing the text
file with readLines and retrieving the single fields with grep:

input = readLines("file.txt")
x = grep("^x\\s", input, value = T)
id1 = grep("^id1\\s", input, value = T)
...

I'd like to accomplish this task entirely in R (no SQL, no perl script),
 possibly without using loops.

Any suggestions are quite welcome!

Regards,
Paolo

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Parsing

Reply via email to