Hi:

This should get you most of the way there; I'll let you figure out how to
assign the BLOCK and RUN numbers.

tx <- "Subject ID,ExperimentName,2010-04-23,32:34:23,Version 0.4, 640 by
960  pixels, On Device M, M, 3.2.4,
[email protected],592,820,3.35,[email protected]
,332,878,3.66,
[email protected],334,824,3.28,[email protected]
,598,878,3.50,
[email protected],630,856,8.03,[email protected]
,98,846,4.09,
[email protected],636,902,3.28,[email protected]
,570,894,1.26,[email protected],322,842,32.96,
[email protected],606,908,3.32,[email protected]
,624,934,2.58,
[email protected],20,828,5.36,[email protected],580,830,4.36,
[email protected],586,806,0.08,[email protected]
,350,854,3.84,
[email protected],570,894,1.26,[email protected],542,840,4.44,
[email protected],632,844,2.47,[email protected]
,96,930,3.63,
[email protected],306,850,2.58,[email protected]
,304,842,3.34,
[email protected],622,884,3.84,[email protected]
,90,858,3.56,
[email protected],570,894,1.26,[email protected],320,852,4.04,"

# Begin by splitting up the text string tx by 'ZZ_'; this produces a list.
# Then use lapply to split again by @9z.svg and remove the first element
(basically the first row of tx above)
lst <- strsplit(tx, 'ZZ_')
lst2 <- lapply(lst[[1]], strsplit, '@9z.svg')
lst2[[1]] <- NULL

# This is a function that breaks up the two strings, one separated by _, the
other by ','
# If a third string exists in a list component, it is ignored.
stringBreak <- function(svec) {
  svec <- unlist(svec)
  u <- svec[1]
  v <- svec[2]

  us <- unlist(strsplit(u, '_'))
# since this string starts with a comma, remove the first empty string
  vs <- unlist(strsplit(v, ','))[-1]
# check for presence of 'BLOCK' string
  if(length(vs) == 4) endblock = 1 else endblock = 0
# write elements to a one-line data frame
  data.frame(IngNam = as.numeric(vs[1]),
    Tx = as.numeric(us[2]),
    Ty = as.numeric(us[3]),
    Treatment = us[4],
    x = as.numeric(vs[1]),
    y = as.numeric(vs[2]),
    Y = as.numeric(vs[3]),
    endblock = endblock)
  }

# Slurp into a data frame:

# Method 1: package plyr
library(plyr)
df0 <- ldply(lst2, stringBreak)

# Method 2: do.call()
df0 <- do.call(rbind, lapply(lst2, stringBreak))

Result:
> ldply(lst2, stringBreak)
   IngNam  Tx  Ty Treatment   x   y    Y BLOCK
1     592 462 488       TRT 592 820 3.35     0
2     332 288 436       CON 332 878 3.66     0
3     334 204 433       TRT 334 824 3.28     0
4     598 575 683       TRT 598 878 3.50     0
5     630 480 239       CON 630 856 8.03     0
6      98 423 394       CON  98 846 4.09     0
7     636 596 398       CON 636 902 3.28     0
8     570  64 320       TRT 570 894 1.26     1
9     606  88 403       CON 606 908 3.32     0
10    624 546 434       CON 624 934 2.58     0
11     20 348 543       CON  20 828 5.36     0
12    580  48 239       CON 580 830 4.36     0
13    586 444 623       TRT 586 806 0.08     0
14    350 423 394       CON 350 854 3.84     0
15    570 382 539       TRT 570 894 1.26     1
16    632 230 662       TRT 632 844 2.47     0
17     96 335 309       CON  96 930 3.63     0
18    306 346 746       TRT 306 850 2.58     0
19    304 200 333       TRT 304 842 3.34     0
20    622 506 726       TRT 622 884 3.84     0
21     90 360 448       TRT  90 858 3.56     0
22    570 335 473       TRT 570 894 1.26     1

HTH,
Dennis


On Sun, Mar 6, 2011 at 7:13 PM, Eric Fail <[email protected]> wrote:

> Dear R-list,
>
> I have a partly comma separated partly underscore separated string that I
> am trying to parse into R.
>
> Furthermore I have a bunch of them, and they are quite long. I have now
> spent most of my Sunday trying to figure this out and thought I would try
> the list to see if someone here would be able to get me started.
>
> My data structure looks like this,
>
> (in a example.txt file)
> Subject ID,ExperimentName,2010-04-23,32:34:23,Version 0.4, 640 by 960
>  pixels, On Device M, M, 3.2.4,[email protected]
> ,592,820,3.35,[email protected]
> ,332,878,3.66,[email protected]
> ,334,824,3.28,[email protected]
> ,598,878,3.50,[email protected]
> ,630,856,8.03,[email protected]
> ,98,846,4.09,[email protected]
> ,636,902,3.28,[email protected],570,894,1.26,[email protected]
> ,322,842,32.96,[email protected]
> ,606,908,3.32,[email protected]
> ,624,934,2.58,[email protected]
> ,20,828,5.36,[email protected]
> ,580,830,4.36,[email protected]
> ,586,806,0.08,[email protected]
> ,350,854,3.84,[email protected],570,894,1.26,[email protected]
> ,542,840,4.44,[email protected]
> ,632,844,2.47,[email protected]
> ,96,930,3.63,[email protected]
> ,306,850,2.58,[email protected]
> ,304,842,3.34,[email protected]
> ,622,884,3.84,[email protected]
> ,90,858,3.56,[email protected],570,894,1.26,[email protected]
> ,320,852,4.04,
> (end of example.txt file)
>
> The above is approximate 5% of the length of a full file, and then I got
> about 100 of them. Please note that the strings end with a comma.
>
> I am trying to parse it into something like this
>
> ID ImgNam BLOCK RUN Tx Ty Treatment x y Y
> Subject ID 373 1 1 462 488 TRT 592 820 3.35
> Subject ID 32 1 2 288 436 CON 332 878 3.66
> Subject ID 384 1 3 204 433 TRT 334 824 3.28
> Subject ID 365 1 4 575 683 TRT 598 878 3.5
> Subject ID 5 1 5 480 239 CON 630 856 8.03
> Subject ID 30 1 6 423 394 CON 98 846 4.09
> Subject ID 33 1 7 596 398 CON 636 902 3.28
> Subject ID 263 1 8 64 320 TRT 570 894 1.26
> Subject ID 4 2 1 88 403 CON 606 908 3.32
> Subject ID 703 2 2 546 434 CON 624 934 2.58
> Subject ID 712 2 3 348 543 CON 20 828 5.36
> Subject ID 5 2 4 48 239 CON 580 830 4.36
> Subject ID 310 2 5 444 623 TRT 586 806 0.08
> Subject ID 30 2 6 423 394 CON 350 854 3.84
> Subject ID 340 2 7 382 539 TRT 570 894 1.26
> Subject ID 345 3 1 230 662 TRT 632 844 2.47
> Subject ID 6 3 2 335 309 CON 96 930 3.63
> Subject ID 782 3 3 346 746 TRT 306 850 2.58
> Subject ID 334 3 4 200 333 TRT 304 842 3.34
> Subject ID 383 3 5 506 726 TRT 622 884 3.84
> Subject ID 294 3 6 360 448 TRT 90 858 3.56
> Subject ID 334 3 7 335 473 TRT 570 894 1.26
>
> I could do it in Excel, but it would take me a week--and it would be
> stupid--if someone could please help me get started I would very much
> appreciate it. It would not only benefit me, but my colleagues would see the
> benefit of R and the R-list in particular.
>
> Thanks in advance!
>
> Eric
>
> --
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to