Hi: This should get you most of the way there; I'll let you figure out how to assign the BLOCK and RUN numbers.
tx <- "Subject ID,ExperimentName,2010-04-23,32:34:23,Version 0.4, 640 by 960 pixels, On Device M, M, 3.2.4, zz_373_462_488_...@9z.svg,592,820,3.35,zz_032_288_436_...@9z.svg ,332,878,3.66, zz_384_204_433_...@9z.svg,334,824,3.28,zz_365_575_683_...@9z.svg ,598,878,3.50, zz_005_480_239_...@9z.svg,630,856,8.03,zz_030_423_394_...@9z.svg ,98,846,4.09, zz_033_596_398_...@9z.svg,636,902,3.28,zz_263_064_320_...@9z.svg ,570,894,1.26,bl...@9z.svg,322,842,32.96, zz_004_088_403_...@9z.svg,606,908,3.32,zz_703_546_434_...@9z.svg ,624,934,2.58, zz_712_348_543_...@9z.svg,20,828,5.36,zz_005_48_239_...@9z.svg,580,830,4.36, zz_310_444_623_...@9z.svg,586,806,0.08,zz_030_423_394_...@9z.svg ,350,854,3.84, zz_340_382_539_...@9z.svg,570,894,1.26,bl...@9z.svg,542,840,4.44, zz_345_230_662_...@9z.svg,632,844,2.47,zz_006_335_309_...@9z.svg ,96,930,3.63, zz_782_346_746_...@9z.svg,306,850,2.58,zz_334_200_333_...@9z.svg ,304,842,3.34, zz_383_506_726_...@9z.svg,622,884,3.84,zz_294_360_448_...@9z.svg ,90,858,3.56, zz_334_335_473_...@9z.svg,570,894,1.26,bl...@9z.svg,320,852,4.04," # Begin by splitting up the text string tx by 'ZZ_'; this produces a list. # Then use lapply to split again by @9z.svg and remove the first element (basically the first row of tx above) lst <- strsplit(tx, 'ZZ_') lst2 <- lapply(lst[[1]], strsplit, '@9z.svg') lst2[[1]] <- NULL # This is a function that breaks up the two strings, one separated by _, the other by ',' # If a third string exists in a list component, it is ignored. stringBreak <- function(svec) { svec <- unlist(svec) u <- svec[1] v <- svec[2] us <- unlist(strsplit(u, '_')) # since this string starts with a comma, remove the first empty string vs <- unlist(strsplit(v, ','))[-1] # check for presence of 'BLOCK' string if(length(vs) == 4) endblock = 1 else endblock = 0 # write elements to a one-line data frame data.frame(IngNam = as.numeric(vs[1]), Tx = as.numeric(us[2]), Ty = as.numeric(us[3]), Treatment = us[4], x = as.numeric(vs[1]), y = as.numeric(vs[2]), Y = as.numeric(vs[3]), endblock = endblock) } # Slurp into a data frame: # Method 1: package plyr library(plyr) df0 <- ldply(lst2, stringBreak) # Method 2: do.call() df0 <- do.call(rbind, lapply(lst2, stringBreak)) Result: > ldply(lst2, stringBreak) IngNam Tx Ty Treatment x y Y BLOCK 1 592 462 488 TRT 592 820 3.35 0 2 332 288 436 CON 332 878 3.66 0 3 334 204 433 TRT 334 824 3.28 0 4 598 575 683 TRT 598 878 3.50 0 5 630 480 239 CON 630 856 8.03 0 6 98 423 394 CON 98 846 4.09 0 7 636 596 398 CON 636 902 3.28 0 8 570 64 320 TRT 570 894 1.26 1 9 606 88 403 CON 606 908 3.32 0 10 624 546 434 CON 624 934 2.58 0 11 20 348 543 CON 20 828 5.36 0 12 580 48 239 CON 580 830 4.36 0 13 586 444 623 TRT 586 806 0.08 0 14 350 423 394 CON 350 854 3.84 0 15 570 382 539 TRT 570 894 1.26 1 16 632 230 662 TRT 632 844 2.47 0 17 96 335 309 CON 96 930 3.63 0 18 306 346 746 TRT 306 850 2.58 0 19 304 200 333 TRT 304 842 3.34 0 20 622 506 726 TRT 622 884 3.84 0 21 90 360 448 TRT 90 858 3.56 0 22 570 335 473 TRT 570 894 1.26 1 HTH, Dennis On Sun, Mar 6, 2011 at 7:13 PM, Eric Fail <eric.f...@gmx.com> wrote: > Dear R-list, > > I have a partly comma separated partly underscore separated string that I > am trying to parse into R. > > Furthermore I have a bunch of them, and they are quite long. I have now > spent most of my Sunday trying to figure this out and thought I would try > the list to see if someone here would be able to get me started. > > My data structure looks like this, > > (in a example.txt file) > Subject ID,ExperimentName,2010-04-23,32:34:23,Version 0.4, 640 by 960 > pixels, On Device M, M, 3.2.4,zz_373_462_488_...@9z.svg > ,592,820,3.35,zz_032_288_436_...@9z.svg > ,332,878,3.66,zz_384_204_433_...@9z.svg > ,334,824,3.28,zz_365_575_683_...@9z.svg > ,598,878,3.50,zz_005_480_239_...@9z.svg > ,630,856,8.03,zz_030_423_394_...@9z.svg > ,98,846,4.09,zz_033_596_398_...@9z.svg > ,636,902,3.28,zz_263_064_320_...@9z.svg,570,894,1.26,bl...@9z.svg > ,322,842,32.96,zz_004_088_403_...@9z.svg > ,606,908,3.32,zz_703_546_434_...@9z.svg > ,624,934,2.58,zz_712_348_543_...@9z.svg > ,20,828,5.36,zz_005_48_239_...@9z.svg > ,580,830,4.36,zz_310_444_623_...@9z.svg > ,586,806,0.08,zz_030_423_394_...@9z.svg > ,350,854,3.84,zz_340_382_539_...@9z.svg,570,894,1.26,bl...@9z.svg > ,542,840,4.44,zz_345_230_662_...@9z.svg > ,632,844,2.47,zz_006_335_309_...@9z.svg > ,96,930,3.63,zz_782_346_746_...@9z.svg > ,306,850,2.58,zz_334_200_333_...@9z.svg > ,304,842,3.34,zz_383_506_726_...@9z.svg > ,622,884,3.84,zz_294_360_448_...@9z.svg > ,90,858,3.56,zz_334_335_473_...@9z.svg,570,894,1.26,bl...@9z.svg > ,320,852,4.04, > (end of example.txt file) > > The above is approximate 5% of the length of a full file, and then I got > about 100 of them. Please note that the strings end with a comma. > > I am trying to parse it into something like this > > ID ImgNam BLOCK RUN Tx Ty Treatment x y Y > Subject ID 373 1 1 462 488 TRT 592 820 3.35 > Subject ID 32 1 2 288 436 CON 332 878 3.66 > Subject ID 384 1 3 204 433 TRT 334 824 3.28 > Subject ID 365 1 4 575 683 TRT 598 878 3.5 > Subject ID 5 1 5 480 239 CON 630 856 8.03 > Subject ID 30 1 6 423 394 CON 98 846 4.09 > Subject ID 33 1 7 596 398 CON 636 902 3.28 > Subject ID 263 1 8 64 320 TRT 570 894 1.26 > Subject ID 4 2 1 88 403 CON 606 908 3.32 > Subject ID 703 2 2 546 434 CON 624 934 2.58 > Subject ID 712 2 3 348 543 CON 20 828 5.36 > Subject ID 5 2 4 48 239 CON 580 830 4.36 > Subject ID 310 2 5 444 623 TRT 586 806 0.08 > Subject ID 30 2 6 423 394 CON 350 854 3.84 > Subject ID 340 2 7 382 539 TRT 570 894 1.26 > Subject ID 345 3 1 230 662 TRT 632 844 2.47 > Subject ID 6 3 2 335 309 CON 96 930 3.63 > Subject ID 782 3 3 346 746 TRT 306 850 2.58 > Subject ID 334 3 4 200 333 TRT 304 842 3.34 > Subject ID 383 3 5 506 726 TRT 622 884 3.84 > Subject ID 294 3 6 360 448 TRT 90 858 3.56 > Subject ID 334 3 7 335 473 TRT 570 894 1.26 > > I could do it in Excel, but it would take me a week--and it would be > stupid--if someone could please help me get started I would very much > appreciate it. It would not only benefit me, but my colleagues would see the > benefit of R and the R-list in particular. > > Thanks in advance! > > Eric > > -- > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.