?readLines ?grep ?textConnection On July 24, 2019 11:54:07 AM PDT, "Morway, Eric via R-help" <r-help@r-project.org> wrote: >The small reproducible example below works, but is way too slow on the >real >problem. The real problem is attempting to extract ~2920 repeated >arrays >from a 60 Mb file and takes ~80 minutes. I'm wondering how I might >re-engineer the script to avoid opening and closing the file 2920 times >as >is the case now. That is, is there a way to keep the file open and >peel >out the arrays and stuff them into a list of data.tables, as is done in >the >small reproducible example below, but in a significantly faster way? > >wha <- " INITIAL PRESSURE HEAD > INITIAL TEMPERATURE SET TO 4.000E+00 DEGREES C > VS2DH - MedSand for TL test > > TOTAL ELAPSED TIME = 0.000000E+00 sec > TIME STEP 0 > > MOISTURE CONTENT > Z, IN > m X OR R DISTANCE, IN m > 0.500 > 0.075 0.1475 > 0.225 0.1475 > 0.375 0.1475 > 0.525 0.1475 > 0.675 0.1475 >blah >blah >blah > TEMPERATURE, IN DECREES C > Z, IN > m X OR R DISTANCE, IN m > 0.500 > 0.075 1.1475 > 0.225 2.1475 > 0.375 3.1475 > 0.525 4.1475 > 0.675 5.1475 >blah >blah >blah > > TOTAL ELAPSED TIME = 8.6400E+04 sec > TIME STEP 0 > > MOISTURE CONTENT > Z, IN > m X OR R DISTANCE, IN m > 0.500 > 0.075 0.1875 > 0.225 0.1775 > 0.375 0.1575 > 0.525 0.1675 > 0.675 0.1475 >blah >blah >blah TEMPERATURE, IN DECREES C > Z, IN > m X OR R DISTANCE, IN m > 0.500 > 0.075 1.1475 > 0.225 2.1475 > 0.375 3.1475 > 0.525 4.1475 > 0.675 5.1475 >blah >blah >blah" > >example_content <- textConnection(wha) > >srchStr1 <- ' MOISTURE CONTENT' >srchStr2 <- 'TEMPERATURE, IN DECREES C' > >lines <- readLines(example_content) >mc_list <- NULL >for (i in 1:length(lines)){ > # Look for start of water content > if(grepl(srchStr1, lines[i])){ > mc_list <- c(mc_list, i) > } >} > >tmp_list <- NULL >for (i in 1:length(lines)){ > # Look for start of temperature data > if(grepl(srchStr2, lines[i])){ > tmp_list <- c(tmp_list, i) > } >} > ># Store the water content arrays >wc <- list() ># Read all the moisture content profiles >for(i in 1:length(mc_list)){ > lineNum <- mc_list[i] + 3 > mct <- read.table(text = wha, skip=lineNum, nrows=5, > col.names=c('depth','wc')) > wc[[i]] <- mct >} > ># Store the water temperature arrays >tmp <- list() ># Read all the temperature profiles >for(i in 1:length(tmp_list)){ > lineNum <- tmp_list[i] + 3 > tmpt <- read.table(text = wha, skip=lineNum, nrows=5, > col.names=c('depth','tmp')) > tmp[[i]] <- tmpt >} > ># quick inspection >length(wc) >wc[[1]] ># Looks like what I'm after, but too slow in real world problem > > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
-- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.