Hello,
Instead of read.table use
data.table::fread
It's an order of magnitude faster and all you have to do is to change
the function, all arguments are the same (in this case).
Hope this helps,
Rui Barradas
Às 20:18 de 24/07/19, Rui Barradas escreveu:
Hello,
This is far from a complete answer.
A quicky one: no loops.
mc_list2 <- grep(srchStr1, lines)
tmp_list2 <- grep(srchStr2, lines)
identical(mc_list, mc_list2) # [1] TRUE
identical(tmp_list, tmp_list2) # [1] TRUE
Another one: don't extend lists or vectors inside loops, reserve memory
beforehand.
wc <- vector("list", length = length(mc_list))
tmp <- vector("list", length = length(tmp_list))
are much better than your
wc <- list()
tmp <- list()
Maybe I will find ways to save time with the really slow instructions.
Hope this helps,
Rui Barradas
Às 19:54 de 24/07/19, Morway, Eric via R-help escreveu:
The small reproducible example below works, but is way too slow on the
real
problem. The real problem is attempting to extract ~2920 repeated arrays
from a 60 Mb file and takes ~80 minutes. I'm wondering how I might
re-engineer the script to avoid opening and closing the file 2920
times as
is the case now. That is, is there a way to keep the file open and peel
out the arrays and stuff them into a list of data.tables, as is done
in the
small reproducible example below, but in a significantly faster way?
wha <- " INITIAL PRESSURE HEAD
INITIAL TEMPERATURE SET TO 4.000E+00 DEGREES C
VS2DH - MedSand for TL test
TOTAL ELAPSED TIME = 0.000000E+00 sec
TIME STEP 0
MOISTURE CONTENT
Z, IN
m X OR R DISTANCE, IN m
0.500
0.075 0.1475
0.225 0.1475
0.375 0.1475
0.525 0.1475
0.675 0.1475
blah
blah
blah
TEMPERATURE, IN DECREES C
Z, IN
m X OR R DISTANCE, IN m
0.500
0.075 1.1475
0.225 2.1475
0.375 3.1475
0.525 4.1475
0.675 5.1475
blah
blah
blah
TOTAL ELAPSED TIME = 8.6400E+04 sec
TIME STEP 0
MOISTURE CONTENT
Z, IN
m X OR R DISTANCE, IN m
0.500
0.075 0.1875
0.225 0.1775
0.375 0.1575
0.525 0.1675
0.675 0.1475
blah
blah
blah TEMPERATURE, IN DECREES C
Z, IN
m X OR R DISTANCE, IN m
0.500
0.075 1.1475
0.225 2.1475
0.375 3.1475
0.525 4.1475
0.675 5.1475
blah
blah
blah"
example_content <- textConnection(wha)
srchStr1 <- ' MOISTURE CONTENT'
srchStr2 <- 'TEMPERATURE, IN DECREES C'
lines <- readLines(example_content)
mc_list <- NULL
for (i in 1:length(lines)){
# Look for start of water content
if(grepl(srchStr1, lines[i])){
mc_list <- c(mc_list, i)
}
}
tmp_list <- NULL
for (i in 1:length(lines)){
# Look for start of temperature data
if(grepl(srchStr2, lines[i])){
tmp_list <- c(tmp_list, i)
}
}
# Store the water content arrays
wc <- list()
# Read all the moisture content profiles
for(i in 1:length(mc_list)){
lineNum <- mc_list[i] + 3
mct <- read.table(text = wha, skip=lineNum, nrows=5,
col.names=c('depth','wc'))
wc[[i]] <- mct
}
# Store the water temperature arrays
tmp <- list()
# Read all the temperature profiles
for(i in 1:length(tmp_list)){
lineNum <- tmp_list[i] + 3
tmpt <- read.table(text = wha, skip=lineNum, nrows=5,
col.names=c('depth','tmp'))
tmp[[i]] <- tmpt
}
# quick inspection
length(wc)
wc[[1]]
# Looks like what I'm after, but too slow in real world problem
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.