Hello,

Instead of read.table use

data.table::fread

It's an order of magnitude faster and all you have to do is to change the function, all arguments are the same (in this case).


Hope this helps,

Rui Barradas

Às 20:18 de 24/07/19, Rui Barradas escreveu:
Hello,

This is far from a complete answer.

A quicky one: no loops.

mc_list2 <- grep(srchStr1, lines)
tmp_list2 <- grep(srchStr2, lines)

identical(mc_list, mc_list2)    # [1] TRUE
identical(tmp_list, tmp_list2)  # [1] TRUE


Another one: don't extend lists or vectors inside loops, reserve memory beforehand.

wc <- vector("list", length = length(mc_list))
tmp <- vector("list", length = length(tmp_list))


are much better than your

wc <- list()
tmp <- list()


Maybe I will find ways to save time with the really slow instructions.

Hope this helps,

Rui Barradas


Às 19:54 de 24/07/19, Morway, Eric via R-help escreveu:
The small reproducible example below works, but is way too slow on the real
problem.  The real problem is attempting to extract ~2920 repeated arrays
from a 60 Mb file and takes ~80 minutes.  I'm wondering how I might
re-engineer the script to avoid opening and closing the file 2920 times as
is the case now.  That is, is there a way to keep the file open and peel
out the arrays and stuff them into a list of data.tables, as is done in the
small reproducible example below, but in a significantly faster way?

wha <- "     INITIAL PRESSURE HEAD
      INITIAL TEMPERATURE SET TO 4.000E+00 DEGREES C
      VS2DH - MedSand for TL test

      TOTAL ELAPSED TIME =  0.000000E+00 sec
      TIME STEP         0

      MOISTURE CONTENT
   Z, IN
   m                       X OR R DISTANCE, IN m
                 0.500
      0.075     0.1475
      0.225     0.1475
      0.375     0.1475
      0.525     0.1475
      0.675     0.1475
blah
blah
blah
      TEMPERATURE, IN DECREES C
   Z, IN
   m                       X OR R DISTANCE, IN m
                 0.500
      0.075     1.1475
      0.225     2.1475
      0.375     3.1475
      0.525     4.1475
      0.675     5.1475
blah
blah
blah

      TOTAL ELAPSED TIME =  8.6400E+04 sec
      TIME STEP         0

      MOISTURE CONTENT
   Z, IN
   m                       X OR R DISTANCE, IN m
                 0.500
      0.075     0.1875
      0.225     0.1775
      0.375     0.1575
      0.525     0.1675
      0.675     0.1475
blah
blah
blah     TEMPERATURE, IN DECREES C
   Z, IN
   m                       X OR R DISTANCE, IN m
                 0.500
      0.075     1.1475
      0.225     2.1475
      0.375     3.1475
      0.525     4.1475
      0.675     5.1475
blah
blah
blah"

example_content <- textConnection(wha)

srchStr1 <- '     MOISTURE CONTENT'
srchStr2 <- 'TEMPERATURE, IN DECREES C'

lines   <- readLines(example_content)
mc_list <- NULL
for (i in 1:length(lines)){
   # Look for start of water content
   if(grepl(srchStr1, lines[i])){
     mc_list <- c(mc_list, i)
   }
}

tmp_list <- NULL
for (i in 1:length(lines)){
   # Look for start of temperature data
   if(grepl(srchStr2, lines[i])){
     tmp_list <- c(tmp_list, i)
   }
}

# Store the water content arrays
wc <- list()
# Read all the moisture content profiles
for(i in 1:length(mc_list)){
   lineNum <- mc_list[i] + 3
   mct <- read.table(text = wha, skip=lineNum, nrows=5,
                     col.names=c('depth','wc'))
   wc[[i]] <- mct
}

# Store the water temperature arrays
tmp <- list()
# Read all the temperature profiles
for(i in 1:length(tmp_list)){
   lineNum <- tmp_list[i] + 3
   tmpt <- read.table(text = wha, skip=lineNum, nrows=5,
                     col.names=c('depth','tmp'))
   tmp[[i]] <- tmpt
}

# quick inspection
length(wc)
wc[[1]]
# Looks like what I'm after, but too slow in real world problem

    [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to