I have a data.table with dimensions 100 by 10^7.

When I do

    trainIndex <-
      caret::createDataPartition(
        df$status,
        p = .9,
        list = FALSE,
        times = 1
      )
    outerTrain <- df[trainIndex]
    outerTest  <- df[-trainIndex]

Subsetting the rows of df takes over 20 minutes.

What is the best way to efficiently subset this?

Thanks!

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to