You have 10^7 columns? That process is bound to be slow.

On April 13, 2018 5:31:32 PM PDT, Jack Arnestad <jackarnes...@gmail.com> wrote:
>I have a data.table with dimensions 100 by 10^7.
>
>When I do
>
>    trainIndex <-
>      caret::createDataPartition(
>        df$status,
>        p = .9,
>        list = FALSE,
>        times = 1
>      )
>    outerTrain <- df[trainIndex]
>    outerTest  <- df[-trainIndex]
>
>Subsetting the rows of df takes over 20 minutes.
>
>What is the best way to efficiently subset this?
>
>Thanks!
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to