Oh, there are ways, but the constraining issue here is moving data (memory bandwidth), and data table is probably already the fastest mechanism for doing that. If you have a computer with four or more real cores you can try setting up a subset of the columns in each task and cbind the results afterward, but it will be hard to accomplish without making extra copies of the data. You are already probably already using virtual memory which is saved to and from hard disk storage as needed.
Working in Spark with a distributed file system like Hadoop might solve some of these problems... but I haven't done real work with such tools. On April 13, 2018 6:31:32 PM PDT, Jack Arnestad <jackarnes...@gmail.com> wrote: >Yes unfortunately. The goal of the "outer" is to do feature selection >before fitting it to a model. > >Is there a way it could be parallelized? > >Thanks! > >On Fri, Apr 13, 2018 at 9:08 PM, Jeff Newmiller ><jdnew...@dcn.davis.ca.us> >wrote: > >> You have 10^7 columns? That process is bound to be slow. >> >> On April 13, 2018 5:31:32 PM PDT, Jack Arnestad ><jackarnes...@gmail.com> >> wrote: >> >I have a data.table with dimensions 100 by 10^7. >> > >> >When I do >> > >> > trainIndex <- >> > caret::createDataPartition( >> > df$status, >> > p = .9, >> > list = FALSE, >> > times = 1 >> > ) >> > outerTrain <- df[trainIndex] >> > outerTest <- df[-trainIndex] >> > >> >Subsetting the rows of df takes over 20 minutes. >> > >> >What is the best way to efficiently subset this? >> > >> >Thanks! >> > >> > [[alternative HTML version deleted]] >> > >> >______________________________________________ >> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >https://stat.ethz.ch/mailman/listinfo/r-help >> >PLEASE do read the posting guide >> >http://www.R-project.org/posting-guide.html >> >and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Sent from my phone. Please excuse my brevity. >> -- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.