What kind of disk do you use? The hardware differences might be important to this issue.
Best, Jiefei Leonard Mada via R-help <r-help@r-project.org> 于 2021年9月26日周日 下午9:04写道: > Dear Bill, > > > - using the Ms Windows Properties: ~ 15 s; > > [Windows new start, 1st operation, bulk size] > > - using R / file.info() (2nd operation): still 523.6 s > > [and R seems mostly unresponsive during this time] > > > Unfortunately, I do not know how to clear any cache. > > [The cache may play a role only for smaller sizes? But I am rather not > inclined to run the ~ 10 minutes procedure multiple times.] > > > Sincerely, > > > Leonard > > > On 9/26/2021 5:49 AM, Richard O'Keefe wrote: > > On a $150 second-hand laptop with 0.9GB of library, > > and a single-user installation of R so only one place to look > > LIBRARY=$HOME/R/x86_64-pc-linux-gnu-library/4.0 > > cd $LIBRARY > > echo "kbytes package" > > du -sk * | sort -k1n > > > > took 150 msec to report the disc space needed for every package. > > > > That' > > > > On Sun, 26 Sept 2021 at 06:14, Bill Dunlap <williamwdun...@gmail.com> > wrote: > >> On my Windows 10 laptop I see evidence of the operating system caching > >> information about recently accessed files. This makes it hard to say > how > >> the speed might be improved. Is there a way to clear this cache? > >> > >>> system.time(L1 <- size.f.pkg(R.home("library"))) > >> user system elapsed > >> 0.48 2.81 30.42 > >>> system.time(L2 <- size.f.pkg(R.home("library"))) > >> user system elapsed > >> 0.35 1.10 1.43 > >>> identical(L1,L2) > >> [1] TRUE > >>> length(L1) > >> [1] 30 > >>> length(dir(R.home("library"),recursive=TRUE)) > >> [1] 12949 > >> > >> On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help < > >> r-help@r-project.org> wrote: > >> > >>> Dear List Members, > >>> > >>> > >>> I tried to compute the file sizes of each installed package and the > >>> process is terribly slow. > >>> > >>> It took ~ 10 minutes for 512 packages / 1.6 GB total size of files. > >>> > >>> > >>> 1.) Package Sizes > >>> > >>> > >>> system.time({ > >>> x = size.pkg(file=NULL); > >>> }) > >>> # elapsed time: 509 s !!! > >>> # 512 Packages; 1.64 GB; > >>> # R 4.1.1 on MS Windows 10 > >>> > >>> > >>> The code for the size.pkg() function is below and the latest version is > >>> on Github: > >>> > >>> https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R > >>> > >>> > >>> Questions: > >>> Is there a way to get the file size faster? > >>> It takes long on Windows as well, but of the order of 10-20 s, not 10 > >>> minutes. > >>> Do I miss something? > >>> > >>> > >>> 1.b.) Alternative > >>> > >>> It came to my mind to read first all file sizes and then use tapply or > >>> aggregate - but I do not see why it should be faster. > >>> > >>> Would it be meaningful to benchmark each individual package? > >>> > >>> Although I am not very inclined to wait 10 minutes for each new try > out. > >>> > >>> > >>> 2.) Big Packages > >>> > >>> Just as a note: there are a few very large packages (in my list of 512 > >>> packages): > >>> > >>> 1 123,566,287 BH > >>> 2 113,578,391 sf > >>> 3 112,252,652 rgdal > >>> 4 81,144,868 magick > >>> 5 77,791,374 openNLPmodels.en > >>> > >>> I suspect that sf & rgdal have a lot of duplicated data structures > >>> and/or duplicate code and/or duplicated libraries - although I am not > an > >>> expert in the field and did not check the sources. > >>> > >>> > >>> Sincerely, > >>> > >>> > >>> Leonard > >>> > >>> ======= > >>> > >>> > >>> # Package Size: > >>> size.f.pkg = function(path=NULL) { > >>> if(is.null(path)) path = R.home("library"); > >>> xd = list.dirs(path = path, full.names = FALSE, recursive = > FALSE); > >>> size.f = function(p) { > >>> p = paste0(path, "/", p); > >>> sum(file.info(list.files(path=p, pattern=".", > >>> full.names = TRUE, all.files = TRUE, recursive = > TRUE))$size); > >>> } > >>> sapply(xd, size.f); > >>> } > >>> > >>> size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") { > >>> x = size.f.pkg(path=path); > >>> x = as.data.frame(x); > >>> names(x) = "Size" > >>> x$Name = rownames(x); > >>> # Order > >>> if(sort) { > >>> id = order(x$Size, decreasing=TRUE) > >>> x = x[id,]; > >>> } > >>> if( ! is.null(file)) { > >>> if( ! is.character(file)) { > >>> print("Error: Size NOT written to file!"); > >>> } else write.csv(x, file=file, row.names=FALSE); > >>> } > >>> return(x); > >>> } > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.