Dear Bill,

- using the Ms Windows Properties: ~ 15 s;

[Windows new start, 1st operation, bulk size]

- using R / file.info() (2nd operation): still 523.6 s

[and R seems mostly unresponsive during this time]


Unfortunately, I do not know how to clear any cache.

[The cache may play a role only for smaller sizes? But I am rather not inclined to run the ~ 10 minutes procedure multiple times.]


Sincerely,


Leonard


On 9/26/2021 5:49 AM, Richard O'Keefe wrote:
On a $150 second-hand laptop with 0.9GB of library,
and a single-user installation of R so only one place to look
LIBRARY=$HOME/R/x86_64-pc-linux-gnu-library/4.0
cd $LIBRARY
echo "kbytes package"
du -sk * | sort -k1n

took 150 msec to report the disc space needed for every package.

That'

On Sun, 26 Sept 2021 at 06:14, Bill Dunlap <williamwdun...@gmail.com> wrote:
On my Windows 10 laptop I see evidence of the operating system caching
information about recently accessed files.  This makes it hard to say how
the speed might be improved.  Is there a way to clear this cache?

system.time(L1 <- size.f.pkg(R.home("library")))
    user  system elapsed
    0.48    2.81   30.42
system.time(L2 <- size.f.pkg(R.home("library")))
    user  system elapsed
    0.35    1.10    1.43
identical(L1,L2)
[1] TRUE
length(L1)
[1] 30
length(dir(R.home("library"),recursive=TRUE))
[1] 12949

On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help <
r-help@r-project.org> wrote:

Dear List Members,


I tried to compute the file sizes of each installed package and the
process is terribly slow.

It took ~ 10 minutes for 512 packages / 1.6 GB total size of files.


1.) Package Sizes


system.time({
          x = size.pkg(file=NULL);
})
# elapsed time: 509 s !!!
# 512 Packages; 1.64 GB;
# R 4.1.1 on MS Windows 10


The code for the size.pkg() function is below and the latest version is
on Github:

https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R


Questions:
Is there a way to get the file size faster?
It takes long on Windows as well, but of the order of 10-20 s, not 10
minutes.
Do I miss something?


1.b.) Alternative

It came to my mind to read first all file sizes and then use tapply or
aggregate - but I do not see why it should be faster.

Would it be meaningful to benchmark each individual package?

Although I am not very inclined to wait 10 minutes for each new try out.


2.) Big Packages

Just as a note: there are a few very large packages (in my list of 512
packages):

1  123,566,287               BH
2  113,578,391               sf
3  112,252,652            rgdal
4   81,144,868           magick
5   77,791,374 openNLPmodels.en

I suspect that sf & rgdal have a lot of duplicated data structures
and/or duplicate code and/or duplicated libraries - although I am not an
expert in the field and did not check the sources.


Sincerely,


Leonard

=======


# Package Size:
size.f.pkg = function(path=NULL) {
      if(is.null(path)) path = R.home("library");
      xd = list.dirs(path = path, full.names = FALSE, recursive = FALSE);
      size.f = function(p) {
          p = paste0(path, "/", p);
          sum(file.info(list.files(path=p, pattern=".",
              full.names = TRUE, all.files = TRUE, recursive = TRUE))$size);
      }
      sapply(xd, size.f);
}

size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") {
      x = size.f.pkg(path=path);
      x = as.data.frame(x);
      names(x) = "Size"
      x$Name = rownames(x);
      # Order
      if(sort) {
          id = order(x$Size, decreasing=TRUE)
          x = x[id,];
      }
      if( ! is.null(file)) {
          if( ! is.character(file)) {
              print("Error: Size NOT written to file!");
          } else write.csv(x, file=file, row.names=FALSE);
      }
      return(x);
}

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

         [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to