Hi Simon, The example of this I'm aware of that is most popular and widely used "in the wild" is the stringi package (which is a dep of the widely used stringr pkg) whose configure file downloads the ICU Data Library (icudt).
See https://github.com/gagolews/stringi/blob/master/configure#L5412 Note it does have some sort of workaround in place for non-internet-capable build machines, but it is external (the build in question fails without the workaround already explicitly performed). Best, ~G On Mon, Sep 26, 2022 at 12:50 PM Simon Urbanek <simon.urba...@r-project.org> wrote: > > > > On Sep 27, 2022, at 8:25 AM, Iñaki Ucar <iu...@fedoraproject.org> wrote: > > > > On Sat, 24 Sept 2022 at 01:55, Simon Urbanek > > <simon.urba...@r-project.org> wrote: > >> > >> Iñaki, > >> > >> I fully agree, this a very common issue since vast majority of server > deployments I have encountered don't allow internet access. In practice > this means that such packages are effectively banned. > >> > >> I would argue that not even (1) or (2) are really an issue, because in > fact the CRAN policy doesn't impose any absolute limits on size, it only > states that the package should be "of minimum necessary size" which means > it shouldn't waste space. If there is no way to reduce the size without > impacting functionality, it's perfectly fine. > > > > "Packages should be of the minimum necessary size" is subject to > > interpretation. And in practice, there is an issue with e.g. packages > > that "bundle" big third-party libraries. There are also packages that > > require downloading precompiled code, JARs... at installation time. > > > > JARs are part of the package, so that's a valid use, no question there, > that's how Java packages do this already. > > Downloading pre-compiled binaries is something that shouldn't be done and > a whole can of worms (since those are not sources and it *is* specific to > the platform, os etc.) that is entirely separate, but worth a separate > discussion. So I still don't see any use cases for actual sources. I do see > a need for better specification of external dependencies which are not part > of the package such that those can be satisfied automatically - but that's > not the problem you asked about. > > > >> That said, there are exceptions such as very large datasets (e.g., as > distributed by Bioconductor) which are orders of magnitude larger than what > is sustainable. I agree that it would be nice to have a mechanism for > specifying such sources. So yes, I like the idea, but I'd like to see more > real use cases to justify the effort. > > > > "More real use cases" like in "more use cases" or like in "the > > previous ones are not real ones"? :) > > > >> The issue with any online downloads, though, is that there is no > guarantee of availability - which is real issue for reproducibility. So one > could argue that if such external sources are required then they should be > on a well-defined, independent, permanent storage such as Zenodo. This > could be a matter of policy as opposed to the technical side above which > would be adding such support to R CMD INSTALL. > > > > Not necessarily. If the package declares the additional sources in the > > DESCRIPTION (probably with hashes), that's a big improvement over the > > current state of things, in which basically we don't know what the > > package tries download, then it may fail, and finally there's no > > guarantee that it's what the author intended in the first place. > > > > But on top of this, R could add a CMD to download those, and then some > > lookaside storage could be used on CRAN. This is e.g. how RPM > > packaging works: the spec declares all the sources, they are > > downloaded once, hashed and stored in a lookaside cache. Then package > > building doesn't need general Internet connectivity, just access to > > the cache. > > > > Sure, I fully agree that it would be a good first step, but I'm still > waiting for examples ;). > > Cheers, > Simon > > > > Iñaki > > > >> > >> Cheers, > >> Simon > >> > >> > >>> On Sep 24, 2022, at 3:22 AM, Iñaki Ucar <iu...@fedoraproject.org> > wrote: > >>> > >>> Hi all, > >>> > >>> I'd like to open this debate here, because IMO this is a big issue. > >>> Many packages do this for various reasons, some more legitimate than > >>> others, but I think that this shouldn't be allowed, because it > >>> basically means that installation fails in a machine without Internet > >>> access (which happens e.g. in Linux distro builders for security > >>> reasons). > >>> > >>> Now, what if connection is suppressed during package load? There are > >>> basically three use cases out there: > >>> > >>> (1) The package requires additional files for the installation (e.g. > >>> the source code of an external library) that cannot be bundled into > >>> the package due to CRAN restrictions (size). > >>> (2) The package requires additional files for using it (e.g., > >>> datasets, a JAR...) that cannot be bundled into the package due to > >>> CRAN restrictions (size). > >>> (3) Other spurious reasons (e.g. the maintainer decided that package > >>> load was a good place to check an online service availability, etc.). > >>> > >>> Again IMO, (3) shouldn't be allowed in any case; (2) should be a > >>> separate function that the user actively calls to download the files, > >>> and those files should be placed into the user dir, and (3) is the > >>> only legitimate use, but then other mechanism should be provided to > >>> avoid connections during package load. > >>> > >>> My proposal to support (3) would be to add a new field in the > >>> DESCRIPTION, "Additional_sources", which would be a comma separated > >>> list of additional resources to download during R CMD INSTALL. Those > >>> sources would be downloaded by R CMD INSTALL if not provided via an > >>> option (to support offline installations), and would be placed in a > >>> predefined place for the package to find and configure them (via an > >>> environment variable or in a predefined subdirectory). > >>> > >>> This proposal has several advantages. Apart from the obvious one > >>> (Internet access during package load can be limited without losing > >>> current functionalities), it gives more visibility to the resources > >>> that packages are using during the installation phase, and thus makes > >>> those installations more reproducible and more secure. > >>> > >>> Best, > >>> -- > >>> Iñaki Úcar > >>> > >>> ______________________________________________ > >>> R-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>> > >> > > > > > > -- > > Iñaki Úcar > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel