Ah, thats embarrassing. Thats a bug in how/where I handle lack of connectivity, rather than me not doing it. I've just push a fix to the github repo that now cleanly passes check with no internet connectivity (much more stringent).
Using a canned file is a bit odd, because in the case where there's no connectivity, the package won't work (the canned file would just set the repositories to URLs that R still won't be able to reach). Anyway, Thanks ~G On Mon, Sep 26, 2022 at 3:11 PM Simon Urbanek <simon.urba...@r-project.org> wrote: > > > > On 27/09/2022, at 11:02 AM, Gabriel Becker <gabembec...@gmail.com> > wrote: > > > > For the record, the only things switchr (my package) is doing internet > wise should be hitting the bioconductor config file ( > http://bioconductor.org/config.yaml) so that it knows the things it need > to know about Bioc repos/versions/etc (at load time, actually, not install > time, but since install does a test load, those are essentially the same). > > > > I have fallback behavior for when the file can't be read, so there > shouldn't be any actual build breakages/install breakages I don't think, > but the check does happen. > > > > $ sandbox-exec -n no-network R CMD INSTALL switchr_0.14.5.tar.gz > [...] > ** testing if installed package can be loaded from final location > Error in readLines(con) : > cannot open the connection to 'http://bioconductor.org/config.yaml' > Calls: <Anonymous> ... getBiocDevelVr -> getBiocYaml -> inet_handlers -> > readLines > Execution halted > ERROR: loading failed > > So, yes, it does break. You should recover from the error and use a > fall-back file that you ship. > > Cheers, > Simon > > > > Advice on what to do for the above use case that is better practice is > welcome. > > > > ~G > > > > On Mon, Sep 26, 2022 at 2:40 PM Simon Urbanek < > simon.urba...@r-project.org> wrote: > > > > > > > On 27/09/2022, at 10:21 AM, Iñaki Ucar <iu...@fedoraproject.org> > wrote: > > > > > > On Mon, 26 Sept 2022 at 23:07, Simon Urbanek > > > <simon.urba...@r-project.org> wrote: > > >> > > >> Iñaki, > > >> > > >> I'm not sure I understand - system dependencies are an entirely > different topic and I would argue a far more important one (very happy to > start a discussion about that), but that has nothing to do with declaring > downloads. I assumed your question was about large files in packages which > packages avoid to ship and download instead so declaring them would be > useful. > > > > > > Exactly. Maybe there's a misunderstanding, because I didn't talk about > system dependencies (alas there are packages that try to download things > that are declared as system dependencies, as Gabe noted). :) > > > > > > > > > Ok, understood. I would like to tackle those as well, but let's start > that conversation in a few weeks when I have a lot more time. > > > > > > >> And for that, the obvious answer is they shouldn't do that - if a > package needs a file to run, it should include it. So an easy solution is > to disallow it. > > > > > > Then we completely agree. My proposal about declaring additional > sources was because, given that so many packages do this, I thought that I > would find a strong opposition to this. But if R Core / CRAN is ok with > just limiting net access at install time, then that's perfect to me. :) > > > > > > > Yes we do agree :). I started looking at your list, and so far those > seem simply bugs or design deficiencies in the packages (and outright > policy violations). I think the only reason they exist is that it doesn't > get detected in CRAN incoming, it's certainly not intentional. > > > > Cheers, > > Simon > > > > > > > Iñaki > > > > > >> But so far all examples where just (ab)use of downloads for binary > dependencies which is an entirely different issue that needs a different > solution (in a naive way declaring such dependencies, but we know it's not > that simple - and download URLs don't help there). > > >> > > >> Cheers, > > >> Simon > > >> > > >> > > >>> On 27/09/2022, at 8:25 AM, Ucar <iu...@fedoraproject.org> wrote: > > >>> > > >>> On Sat, 24 Sept 2022 at 01:55, Simon Urbanek > > >>> <simon.urba...@r-project.org> wrote: > > >>>> > > >>>> Iñaki, > > >>>> > > >>>> I fully agree, this a very common issue since vast majority of > server deployments I have encountered don't allow internet access. In > practice this means that such packages are effectively banned. > > >>>> > > >>>> I would argue that not even (1) or (2) are really an issue, because > in fact the CRAN policy doesn't impose any absolute limits on size, it only > states that the package should be "of minimum necessary size" which means > it shouldn't waste space. If there is no way to reduce the size without > impacting functionality, it's perfectly fine. > > >>> > > >>> "Packages should be of the minimum necessary size" is subject to > > >>> interpretation. And in practice, there is an issue with e.g. packages > > >>> that "bundle" big third-party libraries. There are also packages that > > >>> require downloading precompiled code, JARs... at installation time. > > >>> > > >>>> That said, there are exceptions such as very large datasets (e.g., > as distributed by Bioconductor) which are orders of magnitude larger than > what is sustainable. I agree that it would be nice to have a mechanism for > specifying such sources. So yes, I like the idea, but I'd like to see more > real use cases to justify the effort. > > >>> > > >>> "More real use cases" like in "more use cases" or like in "the > > >>> previous ones are not real ones"? :) > > >>> > > >>>> The issue with any online downloads, though, is that there is no > guarantee of availability - which is real issue for reproducibility. So one > could argue that if such external sources are required then they should be > on a well-defined, independent, permanent storage such as Zenodo. This > could be a matter of policy as opposed to the technical side above which > would be adding such support to R CMD INSTALL. > > >>> > > >>> Not necessarily. If the package declares the additional sources in > the > > >>> DESCRIPTION (probably with hashes), that's a big improvement over the > > >>> current state of things, in which basically we don't know what the > > >>> package tries download, then it may fail, and finally there's no > > >>> guarantee that it's what the author intended in the first place. > > >>> > > >>> But on top of this, R could add a CMD to download those, and then > some > > >>> lookaside storage could be used on CRAN. This is e.g. how RPM > > >>> packaging works: the spec declares all the sources, they are > > >>> downloaded once, hashed and stored in a lookaside cache. Then package > > >>> building doesn't need general Internet connectivity, just access to > > >>> the cache. > > >>> > > >>> Iñaki > > >>> > > >>>> > > >>>> Cheers, > > >>>> Simon > > >>>> > > >>>> > > >>>>> On Sep 24, 2022, at 3:22 AM, Iñaki Ucar <iu...@fedoraproject.org> > wrote: > > >>>>> > > >>>>> Hi all, > > >>>>> > > >>>>> I'd like to open this debate here, because IMO this is a big issue. > > >>>>> Many packages do this for various reasons, some more legitimate > than > > >>>>> others, but I think that this shouldn't be allowed, because it > > >>>>> basically means that installation fails in a machine without > Internet > > >>>>> access (which happens e.g. in Linux distro builders for security > > >>>>> reasons). > > >>>>> > > >>>>> Now, what if connection is suppressed during package load? There > are > > >>>>> basically three use cases out there: > > >>>>> > > >>>>> (1) The package requires additional files for the installation > (e.g. > > >>>>> the source code of an external library) that cannot be bundled into > > >>>>> the package due to CRAN restrictions (size). > > >>>>> (2) The package requires additional files for using it (e.g., > > >>>>> datasets, a JAR...) that cannot be bundled into the package due to > > >>>>> CRAN restrictions (size). > > >>>>> (3) Other spurious reasons (e.g. the maintainer decided that > package > > >>>>> load was a good place to check an online service availability, > etc.). > > >>>>> > > >>>>> Again IMO, (3) shouldn't be allowed in any case; (2) should be a > > >>>>> separate function that the user actively calls to download the > files, > > >>>>> and those files should be placed into the user dir, and (3) is the > > >>>>> only legitimate use, but then other mechanism should be provided to > > >>>>> avoid connections during package load. > > >>>>> > > >>>>> My proposal to support (3) would be to add a new field in the > > >>>>> DESCRIPTION, "Additional_sources", which would be a comma separated > > >>>>> list of additional resources to download during R CMD INSTALL. > Those > > >>>>> sources would be downloaded by R CMD INSTALL if not provided via an > > >>>>> option (to support offline installations), and would be placed in a > > >>>>> predefined place for the package to find and configure them (via an > > >>>>> environment variable or in a predefined subdirectory). > > >>>>> > > >>>>> This proposal has several advantages. Apart from the obvious one > > >>>>> (Internet access during package load can be limited without losing > > >>>>> current functionalities), it gives more visibility to the resources > > >>>>> that packages are using during the installation phase, and thus > makes > > >>>>> those installations more reproducible and more secure. > > >>>>> > > >>>>> Best, > > >>>>> -- > > >>>>> Iñaki Úcar > > >>>>> > > >>>>> ______________________________________________ > > >>>>> R-devel@r-project.org mailing list > > >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel > > >>>>> > > >>>> > > >>> > > >>> > > >>> -- > > >>> Iñaki Úcar > > >> > > > > > > > > > -- > > > Iñaki Úcar > > > > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel