>>>>> Viechtbauer, Wolfgang (SP) >>>>> on Fri, 8 Jan 2021 13:50:14 +0000 writes:
> Instead of a separate file to store such a list, would it be an idea to add versions of the \href{}{} and \url{} markup commands that are skipped by the URL checks? > Best, > Wolfgang I think John Nash and you misunderstood -- or then I misunderstood -- the original proposal: I've been understanding that there should be a "central repository" of URL exceptions that is maintained by volunteers. And rather *not* that package authors should get ways to skip URL checking.. Martin >> -----Original Message----- >> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Spencer >> Graves >> Sent: Friday, 08 January, 2021 13:04 >> To: r-devel@r-project.org >> Subject: Re: [Rd] URL checks >> >> I also would be pleased to be allowed to provide "a list of known >> false-positive/exceptions" to the URL tests. I've been challenged >> multiple times regarding URLs that worked fine when I checked them. We >> should not be required to do a partial lobotomy to pass R CMD check ;-) >> >> Spencer Graves >> >> On 2021-01-07 09:53, Hugo Gruson wrote: >>> >>> I encountered the same issue today with https://astrostatistics.psu.edu/. >>> >>> This is a trust chain issue, as explained here: >>> https://whatsmychaincert.com/?astrostatistics.psu.edu. >>> >>> I've worked for a couple of years on a project to increase HTTPS >>> adoption on the web and we noticed that this type of error is very >>> common, and that website maintainers are often unresponsive to requests >>> to fix this issue. >>> >>> Therefore, I totally agree with Kirill that a list of known >>> false-positive/exceptions would be a great addition to save time to both >>> the CRAN team and package developers. >>> >>> Hugo >>> >>> On 07/01/2021 15:45, Kirill Müller via R-devel wrote: >>>> One other failure mode: SSL certificates trusted by browsers that are >>>> not installed on the check machine, e.g. the "GEANT Vereniging" >>>> certificate from https://relational.fit.cvut.cz/ . >>>> >>>> K >>>> >>>> On 07.01.21 12:14, Kirill Müller via R-devel wrote: >>>>> Hi >>>>> >>>>> The URL checks in R CMD check test all links in the README and >>>>> vignettes for broken or redirected links. In many cases this improves >>>>> documentation, I see problems with this approach which I have >>>>> detailed below. >>>>> >>>>> I'm writing to this mailing list because I think the change needs to >>>>> happen in R's check routines. I propose to introduce an "allow-list" >>>>> for URLs, to reduce the burden on both CRAN and package maintainers. >>>>> >>>>> Comments are greatly appreciated. >>>>> >>>>> Best regards >>>>> >>>>> Kirill >>>>> >>>>> # Problems with the detection of broken/redirected URLs >>>>> >>>>> ## 301 should often be 307, how to change? >>>>> >>>>> Many web sites use a 301 redirection code that probably should be a >>>>> 307. For example, https://www.oracle.com and https://www.oracle.com/ >>>>> both redirect to https://www.oracle.com/index.html with a 301. I >>>>> suspect the company still wants oracle.com to be recognized as the >>>>> primary entry point of their web presence (to reserve the right to >>>>> move the redirection to a different location later), I haven't >>>>> checked with their PR department though. If that's true, the redirect >>>>> probably should be a 307, which should be fixed by their IT >>>>> department which I haven't contacted yet either. >>>>> >>>>> $ curl -i https://www.oracle.com >>>>> HTTP/2 301 >>>>> server: AkamaiGHost >>>>> content-length: 0 >>>>> location: https://www.oracle.com/index.html >>>>> ... >>>>> >>>>> ## User agent detection >>>>> >>>>> twitter.com responds with a 400 error for requests without a user >>>>> agent string hinting at an accepted browser. >>>>> >>>>> $ curl -i https://twitter.com/ >>>>> HTTP/2 400 >>>>> ... >>>>> <body>...<p>Please switch to a supported browser...</p>...</body> >>>>> >>>>> $ curl -s -i https://twitter.com/ -A "Mozilla/5.0 (X11; Ubuntu; Linux >>>>> x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" | head -n 1 >>>>> HTTP/2 200 >>>>> >>>>> # Impact >>>>> >>>>> While the latter problem *could* be fixed by supplying a browser-like >>>>> user agent string, the former problem is virtually unfixable -- so >>>>> many web sites should use 307 instead of 301 but don't. The above >>>>> list is also incomplete -- think of unreliable links, HTTP links, >>>>> other failure modes... >>>>> >>>>> This affects me as a package maintainer, I have the choice to either >>>>> change the links to incorrect versions, or remove them altogether. >>>>> >>>>> I can also choose to explain each broken link to CRAN, this subjects >>>>> the team to undue burden I think. Submitting a package with NOTEs >>>>> delays the release for a package which I must release very soon to >>>>> avoid having it pulled from CRAN, I'd rather not risk that -- hence I >>>>> need to remove the link and put it back later. >>>>> >>>>> I'm aware of https://github.com/r-lib/urlchecker, this alleviates the >>>>> problem but ultimately doesn't solve it. >>>>> >>>>> # Proposed solution >>>>> >>>>> ## Allow-list >>>>> >>>>> A file inst/URL that lists all URLs where failures are allowed -- >>>>> possibly with a list of the HTTP codes accepted for that link. >>>>> >>>>> Example: >>>>> >>>>> https://oracle.com/ 301 >>>>> https://twitter.com/drob/status/1224851726068527106 400 > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel