[R-pkg-devel] Automated checking defeated by website anti-scraping rules

2025-06-13 Thread Hugh Parsonage
When checking a package on win-devel, I get the NOTE

Found the following (possibly) invalid URLs:
  URL: 
http://classic.austlii.edu.au/au/legis/cth/consol_act/itaa1997240/s4.10.html
From: man/small_business_tax_offset.Rd
Status: 410
Message: Gone
  URL: http://classic.austlii.edu.au/au/legis/cth/consol_act/mla1986131/
From: man/medicare_levy.Rd
Status: 410
Message: Gone
  URL: https://guides.dss.gov.au/social-security-guide/3/4/1/10
From: man/age_pension_age.Rd
Status: 403
Message: Forbidden

The URLs exist (changing to https:// changes nothing) and are
accessible from a browser just fine. They appear to have those HTTP
statuses because of the servers' decision to block 'automated
requests'. As imbecilic as these rules might be (they can probably be
easily defeated), what should be the policy going forward?  I can wrap
these URLs in \code{} to get past the checks, but a better solution
might be available at the check stage.

I think the fact that a check fails when a URL really has failed or
moved is a good thing and should be preserved. I don't just want to
get past the check.


Hugh Parsonage.

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Automated checking defeated by website anti-scraping rules

2025-06-13 Thread Iris Simmons
I have a package that throws the same NOTE when checked, the CRAN
maintainers just let it pass every time. I wouldn't worry about it.

On Fri, Jun 13, 2025, 20:35 Hugh Parsonage  wrote:

> When checking a package on win-devel, I get the NOTE
>
> Found the following (possibly) invalid URLs:
>   URL:
> http://classic.austlii.edu.au/au/legis/cth/consol_act/itaa1997240/s4.10.html
> From: man/small_business_tax_offset.Rd
> Status: 410
> Message: Gone
>   URL: http://classic.austlii.edu.au/au/legis/cth/consol_act/mla1986131/
> From: man/medicare_levy.Rd
> Status: 410
> Message: Gone
>   URL: https://guides.dss.gov.au/social-security-guide/3/4/1/10
> From: man/age_pension_age.Rd
> Status: 403
> Message: Forbidden
>
> The URLs exist (changing to https:// changes nothing) and are
> accessible from a browser just fine. They appear to have those HTTP
> statuses because of the servers' decision to block 'automated
> requests'. As imbecilic as these rules might be (they can probably be
> easily defeated), what should be the policy going forward?  I can wrap
> these URLs in \code{} to get past the checks, but a better solution
> might be available at the check stage.
>
> I think the fact that a check fails when a URL really has failed or
> moved is a good thing and should be preserved. I don't just want to
> get past the check.
>
>
> Hugh Parsonage.
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel