Hi Felix, On Sat, Mar 20, 2021 at 09:44:07PM -0700, Felix Lechner wrote: > On Sat, Mar 20, 2021 at 7:27 PM Jelmer Vernooij <jel...@debian.org> wrote: > > > > https://qa.debian.org/cgi-bin/watch?pkg=jupyter-core > > I saw the traffic on IRC where someone suggested we replace > > .*archive/v?([0-9.]*).tar.gz > > with > > .*archive/.*/v?([0-9.]*).tar.gz > > to fix at least 1,500 affected packages. Unfortunately, that may not > work for jupyter-core, which does not prefix tags with a "v" and for > which "(.*)" catches the slash (or maybe even slashes). > > As a tool without network access, Lintian is not well positioned to > figure out, in general, whether a URL/regex combination works. Would > it be okay if Lintian instead issues two now classification tags? > > The first would occur once per source. It shows the watch file URL and > the regular expression for HTML parsing, possibly followed by "debian > update" (or similar). The second tag would occur once for each of the > options selected, i.e. multiple times. Armed with that information, > the Janitor could probe the URL and figure out which parts need > fixing. I was hoping that lintian could verify that there is at least something after "/archive/" in the matching pattern that could match slashes without relying on the main regex group - that could be done without querying GitHub. That said, that code would have to be updated if GitHub changes again in the future and it may be somewhat tricky code.
The offer for informational tags is appreciated, but as you say - the data is already available in UDD so just providing the pure uscan contents wouldn't help much. The alternative is to just let lintian-brush work without a signal from lintian, and gradually grind through the archive. That'll work too, though it'll take a few months - and we lose the verification from lintian after the fix. Jelmer -- Jelmer Vernooij <jel...@jelmer.uk> PGP Key: https://www.jelmer.uk/D729A457.asc
signature.asc
Description: PGP signature