Hi Stuart, Am Wed, Jan 08, 2025 at 12:46:58PM +1100 schrieb Stuart Prescott: > > Lets think about some better fine tuning. "NOT LIKE '%salsa%'" might > > catch also Vcs URLs that are intentionally somewhere else. While I'd > > love to see all packages on Salsa, it might be sensible to start with > > packages that are unintentionally not in Salsa so > > > > udd=> SELECT COUNT(DISTINCT source) FROM sources WHERE release = 'sid' AND > > (vcs_url IS NULL OR vcs_url like '%alioth%' OR vcs_url like > > '%git.debian.org%' OR vcs_url like '%svn.debian.org%') ; > > count > > ------- > > 2213
For completeness I need to add `OR vcs_url like '%anonscm.debian.org%'` which bumps the counter to 2947 ... > > That might make a real challenge to bring that number below 2000 until > > end of my term. Any help to approach this is welcome. ... and my challenge to bring that number below 2000 nearly out of reach (except if lots of people might subscribe this effort). > Well, let's look at some of these other d.o URLs. > > - Not our alioth: There are 16 vcs URLs in there that have 'alioth' > in them but aren't alioth.debian.org; they are git hosted but not > on Debian infrastructure (and perhaps not in a place that facilitates > collaboration in the way being discussed) Ahhh, you mean https://evolvis.org/anonscm/git/alioth ? Thank you for the hint. So the updated query is SELECT COUNT(DISTINCT source) FROM sources WHERE release = 'sid' AND (vcs_url IS NULL OR vcs_url like '%alioth.debian.org%' OR vcs_url like '%git.debian.org%' OR vcs_url like '%svn.debian.org%' OR vcs_url like '%anonscm.debian.org%') ; 2930 > - dgit.debian.org: There are 30 in there that are dgit.debian.org. > That surprised me, maybe I don't know enough about dgit. I consider dgit.debian.org a valid Vcs field. It might preferable to have Salsa the main repostory and dgit.d.o just a clone of this, but for the moment I'm trying to seek for obviously unmaintained Vcs fields or no Vcs at all. > - git.debian.org: There are 146 with git.debian.org - none of these VCS > URLs work any more Yes, that's my point: Fix things that don't work. > - svn.debian.org: !4 list svn.d.o but like git.d.o that's dead. svn.d.o > doesn't even exist as a hostname any more. Same here. > There's 161 packages in sid with old d.o URLs pointing to alioth. There's a > reasonable chance that a good portion of them are also not maintained. > > - 11% of them list their maintainer as Debian QA Group > - 13% of them have a current O bug (another 1 with an RFA) > - who knows how many are otherwise abandoned with MIA maintainers or > maintainers who have just moved on to other things Spotting obviously broken Vcs fields (or no Vcs fields) is one way to seek for unmaintained packages. It might turn out that this indicator is misleading but to my experience from Bug of the Day this is really a rare exception. > There was a recent discussion about what to do with VCSes for orphaned > packages. Maybe if it doesn't exist on salsa, it's worth creating one in the > salsa.d.o/debian/ namespace as part of doing the QA upload? > (gbp import-dscs --debsnap) That would be a good outcome and a good little > project for someone... ... which I would really welcome but we need "someone" who volunteers. > The vast majority of these packages have seen post-alioth uploads but with > the broken Vcs fields still in place. Do you have numbers backing up this "vast majority" statement? To my experience these Uploads where NMUs but not maintainer uploads. This brings me back to my argument that restrictions on NMUs for acceptable changes are preventing NMUers to look for such issues. In most cases where I salvaged packages NMUs where not even pushed to a repository that might exist on Salsa. So having repositories on Salsa without doing an upload with fixed Vcs fields (I've seen lots of these with changelog entries by Janitor) are potentially triggering regressions. The maintainer might simply continue working on the status of the Git repository bumping the Debian revision to something higher than the NMU and the changes of NMU might become lost. > That's perhaps offering the opposite > of collaborative development? The question is whether the repo has actually > moved to salsa but d/control hasn't been updated, or whether the repo has > just vanished. An MBF that the VCS fields are out of date is easy, but > checking and fixing is likely manual work. Its definitely manual work. In most cases you also have to check the Homepage and the watch file of the project. My gut feeling is about 30% of the Homepages of the Bug of the Day-salvaged packages were broken. > year | count > -----+------- > 2011 | 1 > 2012 | 4 > 2013 | 3 > 2014 | 4 > 2015 | 1 > 2016 | 1 > 2017 | 2 > 2018 | 2 (salsa.d.o general availability) > 2019 | 1 > 2020 | 13 > 2021 | 95 > 2022 | 20 > 2023 | 7 > 2024 | 6 > 2025 | 1 Most of these until 2019 will be probably fetched by Bug of the Day sooner or later. Helping hands are always welcome. > I noticed that some teams have some lintian tags checking this from a team > policy perspective - doing this more broadly for other teams would help > provide teams with visibility via lintian.d.o reports. > > lintian-explain-tags -t team/pkg-perl/vcs/no-git \ > team/pkg-perl/vcs/no-team-url Nice. > (I accidentally found 2 python-team packages without Vcs URLs yesterday - > the repos were on salsa, just not listed in d/control) Not so nice. Did you just injected these? If not would you mind naming the packages? > Over half of these old alioth URLs can be addressed by Teams doing some data > normalisation and uploads: > > maintainer_name | count > -------------------------------+------- > Debian Perl Group | 72 > Debian Java Maintainers | 10 > Debian X Strike Force | 7 > Debian XML/SGML Group | 4 > Debian Science Maintainers | 3 > Debian CLI Applications Team | 2 > Debian Ruby Extras Maintainers | 1 > Debian Javascript Maintainers | 1 > Debian Telepathy maintainers | 1 > Debian Fonts Task Force | 1 > Debian CLI Libraries Team | 1 > Debian-IN Team | 1 > Debichem Team | 1 > NeuroDebian Team | 1 > The Debian Lua Team | 1 I find even 13 in Science team and will try to tackle these (or ask for removal). ( SELECT source, maintainer, vcs_url FROM sources WHERE release = 'sid' AND vcs_url not like '%salsa%' AND maintainer like '%science%' ; ) > So in terms of where to start... perhaps there's a couple of teams that > would like to do some data cleansing? It would be really great if this thread would have this effect. Thanks a lot for your analysis Andreas. SELECT s.source, date, vcs_url FROM sources AS s JOIN upload_history AS h ON s.source = h.source AND s.version = h.version WHERE release = 'sid' AND vcs_url ~ '/(git|svn|alioth).debian.org' ORDER BY date DESC; SELECT DATE_PART('year', date) AS year, COUNT(*) FROM sources AS s JOIN upload_history AS h ON s.source = h.source AND s.version = h.version WHERE release = 'sid' AND vcs_url ~ '/(git|svn|alioth).debian.org' GROUP BY year ORDER BY year ASC; SELECT maintainer, COUNT(*) FROM sources WHERE release = 'sid' AND vcs_url ~ '/(git|svn|alioth).debian.org' AND maintainer ~ '(team|group|lists)' GROUP BY maintainer ORDER BY count DESC; -- https://fam-tille.de