Re: Package statistics by downloads

2025-05-02 Thread Erik Schulz
> misguided popularity

I would argue a more objective description is that the measurement has bias.
I.e.
- repeat-download bias.
- external-download bias, when using mirrors.
- false-download bias, when malicious actors try to manipulate the
value, for example using many IPs.

I agree that installation-counting popcon avoids the first two, but
also suffers from 'willing-participant' bias. I have no idea how
severe this bias is.
So I have to theorize: maybe server installations are heavily
underrepresented; it also doesn't count the privacy conscious, and
maybe basic users that don't understand what it does and just say no.
I.e. entire classes are missing.

So the download-counting popcon may at least provide some new insights.

> to just download the files many times to increase the popularity

I assume this attack applies to popcon as well? It would be trivial to
push false numbers. I'm not familiar with how it works, but if it just
pushes a list of installed packages, then it is even more trivial to
manipulate the numbers.

The unfortunate conclusion is that we can't rely on these numbers to
track actual popularity, only whether a package is likely being used.
I.e. very low numbers may be given lower priority on mirrors, which
can be relevant for long term archives (e.g. all packages ever used in
Debian for 20 years) that may prefer to only archive packages that are
more than 0.001% likely to be used.
If servers grossly underrepresented in the sample, the data may be
unreliable for this use case.
And download-counting popcon would at-worst include unused packages.
In the worst-case, someone fake-downloading every single package would
render this statistic very hard to use.


On Fri, May 2, 2025 at 1:28 AM Salvo Tomaselli  wrote:
>
> I presume do some misguided popularity ranking like pypi does, by counting the
> number of downloads.
>
> It works terribly because large organizations that actually download it many
> times will set up internal mirrors, so there is no chance for the value to
> have any meaning.
>
> Also on pypi and similar there's an incentive to just download the files many
> times to increase the popularity (I provide a very nice tool to do that
> without consuming too much bandwidth, on my codeberg).
>
> Plus of course, how would we even aggregate all the download counts from all
> the mirrors?
>
> Best
>
>
> --
> Salvo Tomaselli
>
> "Io non mi sento obbligato a credere che lo stesso Dio che ci ha dotato di
> senso, ragione ed intelletto intendesse che noi ne facessimo a meno."
> -- Galileo Galilei
>
> https://ltworf.codeberg.page/
>
>



Re: General Resolution: Interpretation of DFSG on Artificial Intelligence (AI) Models

2025-05-02 Thread Julien Puydt
Hi,

Le ven. 2 mai 2025, 12:47, Debian Project Secretary - Kurt Roeckx <
secret...@debian.org> a écrit :

>
> More information can be found at:
> https://www.debian.org/vote/2025/vote_002


This link leads to a broken page: the first internal link doesn't work so
you need to scroll down manually, and the links to the appendices lead
nowhere.

I hope I'm not the fiftieth reporter, sorry if it's the case!

J.Puydt


Re: Package statistics by downloads

2025-05-02 Thread Otto Kekäläinen
> I'm interested in package popularity. I'm aware of popcon
> (https://popcon.debian.org/), but I'm more interested in actual
> downloads.

I am also interested in usage statistics. I feel it is much more
meaningful to work on packages that I know how have a lot of users.

While neither popcon of download stats are accurate, they still show
trends and relative numbers which can be used to make useful
conclusions. I would be glad to see if people could share ideas on
what stats we could collect and publish instead of just pointing out
flaws in various stats.