Re: Seeking a small group to package Apache Arrow (was: Bug#970021: RFP: apache-arrow -- cross-language development platform for in-memory analytics)

2024-04-04 Thread Richard Duivenvoorde

On 3/25/24 7:17 PM, Julian Gilbey wrote:

So this is a plea for anyone looking for something really helpful to
do: it would be great to have a group of developers finally package
this!  There was some initial work done (see the RFP bug report for
details: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=970021),
but that is fairly old now.  As Apache Arrow supports numerous
languages, it may well benefit from having a group of developers with
different areas of expertise to build it.  (Or perhaps it would make
more sense to split the upstream source into a collection of different
Debian source packages for the different supported languages.  I don't
know.)  Unfortunately I don't have the capacity to devote any time to
it myself.

Thanks in advance for anyone who can step forward for this!


As someone from the Debian-GIS community, I would also be very interested in 
this!

The Apache Arrow C++ library is one of the dependencies to make GDAL/OGR able 
to read/write (geo)parquet files, a data format with a lot traction in the geo 
community [0]. Thereby making it possible for QGIS to handle those (on Debian).

[0] 
https://cloudnativegeo.org/blog/2023/09/duckdb-the-indispensable-geospatial-tool-you-didnt-know-you-were-missing/

Regards,

Richard Duivenvoorde



Permission to distribute

2024-04-04 Thread John Lee
Hello Debian Team,

I just wondered if I can sell computers that I build with Debian Linux
pre-installed. The computers may also include programs I create. I tried to
find the answer to this question but still unsure.

If you need more details please let me know. Any information is greatly
appreciated! Thanks!

Sincerely,
John Lee


Re: Permission to distribute

2024-04-04 Thread Pierre-Elliott Bécue
Hi

John Lee  wrote on 04/04/2024 at 10:01:48+0200:
> Hello Debian Team, 
>
> I just wondered if I can sell computers that I build with Debian Linux
> pre-installed. The computers may also include programs I create. I
> tried to find the answer to this question but still unsure.
>
> If you need more details please let me know. Any information is
> greatly appreciated! Thanks!

Debian is a Free GNU/Linux Distribution. You may sell any system having
Debian installed on it.

The only thing you need to care about is not infringing the license of
the software you'll distribute.

-- 
PEB


signature.asc
Description: PGP signature


Re: Debian openssh option review: considering splitting out GSS-API key exchange

2024-04-04 Thread Florian Lohoff
On Tue, Apr 02, 2024 at 01:30:43PM +0200, Marc Haber wrote:
> On Tue, 2 Apr 2024 01:30:10 +0100, Colin Watson 
> wrote:
> >We carry a patch to restore support for TCP wrappers, which was dropped
> >in OpenSSH 6.7 (October 2014); see
> >https://lists.mindrot.org/pipermail/openssh-unix-dev/2014-April/032497.html
> >and thread.  That wasn't long before the Debian 8 (jessie) freeze, and
> >so I patched it back in "temporarily", but then I dropped the ball on
> >organizing a proper transition. 
> 
> Please don't drop the mechanism that saved my¹ unstable installations
> from being vulnerable to the current xz-based attack. Just having to
> dump an ALL: ALL into /etc/hosts.deny is vastly easier than having to
> maintain a packet filter.
> 
> Greetings
> Marc
> 
> ¹ and probably thousands others

In the good old days we relied on any network facing service to be
linked to tcp wrappers so a single line would secure your system against
the network with all the possible intruders. This is how i worked for
decades.

These times have long gone and tcp wrapper as a security mechanism has
lost its reliability, this is why people started moving away from tcp
wrapper (which i think is a shame)

I personally moved to nftables which is nearly as simple once you get
your muscle memory set. If ssh is your only candidate of network service
you could also use match statements in /etc/ssh/sshd_config.d/.

So - i am okay with removing the libwrap dependency (not happy)

Flo
-- 
Florian Lohoff f...@zz.de
  Any sufficiently advanced technology is indistinguishable from magic.


signature.asc
Description: PGP signature


Re: Debian openssh option review: considering splitting out GSS-API key exchange

2024-04-04 Thread Stephan Seitz

Am Di, Apr 02, 2024 at 13:30:43 +0200 schrieb Marc Haber:

from being vulnerable to the current xz-based attack. Just having to
dump an ALL: ALL into /etc/hosts.deny is vastly easier than having to
maintain a packet filter.


Stupid question, but if you put „ALL: ALL” into hosts.deny, couldn’t you 
stop the ssh daemon instead? ALL: ALL will block your ssh access, so it 
doesn’t matter if the daemon is running or not.


Stephan

--
|If your life was a horse, you'd have to shoot it.|



Re: Debian openssh option review: considering splitting out GSS-API key exchange

2024-04-04 Thread Marc Haber
On Thu, 4 Apr 2024 13:03:50 +0200, Florian Lohoff  wrote:
>I personally moved to nftables which is nearly as simple once you get
>your muscle memory set.

So you have dedicated packet filters on every machine you run, even if
sshd is the only network-facing service?

Greetings
Marc
-- 

Marc Haber |   " Questions are the | Mailadresse im Header
Rhein-Neckar, DE   | Beginning of Wisdom " | 
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402



ufw (was Re: Debian openssh option review: considering splitting out GSS-API key exchange)

2024-04-04 Thread Holger Levsen
On Thu, Apr 04, 2024 at 01:32:11PM +0200, Marc Haber wrote:
> So you have dedicated packet filters on every machine you run, even if
> sshd is the only network-facing service?

on most machines and it was as simple as doing:

apt install ufw
ufw allow ssh
ufw enable

voila, done. rules configured like above end up in /etc/ufw/user.rules and
user6.rules. quite simple, quite nice.


-- 
cheers,
Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

Kinda weird that we’re all gonna experience climate change as a series of
short, apocalyptic videos until eventually it’s your phone that’s recording.
(@shocks)


signature.asc
Description: PGP signature


Re: Seeking a small group to package Apache Arrow (was: Bug#970021: RFP: apache-arrow -- cross-language development platform for in-memory analytics)

2024-04-04 Thread Thomas Goirand

On 3/25/24 19:17, Julian Gilbey wrote:

Hi all,

[NB: sent to d-science, d-python, d-devel and the RFP bug; reply-to
set to d-science and the RFP bug only]

An update on Apache Arrow, and in particular the Python library
PyArrow.  For those who don't know:

   Apache Arrow is a development platform for in-memory analytics. It
   contains a set of technologies that enable big data systems to
   process and move data fast. It specifies a standardized
   language-independent columnar memory format for flat and
   hierarchical data, organized for efficient analytic operations on
   modern hardware.

   The project is developing a multi-language collection of libraries
   for solving systems problems related to in-memory analytical data
   processing. This includes such topics as:

   * Zero-copy shared memory and RPC-based data movement

   * Reading and writing file formats (like CSV, Apache ORC, and Apache
 Parquet)

   * In-memory analytics and query processing

   (from: https://arrow.apache.org/docs/index.html)

Pandas has announced that Pandas 3.x will depend on PyArrow
in a critical way (it will back the "string" datatype), and it is due
to be released imminently.

So this is a plea for anyone looking for something really helpful to
do: it would be great to have a group of developers finally package
this!  There was some initial work done (see the RFP bug report for
details: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=970021),
but that is fairly old now.  As Apache Arrow supports numerous
languages, it may well benefit from having a group of developers with
different areas of expertise to build it.  (Or perhaps it would make
more sense to split the upstream source into a collection of different
Debian source packages for the different supported languages.  I don't
know.)  Unfortunately I don't have the capacity to devote any time to
it myself.

Thanks in advance for anyone who can step forward for this!

Best wishes,

Julian


Hi,

I may not have much available time to help, though I'd love to have 
Arrow in Debian, as Ceph uses it, and currently use an embedded version.


Cheers,

Thomas Goirand (zigo)



Re: Debian openssh option review: considering splitting out GSS-API key exchange

2024-04-04 Thread Russ Allbery
Florian Lohoff  writes:

> These times have long gone and tcp wrapper as a security mechanism has
> lost its reliability, this is why people started moving away from tcp
> wrapper (which i think is a shame)

> I personally moved to nftables which is nearly as simple once you get
> your muscle memory set. If ssh is your only candidate of network service
> you could also use match statements in /etc/ssh/sshd_config.d/.

For what it's worth, I have iptables (I know, it's nftables under the hood
now, but I'm still using the iptables syntax because the number of hours
in each day is annoyingly low) on every system I run and I still use TCP
wrappers for ssh restrictions for one host.  That's because I have users
who use various ISPs, and for some of those ISPs, DNS-based restrictions
are less maintenance work than playing whack-a-mole with their
ever-changing IP blocks.

Yes, yes, I know this isn't actually secure, etc., but that's fine, I'm
not using it as a primary security measure.  I'm using it to narrow the
number of hosts on the Internet that can exploit an sshd vulnerability,
and to reduce the amount of annoying automated exploit attempts I get.
(Exactly the kind of thing that helps mildly against situations like the
xz backdoor.)

That said, the point that I could switch over to Match blocks in the sshd
configuration is well-taken, and not wanting to take an hour to rewrite my
rules in a different configuration format is probably not a good enough
reason to keep a dependency in a security-critical, network-exposed
service.  I'm mildly grumbly becuase it's yet another thing I have to
change just to keep things from breaking, but such is life.

-- 
Russ Allbery (r...@debian.org)  



Re: Debian openssh option review: considering splitting out GSS-API key exchange

2024-04-04 Thread Marc Haber
On Thu, 4 Apr 2024 13:25:04 +0200, Stephan Seitz
 wrote:
>Am Di, Apr 02, 2024 at 13:30:43 +0200 schrieb Marc Haber:
>>from being vulnerable to the current xz-based attack. Just having to
>>dump an ALL: ALL into /etc/hosts.deny is vastly easier than having to
>>maintain a packet filter.
>
>Stupid question, but if you put „ALL: ALL” into hosts.deny, couldn’t you 
>stop the ssh daemon instead? ALL: ALL will block your ssh access, so it 
>doesn’t matter if the daemon is running or not.

Of course there are sshd: lines in hosts.allow for "my" networks.

Greetings
Marc
-- 

Marc Haber |   " Questions are the | Mailadresse im Header
Rhein-Neckar, DE   | Beginning of Wisdom " | 
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402



Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread kpcyrd

On 4/3/24 4:21 AM, Adrian Bunk wrote:

On Wed, Apr 03, 2024 at 02:31:11AM +0200, kpcyrd wrote:

...
I figured out a somewhat straight-forward way to check if a given `git
archive` output is cryptographically claimed to be the source input of a
given binary package in either Arch Linux or Debian (or both).


For Debian the proper approach would be to copy Checksums-Sha256 for the
source package to the buildinfo file, and there is nothing where it would
matter whether the tarball was generated from git or otherwise.


I believe this to be the "reproducible source tarball" thing some people
have been asking about.
...


The lack of a reliably reproducible checksum when using "git archive" is
the problem, and git cannot realistically provide that.

Even when called with the same parameters, "git archive" executed in
different environments might produce different archives for the same
commit ID.

It is documented that auto-generated Github tarballs for the same tag
and with the same commit ID downloaded at different times might have
different checksums.


Granted it takes some skill to take snapshots that match what github is 
generating (and there are occasional issues) but generally speaking it 
works quite well. The required command is in the README, and I encourage 
you to give it a try.


If you want something that's explicitly designed for taking reproducible 
VCS snapshots you could also consider the "Nix Archive" format[0], 
however I think more people would be in favor of agreeing on how to 
canonically derive a given git tree into a `.tar.gz` (or at least .tar) 
instead of switching Debian to the .nar file format.


[0]: https://github.com/ebkalderon/libnar

I think regular `git archive` is already pretty good, complaining that 
it may only work in 98% of cases, I'd say, is a Luxusproblem considering 
the current state of things. The next paragraph is the bigger headache:



This tool highlights the concept of "canonical sources", which is supposed
to give guidance on what to code review.
...


How does it tell the git commit ID the tarball was generated from?

Doing a code review of git sources as tarball would would be stupid,
you really want the git metadata that usually shows when, why and by
whom something was changed.


It doesn't. It works like a one-way function, it can verify a given VCS 
snapshot is definitely the source code that was ingested into Debian, 
but it can't locate the source code on its own.


I don't know if Debian has this kind of provenance information 
available, to my knowledge, Debian operates on "our maintainers upload 
.tar.xz files into our archive and we take them for face value". Which 
does make sense, considering not every software project uses git, some 
may develop their own VCS, some software projects do not have any VCS at 
all and it's just one person applying patches to a folder on their local 
computer and uploading .tar snapshots to a webserver every other month.


There's some packages that have some kind of system behind them, like 
rust-toml_0.5.11.orig.tar.gz in the Debian Archive can be expected to 
match  (although 
sometimes files get excluded from the tar upload). I'd like to 
explicitly encourage people to point me in the right direction if 
there's any existing effort of mapping debian .orig.tar.gz files to git 
tags (not necessarily bit-for-bit, but at least which commit we expect 
it to come from).



https://github.com/kpcyrd/backseat-signed

The README
...


"This requires some squinting since in Debian the source tarball is
  commonly recompressed so only the inner .tar is compared"

This doesn't sound true.


I've updated the wording and intend to investigate this further. By 
default the relevant command even expects an exact match. For example 
this works:


```
% backseat-signed plumbing debian-tarball-from-sources --sources 
Sources.xz --name cmatrix cmatrix_2.0.orig.tar.gz
[2024-04-04T18:45:09Z INFO  backseat_signed::plumbing] Loading sources 
index from "Sources.xz"
[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] Loading file from 
"cmatrix_2.0.orig.tar.gz"

[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] Searching in index...
[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] File verified 
successfully

```

But if I repack the .tar.gz into .tar.xz it's going to get rejected:

```
% backseat-signed plumbing debian-tarball-from-sources --sources 
Sources.xz --name cmatrix cmatrix_2.0.orig.tar.xz
[2024-04-04T18:48:32Z INFO  backseat_signed::plumbing] Loading sources 
index from "Sources.xz"
[2024-04-04T18:48:33Z INFO  backseat_signed::plumbing] Loading file from 
"cmatrix_2.0.orig.tar.xz"

[2024-04-04T18:48:33Z INFO  backseat_signed::plumbing] Searching in index...
Error: Could not find source tarball with matching hash in source index
```

Being able to disregard the compression layer is still necessary 
however, because Debian (as far as I know) never takes the h

Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread Jeremy Stanley
On 2024-04-04 21:39:51 +0200 (+0200), kpcyrd wrote:
[...]
> I don't know if Debian has this kind of provenance information available, to
> my knowledge, Debian operates on "our maintainers upload .tar.xz files into
> our archive and we take them for face value". Which does make sense,
> considering not every software project uses git, some may develop their own
> VCS, some software projects do not have any VCS at all and it's just one
> person applying patches to a folder on their local computer and uploading
> .tar snapshots to a webserver every other month.
[...]

Looking at this with my upstream hat on, there is more information
in a Git repository than is represented in a flat export of its
worktree. Some projects consider the Git metadata context to be part
of the source code, and run source build processes in order to bake
that additional information into our source archives.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: Debian openssh option review: considering splitting out GSS-API key exchange

2024-04-04 Thread Henrique de Moraes Holschuh
On Tue, Apr 2, 2024, at 07:04, Marco d'Itri wrote:
> On Apr 02, Colin Watson  wrote:
>
>> At the time, denyhosts was popular, but it was removed from Debian
>> several years ago.  I remember that, when I dealt with that on my own
>> systems, fail2ban seemed like the obvious replacement, and my impression
>> is that it's pretty widely used nowadays; it's very pluggable but it
>> normally works by adding firewall rules.  Are there any similar popular
>> systems left that rely on editing /etc/hosts.deny?
> Yes, people. I object to removing TCP wrappers support since the patch 
> is tiny and it supports use cases like DNS-based ACLs which cannot be 
> supported by L3 firewalls.

If libwrap is bringing in complex libs, maybe we could reduce the attack 
surface on libwrap itself?  It would be nice to have a variant that only links 
to the libc and that's it...

And that benefits everything that links to TCP wrappers...

I also like to have the (old-school) standard extra layer of protection that 
libwrap can provide. I'd like to find a way to keep it useful for sshd.

-- 
  Henrique de Moraes Holschuh 



Re: Debian openssh option review: considering splitting out GSS-API key exchange

2024-04-04 Thread Colin Watson
On Thu, Apr 04, 2024 at 06:42:08PM -0300, Henrique de Moraes Holschuh wrote:
> If libwrap is bringing in complex libs, maybe we could reduce the
> attack surface on libwrap itself?  It would be nice to have a variant
> that only links to the libc and that's it...

Yeah, that's https://bugs.debian.org/1068311 which I linked to elsewhere
in this thread.

-- 
Colin Watson (he/him)  [cjwat...@debian.org]



Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread Adrian Bunk
On Thu, Apr 04, 2024 at 09:39:51PM +0200, kpcyrd wrote:
>...
> I've checked both, upstreams github release page and their website[1], but
> couldn't find any mention of .tar.xz, so I think my claim of Debian doing
> the compression is fair.
> 
> [1]: https://www.vim.org/download.php
>...

Perhaps that's a maintainer running "git archive" manually?

Hashes of "git archive" tarballs are anyway not stable,
so whatever a maintainer generates is not worse than what is on Github.

Any proper tooling would have to verify that the contents is equal.

>...
> Being able to disregard the compression layer is still necessary however,
> because Debian (as far as I know) never takes the hash of the inner .tar
> file but only the compressed one. Because of this you may still need to
> provide `--orig ` if you want to compare with an uncompressed tar.
>...

Right now the preferred form of source in Debian is an upstream-signed 
release tarball, NOT anything from git.

An actual improvement would be to automatically and 100% reliably
verify that a given tarball matches the commit ID and signed git tag
in an upstream git tree.

But for that writing tooling would be the trivial part,
architectural topics like where to store the commit ID
and where to store the git tree would be the harder parts.

Or perhaps stop using tarballs in Debian as sole permitted
form of source.

> cheers,
> kpcyrd

cu
Adrian



Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread James McCoy
On Fri, Apr 05, 2024 at 01:31:25AM +0300, Adrian Bunk wrote:
> On Thu, Apr 04, 2024 at 09:39:51PM +0200, kpcyrd wrote:
> >...
> > I've checked both, upstreams github release page and their website[1], but
> > couldn't find any mention of .tar.xz, so I think my claim of Debian doing
> > the compression is fair.
> > 
> > [1]: https://www.vim.org/download.php
> >...
> 
> Perhaps that's a maintainer running "git archive" manually?

Yes, in whichever way git-deborig(1) is driving git archive.

Cheers,
-- 
James
GPG Key: 4096R/91BF BF4D 6956 BD5D F7B7  2D23 DFE6 91AE 331B A3DB



Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread kpcyrd

On 4/5/24 12:31 AM, Adrian Bunk wrote:

Hashes of "git archive" tarballs are anyway not stable,
so whatever a maintainer generates is not worse than what is on Github.

Any proper tooling would have to verify that the contents is equal.


...
Being able to disregard the compression layer is still necessary however,
because Debian (as far as I know) never takes the hash of the inner .tar
file but only the compressed one. Because of this you may still need to
provide `--orig ` if you want to compare with an uncompressed tar.
...


Right now the preferred form of source in Debian is an upstream-signed
release tarball, NOT anything from git.

An actual improvement would be to automatically and 100% reliably
verify that a given tarball matches the commit ID and signed git tag
in an upstream git tree.


I strongly disagree. I think the upstream signature is overrated.

It's from the old mindset of code signing being the only way of securely 
getting code from upstream. Recent events have shown (instead of 
bothering upstream for signatures) it's much more important to have 
clarity and transparency what's in the code that is compiled into 
binaries and executed on our computers, instead of who we got it from. 
The entire reproducible builds effort is based on the idea of the source 
code in Debian being safe and sound to use.


If upstream refused to sign anything but pre-compiled llvm IR, I'd put 
both the IR and signature in the trash and build from source code.


If upstream wouldn't sign anything but autotools pre-processed archives 
with 25k lines of auto-generated shell scripts I'd put it next to the IR 
and build from the actual source code as well.


If upstream would only sign a tarball with files sorted in the order 
they were returned by their kernel to readdir(), I'd raise the question 
why we're having this in 2024 (and possibly suggest to use a tar with 
sorted entries).


Although to be honest if this would really be the only problem we'd be 
having, I'd likely not care anymore and put my time to better use.



Or perhaps stop using tarballs in Debian as sole permitted
form of source.


I'd be fine with that.

cheers,
kpcyrd



Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread Adrian Bunk
On Fri, Apr 05, 2024 at 01:30:51AM +0200, kpcyrd wrote:
> On 4/5/24 12:31 AM, Adrian Bunk wrote:
> > Hashes of "git archive" tarballs are anyway not stable,
> > so whatever a maintainer generates is not worse than what is on Github.
> > 
> > Any proper tooling would have to verify that the contents is equal.
> > 
> > > ...
> > > Being able to disregard the compression layer is still necessary however,
> > > because Debian (as far as I know) never takes the hash of the inner .tar
> > > file but only the compressed one. Because of this you may still need to
> > > provide `--orig ` if you want to compare with an uncompressed tar.
> > > ...
> > 
> > Right now the preferred form of source in Debian is an upstream-signed
> > release tarball, NOT anything from git.
> > 
> > An actual improvement would be to automatically and 100% reliably
> > verify that a given tarball matches the commit ID and signed git tag
> > in an upstream git tree.
> 
> I strongly disagree. I think the upstream signature is overrated.

The best we can realistically verify is that the code is from upstream.

> It's from the old mindset of code signing being the only way of securely
> getting code from upstream. Recent events have shown (instead of bothering
> upstream for signatures) it's much more important to have clarity and
> transparency what's in the code that is compiled into binaries and executed
> on our computers, instead of who we got it from.
>...

We do know that for the backdoored xz packages.

An intentional backdoor by upstream is not something we can 
realistically defend against.

The tiny part of the whole xz backdoor that was only in the tarball 
could instead also have been in git like the rest of the backdoor.

A "supply-chain security tool" that does not bring any improvement in 
this case is just snake oil.

> cheers,
> kpcyrd

cu
Adrian



Bug#1068434: ITP: python-asv-runner -- Core Python benchmark code for ASV

2024-04-04 Thread Yogeswaran Umasankar
Package: wnpp
Severity: wishlist
Owner: Yogeswaran Umasankar 
X-Debbugs-Cc: debian-devel@lists.debian.org, kd8...@gmail.com

* Package name: python-asv-runner
  Version : 0.2.1
  Upstream Contact: Rohit Goswami , Michael Droettboom 

* URL : https://github.com/airspeed-velocity/asv_runner
* License : BSD-3-clause
  Programming Lang: Python
  Description : Core Python benchmark code for ASV

ASV Runner provides essential functionality for benchmarking
 Python packages with ease and efficiency. Planning to maintain
 it under DPT, need a sponsor.