Since I feel this fits to the current discussion on the mailing list, let me quickly introduce you to an idea I had for a while to improve the copyright review situation. TLDR: for projects using REUSE, we could generate d/copyright automatically and approve the copyright check in NEW automatically.
- What is REUSE? The REUSE specification [1] is a specification to make copyright machine-readable in the source files itself. It is straightforward to implement, add (e.g.) "SPDX-FileCopyrightText: 2019 Jane Doe <j...@example.com>" and "SPDX-License-Identifier: GPL-3.0-or-later" as comments to your source file's header and you are done. The license identifiers are standardized by the SPDX [2] and are similar to what we already use in Debian (see also [7], although a bit outdated). The spec is made by the Free Software Foundation Europe (FSFE) and is already implemented in several projects [3]. They also provide a tool (available as "reuse" in Debian [4]) which can lint a source folder on REUSE completeness and it can export the license information to an SPDX bill of materials. - What is an SPDX bill of materials? It is a machine-readable format that specifies the licenses of each file in tag/value style like DEP-5. However compared to DEP-5 it is much less human readable, i.e. it includes much more meta information, and does not contain the license texts. One useful aspect is that it also includes the checksum of each file. I appended an example of how such a document might look like below. The spec is from the Software Package Data Exchange (SPDX), a project hosted by the Linux Foundation. The spec is also available as ISO/IEC 5962:2021. - What has this to do with Debian? My idea is to allow SPDX documents in addition to DEP-5. The advantage is that - if supported upstream - REUSE can generate such reports automatically during package build time, so there is no need to write d/copyright manually anymore. It is also much less error-prone, as this can be done every time there is a new source package and does not suffer from human mistakes like forgetting to check some files during the copyright review. The license identifiers can be parsed to check if the package falls under free/contrib or non-free (except when custom licenses are used). Packages levering REUSE could skip the manual d/copyright check in NEW entirely, even when it is a new source package. Writing a sanity validator would not be a hard task, there probably already exists one. Note that since the licenses are not part of d/copyright anymore, those have to be provided in another way. REUSE specifies that licenses are in a top-level folder called "LICENSES", so we could simply install that folder along the copyright file. We could also depend on the "spdx-licenses" package [5] and symlink all non-custom licenses to reduce duplicate files, however since a license usually needs to be shipped with any code/binary distribution this might get a bit complicated. Another, IMHO less preferred, way would be to write a converter tool from SPDX to DEP-5, but still do auto-approvals. Such a converter tool has been proposed before [6]. - Final thoughts: Besides the quality-of-life improvements, using this also has the advantage of using an industry standard, i.e. shared work on tooling. I heard that Fedora is also thinking about implementing this idea. I've been in contact with one of the responsibles at the FSFE for a while, and they really like this idea and are open to suggestions from our side if we need any particular changes to the tooling. I already have a couple of changes we need in mind, in particular with regards to adding copyright of the debian folder without adding a header to each file, but upstream already has some ideas for that. Note that I don't want DEP-5 to go away - it is unlikely that every project will follow the REUSE spec and writing an SPDX document by hand has no significant advantages over DEP-5. Besides, using the file-exclusion function in DEP-5 for uscan is quite useful for ds/dfsg packages (although that could also be moved to an external file). For now, let me just hear what you think about this idea in general. If someone would be willing to help in this endeavor (e.g. creating dh_reuse, writing a DEP), let me know. Regards, Stephan [1] https://reuse.software/spec/ [2] https://spdx.dev/licenses/ [3] https://api.reuse.software/projects [4] https://tracker.debian.org/pkg/reuse [5] https://tracker.debian.org/pkg/spdx-licenses [6] https://wiki.debian.org/SPDX [7] https://wiki.debian.org/Proposals/CopyrightFormat#Differences_between_DEP5_and_SPDX Example for SPDX bill of materials: """ SPDXVersion: SPDX-2.1 DataLicense: CC0-1.0 SPDXID: SPDXRef-DOCUMENT DocumentName: u2 DocumentNamespace: http://spdx.org/spdxdocs/spdx-v2.1-0ed6ddb2-edbd-4664-8b7e-029432c8e421 Creator: Person: Anonymous () Creator: Organization: Anonymous () Creator: Tool: reuse-0.14.0 Created: 2022-01-26T10:42:59Z CreatorComment: <text>This document was created automatically using available reuse information consistent with REUSE.</text> Relationship: SPDXRef-DOCUMENT describes SPDXRef-3c8056cd1f4f60322830f1e79d55ea13 FileName: ./update_copyright_years.py SPDXID: SPDXRef-3c8056cd1f4f60322830f1e79d55ea13 FileChecksum: SHA1: 65fc75079eb9d85953b39c6fb832e86c7b7e113a LicenseConcluded: NOASSERTION LicenseInfoInFile: MIT FileCopyrightText: <text>SPDX-FileCopyrightText: 2022 Stephan Lachnit <stephanlach...@debian.org></text> """