On Mon, Dec 24, 2007 at 06:52:12PM +0100, David Paleino wrote: > Anyway, I'm seeing that what I'm telling now has already been proposed for > debian/copyright. The problem is still there though: the chance to see some > information about the license of not installed packages not being > connected to the Internet.
This is solvable by packaging a file with the data extracted from the archive: the information will then end up in the CD. I do that for debtags. > Well, most of Debian packages have simple licenses (see: GPL, BSD, MIT). And, > again, the field would be totally optional. In the other mail I sent to this thread I was showing the steps that could be followed to implement this with apt-xapian-index. The tricky parts are the first and second step: 1. Define what kind of searches you want to allow people to do 2. Define what kind of information you need to index for those searches I mentioned that these steps might not be possible to be attacked in a useful way. To understand why I say this, consider: - the variety of licenses we have in the archive - that different bits of a package can have different licenses - that the copyright file applies to the source package but the search probably happens on binary packages. I had a look at http://wiki.debian.org/Proposals/CopyrightFormat, and I strongly endorse that proposal. The 'License:' field proposed there looks like it's the best data source for this. However, if more than something like 20% of the packages in the archive end up having 'License: other', in my experience that field risks to end up being useless for searches. Consider also this scenario: Source package foo contains a debian/copyright file that says "the library is LGPL, the executable tools are GPL, the examples are WTFPL, the debian packaging is BSD-3"[1]. How should we handle it? I can think of two cases: 1. libfoo-dev only shows LGPL, libfoo-bin only shows GPL, libfoo-examples only shows WTFPL. In this case, how do you sort the various licenses into the binary packages? And also, where did BSD-3 go? 2. All the binary packages list all the licenses. In this case, when you search for WTFPL (or BSD-3) you end up with libfoo-dev, libfoo-bin and loads of other false positives among the results. I know it's easy to think "'License: GPL' is all I need", and I also know it's easy to think "it's too much of a mess, it can't be done". What is hard to think is "let's see what really can be done". To really attack this problem, we need to have some statistics about what really is the distribution of licenses around the archive, so we really know what we're talking about. I suppose that starting to adopt http://wiki.debian.org/Proposals/CopyrightFormat could be a good way to make it possible to collect such statistics. Another rather important thing that can be done at this stage is to provide use cases for using the data, check if http://wiki.debian.org/Proposals/CopyrightFormat provides enough information to support those use cases, and in case something is missing see if it can reasonably be added and how. Ciao, Enrico [1] When CC-BY-SA 3.0 will be out, you can reasonably add "Documentation is CC-BY-SA-3", and a libfoo-doc package to the list. -- GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <[EMAIL PROTECTED]>
signature.asc
Description: Digital signature