Hello, Thanks for taking the time to do this :) and sorry that it took me time to respond. As the user of Copyright module in debsources here are few comments
On 02/18/2018 05:59 AM, Stuart Prescott wrote: > > In particular: > > * It introduces a `MachineReadableFormatError` which is used for format > errors; I think it's worth distinguishing between an error in the format and > the copyright file not being in the format at all > (`NotMachineReadableError`). > `MachineReadableFormatError` is derived from `ValueError` which I think makes > sense. > I think the module should raise errors inherited by Error https://salsa.debian.org/python-debian-team/python-debian/blob/8ed0ff84e7f41d0ba08ac11dcc0d7c66597f37b1/lib/debian/copyright.py#L46 otherwise there is not much use to it. Hence i suggest that MachineReadableFormatError inherits from that as well. > > By throwing an exception as soon as an error is found, this becomes a bit of > an all-or-nothing approach. Would a better approach be an incremental > validation where as much as is possible is read with an `valid` attribute per > stanza that propagates to the whole `Copyright` instance? Users would then > check that the file is valid after read in rather than using exception > handling. > I prefer the all-or-nothing approach for the strict mode at least. While the incremental could be interesting for some use cases i feel that for the copyright module in debsources it would create an unnecessary overhead and it would be harder to qualify d/copyright files as machine readable or not. When the copyright hook (parses d/copyright files of packages and then adds in the db the license of each file) is active on debsources (the case these days) we parse the whole file and treat the files mentioned in it, so it will be a bit confusing to have files within a package without their license and other files of the same package with a license. License rendering such as https://sources.debian.org/copyright/license/python-pip/9.0.1-2/ would be more confusing. Or when searching a license of file using its sha256 checksum we would need to maybe handle differently files that their packages have a "partial" machine readable copyright format. (https://sources.debian.org/copyright/sha256/?checksum=d77d235e41d54594865151f4751e835c5a82322b0e87ace266567c3391a4b912) I hope what i am describing is clear here :D Cheers, Orestis