On 2014-08-10 10:05, Johannes Schauer wrote: > Quoting Johannes Schauer (2014-08-10 10:02:31) >> > that's very useful and I added this as a test case. The updated patch is >> > attached. > and it would help to attach the right file... > >
Hi, Thanks for working on this and providing a patch. :) I got a few questions to it though based on a very quick review. My comments are interleaved in the patch. > 0001-check-whether-the-dep-5-debian-copyright-wildcards-m.patch > > > From 040fdbf1b96d1b6dcfa55c67c52e72b3a58ab8ce Mon Sep 17 00:00:00 2001 > From: Johannes Schauer <j.scha...@email.de> > Date: Fri, 8 Aug 2014 12:00:39 +0200 > Subject: [PATCH] check whether the dep-5 debian/copyright wildcards match all > files > > [...] > > diff --git a/checks/source-copyright.desc b/checks/source-copyright.desc > index 921e7b9..b912b2a 100644 > --- a/checks/source-copyright.desc > +++ b/checks/source-copyright.desc > [...] > index 6b9a0c5..cfdaef7 100644 > --- a/checks/source-copyright.pm > +++ b/checks/source-copyright.pm > @@ -24,6 +24,9 @@ use strict; > use warnings; > use autodie; > > +use Cwd qw(getcwd); This does not seem to be used anywhere in the diff? > +use File::Find qw(); > + > use List::MoreUtils qw(any); > use Text::Levenshtein qw(distance); > > @@ -252,6 +255,8 @@ sub _parse_dep5 { > } > > my @commas_in_files; > + my %file_coverage = map { $_ => 0 } get_all_files($info); > + my %file_para_coverage = (); Minor nit pick: Consider merging this with the @commas_in_files declaration, a la: my (@commas_in_files, %file_para_coverage); Alternatively, move it up under @commas_in_files and remove the "= ();". Then 5.20 should be able to this optimisation for us. > my $i = 0; > my $current_line = 0; > for my $para (@dep5) { > @@ -297,6 +302,31 @@ sub _parse_dep5 { > @commas_in_files = ($i, $files_fname); > } > > + # only attempt to evaluate globbing if commas could be legal > + if (not @commas_in_files or any { m/,/xsm } $info->sorted_index) > { The "any { m/,/xsm } $info->sorted_index" part is an invariant in the "for my $para (@dep5)" loop. I think we should move it out of the loop, so we don't loop over all files in the index repeatedly just to conclude that none of them have any commas. > + my @wildcards = split /[\n\t ]+/, $files; > + for my $wildcard (@wildcards) { > + my ($regex, $wildcard_error) = > wildcard_to_regex($wildcard); > + if (defined $wildcard_error) { > + tag 'invalid-escape-sequence-in-dep5-copyright', > substr($wildcard_error, 0, 2) . " (paragraph at line $current_line)"; > + next; > + } > + > + my $used = 0; > + $file_para_coverage{$current_line} = 0; > + for my $srcfile (keys %file_coverage) { > + if ($srcfile =~ $regex) { > + $used = 1; > + $file_coverage{$srcfile} = $current_line; > + $file_para_coverage{$current_line} = 1; > + } > + } > + if (not $used) { > + tag 'wildcard-matches-nothing-in-dep5-copyright', > "$wildcard (paragraph at line $current_line)"; > + } > + } > + } > + > my ($found_license, $full_license, $short_license, > @short_licenses) > = parse_license($license,$current_line); > if ($found_license) { > @@ -325,12 +355,21 @@ sub _parse_dep5 { > $current_line; > } > } > - if (@commas_in_files) { > + if (@commas_in_files and not any { m/,/xsm } $info->sorted_index) { The "any {...} $info->sorted_index" here could be optimised away as well if the change above is implemented. > my ($paragraph_no, $field_name) = @commas_in_files; > [...] > } > while ((my $license, $i) = each %required_standalone_licenses) { > @@ -400,6 +439,55 @@ sub get_field { > return; > } > > +sub wildcard_to_regex { > + my ($regex) = @_; > + $regex =~ s,^\./+,,; > + $regex =~ s,//+,/,g; > + my $error; > + eval { > + $regex =~ s{ > + (\*) | > + (\?) | > + ([^*?\\]+) | > + (\\[\\*?]) | > + (.+) > + }{ > + if (defined $1) { > + '.*'; > + } elsif (defined $2) { > + '.' > + } elsif (defined $3) { > + quotemeta($3); > + } elsif (defined $4) { > + $4; Shouldn't this be quotemeta'ed as well? Otherwise "file.png" would become eqv. to "file?png". > + } else { > + $error = $5; > + die; > + } > + }egx; > + }; > + if ($@) { > + return (undef, $error); > + } else { > + return (qr/^(?:$regex)$/, undef); > + } > +} > + > [...] -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org