David Malcolm <dmalc...@redhat.com> writes: > This patch adds a new dg-lint subdirectory below contrib, containing > a "dg-lint" script for detecting common mistakes made in our DejaGnu > tests. > > Specifically, DejaGnu's dg.exp's dg-get-options has a regexp for > detecting dg- directives > https://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=blob;f=lib/dg.exp > here's the current: > > set tmp [grep $prog "{\[ \t\]\+dg-\[-a-z\]\+\[ \t\]\+.*\[ \t\]\+}" line] > > which if I'm reading it right requires a "{", then one or more tab/space > chars, then a "dg-" directive name, then one of more tab/space > characters, then anything (for arguments to the directive), then one of > more tab/space character, then a "}". > > There are numerous places in our testsuite which look like attempts to > use a directive, but which don't match this regexp. > > The script warns about such places, along with a list of misspelled > directives (currently just "dg_options" for "dg-options"), and a warning > if a dg-do appears after a dg-require-* (as per > https://gcc.gnu.org/onlinedocs/gccint/Directives.html > "This directive must appear after any dg-do directive in the test > and before any dg-additional-sources directive." for > dg-require-effective-target. > > dg-lint uses libgdiagnostics to report its results; the patch adds a > new libgdiagnostics.py script below contrib/dg-lint. This uses Python's > ctypes module to expose libgdianostics.so to Python via FFI. Hence > the warnings have colorization, quote the pertinent parts of the tested > file, can have fix-it hints, etc. Here's the output from the tests, run > from the top-level directory: > > $ LD_LIBRARY_PATH=../build/gcc/ ./contrib/dg-lint/dg-lint > contrib/dg-lint/test-*.c > contrib/dg-lint/test-1.c:6:6: warning: misspelled directive: 'dg_final'; did > you mean 'dg-final'? > 6 | /* { dg_final { scan_assembler_times "vmsumudm" 2 } } */ > | ^~~~~~~~ > | dg-final > contrib/dg-lint/test-1.c:15:4: warning: directive 'dg-output-file' appears > not to match dg.exp's regexp > 15 | dg-output-file "m4.out" > | ^~~~~~~~~~~~~~ > contrib/dg-lint/test-1.c:18:4: warning: directive 'dg-output-file' appears > not to match dg.exp's regexp > 18 | dg-output-file "m4.out" } > | ^~~~~~~~~~~~~~ > contrib/dg-lint/test-1.c:21:6: warning: directive 'dg-output-file' appears > not to match dg.exp's regexp > 21 | { dg-output-file "m4.out" > | ^~~~~~~~~~~~~~ > contrib/dg-lint/test-1.c:24:5: warning: directive 'dg-output-file' appears > not to match dg.exp's regexp > 24 | {dg-output-file "m4.out"} > | ^~~~~~~~~~~~~~ > contrib/dg-lint/test-1.c:27:6: warning: directive 'dg-output-file' appears > not to match dg.exp's regexp > 27 | { dg-output-file, "m4.out" } > | ^~~~~~~~~~~~~~ > contrib/dg-lint/test-2.c:4:6: warning: 'dg-do' after > 'dg-require-effective-target' > 4 | /* { dg-do compile } */ > | ^~~~~ > contrib/dg-lint/test-2.c:3:6: note: 'dg-require-effective-target' was here > 3 | /* { dg-require-effective-target c++11 } */ > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ > > I don't yet have a way to verify these tests (clearly we can't use > DejaGnu for this). > > These Python bindings could be used by other projects, but so far I only > implemented what I needed for dg-lint. > > Running the test on the GCC source tree finds dozens of issues, which > followup patches address. > > Tested with Python 3.8 > > contrib/ChangeLog: > PR testsuite/116163 > * dg-lint/dg-lint: New file. > * dg-lint/libgdiagnostics.py: New file. > * dg-lint/test-1.c: New file. > * dg-lint/test-2.c: New file. > --- > contrib/dg-lint/dg-lint | 210 ++++++++++++++++++++++++ > contrib/dg-lint/libgdiagnostics.py | 248 +++++++++++++++++++++++++++++ > contrib/dg-lint/test-1.c | 28 ++++ > contrib/dg-lint/test-2.c | 8 + > 4 files changed, 494 insertions(+) > create mode 100755 contrib/dg-lint/dg-lint > create mode 100644 contrib/dg-lint/libgdiagnostics.py > create mode 100644 contrib/dg-lint/test-1.c > create mode 100644 contrib/dg-lint/test-2.c > > diff --git a/contrib/dg-lint/dg-lint b/contrib/dg-lint/dg-lint > new file mode 100755 > index 000000000000..20157c304137 > --- /dev/null > +++ b/contrib/dg-lint/dg-lint > @@ -0,0 +1,210 @@ > +#!/usr/bin/env python3 > + > +# Script to detect common mistakes in DejaGnu tests. > + > +# Contributed by David Malcolm <dmalc...@redhat.com> > +# > +# Copyright (C) 2024-2025 Free Software Foundation, Inc. > +# > +# This file is part of GCC. > +# > +# GCC is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 3, or (at your option) > +# any later version. > +# > +# GCC is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with GCC; see the file COPYING. If not, write to > +# the Free Software Foundation, 51 Franklin Street, Fifth Floor, > +# Boston, MA 02110-1301, USA. > + > +import argparse > +import os > +import pathlib > +import re > +import sys > + > +import libgdiagnostics > + > +COMMON_MISSPELLINGS = {('dg_final', 'dg-final')} > +
I don't know if want to consider using difflib here for finding typos (get_close_matches?) and we generate a list of known directives? > +class Directive: > + @staticmethod > + def from_match(path, line_num, line, m): > + return Directive(path, line_num, > + m.group(1), m.span(1), > + m.group(2), m.span(2)) > + > + def __init__(self, path, line_num, > + name, name_span, > + args_str, args_str_span): > + self.path = path > + self.line_num = line_num > + self.name = name > + self.name_span = name_span > + self.args_str = args_str > + self.args_str_span = args_str_span > + > +class FileState: > + def __init__(self): > + # Map from directive name to list of directives > + self.directives = {} > + > + def add_directive(self, d): > + if d.name not in self.directives: > + self.directives[d.name] = [] > + self.directives[d.name].append(d) > + > +class Context: > + def __init__(self): > + self.mgr = libgdiagnostics.Manager() > + self.mgr.set_tool_name('dg-lint') > + self.mgr.add_text_sink() > + > + def check_file(self, f_in): > + file_state = FileState() > + try: > + for line_idx, line in enumerate(f_in): > + self.check_line(file_state, f_in.name, line_idx + 1, line) > + except UnicodeDecodeError: > + return # FIXME > + > + def check_line(self, file_state, path, line_num, line): > + if 0: > + phys_loc = self.get_file_and_line(path, line_num) > + self.report_warning(phys_loc, "line = '%s'", line.rstrip()) > + > + # Look for directives > + # Compare with dg.exp's dg-get-options > + m = re.search(r"{[ \t]+(dg-[-a-z]+)[ \t]+(.*)[ \t]+}", line) > + if m: > + d = Directive.from_match(path, line_num, line, m) > + self.on_directive(file_state, d) > + else: > + # Look for things that look like attempts to use directives, > + # but didn't match the regexp > + m = re.search(r"(dg-[-a-z]+)", line) Hoist the regex up to global scope and use re.compile? (I'm not sure how much it really matters, but it means we have all the regex in a nice place which I tend to prefer). Feel free to disregard. > + if m: > + phys_loc = self.get_file_line_and_range(path, > + line_num, > + m.start() + 1, > + m.end()) > + self.make_warning() \ > + .at(phys_loc) \ > + .emit('directive %qs appears not to match dg.exp\'s > regexp', > + m.group(1)) > + > + # Complain about common misspellings > + for misspelling, correction in COMMON_MISSPELLINGS: As mentioned earlier, really feels like it could be a good use of difflib and we have a known list instead. This means we hopefully cover cases like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116163#c10 for free too. > + for m in re.finditer(misspelling, line): > + phys_loc = self.get_file_line_and_range(path, > + line_num, > + m.start() + 1, > + m.end()) > + self.make_warning() \ > + .at(phys_loc) \ > + .with_fix_it_replace (phys_loc, correction) \ > + .emit("misspelled directive: %qs;" > + " did you mean %qs?", > + misspelling, correction) > + > + def on_directive(self, file_state, d): > + if 0: > + phys_loc = self.get_file_and_line(d.path, d.line_num) > + self.make_note() \ > + .at(phys_loc) \ > + .emit('directive %qs with arguments %qs', > + d.name, d.args_str) > + > + # Apparently we can't have a dg-do after a dg-require; > + # see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116163#c5 > + if d.name == 'dg-do': There are other constraints like this, e.g. dg-options must always be before dg-add-options, or you clobber it. Does it make sense to handle this as a later check where we go over the full list of directives we saw (rather than as parsing) and then can diagnose bad ordering? (This would also solve the IMO perhaps too much nesting here too.) > + for existing_name in file_state.directives: > + if existing_name.startswith('dg-require'): > + with self.mgr.make_group() as g: > + phys_loc = self.get_file_line_and_range(d.path, > + d.line_num, > + > d.name_span[0] + 1, > + > d.name_span[1]) > + self.make_warning() \ > + .at(phys_loc) \ > + .emit("%qs after %qs", d.name, existing_name) > + other_d = file_state.directives[existing_name][0] > + phys_loc = self.get_file_line_and_range(other_d.path, > + > other_d.line_num, > + > other_d.name_span[0] + 1, > + > other_d.name_span[1]) > + self.make_note() \ > + .at(phys_loc) \ > + .emit("%qs was here", existing_name) > + > + file_state.add_directive(d) > + > [...] Thanks for working on this. sam