Re: GNUism in groff tests, was: pic anomalies

Ingo Schwarze Tue, 31 Dec 2019 07:49:21 -0800

Hi Werner,

Werner LEMBERG wrote on Tue, Dec 31, 2019 at 03:34:17PM +0100:


> I think the proper way for testing groff would be to make it run
> with a fuzzer,

Yes, that would no doubt be useful.  We did that for mandoc.
It found about 50 bugs in the first year and a few dozen more later on.
I presented the results of more or less the first year at BSDCan 2015:

  https://www.openbsd.org/papers/bsdcan15-mandoc.pdf (pages 22-26)

 is kind of orthogonal to developing a test suite, though.

Fuzzing tends to find assertion failures, NULL pointer derefences,
segfaults and other hangs and crashes.  Those are rarely relevant
to functionality users need; it's more about hardening the program
against attack.

A test suite, on the other hand, is most useful for making sure no
regressions creep into the functionality, not only making sure that
the programs runs to the end, but also that it produces the right
output.  Of course you can also test that it doesn't crash, but
apart from occasional exceptions, that's usually just a side effect
of testing that the output is correct.

On top of that, i would say that fuzzing is somewhat wasted unless
a careful manual code review and code cleanup is done first.

> using some very simple and small input files.  If a bug gets
> found, we have a new testcase

Err, no.  Fuzzer-generated testcases are almost always completely
unreadable, and you absolutely do not want a unit test to be
unreadable (see my pervious posting).  So once the fuzzer reports
a problem and manual review confirms it is an actual problem,
you need to manually write a carefully constructed, minimal test
case making it easy to understand what the problem is, then write
and commit a patch fixing the root cause of the bug.

> (which is automatically stored and used
> by the fuzzer for more tests).  Additionally, such a fuzzer framework
> also shows the covered code, and by injecting specially crafted test
> examples more unused code paths can be activated (and automatically
> tested).
> 
> Later on, if there is a good code coverage, the available test samples
> might be analyzed to check whether they are producing correct output.

That seems really useless.  All fuzzer input files i have seen
made no sense whatsoever, and it doesn't matter at all which output
is produced from them.

> Does the Debian project have a fuzzer framework to which groff could
> be added?  Or maybe someone could try whether Google is going to
> accept groff in the 'OSS-Fuzz' project...

We absolutely don't need a framework.  While running the fuzzer is
not completely trivial - Jonathan Gray, who did most of that part
of the work, reported that the number of features in the roff
language and its macro sets is so large that a corpus excercising
most of these features tends to be too large to be fed into the
fuzzer for seeding - running the fuzzer is the smaller part of
the task.  Triaging the output is more work than running the
fuzzer.  Understanding what in the autogenerated test files causes
the crashes is much more work than the triage.  Fixing the bugs is
often more work than merely understanding why it crashes.

So fuzzing is no use unless some people actually spend substantial
time fixing the bugs.  I would expect fuzzing to find hundreds of
bugs in groff, and i expect most bugs to require a few hours of
work on average.  So that is at least a month of full-time work
even if it finds surprisingly few bugs, more likely several months.
And fixing bugs in a codebase that wasn't hand-audited and cleaned
up beforehand is extremely painful and time-consuming.

If someone has the time, i'm almost certain a month of manual code
rewiew would have *much* bigger benefit and be *much* less painful.

Yours,
  Ingo

Re: GNUism in groff tests, was: pic anomalies

Reply via email to