Hi Werner, Werner LEMBERG wrote on Tue, Dec 31, 2019 at 03:34:17PM +0100:
> I think the proper way for testing groff would be to make it run > with a fuzzer, Yes, that would no doubt be useful. We did that for mandoc. It found about 50 bugs in the first year and a few dozen more later on. I presented the results of more or less the first year at BSDCan 2015: https://www.openbsd.org/papers/bsdcan15-mandoc.pdf (pages 22-26) is kind of orthogonal to developing a test suite, though. Fuzzing tends to find assertion failures, NULL pointer derefences, segfaults and other hangs and crashes. Those are rarely relevant to functionality users need; it's more about hardening the program against attack. A test suite, on the other hand, is most useful for making sure no regressions creep into the functionality, not only making sure that the programs runs to the end, but also that it produces the right output. Of course you can also test that it doesn't crash, but apart from occasional exceptions, that's usually just a side effect of testing that the output is correct. On top of that, i would say that fuzzing is somewhat wasted unless a careful manual code review and code cleanup is done first. > using some very simple and small input files. If a bug gets > found, we have a new testcase Err, no. Fuzzer-generated testcases are almost always completely unreadable, and you absolutely do not want a unit test to be unreadable (see my pervious posting). So once the fuzzer reports a problem and manual review confirms it is an actual problem, you need to manually write a carefully constructed, minimal test case making it easy to understand what the problem is, then write and commit a patch fixing the root cause of the bug. > (which is automatically stored and used > by the fuzzer for more tests). Additionally, such a fuzzer framework > also shows the covered code, and by injecting specially crafted test > examples more unused code paths can be activated (and automatically > tested). > > Later on, if there is a good code coverage, the available test samples > might be analyzed to check whether they are producing correct output. That seems really useless. All fuzzer input files i have seen made no sense whatsoever, and it doesn't matter at all which output is produced from them. > Does the Debian project have a fuzzer framework to which groff could > be added? Or maybe someone could try whether Google is going to > accept groff in the 'OSS-Fuzz' project... We absolutely don't need a framework. While running the fuzzer is not completely trivial - Jonathan Gray, who did most of that part of the work, reported that the number of features in the roff language and its macro sets is so large that a corpus excercising most of these features tends to be too large to be fed into the fuzzer for seeding - running the fuzzer is the smaller part of the task. Triaging the output is more work than running the fuzzer. Understanding what in the autogenerated test files causes the crashes is much more work than the triage. Fixing the bugs is often more work than merely understanding why it crashes. So fuzzing is no use unless some people actually spend substantial time fixing the bugs. I would expect fuzzing to find hundreds of bugs in groff, and i expect most bugs to require a few hours of work on average. So that is at least a month of full-time work even if it finds surprisingly few bugs, more likely several months. And fixing bugs in a codebase that wasn't hand-audited and cleaned up beforehand is extremely painful and time-consuming. If someone has the time, i'm almost certain a month of manual code rewiew would have *much* bigger benefit and be *much* less painful. Yours, Ingo