On Sun, Jun 25, 2006 at 11:08:34AM +0100, Colin Watson wrote: > On Wed, Jun 21, 2006 at 04:54:01PM -0500, Manoj Srivastava wrote: > > On 20 Jun 2006, Colin Watson verbalised: > > > It's dual-licensed upstream; I contacted upstream years ago about > > > this issue (before it became particularly public that Debian had a > > > problem with the licence) and arranged for the following statement > > > to be added to the top-level LICENSES file: > > > > > > All files part of groff are licensed under this version of the GPL > > > (or licenses which are compatible with the GPL). You are free to > > > choose version 2 or any subsequent version of the GPL. > > > > > > Unfortunately, for technical reasons (see bug #196762), it is > > > extremely difficult to upgrade to the new upstream release. If you > > > like, I can simply include a note in the copyright file with similar > > > contents to this e-mail, although I don't know if that's good form. > > > > Unfortunately, I think that means we have to take the stance > > that the old version is non-free, but the future version is freed; > > unless we can get upstream to release the version in Debian with the > > new license. > > I guess in that case I will have to resume efforts to get 1.19 sorted > out more urgently. I don't want to embark upon the busy-work of > splitting documentation out into a separate package only to put it back > in again later ...
So, I've done some more investigation, and I'd like advice from debian-release. Background: For those not familiar with groff, it has historically accepted only ISO-8859-1 input; internally, it has always been very much hardwired for single-byte input. The Debian groff package has for a long time contained a highly complex "multibyte" patch to support EUC-JP (and later other CJK) input; it works, but is not terribly clean, and for one thing causes groff's behaviour to depend on the locale rather than solely the file it's processing and its command-line arguments, which is wrong for a text processor like groff. It also contains support for an "ascii8" device which essentially just passes through the encoding of the source text; this is typographically unsound because, for instance, you can't do decent hyphenation that way, but we're relying on this for Czech, Croatian, Hungarian, Polish, Russian, Slovak, and Turkish man pages at the moment. Upstream has long stated an intent never to accept this patch, and instead wants to work on UTF-8 support, with a preprocessor to convert from other encodings as necessary. This has been the state of play for several years now. I've tried to port the Debian multibyte patch forward to groff 1.19 and later releases on more than one occasion, but it's a very complex and intrusive patch and I've hit roadblocks that are extremely hard to surmount. groff 1.19 made a number of internal improvements for the better (notably Unicode composite glyphs), but the changes conflict in a big way with the multibyte patch. The authors of that patch haven't seemed able to help, and upstream is entirely uninterested. I've pretty much been stuck maintaining a package based on 1.18.1.1, with no way to jump forward without breaking the now significant number of users relying on CJK support. Some people have suggested reviving the old jgroff package for this and making man use it where appropriate; I'm very much loath to do this, because it's a non-trivial amount of unrewarding packaging work, and it results in either bloating the base system with two versions of groff or requiring all CJK users to know or be told to install jgroff. On another note, while the GFDL discussion was still bubbling on debian-private and before it came up publicly as an issue, I noted that most of groff's documentation was under the GFDL, and was very concerned about the usability of groff in the event that its documentation had to be removed; I'd have serious trouble writing any non-trivial groff documents without the groff documentation! I contacted groff upstream to ask whether its documentation could be dual-licensed under the GPL. After some discussion, they agreed, resulting in a note in groff's LICENSES file that "All files part of groff are licensed under this version of the GPL". Unfortunately, this note was added after groff 1.18.1.1 was released, and Manoj points out that it's not entirely obvious that we can take advantage of it. This prompted me to have another look at the current state of groff upstream with respect to Unicode support. Current situation: Bruno Haible has been working on Unicode support in groff, and CVS groff is now very close to being able to render CJK text on a par with what the Debian patch offers, by means of a preprocessor ("preconv") that converts all non-ASCII text into groff escapes according to an encoding specified on the command line. There are a number of other internal improvements in Unicode support too, although input is still fundamentally single-byte; however the escaping preprocessor makes this less important than it used to be. The major missing features in Japanese rendering are handling of double-width characters and support for kinsoku shori (Japanese line-breaking rules). Werner Lemberg, the upstream maintainer, is very clear that these should be implemented by means of adding glyph class infrastructure to groff, so that properties of ranges of glyphs can be set in groff's font files without using lots of memory to say that each of several thousand glyphs is double-width. This is a moderate chunk of work, but it's at least reasonably accessible from where we are now. The situation for non-ISO-8859-1 single-byte encodings is essentially solved. The ascii8 device is superseded by preconv. Implementing hyphenation for Russian wouldn't do any harm, but it's all a matter of macro files from here on in. Proposals: I'm on holiday away from computers all next week, so I can't realistically do anything before the base freeze. I therefore have two proposals, either of which really ought to be signed off by the release team. One is to do nothing for now, and make an exception for the groff licensing bug on the bases that (a) groff is nearly unusable for authoring without its documentation and (b) upstream considers the current versions of those files to be GPL-licensed. I wouldn't make that request based on (a) alone - we've had the "but it's too useful to be non-free!" whine many times before, and I don't think it's valid - but given (b) it seems to be worth considering. I don't think that splitting off groff's documentation is a good idea, because aside from the small man-page-formatting-only part of groff that's in groff-base, the rest of the groff package is really too painful to use without its documentation: much harder than e.g. make without make-doc. I'm willing to go to almost any lengths to avoid that option. The other option is to try to accelerate the implementation of glyph classes, width handling, and kinsoku shori handling in groff as much as possible, so that we can update to CVS groff (perhaps with some additional patches) and not regress CJK support too badly. This would also require changes in man-db. Obviously, this would require getting testing from several CJK users to confirm that the output is still reasonably readable, and it involves an exception to the base freeze. How does the release team feel about all of this? I'm sorry to have left it so late. Thanks, -- Colin Watson [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]