On Thu, Sep 26, 2013 at 09:58:04PM +0200, Erwin Waterlander wrote: > I'm curious to know how man-db determines the encoding of the man > page. I cannot find that information. Would you like to explain how > man-db does the encoding detecion?
Certainly. man-db contains a table of the typical legacy encodings for each of a number of known languages (I'm happy to add to those, but since new translation efforts tend to start with UTF-8 these days, it's a closed set and I haven't had to extend it since 2008 when I synced up with Fedora). There is generally only one of these. UTF-8 is a strict enough encoding that for reasonable volumes of text it is usually possible to distinguish automatically between it and a legacy encoding, simply by trying to decode as UTF-8 and falling back to the legacy encoding if that fails. manconv does this job; it is more or less like iconv except that it can take a priority order of possible input encodings. There are cases where this system fails, and in such cases you can store manual pages in directories with an explicit encoding tag attached (e.g. "/usr/share/man/man1/<ll>_<CC>.<encoding>"), or put an explicit Emacs-style coding tag at the top of the file. In practice this is rarely necessary. > The reason I work with Federico's man is that I often work on Cygwin > when I don't have Linux at hand. Cygwin does not have man-db > available. Soon I get a Russian translation of my program > (dos2unix), that made this problem actual again for me. Three years > ago I saw this problem coming. At that time I tested also on Fedora > 12, which was still using Federico's man. I didn't notice that > Fedora changed to man-db in the meantime. Ah, yes. I corresponded at one point with somebody who might be interested in porting man-db to Cygwin, but it never came to anything. I would be ecstatic if somebody could help with such a port, as I don't use Windows myself. I use Gnulib extensively, which deals with a lot of portability problems, but not everything. The main effort will be in porting libpipeline to deal with Windows-style process creation and supervision; after that I expect that it will just be a matter of various minor fixes for Unix-specific assumptions I've made. You don't have to come with a complete patch; I'd be willing to accept incremental changes that make the job easier for the next person, or even "this general pattern of things you're doing is Unix-specific; you need to use this pattern instead to be portable to Cygwin". Cheers, -- Colin Watson [cjwat...@debian.org]