Package: po4a Version: 0.73-2 Severity: minor Tags: patch * What led up to the situation?
Checking for defects with a new version test-[g|n]roff -mandoc -t -K utf8 -rF0 -rHY=0 -rCHECKSTYLE=10 -ww -z < "man page" [Use "groff -e ' $' <file>" to find trailing spaces.] ["test-groff" is a script in the repository for "groff"; is not shipped] (local copy and "troff" slightly changed by me). [The fate of "test-nroff" was decided in groff bug #55941.] * What was the outcome of this action? troff:<stdin>:178: warning: font name 'CW' is deprecated * What outcome did you expect instead? No output (no warnings). -.- General remarks and further material, if a diff-file exist, are in the attachments. -- System Information: Debian Release: trixie/sid APT prefers testing APT policy: (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 6.12.6-amd64 (SMP w/2 CPU threads; PREEMPT) Locale: LANG=is_IS.iso88591, LC_CTYPE=is_IS.iso88591 (charmap=ISO-8859-1), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: sysvinit (via /sbin/init) Versions of packages po4a depends on: ii gettext 0.22.5-3 ii libpod-parser-perl 1.67-1 ii libsgmls-perl 1.03ii-38 ii libsyntax-keyword-try-perl 0.30-1+b1 ii libyaml-tiny-perl 1.76-1 ii opensp 1.5.2-15.1 ii perl 5.40.0-8 Versions of packages po4a recommends: ii liblocale-gettext-perl 1.07-7+b1 ii libterm-readkey-perl 2.38-2+b4 ii libtext-wrapi18n-perl 0.06-10 ii libunicode-linebreak-perl 0.0.20190101-1+b8 po4a suggests no packages. -- no debconf information
Input file is po4a-gettextize.1p Any program (person), that produces man pages, should check the output for defects by using (both groff and nroff) [gn]roff -mandoc -t -ww -b -z -K utf8 <man page> The same goes for man pages that are used as an input. For a style guide use mandoc -T lint -.- So any 'generator' should check its products with the above mentioned 'groff', 'mandoc', and additionally with 'nroff ...'. This is just a simple quality control measure. The 'generator' may have to be corrected to get a better man page, the source file may, and any additional file may. Common defects: Input text line longer than 80 bytes. Not removing trailing spaces (in in- and output). The reason for these trailing spaces should be found and eliminated. Not beginning each input sentence on a new line. Lines should thus be shorter. See man-pages(7), item 'semantic newline'. -.- The difference between the formatted output of the original and patched file can be seen with: nroff -mandoc <file1> > <out1> nroff -mandoc <file2> > <out2> diff -u <out1> <out2> and for groff, using "printf '%s\n%s\n' '.kern 0' '.ss 12 0' | groff -mandoc -Z - " instead of 'nroff -mandoc' Add the option '-t', if the file contains a table. Read the output of 'diff -u' with 'less -R' or similar. -.-. If 'man' (man-db) is used to check the manual for warnings, the following must be set: The option "-warnings=w" The environmental variable: export MAN_KEEP_STDERR=yes (or any non-empty value) or (produce only warnings): export MANROFFOPT="-ww -b -z" export MAN_KEEP_STDERR=yes (or any non-empty value) -.-. Output from "mandoc -T lint po4a-gettextize.1p": (shortened list) 13 input text line longer than 80 bytes -.-. Output from "test-groff -mandoc -t -ww -z po4a-gettextize.1p": (shortened list) 1 font name 'CW' is deprecated -.-. Change '-' (\-) to '\(en' (en-dash) for a numeric range. GNU gnulib has recently (2023-06-18) updated its "build_aux/update-copyright" to recognize "\(en" in man pages. po4a-gettextize.1p:337:Copyright 2002\-2023 by SPI, inc. -.-. Add a (no-break, "\ " or "\~") space between a number and a unit, as these are not one entity. 200:despite the \fImany\fR synchronization issues. Given the amount of text (2MB of -.-. Wrong distance between sentences in the input file. Separate the sentences and subordinate clauses; each begins on a new line. See man-pages(7) ("Conventions for source file layout") and "info groff" ("Input Conventions"). The best procedure is to always start a new sentence on a new line, at least, if you are typing on a computer. Remember coding: Only one command ("sentence") on each (logical) line. E-mail: Easier to quote exactly the relevant lines. Generally: Easier to edit the sentence. Patches: Less unaffected text. Search for two adjacent words is easier, when they belong to the same line, and the same phrase. The amount of space between sentences in the output can then be controlled with the ".ss" request. Mark a final abbreviation point as such by suffixing it with "\&". N.B. The number of lines affected can be too large to be in a patch. 73:the classical gettext tools. The main feature of po4a is that it decouples the 78:translations into a po4a\-based workflow. This is only to be done once to salvage 80:the conversion of your project. This tedious process is explained in details in 84:existing translated file (e.g., a previous translation attempt without po4a). If 87:then use \fBmsgmerge\fR to merge all produced PO files. As you wish. 90:be in UTF\-8. If the master document is completely in ASCII, the generated 96:Format of the documentation you want to handle. Use the \fB\-\-help\-format\fR 100:File containing the master document to translate. You can use this option 107:File containing the localized (translated) document. If you provided 115:File where the message catalog should be written. If not given, the message 119:Extra option(s) to pass to the format plugin. See the documentation of each 120:plugin for more information about the valid options and their meanings. For 145:Set the report address for msgid bugs. By default, the created POT files 149:Set the copyright holder in the POT header. The default value is 153:Set the package name for the POT header. The default is "PACKAGE". 156:Set the package version for the POT header. The default is "VERSION". 160:content into a PO file. The content of the master file gives the \fBmsgid\fR while 161:the content of the localized file gives the \fBmsgstr\fR. This process is somewhat 166:original document that was used for translation. Even so, you may need to fiddle 171:strings. This is how desynchronization are detected during the gettextization. 174:(of type 'paragraph'). It is more likely that a new paragraph was added to the 189:\&\fBpo4a\-gettextize\fR will verbosely diagnose any structure desynchronization. When 192:match. Some tricks are given below to salvage the most of the existing 196:the box, building a correct PO file is a matter of seconds. Otherwise, you will 198:gettextization often remains faster than translating everything again. I 200:despite the \fImany\fR synchronization issues. Given the amount of text (2MB of 202:translations would have required several months of work. In addition, this grunt 203:work is the price to pay to get the comfort of po4a. Once converted, the 213:The gettextization stops as soon as a desynchronization is detected. When this 215:structures. \fBpo4a\-gettextize\fR is rather verbose when things go wrong. It 217:of each of them. Moreover, the PO file generated so far is dumped as 224:to the translators. They should be added separately to \fBpo4a\fR as addenda (see 228:if possible. Indeed, if the changes to the original are too intrusive, the old 230:gettextization (see below). Any unmatched translation will be dumped anyway. 233:paragraph of the translation is dumped. The important thing is to get a first PO 237:translated version. This content will be automatically reintroduced afterward, 241:translation that seems justified. Issues in the original document should 242:reported to the author. Fixing them in your translation only fixes them for a 243:part of the community. Plus, it is impossible to do so when using po4a ;) But 247:Sometimes, the paragraph content does match, but not their types. Fixing it is 248:rather format-dependent. In POD and man, it often comes from the fact that one 251:type. Just remove the space and you are fine. It may also be a typo in the tag 259:attached to the wrong original paragraph. It is the sign of an undetected issue 260:earlier in the process. Search for the actual desynchronization point by 265:translation. Duplicated strings are merged in PO files, with two references. 268:files. It is however believed that recent versions of po4a deal properly with 273:the script terminates successfully. You should skim over the PO file, ensuring 274:that the \fBmsgid\fR and \fBmsgstr\fR actually match. It is not necessary to ensure 276:fuzzy translations anyway. You only need to check for obvious matching issues 282:\&\fBmsgstr\fR. As a speaker of French, English, and some German myself, I can do 284:of these languages. I sometimes manage to detect matching issues in non-Latin 286:interrogation marks match?) and other clues, but I prefer when someone else can 290:\&\fBpo4a\-gettextize\fR reported an error, and try again. Once you have a decent PO 297:deprecated). Please check the "CONFIGURATION FILE" Section in \fBpo4a\fR\|(1) 302:that you salvaged through gettextization. This can take quite a long time, 304:the elements of the POT file built from the recent master files. This forces 311:After this first run, the PO files are ready to be reviewed by translators. All 313:their careful review before use. Translators should take each entry to verify 319:production. Some projects find it useful to rely on weblate to coordinate -.-. Split lines longer than 80 characters into two or more lines. Appropriate break points are the end of a sentence and a subordinate clause; after punctuation marks. N.B. The number of lines affected can be too large to be in a patch. Line 58, length 88 .TH PO4A-GETTEXTIZE.1P 1 2024-08-06 "perl v5.38.2" "User Contributed Perl Documentation" Line 67, length 116 \&\fBpo4a\-gettextize\fR \fB\-f\fR \fIfmt\fR \fB\-m\fR \fImaster.doc\fR \fB\-l\fR \fIXX.doc\fR \fB\-p\fR \fIXX.po\fR Line 78, length 81 translations into a po4a\-based workflow. This is only to be done once to salvage Line 159, length 85 \&\fBpo4a\-gettextize\fR synchronizes the master and localized files to extract their Line 160, length 82 content into a PO file. The content of the master file gives the \fBmsgid\fR while Line 161, length 82 the content of the localized file gives the \fBmsgstr\fR. This process is somewhat Line 189, length 86 \&\fBpo4a\-gettextize\fR will verbosely diagnose any structure desynchronization. When Line 224, length 81 to the translators. They should be added separately to \fBpo4a\fR as addenda (see Line 255, length 81 line contains some spaces, or when there is no empty line between the \fB=item\fR Line 261, length 81 inspecting the file \fIgettextization.failed.po\fR that was produced, and fix the Line 269, length 84 duplicated strings, so you should report any remaining issue that you may encounter. Line 272, length 82 Any file produced by \fBpo4a\-gettextize\fR should be manually reviewed, even when Line 274, length 83 that the \fBmsgid\fR and \fBmsgstr\fR actually match. It is not necessary to ensure Line 281, length 81 only want to recognize similar elements in each \fBmsgid\fR and its corresponding Line 282, length 81 \&\fBmsgstr\fR. As a speaker of French, English, and some German myself, I can do Line 290, length 84 \&\fBpo4a\-gettextize\fR reported an error, and try again. Once you have a decent PO Line 295, length 83 The easiest way to setup po4a is to write a \fBpo4a.conf\fR configuration file, and Line 296, length 89 use the integrated \fBpo4a\fR program (\fBpo4a\-updatepo\fR and \fBpo4a\-translate\fR are Line 312, length 82 entries were marked as fuzzy in the PO file by \fBpo4a\-gettextization\fR, forcing -.-. Add a zero (0) in front of a decimal fraction that begins with a period (.) 7:.if t .sp .5v -.-. Put a subordinate sentence (after a comma) on a new line. 69:(\fIXX.po\fR is the output, all others are inputs) 79:an existing translation while converting to po4a, not on a regular basis after 83:You must provide both a master file (e.g., the source in English) and an 84:existing translated file (e.g., a previous translation attempt without po4a). If 85:you provide more than one master or translation files, they will be used in 86:sequence, but it may be easier to gettextize each page or chapter separately and 89:If the master document has non-ASCII characters, the new generated PO file will 90:be in UTF\-8. If the master document is completely in ASCII, the generated 108:multiple master files, you may wish to provide multiple localized file by 115:File where the message catalog should be written. If not given, the message 121:example, you could pass '\-o tablecells' to the AsciiDoc parser, while the 132:This can be useful to understand why these files get desynchronized, leading 145:Set the report address for msgid bugs. By default, the created POT files 166:original document that was used for translation. Even so, you may need to fiddle 168:by the original translator, so working on files' copies is advised. 170:Internally, each po4a parser reports the syntactical type of each extracted 172:In the example depicted below, it is very unlikely that the 4th string in 175:original, or that two original paragraphs were merged together in the 190:this happens, you should manually edit the files to add fake paragraphs or 196:the box, building a correct PO file is a matter of seconds. Otherwise, you will 201:original text), restarting the translation without first salvaging the old 202:translations would have required several months of work. In addition, this grunt 203:work is the price to pay to get the comfort of po4a. Once converted, the 207:After a successful gettextization, the produced documents should be manually 208:checked for undetected disparities and silent errors, as explained below. 214:happens, you need to edit the files as much as needed to re-align the files' 216:reports the strings that don't match, their positions in the text, and the type 217:of each of them. Moreover, the PO file generated so far is dumped as 223:Remove all extra content of the translations, such as the section giving credits 227:When editing the files to align their structures, prefer editing the translation 228:if possible. Indeed, if the changes to the original are too intrusive, the old 231:That being said, you still want to edit the original document if it's too hard 232:to get the gettextization to proceed otherwise, even if it means that one 243:part of the community. Plus, it is impossible to do so when using po4a ;) But 247:Sometimes, the paragraph content does match, but not their types. Fixing it is 248:rather format-dependent. In POD and man, it often comes from the fact that one 250:In those formats, such paragraph cannot be wrapped and thus become a different 254:Likewise, two paragraphs may get merged together in POD when the separating 255:line contains some spaces, or when there is no empty line between the \fB=item\fR 258:Sometimes, the desynchronization message seems odd because the translation is 261:inspecting the file \fIgettextization.failed.po\fR that was produced, and fix the 265:translation. Duplicated strings are merged in PO files, with two references. 266:This constitutes a difficulty for the gettextization algorithm, that is a simple 269:duplicated strings, so you should report any remaining issue that you may encounter. 272:Any file produced by \fBpo4a\-gettextize\fR should be manually reviewed, even when 273:the script terminates successfully. You should skim over the PO file, ensuring 275:that the translation is perfectly correct yet, as all entries are marked as 280:Fortunately, this step does not require to master the target languages as you 282:\&\fBmsgstr\fR. As a speaker of French, English, and some German myself, I can do 283:this for all European languages at least, even if I cannot say one word of most 285:languages by looking at string length, phrase structures (does the amount of 286:interrogation marks match?) and other clues, but I prefer when someone else can 289:If you detect a mismatch, edit the original and translation files as if 290:\&\fBpo4a\-gettextize\fR reported an error, and try again. Once you have a decent PO 291:file for your previous translation, backup it until you get po4a working 295:The easiest way to setup po4a is to write a \fBpo4a.conf\fR configuration file, and 300:When \fBpo4a\fR runs for the first time, the current version of the master 306:For example, the first run over the Perl documentation's French translation (5.5 307:MB PO file) took about 48 hours (yes, two days) while the subsequent ones only 311:After this first run, the PO files are ready to be reviewed by translators. All 312:entries were marked as fuzzy in the PO file by \fBpo4a\-gettextization\fR, forcing 314:that the salvaged translation actually match the current original text, update 315:the translation on need, and remove the fuzzy markers. 317:Once enough fuzzy markers are removed, \fBpo4a\fR will start generating the 318:translation files on disk, and you're ready to move your translation workflow to 320:between translators and maintainers, but that's beyond \fBpo4a\fR' scope. 337:Copyright 2002\-2023 by SPI, inc. -.-. Output from "test-groff -mandoc -t -K utf8 -rF0 -rHY=0 -rCHECKSTYLE=10 -ww -z ": troff:<stdin>:178: warning: font name 'CW' is deprecated -.-. Additionally (general): Abbreviations get a '\&' added after their final full stop (.) to mark them as such and not as an end of a sentence. There is no need to add a '\&' before a full stop (.) if it has a character before it!
--- po4a-gettextize.1p 2025-01-03 21:55:35.753926912 +0000 +++ po4a-gettextize.1p.new 2025-01-03 22:15:57.427914008 +0000 @@ -4,11 +4,12 @@ .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) -.if t .sp .5v +.if t .sp 0.5v .if n .sp .. .de Vb \" Begin verbatim text -.ft CW +.ie \\n(.g .ft CR +.el .ft CW .nf .ne \\$1 .. @@ -197,7 +198,7 @@ the box, building a correct PO file is a soon understand why this process has such an ugly name :) Even so, gettextization often remains faster than translating everything again. I gettextized the French translation of the whole Perl documentation in one day -despite the \fImany\fR synchronization issues. Given the amount of text (2MB of +despite the \fImany\fR synchronization issues. Given the amount of text (2\~MB of original text), restarting the translation without first salvaging the old translations would have required several months of work. In addition, this grunt work is the price to pay to get the comfort of po4a. Once converted, the @@ -334,7 +335,7 @@ between translators and maintainers, but .Ve .SH "COPYRIGHT AND LICENSE" .IX Header "COPYRIGHT AND LICENSE" -Copyright 2002\-2023 by SPI, inc. +Copyright 2002\(en2023 by SPI, inc. .PP This program is free software; you may redistribute it and/or modify it under the terms of GPL v2.0 or later (see the COPYING file).