Package: po4a
Version: 0.73-2
Severity: minor
Tags: patch

   * What led up to the situation?

     Checking for defects with a new version

test-[g|n]roff -mandoc -t -K utf8 -rF0 -rHY=0 -rCHECKSTYLE=10 -ww -z < "man 
page"

  [Use "groff -e ' $' <file>" to find trailing spaces.]

  ["test-groff" is a script in the repository for "groff"; is not shipped]
(local copy and "troff" slightly changed by me).

  [The fate of "test-nroff" was decided in groff bug #55941.]

   * What was the outcome of this action?


troff:<stdin>:178: warning: font name 'CW' is deprecated


   * What outcome did you expect instead?

     No output (no warnings).

-.-

  General remarks and further material, if a diff-file exist, are in the
attachments.


-- System Information:
Debian Release: trixie/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 6.12.6-amd64 (SMP w/2 CPU threads; PREEMPT)
Locale: LANG=is_IS.iso88591, LC_CTYPE=is_IS.iso88591 (charmap=ISO-8859-1), 
LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: sysvinit (via /sbin/init)

Versions of packages po4a depends on:
ii  gettext                     0.22.5-3
ii  libpod-parser-perl          1.67-1
ii  libsgmls-perl               1.03ii-38
ii  libsyntax-keyword-try-perl  0.30-1+b1
ii  libyaml-tiny-perl           1.76-1
ii  opensp                      1.5.2-15.1
ii  perl                        5.40.0-8

Versions of packages po4a recommends:
ii  liblocale-gettext-perl     1.07-7+b1
ii  libterm-readkey-perl       2.38-2+b4
ii  libtext-wrapi18n-perl      0.06-10
ii  libunicode-linebreak-perl  0.0.20190101-1+b8

po4a suggests no packages.

-- no debconf information
Input file is po4a-gettextize.1p

  Any program (person), that produces man pages, should check the output
for defects by using (both groff and nroff)

[gn]roff -mandoc -t -ww -b -z -K utf8  <man page>

  The same goes for man pages that are used as an input.

  For a style guide use

  mandoc -T lint

-.-

  So any 'generator' should check its products with the above mentioned
'groff', 'mandoc',  and additionally with 'nroff ...'.

  This is just a simple quality control measure.

  The 'generator' may have to be corrected to get a better man page,
the source file may, and any additional file may.

  Common defects:

  Input text line longer than 80 bytes.

  Not removing trailing spaces (in in- and output).
  The reason for these trailing spaces should be found and eliminated.

  Not beginning each input sentence on a new line.
Lines should thus be shorter.

  See man-pages(7), item 'semantic newline'.

-.-

The difference between the formatted output of the original and patched file
can be seen with:

  nroff -mandoc <file1> > <out1>
  nroff -mandoc <file2> > <out2>
  diff -u <out1> <out2>

and for groff, using

"printf '%s\n%s\n' '.kern 0' '.ss 12 0' | groff -mandoc -Z - "

instead of 'nroff -mandoc'

  Add the option '-t', if the file contains a table.

  Read the output of 'diff -u' with 'less -R' or similar.

-.-.

  If 'man' (man-db) is used to check the manual for warnings,
the following must be set:

  The option "-warnings=w"

  The environmental variable:

export MAN_KEEP_STDERR=yes (or any non-empty value)

  or

  (produce only warnings):

export MANROFFOPT="-ww -b -z"

export MAN_KEEP_STDERR=yes (or any non-empty value)


-.-.

Output from "mandoc -T lint  po4a-gettextize.1p": (shortened list)

     13 input text line longer than 80 bytes

-.-.

Output from "test-groff -mandoc -t -ww -z po4a-gettextize.1p": (shortened list)

      1 font name 'CW' is deprecated

-.-.

Change '-' (\-) to '\(en' (en-dash) for a numeric range.
GNU gnulib has recently (2023-06-18) updated its
"build_aux/update-copyright" to recognize "\(en" in man pages.

po4a-gettextize.1p:337:Copyright 2002\-2023 by SPI, inc.

-.-.

Add a (no-break, "\ " or "\~") space between a number and a unit,
as these are not one entity.

200:despite the \fImany\fR synchronization issues. Given the amount of text 
(2MB of

-.-.

Wrong distance between sentences in the input file.

  Separate the sentences and subordinate clauses; each begins on a new
line.  See man-pages(7) ("Conventions for source file layout") and
"info groff" ("Input Conventions").

  The best procedure is to always start a new sentence on a new line,
at least, if you are typing on a computer.

Remember coding: Only one command ("sentence") on each (logical) line.

E-mail: Easier to quote exactly the relevant lines.

Generally: Easier to edit the sentence.

Patches: Less unaffected text.

Search for two adjacent words is easier, when they belong to the same line,
and the same phrase.

  The amount of space between sentences in the output can then be
controlled with the ".ss" request.

Mark a final abbreviation point as such by suffixing it with "\&".

N.B.

  The number of lines affected can be too large to be in a patch.

73:the classical gettext tools. The main feature of po4a is that it decouples 
the
78:translations into a po4a\-based workflow. This is only to be done once to 
salvage
80:the conversion of your project. This tedious process is explained in details 
in
84:existing translated file (e.g., a previous translation attempt without 
po4a). If
87:then use \fBmsgmerge\fR to merge all produced PO files. As you wish.
90:be in UTF\-8. If the master document is completely in ASCII, the generated
96:Format of the documentation you want to handle. Use the 
\fB\-\-help\-format\fR
100:File containing the master document to translate. You can use this option
107:File containing the localized (translated) document. If you provided
115:File where the message catalog should be written. If not given, the message
119:Extra option(s) to pass to the format plugin. See the documentation of each
120:plugin for more information about the valid options and their meanings. For
145:Set the report address for msgid bugs. By default, the created POT files
149:Set the copyright holder in the POT header. The default value is
153:Set the package name for the POT header. The default is "PACKAGE".
156:Set the package version for the POT header. The default is "VERSION".
160:content into a PO file. The content of the master file gives the 
\fBmsgid\fR while
161:the content of the localized file gives the \fBmsgstr\fR. This process is 
somewhat
166:original document that was used for translation. Even so, you may need to 
fiddle
171:strings. This is how desynchronization are detected during the 
gettextization.
174:(of type 'paragraph'). It is more likely that a new paragraph was added to 
the
189:\&\fBpo4a\-gettextize\fR will verbosely diagnose any structure 
desynchronization. When
192:match. Some tricks are given below to salvage the most of the existing
196:the box, building a correct PO file is a matter of seconds. Otherwise, you 
will
198:gettextization often remains faster than translating everything again. I
200:despite the \fImany\fR synchronization issues. Given the amount of text 
(2MB of
202:translations would have required several months of work. In addition, this 
grunt
203:work is the price to pay to get the comfort of po4a. Once converted, the
213:The gettextization stops as soon as a desynchronization is detected. When 
this
215:structures. \fBpo4a\-gettextize\fR is rather verbose when things go wrong. 
It
217:of each of them. Moreover, the PO file generated so far is dumped as
224:to the translators. They should be added separately to \fBpo4a\fR as 
addenda (see
228:if possible. Indeed, if the changes to the original are too intrusive, the 
old
230:gettextization (see below). Any unmatched translation will be dumped anyway.
233:paragraph of the translation is dumped. The important thing is to get a 
first PO
237:translated version. This content will be automatically reintroduced 
afterward,
241:translation that seems justified. Issues in the original document should
242:reported to the author. Fixing them in your translation only fixes them for 
a
243:part of the community. Plus, it is impossible to do so when using po4a ;) 
But
247:Sometimes, the paragraph content does match, but not their types. Fixing it 
is
248:rather format-dependent. In POD and man, it often comes from the fact that 
one
251:type. Just remove the space and you are fine. It may also be a typo in the 
tag
259:attached to the wrong original paragraph. It is the sign of an undetected 
issue
260:earlier in the process. Search for the actual desynchronization point by
265:translation. Duplicated strings are merged in PO files, with two references.
268:files. It is however believed that recent versions of po4a deal properly 
with
273:the script terminates successfully. You should skim over the PO file, 
ensuring
274:that the \fBmsgid\fR and \fBmsgstr\fR actually match. It is not necessary 
to ensure
276:fuzzy translations anyway. You only need to check for obvious matching 
issues
282:\&\fBmsgstr\fR. As a speaker of French, English, and some German myself, I 
can do
284:of these languages. I sometimes manage to detect matching issues in 
non-Latin
286:interrogation marks match?) and other clues, but I prefer when someone else 
can
290:\&\fBpo4a\-gettextize\fR reported an error, and try again. Once you have a 
decent PO
297:deprecated). Please check the "CONFIGURATION FILE" Section in 
\fBpo4a\fR\|(1)
302:that you salvaged through gettextization. This can take quite a long time,
304:the elements of the POT file built from the recent master files. This forces
311:After this first run, the PO files are ready to be reviewed by translators. 
All
313:their careful review before use. Translators should take each entry to 
verify
319:production. Some projects find it useful to rely on weblate to coordinate

-.-.

Split lines longer than 80 characters into two or more lines.
Appropriate break points are the end of a sentence and a subordinate
clause; after punctuation marks.

N.B.

  The number of lines affected can be too large to be in a patch.


Line 58, length 88

.TH PO4A-GETTEXTIZE.1P 1 2024-08-06 "perl v5.38.2" "User Contributed Perl 
Documentation"

Line 67, length 116

\&\fBpo4a\-gettextize\fR \fB\-f\fR \fIfmt\fR \fB\-m\fR \fImaster.doc\fR 
\fB\-l\fR \fIXX.doc\fR \fB\-p\fR \fIXX.po\fR

Line 78, length 81

translations into a po4a\-based workflow. This is only to be done once to 
salvage

Line 159, length 85

\&\fBpo4a\-gettextize\fR synchronizes the master and localized files to extract 
their

Line 160, length 82

content into a PO file. The content of the master file gives the \fBmsgid\fR 
while

Line 161, length 82

the content of the localized file gives the \fBmsgstr\fR. This process is 
somewhat

Line 189, length 86

\&\fBpo4a\-gettextize\fR will verbosely diagnose any structure 
desynchronization. When

Line 224, length 81

to the translators. They should be added separately to \fBpo4a\fR as addenda 
(see

Line 255, length 81

line contains some spaces, or when there is no empty line between the 
\fB=item\fR

Line 261, length 81

inspecting the file \fIgettextization.failed.po\fR that was produced, and fix 
the

Line 269, length 84

duplicated strings, so you should report any remaining issue that you may 
encounter.

Line 272, length 82

Any file produced by \fBpo4a\-gettextize\fR should be manually reviewed, even 
when

Line 274, length 83

that the \fBmsgid\fR and \fBmsgstr\fR actually match. It is not necessary to 
ensure

Line 281, length 81

only want to recognize similar elements in each \fBmsgid\fR and its 
corresponding

Line 282, length 81

\&\fBmsgstr\fR. As a speaker of French, English, and some German myself, I can 
do

Line 290, length 84

\&\fBpo4a\-gettextize\fR reported an error, and try again. Once you have a 
decent PO

Line 295, length 83

The easiest way to setup po4a is to write a \fBpo4a.conf\fR configuration file, 
and

Line 296, length 89

use the integrated \fBpo4a\fR program (\fBpo4a\-updatepo\fR and 
\fBpo4a\-translate\fR are

Line 312, length 82

entries were marked as fuzzy in the PO file by \fBpo4a\-gettextization\fR, 
forcing


-.-.


Add a zero (0) in front of a decimal fraction that begins with a period
(.)

7:.if t .sp .5v

-.-.

Put a subordinate sentence (after a comma) on a new line.

69:(\fIXX.po\fR is the output, all others are inputs)
79:an existing translation while converting to po4a, not on a regular basis 
after
83:You must provide both a master file (e.g., the source in English) and an
84:existing translated file (e.g., a previous translation attempt without 
po4a). If
85:you provide more than one master or translation files, they will be used in
86:sequence, but it may be easier to gettextize each page or chapter separately 
and
89:If the master document has non-ASCII characters, the new generated PO file 
will
90:be in UTF\-8. If the master document is completely in ASCII, the generated
108:multiple master files, you may wish to provide multiple localized file by
115:File where the message catalog should be written. If not given, the message
121:example, you could pass '\-o tablecells' to the AsciiDoc parser, while the
132:This can be useful to understand why these files get desynchronized, leading
145:Set the report address for msgid bugs. By default, the created POT files
166:original document that was used for translation. Even so, you may need to 
fiddle
168:by the original translator, so working on files' copies is advised.
170:Internally, each po4a parser reports the syntactical type of each extracted
172:In the example depicted below, it is very unlikely that the 4th string in
175:original, or that two original paragraphs were merged together in the
190:this happens, you should manually edit the files to add fake paragraphs or
196:the box, building a correct PO file is a matter of seconds. Otherwise, you 
will
201:original text), restarting the translation without first salvaging the old
202:translations would have required several months of work. In addition, this 
grunt
203:work is the price to pay to get the comfort of po4a. Once converted, the
207:After a successful gettextization, the produced documents should be manually
208:checked for undetected disparities and silent errors, as explained below.
214:happens, you need to edit the files as much as needed to re-align the files'
216:reports the strings that don't match, their positions in the text, and the 
type
217:of each of them. Moreover, the PO file generated so far is dumped as
223:Remove all extra content of the translations, such as the section giving 
credits
227:When editing the files to align their structures, prefer editing the 
translation
228:if possible. Indeed, if the changes to the original are too intrusive, the 
old
231:That being said, you still want to edit the original document if it's too 
hard
232:to get the gettextization to proceed otherwise, even if it means that one
243:part of the community. Plus, it is impossible to do so when using po4a ;) 
But
247:Sometimes, the paragraph content does match, but not their types. Fixing it 
is
248:rather format-dependent. In POD and man, it often comes from the fact that 
one
250:In those formats, such paragraph cannot be wrapped and thus become a 
different
254:Likewise, two paragraphs may get merged together in POD when the separating
255:line contains some spaces, or when there is no empty line between the 
\fB=item\fR
258:Sometimes, the desynchronization message seems odd because the translation 
is
261:inspecting the file \fIgettextization.failed.po\fR that was produced, and 
fix the
265:translation. Duplicated strings are merged in PO files, with two references.
266:This constitutes a difficulty for the gettextization algorithm, that is a 
simple
269:duplicated strings, so you should report any remaining issue that you may 
encounter.
272:Any file produced by \fBpo4a\-gettextize\fR should be manually reviewed, 
even when
273:the script terminates successfully. You should skim over the PO file, 
ensuring
275:that the translation is perfectly correct yet, as all entries are marked as
280:Fortunately, this step does not require to master the target languages as 
you
282:\&\fBmsgstr\fR. As a speaker of French, English, and some German myself, I 
can do
283:this for all European languages at least, even if I cannot say one word of 
most
285:languages by looking at string length, phrase structures (does the amount of
286:interrogation marks match?) and other clues, but I prefer when someone else 
can
289:If you detect a mismatch, edit the original and translation files as if
290:\&\fBpo4a\-gettextize\fR reported an error, and try again. Once you have a 
decent PO
291:file for your previous translation, backup it until you get po4a working
295:The easiest way to setup po4a is to write a \fBpo4a.conf\fR configuration 
file, and
300:When \fBpo4a\fR runs for the first time, the current version of the master
306:For example, the first run over the Perl documentation's French translation 
(5.5
307:MB PO file) took about 48 hours (yes, two days) while the subsequent ones 
only
311:After this first run, the PO files are ready to be reviewed by translators. 
All
312:entries were marked as fuzzy in the PO file by \fBpo4a\-gettextization\fR, 
forcing
314:that the salvaged translation actually match the current original text, 
update
315:the translation on need, and remove the fuzzy markers.
317:Once enough fuzzy markers are removed, \fBpo4a\fR will start generating the
318:translation files on disk, and you're ready to move your translation 
workflow to
320:between translators and maintainers, but that's beyond \fBpo4a\fR' scope.
337:Copyright 2002\-2023 by SPI, inc.

-.-.

Output from "test-groff  -mandoc -t -K utf8 -rF0 -rHY=0 -rCHECKSTYLE=10 -ww -z 
":

troff:<stdin>:178: warning: font name 'CW' is deprecated

-.-.

  Additionally (general):

  Abbreviations get a '\&' added after their final full stop (.) to mark them
as such and not as an end of a sentence.

  There is no need to add a '\&' before a full stop (.) if it has a character
before it!
--- po4a-gettextize.1p  2025-01-03 21:55:35.753926912 +0000
+++ po4a-gettextize.1p.new      2025-01-03 22:15:57.427914008 +0000
@@ -4,11 +4,12 @@
 .\" Standard preamble:
 .\" ========================================================================
 .de Sp \" Vertical space (when we can't use .PP)
-.if t .sp .5v
+.if t .sp 0.5v
 .if n .sp
 ..
 .de Vb \" Begin verbatim text
-.ft CW
+.ie \\n(.g .ft CR
+.el .ft CW
 .nf
 .ne \\$1
 ..
@@ -197,7 +198,7 @@ the box, building a correct PO file is a
 soon understand why this process has such an ugly name :) Even so,
 gettextization often remains faster than translating everything again. I
 gettextized the French translation of the whole Perl documentation in one day
-despite the \fImany\fR synchronization issues. Given the amount of text (2MB of
+despite the \fImany\fR synchronization issues. Given the amount of text (2\~MB 
of
 original text), restarting the translation without first salvaging the old
 translations would have required several months of work. In addition, this 
grunt
 work is the price to pay to get the comfort of po4a. Once converted, the
@@ -334,7 +335,7 @@ between translators and maintainers, but
 .Ve
 .SH "COPYRIGHT AND LICENSE"
 .IX Header "COPYRIGHT AND LICENSE"
-Copyright 2002\-2023 by SPI, inc.
+Copyright 2002\(en2023 by SPI, inc.
 .PP
 This program is free software; you may redistribute it and/or modify it
 under the terms of GPL v2.0 or later (see the COPYING file).

Reply via email to