Package: poppler-utils
Version: 25.03.0-4
Severity: minor
Tags: patch

   * What led up to the situation?

     Checking for defects with a new version

test-[g|n]roff -mandoc -t -K utf8 -rF0 -rHY=0 -rCHECKSTYLE=10 -ww -z < "man 
page"

  [Use 

grep -n -e ' $' -e '\\~$' -e ' \\f.$' -e ' \\"' <file>

  to find (most) trailing spaces.]

  ["test-groff" is a script in the repository for "groff"; is not shipped]
(local copy and "troff" slightly changed by me).

  [The fate of "test-nroff" was decided in groff bug #55941.]

   * What was the outcome of this action?

an.tmac:<stdin>:9: misuse, warning: .RI is for at least 2 arguments, got 1
        Use macro '.I' for one argument or split argument.


   * What outcome did you expect instead?

     No output (no warnings).

-.-

  General remarks and further material, if a diff-file exist, are in the
attachments.


-- System Information:
Debian Release: 13.0
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 6.12.27-amd64 (SMP w/2 CPU threads; PREEMPT)
Locale: LANG=is_IS.iso88591, LC_CTYPE=is_IS.iso88591 (charmap=ISO-8859-1), 
LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: sysvinit (via /sbin/init)

Versions of packages poppler-utils depends on:
ii  libc6          2.41-8
ii  libcairo2      1.18.4-1+b1
ii  libfreetype6   2.13.3+dfsg-1
ii  liblcms2-2     2.16-2
ii  libpoppler147  25.03.0-4
ii  libstdc++6     14.2.0-19

poppler-utils recommends no packages.

poppler-utils suggests no packages.

-- no debconf information
Input file is pdftotext.1

Output from "mandoc -T lint  pdftotext.1": (shortened list)

      1 input text line longer than 80 bytes

Find trailing space with:
grep -n -e ' $' -e ' \\f.$' -e ' \\"' <man page>

-.-.

Output from
test-nroff -mandoc -t -ww -z pdftotext.1: (shortened list)

      1         Use macro '.I' for one argument or split argument.
      1 .RI is for at least 2 arguments, got 1

Find trailing space with:
grep -n -e ' $' -e ' \\f.$' -e ' \\"' <man page>

-.-.

Change '-' (\-) to '\(en' (en-dash) for a (numeric) range.

GNU gnulib has recently (2023-06-18) updated its
"build_aux/update-copyright" to recognize "\(en" in man pages.

pdftotext.1:148:The pdftotext software and documentation are copyright 
1996-2011 Glyph

-.-.

Wrong distance (not two spaces) between sentences in the input file.

  Separate the sentences and subordinate clauses; each begins on a new
line.  See man-pages(7) ("Conventions for source file layout") and
"info groff" ("Input Conventions").

  The best procedure is to always start a new sentence on a new line,
at least, if you are typing on a computer.

Remember coding: Only one command ("sentence") on each (logical) line.

E-mail: Easier to quote exactly the relevant lines.

Generally: Easier to edit the sentence.

Patches: Less unaffected text.

Search for two adjacent words is easier, when they belong to the same line,
and the same phrase.

  The amount of space between sentences in the output can then be
controlled with the ".ss" request.

Mark a final abbreviation point as such by suffixing it with "\&".

Some sentences (etc.) do not begin on a new line.

Split (sometimes) lines after a punctuation mark; before a conjunction.

  Lines with only one (or two) space(s) between sentences could be split,
so latter sentences begin on a new line.

Use

#!/usr/bin/sh

sed -e '/^\./n' \
-e 's/\([[:alpha:]]\)\.  */\1.\n/g' $1

to split lines after a sentence period.
Check result with the difference between the formatted outputs.
See also the attachment "general.bugs"

56:hyphenation, etc.) and output the text in reading order.
69:0, 90, 180, or 270 degree axes). This is useful for skipping
93:Specifies how much spacing we allow after a word before considering adjacent 
text to be a new column, measured as a fraction of the font size. Current 
default is 0.7, old releases had a 0.3 default.
96:Sets the encoding to use for text output. This defaults to "UTF-8".

-.-.

Split lines longer than 80 characters into two or more lines.
Appropriate break points are the end of a sentence and a subordinate
clause; after punctuation marks.
Add "\:" to split the string for the output, "\<newline>" in the source.  

Specifies how much spacing we allow after a word before considering adjacent 
text to be a new column, measured as a fraction of the font size. Current 
default is 0.7, old releases had a 0.3 default.

Longest line is number 93 with 198 characters

-.-.

Put a parenthetical sentence, phrase on a separate line,
if not part of a code.
See man-pages(7), item "semantic newline".

pdftotext.1:48:Specifies the width of crop area in pixels (default is 0)
pdftotext.1:51:Specifies the height of crop area in pixels (default is 0)
pdftotext.1:54:Maintain (as best as possible) the original physical layout of 
the
pdftotext.1:59:Assume fixed-pitch (or tabular) text, with the specified 
character
pdftotext.1:60:width (in points).  This forces physical layout mode.
pdftotext.1:105:Don't insert page breaks (form feed characters) between pages.
pdftotext.1:128:recognition.  There is no way (short of OCR) to extract text 
from

-.-.

Only one space character is after a possible end of sentence
(after a punctuation, that can end a sentence).

pdftotext.1:69:0, 90, 180, or 270 degree axes). This is useful for skipping
pdftotext.1:93:Specifies how much spacing we allow after a word before 
considering adjacent text to be a new column, measured as a fraction of the 
font size. Current default is 0.7, old releases had a 0.3 default.
pdftotext.1:96:Sets the encoding to use for text output. This defaults to 
"UTF-8".

-.-.

Output from "test-groff  -mandoc -t -K utf8 -rF0 -rHY=0 -rCHECKSTYLE=0 -ww -z ":

an.tmac:<stdin>:9: misuse, warning: .RI is for at least 2 arguments, got 1
        Use macro '.I' for one argument or split argument.

-.-.

Generally:

Split (sometimes) lines after a punctuation mark; before a conjunction.
--- pdftotext.1 2025-05-25 01:43:56.821852929 +0000
+++ pdftotext.1.new     2025-05-25 01:54:10.478713200 +0000
@@ -6,7 +6,7 @@ pdftotext \- Portable Document Format (P
 .SH SYNOPSIS
 .B pdftotext
 [options]
-.RI PDF-file
+.I PDF-file
 .RI [ text-file ]
 .SH DESCRIPTION
 .B Pdftotext
@@ -24,9 +24,9 @@ to
 .IR file.txt .
 If
 .I text-file
-is \'-', the text is sent to stdout.  If
+is \'\-', the text is sent to stdout.  If
 .I PDF-file
-is \'-', it reads the PDF file from stdin.
+is \'\-', it reads the PDF file from stdin.
 .SH OPTIONS
 .TP
 .BI \-f " number"
@@ -145,7 +145,7 @@ Error related to PDF permissions.
 99
 Other error.
 .SH AUTHOR
-The pdftotext software and documentation are copyright 1996-2011 Glyph
+The pdftotext software and documentation are copyright 1996\(en2011 Glyph
 & Cog, LLC.
 .SH "SEE ALSO"
 .BR pdfdetach (1),
  Any program (person), that produces man pages, should check the output
for defects by using (both groff and nroff)

[gn]roff -mandoc -t -ww -b -z -K utf8 <man page>

  To find trailing space use

grep -n -e ' $' -e ' \\f.$' -e ' \\"' <man page>

  The same goes for man pages that are used as an input.

  For a style guide use

  mandoc -T lint

-.-

  Any "autogenerator" should check its products with the above mentioned
'groff', 'mandoc', and additionally with 'nroff ...'.

  It should also check its input files for too long (> 80) lines.

  This is just a simple quality control measure.

  The "autogenerator" may have to be corrected to get a better man page,
the source file may, and any additional file may.

  Common defects:

  Not removing trailing spaces (in in- and output).
  The reason for these trailing spaces should be found and eliminated.

  "git" has a "tool" to point out whitespace,
see for example "git-apply(1)" and git-config(1)")

  Not beginning each input sentence on a new line.
Line length and patch size should thus be reduced.

  The script "reportbug" uses 'quoted-printable' encoding when a line is
longer than 1024 characters in an 'ascii' file.

  See man-pages(7), item "semantic newline".

-.-

The difference between the formatted output of the original and patched file
can be seen with:

  nroff -mandoc <file1> > <out1>
  nroff -mandoc <file2> > <out2>
  diff -d -u <out1> <out2>

and for groff, using

\"printf '%s\n%s\n' '.kern 0' '.ss 12 0' | groff -mandoc -Z - \"

instead of 'nroff -mandoc'

  Add the option '-t', if the file contains a table.

  Read the output from 'diff -d -u ...' with 'less -R' or similar.

-.-.

  If 'man' (man-db) is used to check the manual for warnings,
the following must be set:

  The option "-warnings=w"

  The environmental variable:

export MAN_KEEP_STDERR=yes (or any non-empty value)

  or

  (produce only warnings):

export MANROFFOPT="-ww -b -z"

export MAN_KEEP_STDERR=yes (or any non-empty value)

-.-

Reply via email to