On Tuesday, 12 October 2021 11:49:23 BST Keith Marshall wrote:
> Ref: https://savannah.gnu.org/bugs/index.php?55107
> 
> On 01/10/2021 01:10, Deri wrote:
> > I did try to help Keith with this previously, but I was mildly "told
> > off" (on list) for sending my help off list. I've learned my lesson.
> 
> Thanks, Deri.
> 
> IIRC, the reason for the "mild telling off" was that, by replying off
> list, you denied us the potential benefit from other list members who
> may have been willing to review the issue, and so contribute to the
> debugging effort.  I am pleased that, on this occasion, you have kept
> this on-list; even if the majority of list members aren't sufficiently
> interested to assist, there may be some who will, and any assistance
> will be gratefully accepted, and very much appreciated.
> 

Hi Keith,

I just assumed the best person for debugging faults in the code would probably 
be you rather than the rest of us. You may receive other "problem pdfs" from 
other members, but the debugging effort is likely to be yours alone.

What I did find useful while debugging the pdf parser in pdfbb/gropdf was the 
Ghent PDF Output Suite (which has some very esoteric examples - sorry it is 
144mb!), see:-

http://gwg.org/gos5/

> > I attach a couple of pdfs with which the current code has problems.
> > 
> > Picture.pdf
> > 
> > [derij@pip groff-psbb]$ ./psbb ../../Picture.pdf
> > ../../Picture.pdf: bounding box = (0,0)..(0,0)
> 
> This is caused by the nested /Group dictionary, within the /Page object;
> the current groff-psbb lexer is confused by it, and ends up in the wrong
> state, when it eventually encounters the /MediaBox key.  Adding one more
> rule (for "<<") to the PDF dictionary state scanning model gets us to:
> 
>    $ ./psbb Picture.pdf
>    Picture.pdf: bounding box = (0,0)..(592,842)
> 
> > [derij@pip groff-psbb]$ pdfbb ../../Picture.pdf
> > Processing '../../Picture.pdf'
> > ../../Picture.pdf: CropBox: 162.085,623.346,340.825,716.546  (178.74,93.2)
> 
> The psbb lexer doesn't handle the /CropBox key.  Should it?  Should
> /CropBox override any extant /MediaBox?

If you view Picture.pdf with a pdf viewer you will see a dumb bell shape, this 
is in fact the area of the A4 page described by the CropBox, not the complete 
A4 page described by the MediaBox. If the MediaBox dimensions were given to 
PDFPIC the included picture would be the wrong shape. Current gropdf honours 
the various "boxes" in this order:-

ArtBox TrimBox BleedBox CropBox MediaBox

(No idea if this is "correct", but the viewers I have tested definitely 
prioritise CropBox over MediaBox, you will have to experiment). 

You would also have to be careful, a MediaBox at the group level could be 
overridden by a CropBox at the page level, I assume.

> > croptest.pdf
> > 
> > [derij@pip groff-psbb]$ ./psbb ../../croptest.pdf
> > psbb:t-psbb (t-psbb.cpp):193: PDF file '../../croptest.pdf' is
> > malformed; no trailer found
> 
> Since croptest.pdf lacks both a trailer dictionary, and a free-standing
> cross reference table, (both are hidden away within a /XRefStm object,
> with a compressed cross reference table), croptest.pdf is _incompatible_
> with applications which do not support this feature of PDF-1.5 (and
> later).  The groff-psbb prototype implementation (currently) does not
> offer this level of PDF-1.5 support; thus, this behaviour is expected.

Gropdf/pdfbb now supports import of these later pdf versions (as does pdfinfo 
which PDFPIC currently uses) so it is important that whatever method is used 
to report the image dimensions back to PDFPIC is consistent with what a user 
would see when viewing the pdf in a viewer.

> > [derij@pip groff-psbb]$ pdfbb ../../croptest.pdf
> > Processing '../../croptest.pdf'
> > ../../croptest.pdf: MediaBox: 0,0,595,842  (595,842)
> 
> Well, this agrees with the result I've shown above, for Picture.pdf,

Croptest.pdf is an A4 page written as a PDF 1.7 file but the included image 
(three times) is the CropBox from Picture.pdf. So the dimensions reported by 
pdfbb are correct, its an A4 page, but not because the Picture.pdf is wrongly 
reported as A4 by psbb.

I have attached a new version called croptest-2.pdf, which psbb successfully 
reports as A4 (because this time it is written in PDF 1.4) but is showing that 
groff can embed a PDF 1.7 image (croptest.pdf) which itself contains three PDF 
1.5 images (Picture.pdf). I also enclose the troff files which created the two 
pdfs, which shows that you don't need to use PDFPIC if you are concerned about 
using unsafe mode in groff. The only thing which PDFPIC does is calculate the 
vertical movement to do after the call to \X'pdf: pdfpic’ to continue output 
after the image, which is fairly easy to do manually given the information 
from pdfinfo.

> with groff-psbb modified to properly handle nested dictionaries; some
> further (non-trivial) development effort will be required, to support
> concealment of trailer dictionaries and cross reference tables within
> /XRefStm objects.

There are several options which would address this problem, i.e. non 
portability of grep and desirability of avoiding groff unsafe mode.

A) Replace grep with sed/awk (still requires unsafe mode).

B) Use psbb (requires "non-trivial development").

C) Use pdfbb (requires hook in input.cpp to call pdfbb and return results).

D) Convert pdfbb to be a pre-gropdf (i.e. a preprocessor like pre-grohtml) 
which would look for .PDFPIC and replace with the appropriate calls to \X'pdf: 
pdfpic’ and add vertical space with .sp.

(A) is obviously the easiest and quickest, (C) and (D) are not too much work, 
since the parser required is already in use.

Cheers 

Deri

Attachment: croptest-2.pdf
Description: Adobe PDF document

.sp 1i
\#.PDFPIC -L untitled.pdf 
.po 1cm
.ll 19cm
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor 
incididunt ut labore et dolore magna aliqua. Aliquam ultrices sagittis orci a 
scelerisque purus semper eget duis. Condimentum id venenatis a condimentum. At 
ultrices mi tempus imperdiet nulla malesuada. Praesent semper feugiat nibh sed 
pulvinar proin. Libero enim sed faucibus turpis in. Tincidunt eget nullam non 
nisi est sit. Vulputate odio ut enim blandit volutpat maecenas volutpat blandit 
aliquam. Imperdiet dui accumsan sit amet nulla. Elit duis tristique 
sollicitudin nibh sit. Aliquam nulla facilisi cras fermentum odio eu feugiat 
pretium. Non arcu risus quis varius. Mi quis hendrerit dolor magna eget. 
Bibendum at varius vel pharetra vel turpis nunc eget. Massa massa ultricies mi 
quis hendrerit dolor magna. Donec ultrices tincidunt arcu non sodales neque.

Facilisis gravida neque convallis a. Nulla facilisi cras fermentum odio eu 
feugiat pretium nibh ipsum. Ultrices sagittis orci a scelerisque purus semper. 
Praesent semper feugiat nibh sed pulvinar. Nisl condimentum id venenatis a 
condimentum vitae sapien pellentesque. Augue eget arcu dictum varius duis at. 
Nisl pretium fusce id velit ut tortor. Risus ultricies tristique nulla aliquet 
enim tortor at auctor. Tempus quam pellentesque nec nam aliquam sem. Ipsum a 
arcu cursus vitae. Sed turpis tincidunt id aliquet risus feugiat. Sit amet 
luctus venenatis lectus magna. Sed pulvinar proin gravida hendrerit.

Neque aliquam vestibulum morbi blandit cursus risus at ultrices. In aliquam sem 
fringilla ut. Quam nulla porttitor massa id neque. Mi sit amet mauris commodo 
quis imperdiet massa tincidunt. Augue lacus viverra vitae congue eu consequat 
ac felis. Lobortis feugiat vivamus at augue eget arcu dictum. Pharetra et 
ultrices neque ornare aenean euismod. Elit at imperdiet dui accumsan sit. 
Cursus turpis massa tincidunt dui ut. Cursus mattis molestie a iaculis at erat 
pellentesque adipiscing. Aliquet sagittis id consectetur purus ut faucibus 
pulvinar elementum integer. Enim blandit volutpat maecenas volutpat blandit 
aliquam etiam erat. Diam vulputate ut pharetra sit amet. In iaculis nunc sed 
augue lacus viverra vitae. Amet commodo nulla facilisi nullam vehicula ipsum a 
arcu cursus. At auctor urna nunc id cursus metus.

Bibendum at varius vel pharetra vel turpis nunc eget lorem. Amet consectetur 
adipiscing elit duis tristique. Nec dui nunc mattis enim ut tellus. Tellus in 
hac habitasse platea dictumst vestibulum rhoncus est. Mauris pharetra et 
ultrices neque ornare aenean. Commodo nulla facilisi nullam vehicula ipsum a 
arcu. Lacus viverra vitae congue eu consequat ac. Viverra aliquet eget sit amet 
tellus. Curabitur gravida arcu ac tortor dignissim convallis aenean. Ac felis 
donec et odio pellentesque. Sodales ut eu sem integer vitae justo eget magna. 
In arcu cursus euismod quis viverra nibh cras pulvinar mattis. Bibendum ut 
tristique et egestas quis. Sit amet risus nullam eget felis.

Malesuada proin libero nunc consequat. Quis blandit turpis cursus in. Neque 
ornare aenean euismod elementum nisi quis. Accumsan lacus vel facilisis 
volutpat est. Non enim praesent elementum facilisis leo vel fringilla est. Quis 
vel eros donec ac odio tempor orci. Nulla pellentesque dignissim enim sit amet 
venenatis urna. Nunc mi ipsum faucibus vitae. Rhoncus dolor purus non enim 
praesent elementum. Risus in hendrerit gravida rutrum quisque non tellus orci. 
Egestas egestas fringilla phasellus faucibus scelerisque eleifend donec 
pretium. Viverra tellus in hac habitasse platea dictumst vestibulum rhoncus. 
Proin libero nunc consequat interdum varius. Suspendisse potenti nullam ac 
tortor vitae. Ultricies leo integer malesuada nunc vel. Nisi scelerisque eu 
ultrices vitae auctor eu augue ut lectus. Nam aliquam sem et tortor consequat 
id porta. Curabitur vitae nunc sed velit dignissim sodales ut eu.

Lectus sit amet est placerat in. Nam at lectus urna duis convallis convallis 
tellus. Tortor aliquam nulla facilisi cras fermentum odio eu feugiat. At urna 
condimentum mattis pellentesque. Viverra justo nec ultrices dui sapien eget. 
Tempor nec feugiat nisl pretium. Ullamcorper malesuada proin libero nunc 
consequat interdum. A pellentesque sit amet porttitor eget. Libero justo 
laoreet sit amet cursus sit. Fermentum posuere urna nec tincidunt praesent 
semper feugiat nibh. Dictum fusce ut placerat orci nulla pellentesque dignissim 
enim. Rhoncus mattis rhoncus urna neque. Iaculis eu non diam phasellus 
vestibulum lorem. Eu turpis egestas pretium aenean. Egestas tellus rutrum 
tellus pellentesque eu tincidunt tortor aliquam. Tellus integer feugiat 
scelerisque varius morbi. Accumsan tortor posuere ac ut consequat semper 
viverra. Id venenatis a condimentum vitae sapien pellentesque.

Hac habitasse platea dictumst vestibulum rhoncus est pellentesque elit. Diam 
sit amet nisl suscipit adipiscing bibendum est ultricies integer. Lorem ipsum 
dolor sit amet consectetur adipiscing elit. Morbi leo urna molestie at 
elementum eu. Dolor sit amet consectetur adipiscing elit duis tristique 
sollicitudin. A diam sollicitudin tempor id eu nisl nunc mi ipsum. Aliquam 
nulla facilisi cras fermentum odio eu feugiat pretium nibh. Non tellus orci ac 
auctor augue mauris. Dignissim convallis aenean et tortor at. Nulla facilisi 
etiam dignissim diam quis enim lobortis. Ut placerat orci nulla pellentesque 
dignissim. Ac orci phasellus egestas tellus rutrum. Mi proin sed libero enim 
sed faucibus turpis. Nulla facilisi cras fermentum odio eu feugiat pretium. 
Scelerisque felis imperdiet proin fermentum leo. Sit amet nisl suscipit 
adipiscing bibendum est ultricies.
.sp |1.2i
.nf
\X'pdf: pdfpic croptest.pdf -L 2i'
.sp 2i
\X'pdf: pdfpic croptest.pdf -C 2i 0 \n[.l]z'
.sp 2i
\X'pdf: pdfpic croptest.pdf -R 2i 0 \n[.l]z'
.sp 1i
\#.PDFPIC -L untitled.pdf 
.po 1cm
.ll 19cm
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor 
incididunt ut labore et dolore magna aliqua. Aliquam ultrices sagittis orci a 
scelerisque purus semper eget duis. Condimentum id venenatis a condimentum. At 
ultrices mi tempus imperdiet nulla malesuada. Praesent semper feugiat nibh sed 
pulvinar proin. Libero enim sed faucibus turpis in. Tincidunt eget nullam non 
nisi est sit. Vulputate odio ut enim blandit volutpat maecenas volutpat blandit 
aliquam. Imperdiet dui accumsan sit amet nulla. Elit duis tristique 
sollicitudin nibh sit. Aliquam nulla facilisi cras fermentum odio eu feugiat 
pretium. Non arcu risus quis varius. Mi quis hendrerit dolor magna eget. 
Bibendum at varius vel pharetra vel turpis nunc eget. Massa massa ultricies mi 
quis hendrerit dolor magna. Donec ultrices tincidunt arcu non sodales neque.

Facilisis gravida neque convallis a. Nulla facilisi cras fermentum odio eu 
feugiat pretium nibh ipsum. Ultrices sagittis orci a scelerisque purus semper. 
Praesent semper feugiat nibh sed pulvinar. Nisl condimentum id venenatis a 
condimentum vitae sapien pellentesque. Augue eget arcu dictum varius duis at. 
Nisl pretium fusce id velit ut tortor. Risus ultricies tristique nulla aliquet 
enim tortor at auctor. Tempus quam pellentesque nec nam aliquam sem. Ipsum a 
arcu cursus vitae. Sed turpis tincidunt id aliquet risus feugiat. Sit amet 
luctus venenatis lectus magna. Sed pulvinar proin gravida hendrerit.

Neque aliquam vestibulum morbi blandit cursus risus at ultrices. In aliquam sem 
fringilla ut. Quam nulla porttitor massa id neque. Mi sit amet mauris commodo 
quis imperdiet massa tincidunt. Augue lacus viverra vitae congue eu consequat 
ac felis. Lobortis feugiat vivamus at augue eget arcu dictum. Pharetra et 
ultrices neque ornare aenean euismod. Elit at imperdiet dui accumsan sit. 
Cursus turpis massa tincidunt dui ut. Cursus mattis molestie a iaculis at erat 
pellentesque adipiscing. Aliquet sagittis id consectetur purus ut faucibus 
pulvinar elementum integer. Enim blandit volutpat maecenas volutpat blandit 
aliquam etiam erat. Diam vulputate ut pharetra sit amet. In iaculis nunc sed 
augue lacus viverra vitae. Amet commodo nulla facilisi nullam vehicula ipsum a 
arcu cursus. At auctor urna nunc id cursus metus.

Bibendum at varius vel pharetra vel turpis nunc eget lorem. Amet consectetur 
adipiscing elit duis tristique. Nec dui nunc mattis enim ut tellus. Tellus in 
hac habitasse platea dictumst vestibulum rhoncus est. Mauris pharetra et 
ultrices neque ornare aenean. Commodo nulla facilisi nullam vehicula ipsum a 
arcu. Lacus viverra vitae congue eu consequat ac. Viverra aliquet eget sit amet 
tellus. Curabitur gravida arcu ac tortor dignissim convallis aenean. Ac felis 
donec et odio pellentesque. Sodales ut eu sem integer vitae justo eget magna. 
In arcu cursus euismod quis viverra nibh cras pulvinar mattis. Bibendum ut 
tristique et egestas quis. Sit amet risus nullam eget felis.

Malesuada proin libero nunc consequat. Quis blandit turpis cursus in. Neque 
ornare aenean euismod elementum nisi quis. Accumsan lacus vel facilisis 
volutpat est. Non enim praesent elementum facilisis leo vel fringilla est. Quis 
vel eros donec ac odio tempor orci. Nulla pellentesque dignissim enim sit amet 
venenatis urna. Nunc mi ipsum faucibus vitae. Rhoncus dolor purus non enim 
praesent elementum. Risus in hendrerit gravida rutrum quisque non tellus orci. 
Egestas egestas fringilla phasellus faucibus scelerisque eleifend donec 
pretium. Viverra tellus in hac habitasse platea dictumst vestibulum rhoncus. 
Proin libero nunc consequat interdum varius. Suspendisse potenti nullam ac 
tortor vitae. Ultricies leo integer malesuada nunc vel. Nisi scelerisque eu 
ultrices vitae auctor eu augue ut lectus. Nam aliquam sem et tortor consequat 
id porta. Curabitur vitae nunc sed velit dignissim sodales ut eu.

Lectus sit amet est placerat in. Nam at lectus urna duis convallis convallis 
tellus. Tortor aliquam nulla facilisi cras fermentum odio eu feugiat. At urna 
condimentum mattis pellentesque. Viverra justo nec ultrices dui sapien eget. 
Tempor nec feugiat nisl pretium. Ullamcorper malesuada proin libero nunc 
consequat interdum. A pellentesque sit amet porttitor eget. Libero justo 
laoreet sit amet cursus sit. Fermentum posuere urna nec tincidunt praesent 
semper feugiat nibh. Dictum fusce ut placerat orci nulla pellentesque dignissim 
enim. Rhoncus mattis rhoncus urna neque. Iaculis eu non diam phasellus 
vestibulum lorem. Eu turpis egestas pretium aenean. Egestas tellus rutrum 
tellus pellentesque eu tincidunt tortor aliquam. Tellus integer feugiat 
scelerisque varius morbi. Accumsan tortor posuere ac ut consequat semper 
viverra. Id venenatis a condimentum vitae sapien pellentesque.

Hac habitasse platea dictumst vestibulum rhoncus est pellentesque elit. Diam 
sit amet nisl suscipit adipiscing bibendum est ultricies integer. Lorem ipsum 
dolor sit amet consectetur adipiscing elit. Morbi leo urna molestie at 
elementum eu. Dolor sit amet consectetur adipiscing elit duis tristique 
sollicitudin. A diam sollicitudin tempor id eu nisl nunc mi ipsum. Aliquam 
nulla facilisi cras fermentum odio eu feugiat pretium nibh. Non tellus orci ac 
auctor augue mauris. Dignissim convallis aenean et tortor at. Nulla facilisi 
etiam dignissim diam quis enim lobortis. Ut placerat orci nulla pellentesque 
dignissim. Ac orci phasellus egestas tellus rutrum. Mi proin sed libero enim 
sed faucibus turpis. Nulla facilisi cras fermentum odio eu feugiat pretium. 
Scelerisque felis imperdiet proin fermentum leo. Sit amet nisl suscipit 
adipiscing bibendum est ultricies.
.sp |1.2i
.nf
\X'pdf: pdfpic Picture.pdf -L'
.sp 2i
\X'pdf: pdfpic Picture.pdf -C 0 0 \n[.l]z'
.sp 2i
\X'pdf: import Picture.pdf 162 623 340 716 3i .4i'

Reply via email to