On Tuesday, 12 October 2021 11:49:23 BST Keith Marshall wrote: > Ref: https://savannah.gnu.org/bugs/index.php?55107 > > On 01/10/2021 01:10, Deri wrote: > > I did try to help Keith with this previously, but I was mildly "told > > off" (on list) for sending my help off list. I've learned my lesson. > > Thanks, Deri. > > IIRC, the reason for the "mild telling off" was that, by replying off > list, you denied us the potential benefit from other list members who > may have been willing to review the issue, and so contribute to the > debugging effort. I am pleased that, on this occasion, you have kept > this on-list; even if the majority of list members aren't sufficiently > interested to assist, there may be some who will, and any assistance > will be gratefully accepted, and very much appreciated. >
Hi Keith, I just assumed the best person for debugging faults in the code would probably be you rather than the rest of us. You may receive other "problem pdfs" from other members, but the debugging effort is likely to be yours alone. What I did find useful while debugging the pdf parser in pdfbb/gropdf was the Ghent PDF Output Suite (which has some very esoteric examples - sorry it is 144mb!), see:- http://gwg.org/gos5/ > > I attach a couple of pdfs with which the current code has problems. > > > > Picture.pdf > > > > [derij@pip groff-psbb]$ ./psbb ../../Picture.pdf > > ../../Picture.pdf: bounding box = (0,0)..(0,0) > > This is caused by the nested /Group dictionary, within the /Page object; > the current groff-psbb lexer is confused by it, and ends up in the wrong > state, when it eventually encounters the /MediaBox key. Adding one more > rule (for "<<") to the PDF dictionary state scanning model gets us to: > > $ ./psbb Picture.pdf > Picture.pdf: bounding box = (0,0)..(592,842) > > > [derij@pip groff-psbb]$ pdfbb ../../Picture.pdf > > Processing '../../Picture.pdf' > > ../../Picture.pdf: CropBox: 162.085,623.346,340.825,716.546 (178.74,93.2) > > The psbb lexer doesn't handle the /CropBox key. Should it? Should > /CropBox override any extant /MediaBox? If you view Picture.pdf with a pdf viewer you will see a dumb bell shape, this is in fact the area of the A4 page described by the CropBox, not the complete A4 page described by the MediaBox. If the MediaBox dimensions were given to PDFPIC the included picture would be the wrong shape. Current gropdf honours the various "boxes" in this order:- ArtBox TrimBox BleedBox CropBox MediaBox (No idea if this is "correct", but the viewers I have tested definitely prioritise CropBox over MediaBox, you will have to experiment). You would also have to be careful, a MediaBox at the group level could be overridden by a CropBox at the page level, I assume. > > croptest.pdf > > > > [derij@pip groff-psbb]$ ./psbb ../../croptest.pdf > > psbb:t-psbb (t-psbb.cpp):193: PDF file '../../croptest.pdf' is > > malformed; no trailer found > > Since croptest.pdf lacks both a trailer dictionary, and a free-standing > cross reference table, (both are hidden away within a /XRefStm object, > with a compressed cross reference table), croptest.pdf is _incompatible_ > with applications which do not support this feature of PDF-1.5 (and > later). The groff-psbb prototype implementation (currently) does not > offer this level of PDF-1.5 support; thus, this behaviour is expected. Gropdf/pdfbb now supports import of these later pdf versions (as does pdfinfo which PDFPIC currently uses) so it is important that whatever method is used to report the image dimensions back to PDFPIC is consistent with what a user would see when viewing the pdf in a viewer. > > [derij@pip groff-psbb]$ pdfbb ../../croptest.pdf > > Processing '../../croptest.pdf' > > ../../croptest.pdf: MediaBox: 0,0,595,842 (595,842) > > Well, this agrees with the result I've shown above, for Picture.pdf, Croptest.pdf is an A4 page written as a PDF 1.7 file but the included image (three times) is the CropBox from Picture.pdf. So the dimensions reported by pdfbb are correct, its an A4 page, but not because the Picture.pdf is wrongly reported as A4 by psbb. I have attached a new version called croptest-2.pdf, which psbb successfully reports as A4 (because this time it is written in PDF 1.4) but is showing that groff can embed a PDF 1.7 image (croptest.pdf) which itself contains three PDF 1.5 images (Picture.pdf). I also enclose the troff files which created the two pdfs, which shows that you don't need to use PDFPIC if you are concerned about using unsafe mode in groff. The only thing which PDFPIC does is calculate the vertical movement to do after the call to \X'pdf: pdfpic’ to continue output after the image, which is fairly easy to do manually given the information from pdfinfo. > with groff-psbb modified to properly handle nested dictionaries; some > further (non-trivial) development effort will be required, to support > concealment of trailer dictionaries and cross reference tables within > /XRefStm objects. There are several options which would address this problem, i.e. non portability of grep and desirability of avoiding groff unsafe mode. A) Replace grep with sed/awk (still requires unsafe mode). B) Use psbb (requires "non-trivial development"). C) Use pdfbb (requires hook in input.cpp to call pdfbb and return results). D) Convert pdfbb to be a pre-gropdf (i.e. a preprocessor like pre-grohtml) which would look for .PDFPIC and replace with the appropriate calls to \X'pdf: pdfpic’ and add vertical space with .sp. (A) is obviously the easiest and quickest, (C) and (D) are not too much work, since the parser required is already in use. Cheers Deri
croptest-2.pdf
Description: Adobe PDF document
.sp 1i \#.PDFPIC -L untitled.pdf .po 1cm .ll 19cm Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Aliquam ultrices sagittis orci a scelerisque purus semper eget duis. Condimentum id venenatis a condimentum. At ultrices mi tempus imperdiet nulla malesuada. Praesent semper feugiat nibh sed pulvinar proin. Libero enim sed faucibus turpis in. Tincidunt eget nullam non nisi est sit. Vulputate odio ut enim blandit volutpat maecenas volutpat blandit aliquam. Imperdiet dui accumsan sit amet nulla. Elit duis tristique sollicitudin nibh sit. Aliquam nulla facilisi cras fermentum odio eu feugiat pretium. Non arcu risus quis varius. Mi quis hendrerit dolor magna eget. Bibendum at varius vel pharetra vel turpis nunc eget. Massa massa ultricies mi quis hendrerit dolor magna. Donec ultrices tincidunt arcu non sodales neque. Facilisis gravida neque convallis a. Nulla facilisi cras fermentum odio eu feugiat pretium nibh ipsum. Ultrices sagittis orci a scelerisque purus semper. Praesent semper feugiat nibh sed pulvinar. Nisl condimentum id venenatis a condimentum vitae sapien pellentesque. Augue eget arcu dictum varius duis at. Nisl pretium fusce id velit ut tortor. Risus ultricies tristique nulla aliquet enim tortor at auctor. Tempus quam pellentesque nec nam aliquam sem. Ipsum a arcu cursus vitae. Sed turpis tincidunt id aliquet risus feugiat. Sit amet luctus venenatis lectus magna. Sed pulvinar proin gravida hendrerit. Neque aliquam vestibulum morbi blandit cursus risus at ultrices. In aliquam sem fringilla ut. Quam nulla porttitor massa id neque. Mi sit amet mauris commodo quis imperdiet massa tincidunt. Augue lacus viverra vitae congue eu consequat ac felis. Lobortis feugiat vivamus at augue eget arcu dictum. Pharetra et ultrices neque ornare aenean euismod. Elit at imperdiet dui accumsan sit. Cursus turpis massa tincidunt dui ut. Cursus mattis molestie a iaculis at erat pellentesque adipiscing. Aliquet sagittis id consectetur purus ut faucibus pulvinar elementum integer. Enim blandit volutpat maecenas volutpat blandit aliquam etiam erat. Diam vulputate ut pharetra sit amet. In iaculis nunc sed augue lacus viverra vitae. Amet commodo nulla facilisi nullam vehicula ipsum a arcu cursus. At auctor urna nunc id cursus metus. Bibendum at varius vel pharetra vel turpis nunc eget lorem. Amet consectetur adipiscing elit duis tristique. Nec dui nunc mattis enim ut tellus. Tellus in hac habitasse platea dictumst vestibulum rhoncus est. Mauris pharetra et ultrices neque ornare aenean. Commodo nulla facilisi nullam vehicula ipsum a arcu. Lacus viverra vitae congue eu consequat ac. Viverra aliquet eget sit amet tellus. Curabitur gravida arcu ac tortor dignissim convallis aenean. Ac felis donec et odio pellentesque. Sodales ut eu sem integer vitae justo eget magna. In arcu cursus euismod quis viverra nibh cras pulvinar mattis. Bibendum ut tristique et egestas quis. Sit amet risus nullam eget felis. Malesuada proin libero nunc consequat. Quis blandit turpis cursus in. Neque ornare aenean euismod elementum nisi quis. Accumsan lacus vel facilisis volutpat est. Non enim praesent elementum facilisis leo vel fringilla est. Quis vel eros donec ac odio tempor orci. Nulla pellentesque dignissim enim sit amet venenatis urna. Nunc mi ipsum faucibus vitae. Rhoncus dolor purus non enim praesent elementum. Risus in hendrerit gravida rutrum quisque non tellus orci. Egestas egestas fringilla phasellus faucibus scelerisque eleifend donec pretium. Viverra tellus in hac habitasse platea dictumst vestibulum rhoncus. Proin libero nunc consequat interdum varius. Suspendisse potenti nullam ac tortor vitae. Ultricies leo integer malesuada nunc vel. Nisi scelerisque eu ultrices vitae auctor eu augue ut lectus. Nam aliquam sem et tortor consequat id porta. Curabitur vitae nunc sed velit dignissim sodales ut eu. Lectus sit amet est placerat in. Nam at lectus urna duis convallis convallis tellus. Tortor aliquam nulla facilisi cras fermentum odio eu feugiat. At urna condimentum mattis pellentesque. Viverra justo nec ultrices dui sapien eget. Tempor nec feugiat nisl pretium. Ullamcorper malesuada proin libero nunc consequat interdum. A pellentesque sit amet porttitor eget. Libero justo laoreet sit amet cursus sit. Fermentum posuere urna nec tincidunt praesent semper feugiat nibh. Dictum fusce ut placerat orci nulla pellentesque dignissim enim. Rhoncus mattis rhoncus urna neque. Iaculis eu non diam phasellus vestibulum lorem. Eu turpis egestas pretium aenean. Egestas tellus rutrum tellus pellentesque eu tincidunt tortor aliquam. Tellus integer feugiat scelerisque varius morbi. Accumsan tortor posuere ac ut consequat semper viverra. Id venenatis a condimentum vitae sapien pellentesque. Hac habitasse platea dictumst vestibulum rhoncus est pellentesque elit. Diam sit amet nisl suscipit adipiscing bibendum est ultricies integer. Lorem ipsum dolor sit amet consectetur adipiscing elit. Morbi leo urna molestie at elementum eu. Dolor sit amet consectetur adipiscing elit duis tristique sollicitudin. A diam sollicitudin tempor id eu nisl nunc mi ipsum. Aliquam nulla facilisi cras fermentum odio eu feugiat pretium nibh. Non tellus orci ac auctor augue mauris. Dignissim convallis aenean et tortor at. Nulla facilisi etiam dignissim diam quis enim lobortis. Ut placerat orci nulla pellentesque dignissim. Ac orci phasellus egestas tellus rutrum. Mi proin sed libero enim sed faucibus turpis. Nulla facilisi cras fermentum odio eu feugiat pretium. Scelerisque felis imperdiet proin fermentum leo. Sit amet nisl suscipit adipiscing bibendum est ultricies. .sp |1.2i .nf \X'pdf: pdfpic croptest.pdf -L 2i' .sp 2i \X'pdf: pdfpic croptest.pdf -C 2i 0 \n[.l]z' .sp 2i \X'pdf: pdfpic croptest.pdf -R 2i 0 \n[.l]z'
.sp 1i \#.PDFPIC -L untitled.pdf .po 1cm .ll 19cm Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Aliquam ultrices sagittis orci a scelerisque purus semper eget duis. Condimentum id venenatis a condimentum. At ultrices mi tempus imperdiet nulla malesuada. Praesent semper feugiat nibh sed pulvinar proin. Libero enim sed faucibus turpis in. Tincidunt eget nullam non nisi est sit. Vulputate odio ut enim blandit volutpat maecenas volutpat blandit aliquam. Imperdiet dui accumsan sit amet nulla. Elit duis tristique sollicitudin nibh sit. Aliquam nulla facilisi cras fermentum odio eu feugiat pretium. Non arcu risus quis varius. Mi quis hendrerit dolor magna eget. Bibendum at varius vel pharetra vel turpis nunc eget. Massa massa ultricies mi quis hendrerit dolor magna. Donec ultrices tincidunt arcu non sodales neque. Facilisis gravida neque convallis a. Nulla facilisi cras fermentum odio eu feugiat pretium nibh ipsum. Ultrices sagittis orci a scelerisque purus semper. Praesent semper feugiat nibh sed pulvinar. Nisl condimentum id venenatis a condimentum vitae sapien pellentesque. Augue eget arcu dictum varius duis at. Nisl pretium fusce id velit ut tortor. Risus ultricies tristique nulla aliquet enim tortor at auctor. Tempus quam pellentesque nec nam aliquam sem. Ipsum a arcu cursus vitae. Sed turpis tincidunt id aliquet risus feugiat. Sit amet luctus venenatis lectus magna. Sed pulvinar proin gravida hendrerit. Neque aliquam vestibulum morbi blandit cursus risus at ultrices. In aliquam sem fringilla ut. Quam nulla porttitor massa id neque. Mi sit amet mauris commodo quis imperdiet massa tincidunt. Augue lacus viverra vitae congue eu consequat ac felis. Lobortis feugiat vivamus at augue eget arcu dictum. Pharetra et ultrices neque ornare aenean euismod. Elit at imperdiet dui accumsan sit. Cursus turpis massa tincidunt dui ut. Cursus mattis molestie a iaculis at erat pellentesque adipiscing. Aliquet sagittis id consectetur purus ut faucibus pulvinar elementum integer. Enim blandit volutpat maecenas volutpat blandit aliquam etiam erat. Diam vulputate ut pharetra sit amet. In iaculis nunc sed augue lacus viverra vitae. Amet commodo nulla facilisi nullam vehicula ipsum a arcu cursus. At auctor urna nunc id cursus metus. Bibendum at varius vel pharetra vel turpis nunc eget lorem. Amet consectetur adipiscing elit duis tristique. Nec dui nunc mattis enim ut tellus. Tellus in hac habitasse platea dictumst vestibulum rhoncus est. Mauris pharetra et ultrices neque ornare aenean. Commodo nulla facilisi nullam vehicula ipsum a arcu. Lacus viverra vitae congue eu consequat ac. Viverra aliquet eget sit amet tellus. Curabitur gravida arcu ac tortor dignissim convallis aenean. Ac felis donec et odio pellentesque. Sodales ut eu sem integer vitae justo eget magna. In arcu cursus euismod quis viverra nibh cras pulvinar mattis. Bibendum ut tristique et egestas quis. Sit amet risus nullam eget felis. Malesuada proin libero nunc consequat. Quis blandit turpis cursus in. Neque ornare aenean euismod elementum nisi quis. Accumsan lacus vel facilisis volutpat est. Non enim praesent elementum facilisis leo vel fringilla est. Quis vel eros donec ac odio tempor orci. Nulla pellentesque dignissim enim sit amet venenatis urna. Nunc mi ipsum faucibus vitae. Rhoncus dolor purus non enim praesent elementum. Risus in hendrerit gravida rutrum quisque non tellus orci. Egestas egestas fringilla phasellus faucibus scelerisque eleifend donec pretium. Viverra tellus in hac habitasse platea dictumst vestibulum rhoncus. Proin libero nunc consequat interdum varius. Suspendisse potenti nullam ac tortor vitae. Ultricies leo integer malesuada nunc vel. Nisi scelerisque eu ultrices vitae auctor eu augue ut lectus. Nam aliquam sem et tortor consequat id porta. Curabitur vitae nunc sed velit dignissim sodales ut eu. Lectus sit amet est placerat in. Nam at lectus urna duis convallis convallis tellus. Tortor aliquam nulla facilisi cras fermentum odio eu feugiat. At urna condimentum mattis pellentesque. Viverra justo nec ultrices dui sapien eget. Tempor nec feugiat nisl pretium. Ullamcorper malesuada proin libero nunc consequat interdum. A pellentesque sit amet porttitor eget. Libero justo laoreet sit amet cursus sit. Fermentum posuere urna nec tincidunt praesent semper feugiat nibh. Dictum fusce ut placerat orci nulla pellentesque dignissim enim. Rhoncus mattis rhoncus urna neque. Iaculis eu non diam phasellus vestibulum lorem. Eu turpis egestas pretium aenean. Egestas tellus rutrum tellus pellentesque eu tincidunt tortor aliquam. Tellus integer feugiat scelerisque varius morbi. Accumsan tortor posuere ac ut consequat semper viverra. Id venenatis a condimentum vitae sapien pellentesque. Hac habitasse platea dictumst vestibulum rhoncus est pellentesque elit. Diam sit amet nisl suscipit adipiscing bibendum est ultricies integer. Lorem ipsum dolor sit amet consectetur adipiscing elit. Morbi leo urna molestie at elementum eu. Dolor sit amet consectetur adipiscing elit duis tristique sollicitudin. A diam sollicitudin tempor id eu nisl nunc mi ipsum. Aliquam nulla facilisi cras fermentum odio eu feugiat pretium nibh. Non tellus orci ac auctor augue mauris. Dignissim convallis aenean et tortor at. Nulla facilisi etiam dignissim diam quis enim lobortis. Ut placerat orci nulla pellentesque dignissim. Ac orci phasellus egestas tellus rutrum. Mi proin sed libero enim sed faucibus turpis. Nulla facilisi cras fermentum odio eu feugiat pretium. Scelerisque felis imperdiet proin fermentum leo. Sit amet nisl suscipit adipiscing bibendum est ultricies. .sp |1.2i .nf \X'pdf: pdfpic Picture.pdf -L' .sp 2i \X'pdf: pdfpic Picture.pdf -C 0 0 \n[.l]z' .sp 2i \X'pdf: import Picture.pdf 162 623 340 716 3i .4i'