Hi Deri,

Thanks for trying it out.

On 09/10/17 01:21, Deri James wrote:
> Some pdfs I have tried fail with "syntax error".

That's yacc's default behaviour, when the sequence of tokens returned 
by the lexer doesn't conform to its notion of a valid grammar -- either 
the order isn't as expected, or the sequence is incomplete.

> It seems to occur if MediaBox is defined in an ancestor object rather
> than in a "/Page object. There are a number of page attributes which
> are inheritable in this way, MediaBox is one of them.

I do know that, thanks; it is a configuration which I did test, (albeit 
with contrived, hand crafted test files):

  $ ./psbb *.pdf
  inherited.pdf: bounding box = (0,0)..(612,792)
  minimal.pdf: bounding box = (0,0)..(612,792)
  override.pdf: bounding box = (0,0)..(606,809)

> So in case a MediaBox is superseded by an entry further down the tree
> you still have to continue looking till you get to the object for
> page 1, to make sure.

And this is exactly what my code does!  (To be precise, it parses the 
trailer dictionary, to locate the /Catalog object, whence it follows the 
indirect object reference to the top level /Pages object, and thence, it 
follows the chain of the first /Kids references, through as many /Pages 
objects as it may find, until it finds the first /Page object.  In each 
/Pages object it traverses, it evaluates any /MediaBox specifications 
it may find; at each lower level, any such specification overrides any 
which was evaluated at a higher level.  Thus, when the /Page object is 
parsed, the last /MediaBox encountered -- which may be within the /Page 
object itself, or in its nearest /Pages ancestor which specified one -- 
will prevail).

Perhaps, you could:

  $ make clean
  $ make CFLAGS=-DDEBUGGING

and check your failing PDFs again, so we can see whatever unexpected 
token sequence is leading to the "syntax error"; only when we know that, 
will we have any chance of handling it, before the parser simply gives 
up on the offending PDF.

-- 
Regards,
Keith.

Attachment: samples.tar.xz
Description: application/xz

Reply via email to