Hi Zack, On Sat, May 12, 2007 at 08:00:56PM -0700, Zack Weinberg wrote: > I have a number of PDFs with DOIs appearing in the text, but that > Referencer cannot properly scrape out. There is no true metadata in the > PDF, so it's going for text extraction from the page body. The complete > BT/ET block containing the DOI is at the end of this message, but the > key bit is this: > > [(doi:10.1016/)14.5(S)-95.3(0)]TJ > 6.3307 0 TD > 0.0983 Tc > [(010-0277\(02\)00)-6.3(235-4)]TJ > ET > > This causes libpoppler to feed this text to BibData::guessDoi(): > > doi:10.1016/S 0 0 1 0 - 0 2 7 7 ( 0 2 ) 0 0 2 3 5 - 4\n > > "10.1016/S" is what Referencer records as the DOI. The correct DOI is the > above > string with all the spaces taken out, i.e. 10.1016/S0010-0277(02)00235-4 . > > Unfortunately, I don't have any concrete suggestion for how guessDoi() could > do a better job in this case without also screwing up other situations (where > random text appears immediately after the DOI, separated only by a space).
Are you still using referencer? Can you verify this issue is still present? If yes, I would like to forward it to the author, so he can ponder on it. Sorry for the late reply, Michael -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]