Package: referencer Version: 1.0.2-1 Severity: normal I have a number of PDFs with DOIs appearing in the text, but that Referencer cannot properly scrape out. There is no true metadata in the PDF, so it's going for text extraction from the page body. The complete BT/ET block containing the DOI is at the end of this message, but the key bit is this:
[(doi:10.1016/)14.5(S)-95.3(0)]TJ 6.3307 0 TD 0.0983 Tc [(010-0277\(02\)00)-6.3(235-4)]TJ ET This causes libpoppler to feed this text to BibData::guessDoi(): doi:10.1016/S 0 0 1 0 - 0 2 7 7 ( 0 2 ) 0 0 2 3 5 - 4\n "10.1016/S" is what Referencer records as the DOI. The correct DOI is the above string with all the spaces taken out, i.e. 10.1016/S0010-0277(02)00235-4 . Unfortunately, I don't have any concrete suggestion for how guessDoi() could do a better job in this case without also screwing up other situations (where random text appears immediately after the DOI, separated only by a space). -- System Information: Debian Release: lenny/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing'), (1, 'experimental') Architecture: i386 (i686) Kernel: Linux 2.6.18-4-686 (SMP w/2 CPU cores) Locale: LANG=en_US, LC_CTYPE=en_US (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages referencer depends on: ii libart-2.0-2 2.3.19-3 Library of functions for 2D graphi ii libatk1.0-0 1.18.0-2 The ATK accessibility toolkit ii libbonobo2-0 2.18.0-2 Bonobo CORBA interfaces library ii libbonoboui2-0 2.18.0-5 The Bonobo UI library ii libboost-regex1.33.1 1.33.1-10 regular expression library for C++ ii libc6 2.5-7 GNU C Library: Shared libraries ii libcairo2 1.4.6-1 The Cairo 2D vector graphics libra ii libfontconfig1 2.4.2-1.2 generic font configuration library ii libgcc1 1:4.1.2-6 GCC support library ii libgconf2-4 2.18.0.1-3 GNOME configuration database syste ii libgconfmm-2.6-1c2 2.14.2-1 C++ wrappers for GConf (shared lib ii libglade2-0 1:2.6.0-4 library to load .glade files at ru ii libglademm-2.4-1c2a 2.6.2-2 C++ wrappers for libglade2 (shared ii libglib2.0-0 2.12.12-1 The GLib library of C routines ii libglibmm-2.4-1c2a 2.12.7-1 C++ wrapper for the GLib toolkit ( ii libgnome-keyring0 0.8.1-2 GNOME keyring services library ii libgnome-vfsmm-2.6-1c2a 2.14.0-1 C++ wrappers for GnomeVFS (shared ii libgnome2-0 2.18.0-4 The GNOME 2 library - runtime file ii libgnomecanvas2-0 2.14.0-2 A powerful object-oriented display ii libgnomecanvasmm-2.6-1c2a 2.14.0-1 C++ wrappers for libgnomecanvas2 ( ii libgnomemm-2.6-1c2 2.14.0-1 C++ wrappers for libgnome (shared ii libgnomeui-0 2.18.1-2 The GNOME 2 libraries (User Interf ii libgnomeuimm-2.6-1c2a 2.14.0-1 C++ wrappers for libgnomeui (share ii libgnomevfs2-0 1:2.18.1-2 GNOME Virtual File System (runtime ii libgtk2.0-0 2.10.12-1 The GTK+ graphical user interface ii libgtkmm-2.4-1c2a 1:2.8.8-1 C++ wrappers for GTK+ 2.4 (shared ii libice6 1:1.0.3-2 X11 Inter-Client Exchange library ii liborbit2 1:2.14.7-0.1 libraries for ORBit2 - a CORBA ORB ii libpango1.0-0 1.16.4-1 Layout and rendering of internatio ii libpoppler0c2 0.4.5-5.1 PDF rendering library ii libpopt0 1.10-3 lib for parsing cmdline parameters ii libsigc++-2.0-0c2a 2.0.17-2 type-safe Signal Framework for C++ ii libsm6 1:1.0.2-2 X11 Session Management library ii libstdc++6 4.1.2-6 The GNU Standard C++ Library v3 ii libx11-6 2:1.0.3-7 X11 client-side library ii libxcursor1 1:1.1.8-2 X cursor management library ii libxext6 1:1.0.3-2 X11 miscellaneous extension librar ii libxfixes3 1:4.0.3-2 X11 miscellaneous 'fixes' extensio ii libxi6 1:1.0.1-4 X11 Input extension library ii libxinerama1 1:1.0.2-1 X11 Xinerama extension library ii libxml2 2.6.28.dfsg-1 GNOME XML library ii libxrandr2 2:1.2.1-1 X11 RandR extension library ii libxrender1 1:0.9.2-1 X Rendering Extension client libra referencer recommends no packages. -- no debconf information BT 7.9702 0 0 7.9702 340.5542 597.3164 Tm [(www.elsev)11.4(ier.com/locate/co)8.9(gnit)]TJ -32.0589 -63.7337 TD [(0010-0277)15.5(/03/$)-299.5(-)-300.1(see)-293(front)-300.7(matter)]TJ /F4 1 Tf 13.9915 0 TD (\001)Tj /F1 1 Tf 1.1666 0 TD [(2003)-297.5(Elsevier)-289.8(Science)-293.2(B.V.)-299.7(All)-299.1(rights)-294.9(reserved.)]TJ -15.1581 -1.2448 TD [(doi:10.1016/)14.5(S)-95.3(0)]TJ 6.3307 0 TD 0.0983 Tc [(010-0277\(02\)00)-6.3(235-4)]TJ ET -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]