Treating Comment #1 as "works as intended" (with a character precision limitation) and Bug #632524 as "broken" (font size/placement has no correlation to underlying text + out-of-bounds/missing/"dog-piled" text), I'm happy to report the following:
While developing a new Inkscape extension to export hand-drawn/typed text boxes as hOCR I came across the same issues reported in Bug #632524. The hOCR file generated by the extension does not use 'ocr_word' nor 'ocr_cinfo' elements, just plain text within 'ocr_line' parent elements (with corresponding unique 'id' and 'bbox' attributes). I believe hocr2pdf was mis-parsing the file expecting that each character was contained within its own bbox. As a stop-gap measure adding either matching <p></p> elements around each plain text line, or, alternatively, a <br> at the end of each plain text line resulted in 'proper' text placement. The <title> element also needs to be escaped in this way. Tested with exact-image 0.8.8. I wasn't able to complete the build due to relocation errors but '/objdir/frontends/hocr2pdf' was usable as-is. 'lib/hocr.cc' is the file in need of patches. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/623438 Title: Font size not correct in merged sandvich PDF To manage notifications about this bug go to: https://bugs.launchpad.net/cuneiform-linux/+bug/623438/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs