Treating Comment #1 as "works as intended" (with a character precision
limitation) and Bug #632524 as "broken" (font size/placement has no
correlation to underlying text + out-of-bounds/missing/"dog-piled"
text), I'm happy to report the following:

While developing a new Inkscape extension to export hand-drawn/typed
text boxes as hOCR I came across the same issues reported in Bug
#632524.  The hOCR file generated by the extension does not use
'ocr_word' nor 'ocr_cinfo' elements, just plain text within 'ocr_line'
parent elements (with corresponding unique 'id' and 'bbox' attributes).

I believe hocr2pdf was mis-parsing the file expecting that each
character was contained within its own bbox.  As a stop-gap measure
adding either matching <p></p> elements around each plain text line, or,
alternatively, a <br> at the end of each plain text line resulted in
'proper' text placement.  The <title> element also needs to be escaped
in this way.

Tested with exact-image 0.8.8.  I wasn't able to complete the build due
to relocation errors but '/objdir/frontends/hocr2pdf' was usable as-is.
'lib/hocr.cc' is the file in need of patches.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/623438

Title:
  Font size not correct in merged sandvich PDF

To manage notifications about this bug go to:
https://bugs.launchpad.net/cuneiform-linux/+bug/623438/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to