Hi Suzuki, no I was wondering because you said: > I'm not saying "this is ready to use, please use"
I tried it and it seems to be working ok. XML output would be much better though since that's what pdftotext's output is. Is there an quick/easy way to achieve the same with XML output? On Tue, May 8, 2018 at 7:21 PM, suzuki toshiya <[email protected]> wrote: > Sorry, I would not have sufficient time to work with poppler until the end > of > this month. > my patch for the poppler-dump was just a proof of concept, and I have not > used > much yet. > have you experienced something? > > obsidian . wrote: > > Hi Suzuki, > > > > have you noticed any problems while using the patched poppler-dump > utility? > > > > > > > > On Tue, May 8, 2018 at 2:25 AM, obsidian . <[email protected]< > mailto:[email protected]>> wrote: > > Thanks Suzuki. > > > > I was looking for something more tried, tested and "stable". > > I'm kind of surprised there's no other way to output char level > information. > > > > On Sat, May 5, 2018 at 9:41 AM, Adam Reichold <[email protected] > <mailto:[email protected]>> wrote: > > Hello again, > > > > so I obviously forgot the attachment... |:-\ Sorry for that. > > > > Regards, > > Adam > > > > Am 05.05.2018 um 08:16 schrieb Adam Reichold: > >> Hello mpsuzuki, > >> > >> attached is a version of your patch with some inline comments. > >> > >> Generally speaking, I would say that some well-defined format like JSON > >> or YAML would be preferable to the ad-hoc encoding? > >> > >> Best regards, > >> Adam > >> > >> Am 03.05.2018 um 13:50 schrieb suzuki toshiya: > >>> Current poppler-dump (a testing tool of cpp-frontend) has no feature to > >>> demonstrate per-character bbox feature. > >>> Attached patch adds the option to demonstrate it (I'm not saying "this > is ready > >>> to use, please use", I want to understand your request and whether > existing > >>> features could cover some part of your requests). > >>> > >>> The patched poppler-dump can work like this: > >>> > >>> $ cpp/tests/poppler-dump --show-glyph-list test.pdf > >>> Page 1/1: > >>> --- > >>> [Please] @ ( x=72 y=72.624 w=61.32 h=21.6 ) > >>> [0] @ ( x=72 y=72.624 w=13.344 h=21.6 ) > >>> [1] @ ( x=85.344 y=72.624 w=6.672 h=21.6 ) > >>> [2] @ ( x=92.016 y=72.624 w=10.656 h=21.6 ) > >>> [3] @ ( x=102.672 y=72.624 w=10.656 h=21.6 ) > >>> [4] @ ( x=113.328 y=72.624 w=9.336 h=21.6 ) > >>> [5] @ ( x=122.664 y=72.624 w=10.656 h=21.6 ) > >>> [wait...] @ ( x=139.32 y=72.624 w=59.328 h=21.6 ) > >>> [0] @ ( x=139.32 y=72.624 w=17.328 h=21.6 ) > >>> [1] @ ( x=156.648 y=72.624 w=10.656 h=21.6 ) > >>> [2] @ ( x=167.304 y=72.624 w=6.672 h=21.6 ) > >>> [3] @ ( x=173.976 y=72.624 w=6.672 h=21.6 ) > >>> [4] @ ( x=180.648 y=72.624 w=6 h=21.6 ) > >>> [5] @ ( x=186.648 y=72.624 w=6 h=21.6 ) > >>> [6] @ ( x=192.648 y=72.624 w=6 h=21.6 ) > >>> [If] @ ( x=72 y=112.428 w=7.992 h=10.8 ) > >>> [0] @ ( x=72 y=112.428 w=3.996 h=10.8 ) > >>> [1] @ ( x=75.996 y=112.428 w=3.996 h=10.8 ) > >>> [this] @ ( x=82.992 y=112.428 w=17.34 h=10.8 ) > >>> [0] @ ( x=82.992 y=112.428 w=3.336 h=10.8 ) > >>> [1] @ ( x=86.328 y=112.428 w=6 h=10.8 ) > >>> [2] @ ( x=92.328 y=112.428 w=3.336 h=10.8 ) > >>> [3] @ ( x=95.664 y=112.428 w=4.668 h=10.8 ) > >>> ... > >>> > >>> Regards, > >>> mpsuzuki > >>> > >>> suzuki toshiya wrote: > >>>> Dear obsidian, > >>>> > >>>> Too many posts about similar issues :-) > >>>> I'm not sure whether poppler maintainers are interested in the > enhancement of > >>>> pdftotext, > >>>> but recently Jeroen and I were working with cpp-frontend to have > similar features. > >>>> > >>>> in the latest version of poppler, > >>>> cpp-frontend has a feature to retrieve the list of words with > bounding box, > >>>> and it can retrieve the bounding box for each glyph in the word. > >>>> > >>>> -- > >>>> > >>>> also I proposed a patch to retrieve the font family and point size: > >>>> https://lists.freedesktop.org/archives/poppler/2018-April/013035.html > >>>> > >>>> it might be waiting the maintainers review. the discussion and result > would be > >>>> found at here: > >>>> https://github.com/ropensci/pdftools/issues/29 > >>>> > >>>> -- > >>>> > >>>>> - style, i.e. none, bold, italic > >>>> if the document producer has a bold font and used in the document, > aslike > >>>> Helvetica-Bold, > >>>> it would be found by the family name. > >>>> but if the document producer has no bold font and let the word > processor > >>>> software synthesize the embolden fonts, > >>>> it would be difficult for the PDF renderer to recognize it as > embolden font, > >>>> because the embolding is done by showing same glyph with subtle > shifting. > >>>> Simple PDF renderers would be unable to distinguish "normal font but > layered" > >>>> and "embolden font". > >>>> > >>>> Regards, > >>>> mpsuzuki > >>>> > >>>> obsidian . wrote: > >>>>> I'm using "pdftotext -bbox file.pdf" to convert a pdf file into html. > >>>>> > >>>>> Here's a sample line from the output: > >>>>> <word xMin="359.852025" yMin="462.548936" xMax="365.689478" > yMax="467.681498">foo</word> > >>>>> > >>>>> Is there a way to get font information for every word like: > >>>>> - font family, e.g. Verdana > >>>>> - style, i.e. none, bold, italic > >>>>> - size, e.g. font size 9 > >>>>> > >>>>> I'm using pdftotext version 0.55.0 on Windows. > >>>>> > >>>>> > >>>> _______________________________________________ > >>>> poppler mailing list > >>>> [email protected]<mailto:[email protected]> > >>>> https://lists.freedesktop.org/mailman/listinfo/poppler > >>>> > >>>> > >>>> _______________________________________________ > >>>> poppler mailing list > >>>> [email protected]<mailto:[email protected]> > >>>> https://lists.freedesktop.org/mailman/listinfo/poppler > >> > >> > >> _______________________________________________ > >> poppler mailing list > >> [email protected]<mailto:[email protected]> > >> https://lists.freedesktop.org/mailman/listinfo/poppler > >> > > > > _______________________________________________ > > poppler mailing list > > [email protected]<mailto:[email protected]> > > https://lists.freedesktop.org/mailman/listinfo/poppler > > > > > > > > > > > >
_______________________________________________ poppler mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/poppler
