https://bugs.documentfoundation.org/show_bug.cgi?id=151577

V Stuart Foote <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEEDINFO
           Keywords|                            |needsDevAdvice
     Ever confirmed|0                           |1
           See Also|                            |https://bugs.documentfounda
                   |                            |tion.org/show_bug.cgi?id=32
                   |                            |249,
                   |                            |https://bugs.documentfounda
                   |                            |tion.org/show_bug.cgi?id=11
                   |                            |8370
           Severity|normal                      |enhancement
             Blocks|                            |99746
                 CC|                            |[email protected],
                   |                            |[email protected],
                   |                            |[email protected],
                   |                            |[email protected],
                   |                            |[email protected]

--- Comment #6 from V Stuart Foote <[email protected]> ---
Sorry, it is a dupe of bug 33249 clear an simple. Filter functions needed to
render PDF text spans back as Paragraph objects would be the same across all LO
modules. 

Comment 0 was opened against a Writer originated ODF document, but there is no
distinction made in the export filter(s) (PDF has no "paragraph" object keeping
text spans together as sentences, even words might be broken apart). And this
*enhancement* is not about the LO Hybrid PDF that attaches the ODF source
document into the PDF and selectively LO will open that attachment on
import--bypassing the PDF facsimile. But that already functions as an export
option.

For bug 32249 and bug 118370 Justin L. completed *one* reasonable approach
working with the poppler -> cairo extracted sd text box objects from the PDF
BT/ET spans, of "consolidating" a selection of the generated text boxes into a
single text box object.

An alternative was proposed at
https://bugs.documentfoundation.org/show_bug.cgi?id=32249#c19 of an process
taking the extracted strings (still poppler -> cairo based) and reflowing that
into lexically correct full sentences or full paragraph objects. And assembling
those into as an ODF ready object available to style, spell check, etc. Focus
would be less on the layout of the PDF and more on extracting a lexicographic
correct representation of a page.

So, this bz issue could be that additional work. More fully scoped here. Or, we
 could set back to the dupe it is as bug 33249 was left open after the work on
bug 118370 but scope was not expanded to all PDF import filters. 

Added the devs with insight, for their opinions, but coin flip set it again as
the dupe it is.


Referenced Bugs:

https://bugs.documentfoundation.org/show_bug.cgi?id=99746
[Bug 99746] [META] PDF import filter in Draw
-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to