https://bugs.documentfoundation.org/show_bug.cgi?id=151552
V Stuart Foote <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|filters and storage |Writer CC| |[email protected], | |[email protected], | |[email protected], | |[email protected] --- Comment #7 from V Stuart Foote <[email protected]> --- (In reply to Eyal Rozenberg from comment #6) No, please understand how our poppler based PDF import filtering functions. PDF is not an editable format. We do not Edit PDFs. A PDF viewer processor will open and parse PDF stream content onto fully described (in postscript) pages. And then manage display of those complete pages. Even for a document being "round-tripped" LibreOffice's import filter(s), using external poppler and poppler-utils libraries, extracts the content streams from the published presentation, and converts each stream into a discreet draw Shape object. The text runs in the PDF are just one of the content streams. Those discreet text run content streams have no lexical details and are strictly glyph based snippets of text with font and character metrics that are then used to create the draw Shape textboxes. The content stream includes a starting position on the published page, and that is used to coarsely position the draw textbox to LO canvas. That is why the text runs are not rendered to LO canvas as "justified" and can exceed the LO canvas margins. The mishandling of the RTL text was also manifestation of the fact that the content stream records text in the order they are recorded to the postscript page. There are similar issues for complex text recorded to PDF with /ActualText flag support. PDF Viewers don't need to do more with the content streams--they simply parse them and lay them out as described in the postscript pages. And LibreOffice actually includes a PDF viewer processor--that is the pdfium based ipdf filter used to insert PDF page as image. Improving fidelity of filter imported draw Shapes to content on the source PDF published page is out of scope for project. Put another way it is not justified to expend dev, QA and design resources working on the PDF import filters when we offer exceptional fidelity for PDF content using the pdfium based insert filters. Where any "manipulation" of the source PDF (e.g. page extraction, clipping, etc.) to prepare it for insertion is best done external to LibreOffice. And that is why I make the suggestion that perhaps it would be best just to drop the functional poppler based PDF import filter from core LO deliverables. And it could then be packaged more effectively as an extension (where it started in the Oracle OOo era). And again, LibreOffice is *not* a PDF editor. -- You are receiving this mail because: You are the assignee for the bug.
