I tested the same example sentence with Ubuntu 18.04 and LibreOffice 6.0.3.2. Here’s the output from pdftotext:
ه اشترى للا خمسة آفا كتاب وَأنَا اشْ ت َ َريْتُهَا ِ من ْ ُ Here four out of the eight words are intact, so it’s an improvement to 5.4.6 but still leaves a lot to hope for. The last word of the sentence (مِنْهُ) is broken into pieces so that the last full character ه is found on the first line and the two others on the last. Diacritical marks are sometimes placed where they are supposed to (such as the first and the three last diacritics in the word اشْتَرَيْتُهَا) but sometimes not (the middle of the same word and the last word of the sentence مِنْهُ). This time ى is visible but the first letter of the following word ب is not. Here’s what MS Word 2007 (12.0.6787.5000, SP3 MSO 12.0.6785.5000) on Windows 8.1 produces when processed by pdftotext: اشترى بالل خمسة آالف كتاب وأنا اشتريتها منه So Word 2007 drops all the diacritics, and mixes up the order of the letters in the combination ل (U+0644) + ا (U+0627) producing ال instead of لا. Otherwise the output is intact and definitely much better than LO. I don't have any newer versions of MS Word at my disposal, so I can't test it further. -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to libreoffice in Ubuntu. https://bugs.launchpad.net/bugs/1772439 Title: Arabic text gets deformed when creating a PDF in LibreOffice Writer Status in libreoffice package in Ubuntu: New Bug description: Creating a PDF from a document written in the Arabic script deforms the textual content of the document, although it looks fine on the screen. For example, see the attached PDF created with Writer, where the example sentence "اشترى بلال خمسة آلاف كتاب وَأَنَا اشْتَرَيْتُهَا مِنْهُ" looks as it should, but when you view it with any PDF reader, such as evince, copying the text deforms most of the words. Some characters are clearly visible but cannot be selected or searched (such as ى at the end of the first word اشترى). If I search for the second word بلال, evince tells me there are no matches in the document. The same happens when converting the file with pdftotext, which produces the following output: اشتر للا مسة لفا كتاب وَأَنَا ْ ه اشت َ َريْتُهَا ِ من ْ ُ Here only two of the seven words are intact, the rest are garbled in one way or another. If the text is in Latin script, both evince and pdftotext behave as expected, meaning that the textual content is transferred correctly from Writer to the PDF. Description: Ubuntu 17.10 Release: 17.10 libreoffice-writer: Installed: 1:5.4.6-0ubuntu0.17.10.1 Candidate: 1:5.4.6-0ubuntu0.17.10.1 Version table: *** 1:5.4.6-0ubuntu0.17.10.1 500 500 http://mr.archive.ubuntu.com/ubuntu artful-updates/main amd64 Packages 100 /var/lib/dpkg/status 1:5.4.5-0ubuntu0.17.10.5 500 500 http://security.ubuntu.com/ubuntu artful-security/main amd64 Packages 1:5.4.1-0ubuntu1 500 500 http://mr.archive.ubuntu.com/ubuntu artful/main amd64 Packages ProblemType: Bug DistroRelease: Ubuntu 17.10 Package: libreoffice-writer 1:5.4.6-0ubuntu0.17.10.1 ProcVersionSignature: Ubuntu 4.13.0-41.46-generic 4.13.16 Uname: Linux 4.13.0-41-generic x86_64 ApportVersion: 2.20.7-0ubuntu3.8 Architecture: amd64 CurrentDesktop: ubuntu:GNOME Date: Mon May 21 15:18:41 2018 InstallationDate: Installed on 2017-02-13 (462 days ago) InstallationMedia: Ubuntu 16.10 "Yakkety Yak" - Release amd64 (20161012.2) ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=fi_FI.UTF-8 SHELL=/bin/bash SourcePackage: libreoffice UpgradeStatus: Upgraded to artful on 2017-11-05 (196 days ago) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libreoffice/+bug/1772439/+subscriptions -- Mailing list: https://launchpad.net/~desktop-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~desktop-packages More help : https://help.launchpad.net/ListHelp

