On Sat, 14 Aug 2010 21:18:56 +0100 Albert Astals Cid <[email protected]> wrote:
>A Dissabte, 31 de juliol de 2010, [email protected] va escriure: >> Sorry for a silence in a while. Checking the source, >> I found following points. >> 1) poppler-qt4 page object issue >> On the other hand, getText() is device specific method, >> only in TextOutputDev.cc, so changing getText() is >> easier. >> >> 2) TextOutputDev::getText() issue >> I think, raw-ordered text from MS Office's tricky vertical >> text can be applicable for text search, but physically- >> layouted text cannot be applicable for text search. >WoW, that's a huge mail :D Sorry, my post was too lengthy to find what is my proposal to poppler maintainers. >So my understanding is that "proper" CJK searching is a lot >of work and you advocate for just exposing the raw text to >the upper layers (users of poppler-qt4) so they can do the >work if they need it? Yes. I think exposing the raw text to the upper layers would be the reasonable starting point for various non-left-to-right scripts, because it is script-independent. # about the insertion of the space (U+0020) between the words, # still I've not decided what is good. Also I've written a preliminary patch to modify TextPage::findText() in TextOutputDev to support the device created in rawOrder mode (if required, I will post here). Now I'm waiting for Cobra's feedback to see if it works for his purpose. Regards, mpsuzuki _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
