And what is the primary reading order for any document? That's also important not just for semantic analysis but for things such as text-to-speech or screen readers (aka accessibility).
Leonard On 9/23/11 7:59 AM, "Jonathan Kew" <[email protected]> wrote: >On 23 Sep 2011, at 12:44, Peter A. Kerzum wrote: > >> Actually consistent To-Unicode mapping should be a good compromise, as >>higher >> level software can really segment text into regions of different >>languages >> based solely on their alphabets and then detect and correct text flow >>for each >> particular region >> >> This way the example >> >> english WERBEH >> >> should generaly work being decomposed into 2 regions with the latter >>reversed > >But what is the order of those "2 regions"? You cannot tell unless you >have some higher-level info... the purely visual presentation is >inherently ambiguous. > >JK > >_______________________________________________ >poppler mailing list >[email protected] >http://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
