Re: [poppler] poppler util pdftohtml

Jonathan Kew Fri, 23 Sep 2011 04:59:30 -0700

On 23 Sep 2011, at 12:44, Peter A. Kerzum wrote:

> Actually consistent To-Unicode mapping should be a good compromise, as higher 
> level software can really segment text into regions of different languages 
> based solely on their alphabets and then detect and correct text flow for 
> each 
> particular region
> 
> This way the example
> 
>   english WERBEH
> 
> should generaly work being decomposed into 2 regions with the latter reversed


But what is the order of those "2 regions"? You cannot tell unless you have 
some higher-level info... the purely visual presentation is inherently 
ambiguous.

JK

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] poppler util pdftohtml

Reply via email to