Hi, On Tue, 4 Jan 2011 12:00:09 +0100 Daniel Garcia <[email protected]> wrote:
>On Wed, Sep 22, 2010 at 02:11:31PM +0200, carlosgc wrote: >> Excerpts from suzuki toshiya's message of miƩ sep 15 12:16:22 +0200 2010: >> > Hi, >> >> Hi, >> >> > Attached patches are the introduction of new API to access raw text. >> > I wish some maintainer of poppler-glib can review it. >> >> Yes, sorry for the delay. >> >> > poppler-0.15.0_glib-lib.diff >> > patch to declare new function and its implementation >> > >> >> I prefer poppler_page_get_raw_text(), rather than >> poppler_page_get_selected_raw_text(), and always return the text of >> the whole page. I don't see why you might want the selected text in >> raw order. > >This patch never get applied... I'll write the >poppler_page_get_raw_text() function. I don't know if suzuki is still >interested. I'm sorry for my silence to your question, I have been too busy to write a good explanation to your question. The reason why I wanted to give the rectangle to restrict the area to extract raw text was related with the status that current poppler is difficult to extract the text from complex layouted materials, like vertical layouted CJK text, right-to-left like Arabic/Hebrew, etc. For me, the achievement of the feature as builtin feature of poppler seems to be very long way work. Therefore, I wanted to provide the APIs that can extract raw text from the specified rectangle. I expected the higher level application can move the small window in the page and collect the fragment of the raw text with their positions and restruct the text by themselves. Could I answer to your question? Regards, mpsuzuki
_______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
