Thanks, I emailed the ocropus list. New rev ready to be pulled from. I have tested the hocr output and it works fine. Now ocr_line folllows the standard according to the hocr ref from 2007 mentioned earlier. (E.g. the char bboxes are in ocr_cinfo, and the text line is in pure text as text content for the ocr_line tag).
Julien ________________________________________ Från: Yury V. Zaytsev [[email protected]] Skickat: den 2 oktober 2009 10:46 Till: julien Ämne: Re: [Cuneiform] Hocr output status and identified improvements. On Thu, 2009-10-01 at 22:44 +0000, julien wrote: > > My goal is to standardize the hocr output as much as possible. > >From what I have understood it originates from the authors of ocropus. > The standard refered to from ocropus is: > http://docs.google.com/View?docid=dfxcv4vc_67g844kf Did you check the OCRopus mailing lists by the way? It might happen that they have a newer version of the standard which is not yet on the website... -- Sincerely yours, Yury V. Zaytsev _______________________________________________ Mailing list: https://launchpad.net/~cuneiform Post to : [email protected] Unsubscribe : https://launchpad.net/~cuneiform More help : https://help.launchpad.net/ListHelp

