On Mon, Dec 21, 2009 at 02:00:30PM -0500, [email protected] wrote: > > (a) is this the version of cuneifrom I should be using? Is there a way to > use git/svn/... to automatically pull the cuneiform sourcecode as it's > updated without downloading the whole tarball each time?
There's a bzr repo. It's on launchpad.net called cuneiform-linux. Bazaar has builtin support for launchpad (which is the github of bzr). > Currently, when scanning a book with the two facing pages, cunefirm puts the > two page headers at the top followed by the contents of both pages one after > the other. E.G. You could cut the image yourself (with netpbm or ImageMagick). I believe cuneiform can do what you want if you enable 'tables' (I don't know if the '--tables' argument is in the mainline). You could also output HOCR format which tags every letter with its coords from the original image and split it up that way, but that sounds way harder than splitting the input image. > finally, can I append the recognitions of multiple scans to the same file? I > tried "cuneiform -f rtf -o test.rtf *.tiff" on a hanful of consecutively > numbered image files, but the results continually over-write the previous > data and I am left with the results from the last file recognized. There might be an rtf tool that can help you. If you chose a format like HOCR or text you could just concatenate the output files yourself. -- Ben Jackson AD7GD <[email protected]> http://www.ben.com/ _______________________________________________ Mailing list: https://launchpad.net/~cuneiform Post to : [email protected] Unsubscribe : https://launchpad.net/~cuneiform More help : https://help.launchpad.net/ListHelp

