On 18/4/25 13:10, to...@tuxteam.de wrote:
I'm not sure if it is mentioned but just take a picture of each page and ask
a good Large Language Model to give you a table.
After this, I'd double-check each individual number. You'll never know
if they are being made up, otherwise.


I've been doing this for a couple of years now scanning bank statements etc.

I had previously tried the online pdf to text servers with varying results. I also tried using PDF to text programs that often got poor results for tabular data as what you see is not necessarily how it is stored in the pdf.  I had to write extended python code to clean up the results and even then manual edits were sometimes necessary

At the start what you say about LLM was true. Now with the good paid models not at all.

I use Claude 3.7 Sonnet almost exclusively as it is really good at this, though I might do an occasional QA check.

Another strategy is to use two models and compare the outputs or use the same model in two sessions.

If you want hallucinations and failure to follow instructions just use Gemini

Reply via email to