Re: How can i judge a PDF is a Scanned PDF?

Brian L. Matthews Sat, 23 Nov 2024 09:31:13 -0800

On 11/20/24 3:39 AM, Lachezar Dobrev wrote:

To the original poster 'achilles': there is no reliable way todetect whether PDF file is a result of a document scan process, or hasbeen crafted. However Ulf Dittmer's suggestion to look for pages withjust a big image per page is (probably) the best option.

I kind of do the opposite, extract the text and if there's more than acertain amount of it, treat it as a true PDF, not scanned (actually I dowhat the text extractor does, but bail out when I hit the thresholdamount of text. You could also bail out if you see a certain number ofpages with no text.)


Brian

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: How can i judge a PDF is a Scanned PDF?

Reply via email to