branch: externals/scanner commit 6febfbf0c9e9221ea7ef8aa403a6c3defee5226d Author: Raffael Stocker <r.stoc...@mnet-mail.de> Commit: Raffael Stocker <r.stoc...@mnet-mail.de>
update readme and headers in scanner.el --- README | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Readme.org | 23 +++++++++++++--- scanner.el | 48 +++------------------------------- 3 files changed, 111 insertions(+), 48 deletions(-) diff --git a/README b/README new file mode 100644 index 0000000000..a867b4e67b --- /dev/null +++ b/README @@ -0,0 +1,88 @@ +Raffael Stocker + + +Table of Contents +───────────────── + +1. Scanner: scan documents and images with Emacs +2. Configuration +3. Bugs + + +1 Scanner: scan documents and images with Emacs +═══════════════════════════════════════════════ + + Scan documents and images using `scanimage(1)' from the SANE + distribution and `tesseract(1)' for OCR and PDF export. Additionally, + `unpaper(1)' can now be used for post-processing the scans obtained + from `scanimage' before feeding them into `tesseract'. This is + optional, but highly recommended. The source to unpaper is available + at <https://github.com/unpaper/unpaper>. + + The scanner package uses two sets of customizations for image mode and + document mode, with the former usually configured to use high + resolution and an image file format, like JPEG, and the latter to use + lower resolution and a document format, like PDF or text. The + available file formats are provided by `scanimage(1)' for image mode + and `tesseract(1)' for document mode. The scanner package uses + `tesseract(1)' to provide optical character recognition (OCR). You + can select the language plugins with `scanner-tesseract-languages'. + See also the remark about the data directories below. + + In document mode, you can scan one or multiple pages that are then + written in a customizable output format, e.g. (searchable) PDF or + text, or whatever tesseract provides. You can also customize + resolution, intermediate image format, and paper size. The command + `scanner-scan-document' starts a document scan. Without a prefix + argument, it scans one page. With a non-numeric argument, it asks the + user after each scanned page for confirmation to scan another page. + With a numeric argument, it scans that many pages. In the latter + case, it observes a delay between scans that is customizable using + `scanner-scan-delay'. + + The `scanner-scan-image' command performs one scan or multiple scans + in image mode. This function tries to guess the file format from the + chosen file name or falls back to the configured default, see + `scanner-image-format'. The prefix argument works as in document + mode. + + The scanning commands are also available in the Tools->Scanner menu. + + For both images and documents, you can customize the scan mode + (e.g. "Color" or "Gray") if your scanning device supports it. + + You can pass additional options to the backends using the + customization variables `scanner-scanimage-switches' and + `scanner-tesseract-switches'. The former variable is helpful for + tuning brightness and contrast, for instance. + + Finally, the customization options `scanner-tessdata-dir' and + `scanner-tessdata-configdir' must be set to point to tesseract's data + directory containing the language definitions (usually something like + `/usr/share/tessdata/') and tesseract's configs directory containing + the output configurations (usually something like + `/usr/share/tessdata/configs/'). + + +2 Configuration +═══════════════ + + To use `unpaper', set the customization option `scanner-use-unpaper' + to t. + + Scanner defines a keymap that is best bound to some convenient key, + for example with `(keymap-global-set "s-s" scanner-map)' or when using + `use-package' with `:bind-keymap ("s-s" . scanner-map)' in the + use-package specification. + + Most package options are customizations and can but configured in the + usual ways. + + +3 Bugs +══════ + + • This package doesn't support document feeders yet. + • This package doesn't support authentication. + • If a new document scan is started while another is still running, + the log will be messed up a bit. diff --git a/Readme.org b/Readme.org index 2b750e199d..b3761b530c 100644 --- a/Readme.org +++ b/Readme.org @@ -1,8 +1,13 @@ -* Scanner: scan documents and images with Emacs +#+EXPORT_FILE_NAME: README - Scan documents and images using =scanimage(1)= from the SANE distribution - and =tesseract(1)= for OCR and PDF export. +* Scanner: scan documents and images with Emacs + Scan documents and images using =scanimage(1)= from the SANE distribution and + =tesseract(1)= for OCR and PDF export. Additionally, =unpaper(1)= can now be + used for post-processing the scans obtained from =scanimage= before feeding + them into =tesseract=. This is optional, but highly recommended. The source + to unpaper is available at https://github.com/unpaper/unpaper. + The scanner package uses two sets of customizations for image mode and document mode, with the former usually configured to use high resolution and an image file format, like JPEG, and the latter to use lower resolution and @@ -45,7 +50,19 @@ output configurations (usually something like =/usr/share/tessdata/configs/=). +* Configuration + + To use =unpaper=, set the customization option =scanner-use-unpaper= to t. + + Scanner defines a keymap that is best bound to some convenient key, for + example with + =(keymap-global-set "s-s" scanner-map)= + or when using =use-package= with + =:bind-keymap ("s-s" . scanner-map)= + in the use-package specification. + Most package options are customizations and can but configured in the usual + ways. * Bugs diff --git a/scanner.el b/scanner.el index 8664112326..bfbd0f9b9f 100644 --- a/scanner.el +++ b/scanner.el @@ -1,6 +1,6 @@ ;;; scanner.el --- Scan documents and images -*- lexical-binding: t; -*- -;; Copyright (C) 2020, 2021 Free Software Foundation, Inc +;; Copyright (C) 2020, 2021, 2025 Free Software Foundation, Inc ;; Author: Raffael Stocker <r.stoc...@mnet-mail.de> ;; Maintainer: Raffael Stocker <r.stoc...@mnet-mail.de> @@ -8,7 +8,7 @@ ;; Version: 0.2 ;; Package-Requires: ((emacs "25.1") (dash "2.12.0")) ;; Keywords: hardware, multimedia -;; URL: https://gitlab.com/rstocker/scanner.git +;; URL: https://codeberg.org/rstocker/scanner.git ;; This program is free software: you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by @@ -26,49 +26,7 @@ ;;; Commentary: ;; Scan documents and images using scanimage(1) from the SANE distribution and -;; tesseract(1) for OCR and PDF export. -;; -;; The scanner package uses two sets of customizations for image mode and -;; document mode, with the former usually configured to use high resolution -;; and an image file format, like JPEG, and the latter to use lower resolution -;; and a document format, like PDF or text. The available file formats are -;; provided by scanimage(1) for image mode and tesseract(1) for document mode. -;; The scanner package uses tesseract(1) to provide optical character -;; recognition (OCR). You can select the language plugins with -;; ‘scanner-tesseract-languages’. See also the remark about the data -;; directories below. -;; -;; In document mode, you can scan one or multiple pages that are then written -;; in a customizable output format, e.g. (searchable) PDF or text, or whatever -;; tesseract provides. You can also customize resolution, intermediate image -;; format, and paper size. The command ‘scanner-scan-document’ starts a -;; document scan. Without a prefix argument, it scans one page. With a -;; non-numeric argument, it asks the user after each scanned page for -;; confirmation to scan another page. With a numeric argument, it scans that -;; many pages. In the latter case, it observes a delay between scans that is -;; customizable using ‘scanner-scan-delay’. -;; -;; The ‘scanner-scan-image’ command performs one scan or multiple scans in -;; image mode. This function tries to guess the file format from the chosen -;; file name or falls back to the configured default, see -;; ‘scanner-image-format’. The prefix argument works as in document mode. -;; -;; The scanning commands are also available in the Tools->Scanner menu. -;; -;; For both images and documents, you can customize the scan mode -;; (e.g. "Color" or "Gray") if your scanning device supports it. -;; -;; You can pass additional options to the backends using the customization -;; variables ‘scanner-scanimage-switches’ and ‘scanner-tesseract-switches’. -;; The former variable is helpful for tuning brightness and contrast, for -;; instance. -;; -;; Finally, the customization options ‘scanner-tessdata-dir’ and -;; ‘scanner-tessdata-configdir’ must be set to point to tesseract's data -;; directory containing the language definitions (usually something like -;; /usr/share/tessdata/) and tesseract's configs directory containing the -;; output configurations (usually something like -;; /usr/share/tessdata/configs/). +;; tesseract(1) for OCR and PDF export. Enhance the scan with unpaper(1). ;;; Code: