branch: externals/scanner
commit 4fd44f213fa2f515053f4129061ef6ce35769d59
Author: Raffael Stocker <[email protected]>
Commit: Raffael Stocker <[email protected]>
add documentation of unpaper commands and options
---
scanner.texi | 207 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 204 insertions(+), 3 deletions(-)
diff --git a/scanner.texi b/scanner.texi
index 4c62296e67..94a8480eae 100644
--- a/scanner.texi
+++ b/scanner.texi
@@ -71,7 +71,10 @@ The document was typeset with
@c Insert new nodes with `C-c C-c n'.
@node Overview
@chapter Overview
-@cindex Overview
+@cindex overview
+
+This chapter gives provides you with the most important information to
+get started using Scanner.
@menu
* Introduction::
@@ -81,7 +84,7 @@ The document was typeset with
@node Introduction
@section Introduction
-@cindex Introduction
+@cindex introduction
If you want to scan a document at high quality with @acronym{OCR,
optical character recognition} and not use one of the available free
@@ -191,6 +194,7 @@ images. These are described below.
@item M-x scanner-scan-document
@itemx C-u M-x scanner-scan-document
@itemx C-u N M-x scanner-scan-document
+@findex scanner-scan-document
Scan a document. When called without a prefix argument, this command
will scan only one page. When called with the default prefix argument
(as @kbd{C-u M-x scanner-scan-document}), it will ask after each scanned
@@ -233,6 +237,7 @@ for a multi-page scan.
@item M-x scanner-scan-image
@itemx C-u M-x scanner-scan-image
@itemx C-u n M-x scanner-scan-image
+@findex scanner-scan-image
Scan an image. When called without a prefix argument, this command
will scan only one image. When called with the default prefix argument
(as @kbd{C-u M-x scanner-scan-image}), it will ask after each scanned
@@ -287,7 +292,7 @@ Scanner menu (@clicksequence{Tools @click{} Scanner}).
@node Configuration Commands
@section Configuration Commands
-@cindex Configuration Commands
+@cindex configuration commands
The following commands help you configure some of the more-often used
options. They only change the options for the running session; if you
@@ -297,6 +302,7 @@ Emacs sessions, use the customization interface.
@table @kbd
@item M-x scanner-set-image-resolution
@item M-x scanner-set-document-resolution
+@findex scanner-set-document-resolution
These commands interactively asks for a resolution (in @acronym{DPI,
dots per inch}) to be used in subsequent image and document scans,
respectively. The corresponding user options is
@@ -310,6 +316,7 @@ and@*
document resolution}.
@item M-x scanner-select-papersize
+@findex scanner-select-papersize
Select a paper size from @code{scanner-paper-sizes} or
@code{:whatever}. See also @code{scanner-doc-papersize}.
@@ -317,6 +324,7 @@ This command is available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Select paper size}.
@item M-x scanner-select-image-size
+@findex scanner-select-image-size
Select an image size. This command interactively reads x and y
dimensions in millimeter from the minibuffer and sets
@code{scanner-image-size} accordingly.
@@ -325,6 +333,7 @@ This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Select image size}.
@item M-x scanner-select-outputs
+@findex scanner-select-outputs
Select the document outputs. This command reads a list of document
output formats. See also @code{scanner-tesseract-outputs}.
@@ -332,6 +341,7 @@ This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Select document outputs}.
@item M-x scanner-select-languages
+@findex scanner-select-languages
Select the languages assumed for OCR. This command reads a list of
languages used for OCR. The necessary @command{tesseract} data files
must be available. See @code{scanner-tesseract-languages}.
@@ -341,6 +351,7 @@ This command is also available in the Scanner menu as@*
@item M-x scanner-select-device
@itemx C-u M-x scanner-select-device
+@findex scanner-select-device
Select a device, possibly triggering auto-detection. Normally, manual
device selection is not necessary as @command{scanimage} will
auto-detect. However, if you have multiple devices and want to change
@@ -353,6 +364,133 @@ This command is also available in the Scanner menu as@*
@clicksequence{Tools @click{} Scanner @click{} Select scanning device}
@end table
+The following commands can be found in the ``Scan Enhancement'' submenu
+of the Scanner menu (@clicksequence{Tools @click{} Scanner @click{} Scan
+Enhancement}). They require @command{unpaper} to be installed. Scan
+enhancement allows such post-processing operations as rotation,
+de-noising, and deskewing, among others. It is highly recommended as a
+preparatory step before OCR. The descriptions of the commands below
+give a few hints on the usage of @command{unpaper}. For more details,
+see its man-page or web-site.
+
+@table @kbd
+@item M-x scanner-toggle-use-unpaper
+@findex scanner-toggle-use-unpaper
+Toggle the use of @command{unpaper} for scan enhancement. This command
+changes the option @code{scanner-use-unpaper} during the session. Only
+when this option is non-@code{nil} will @command{unpaper} be used and
+the other items in the ``Scan Enhancement'' menu be available.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Use unpaper for scan enhancement}
+
+The following commands configure some important processing steps; see
+@ref{Configuring unpaper} for all the options.
+
+@item M-x scanner-select-page-layout
+@findex scanner-select-page-layout
+This command interactively asks for the page layout of the pages to be
+scanned. Available options are ``single'', ``double'', and ``none''
+(the default). If you scan a sheet with two pages, for example as with
+a book, you can choose ``double'' here so @command{unpaper} will divide
+the sheet into two output pages. If you use ``single'', it will try to
+identify the actual (single-)page contents on the sheet and stretch
+these to fit the output page size. If you don't want any rearrangement,
+choose ``none''. Note that ``double'' page layout implies a landscape
+orientation. This command sets the option
+@code{scanner-unpaper-page-layout} accordingly. If you want to split up
+an input page into two output pages, you must also use the
+@command{scanner-select-output-pages} command.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Select page layout}
+
+@item M-x scanner-select-input-pages
+@findex scanner-select-input-pages
+This command allows you to select the number of input pages. Available
+options are @code{1} and @code{2}. It sets the option
+@code{scanner-unpaper-input-pages}. If you wanted to combine two
+scanned input pages into one page, for example, to have left and right
+sides on one sheet, you would select two input pages and one output
+page, together with a ``single'' (or ``none'') page layout.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Select number of input pages}
+
+@item M-x scanner-select-output-pages
+@findex scanner-select-output-pages
+This command allows you to select the number of output pages. Available
+options are @code{1} and @code{2}. It sets the option
+@code{scanner-unpaper-output-pages}. If you wanted to split one scanned
+input page into two output pages, for example, to have left and right
+sides from a book on separate pages, you would select one input page and
+two output pages, together with a ``double'' page layout.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Select number of output pages}
+
+@item M-x scanner-select-pre-rotation
+@findex scanner-select-pre-rotation
+This command asks for the rotation to be applied before any further
+processing. Available values are ``clockwise'', ``counter-clockwise'',
+and ``none''. It sets the @code{scanner-unpaper-pre-rotation} option.
+You should use this option if you have a landscape-oriented document
+scanned as portrait. Rotating before further processing is especially
+relevant for scanning double-page documents, as it ensures that the
+document is in the correct orientation before @command{unpaper} tries to
+split pages.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Select page rotation before processing}
+
+@item M-x scanner-select-post-rotation
+@findex scanner-select-post-rotation
+This command asks for the rotation to be applied after all the
+processing. Available values are ``clockwise'', ``counter-clockwise'',
+and ``none''. It sets the @code{scanner-unpaper-post-rotation} option.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Select page rotation after processing}
+
+@item M-x scanner-select-pre-size
+@findex scanner-select-pre-size
+This command interactively asks for the page size to set before further
+processing. The scanned sheets will be scaled to this size. Available
+options are ``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'',
+``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'',
+``legal-landscape'', ``none'', and direct width and height
+specifications as in ``21cm,29.7cm''. See the documentation for
+@command{unpaper} for the understood units. If you choose ``none'', no
+size will be specified in the invocation of @command{unpaper} and it
+will select the size based on the input data.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Select page size before processing}
+
+@item M-x scanner-select-post-size
+@findex scanner-select-post-size
+This command interactively asks for the page size to set after all the
+processing. The processed sheets will be scaled to this size. Available
+options are ``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'',
+``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'',
+``legal-landscape'', ``none'', and direct width and height
+specifications as in ``21cm,29.7cm''. See the documentation for
+@command{unpaper} for the understood units. If you choose ``none'', no
+size will be specified in the invocation of @command{unpaper} and it
+will select the size based on the processed data.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Select page size after processing}
+@end table
+
@node General Options
@section General Options
@@ -531,42 +669,105 @@ are device-dependent.
@cindex configuring unpaper
@defopt scanner-unpaper-program
+This variable contains the path of the @command{unpaper} program.
@end defopt
@defopt scanner-use-unpaper
+If this option is non-@code{nil}, scan enhancement using
+@command{unpaper} is activated. Although using @command{unpaper} is
+highly recommended, its configuration is a bit elaborate and might be
+confusing at first. The default is therefore @code{nil}.
@end defopt
@defopt scanner-unpaper-page-layout
+This option specifies the page layout of the scanned sheets. Allowed
+values are ``single'', ``double'', and ``none'', setting
+@command{unpaper} up for detection of the page extent. Note that
+``double'' implies a landscape orientation. This option corresponds to
+the @option{--layout} option of @command{unpaper}. See its
+documentation for details on the implications of the values. The
+default is ``none''.
@end defopt
@defopt scanner-unpaper-input-pages
+This option selects the number of pages per scanned sheet of input.
+Allowed values are @code{1} and @code{2}. This variable corresponds to
+the @option{--input-pages} option of @command{unpaper}. If set to two
+input pages, @command{unpaper} will pairwise combine input sheets. The
+default is @code{1}.
@end defopt
@defopt scanner-unpaper-output-pages
+This option selects the number of pages per sheet of processed output.
+Allowed values are @code{1} and @code{2}. This variable corresponds to
+the @option{--output-pages} option of @command{unpaper}. If set to two
+output pages, @command{unpaper} will split up every page of processed
+output into two pages. The default is @code{1}.
@end defopt
@defopt scanner-unpaper-pre-rotation
+This option specifies the rotation to be applied before further
+processing. Allowed values are ``clockwise'', ``counter-clockwise'',
+and ``none''. This variable corresponds to the @option{--pre-rotation}
+option of @command{unpaper}. If you choose ``none'', no rotation is
+specified in the invocation of @command{unpaper}. The default is
+``none.
@end defopt
@defopt scanner-unpaper-post-rotation
+This option specifies the rotation to be applied after all the
+processing. Allowed values are ``clockwise'', ``counter-clockwise'',
+and ``none''. This variable corresponds to the @option{--post-rotation}
+option of @command{unpaper}. If you choose ``none'', no rotation is
+specified in the invocation of @command{unpaper}. The default is
+``none.
@end defopt
@defopt scanner-unpaper-pre-size
+This option specifies the page size to assume before further processing.
+The scanned input will be scaled to this size. Allowed values are
+``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'',
+``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'',
+``legal-landscape'', ``none'', and direct width and height
+specifications as in ``21cm,29.7cm''. This variable corresponds to the
+@option{--size} option of @command{unpaper}. The default is ``a4''.
@end defopt
@defopt scanner-unpaper-post-size
+This option specifies the page size to assume after all the processing.
+The processed output will be scaled to this size. Allowed values are
+``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'',
+``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'',
+``legal-landscape'', ``none'', and direct width and height
+specifications as in ``21cm,29.7cm''. This variable corresponds to the
+@option{--post-size} option of @command{unpaper}. The default is ``a4''.
@end defopt
@defopt scanner-unpaper-border
+This option allows you to force a border of white pixels at the four
+edges of a scanned sheet. Allowed is any list of four integers, for
+example, @code{(10 10 10 10)} (the default). This is very useful to
+remove black or gray scan artefacts at the edges of a sheet. Even if
+this is not specified, @command{unpaper} will try to detect any such
+artefacts and remove them. However, forcing a border usually leads to
+better results. This variable corresponds to the @option{--border}
+option of @command{unpaper}.
@end defopt
@defopt scanner-unpaper-switches
+Any additional parameters to @command{unpaper} can be specified using
+this option. Allowed is any list comprising valid @command{unpaper}
+options as strings.
@end defopt
@node Configuring tesseract
@section Configuring tesseract
@cindex configuring tesseract
+@defopt scanner-tesseract-program
+This option specifies the path of the @command{tesseract} program.
+@end defopt
+
@defopt scanner-tessdata-dir
This option specifies the @file{tessdata} directory. This directory is
supposed to contain the language data files for @command{tesseract}.