branch: externals/scanner
commit b5dbefbc12530523069389814260601fd4881288
Merge: 3de9ddefc7 599cecc8a6
Author: Raffael Stocker <r.stoc...@mnet-mail.de>
Commit: Raffael Stocker <r.stoc...@mnet-mail.de>

    Merge branch 'documentation' into develop/unpaper
---
 .gitignore   |    5 +
 fdl.texi     |  505 ++++++++++++++++++++++++++++
 scanner.texi | 1053 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1563 insertions(+)

diff --git a/.gitignore b/.gitignore
index 0381e802c3..6a7296b1ef 100644
--- a/.gitignore
+++ b/.gitignore
@@ -12,3 +12,8 @@ TAGS
 *.png
 *.pdf
 *.txt
+*.aux
+*.cp
+*.cps
+*.toc
+*.info
diff --git a/fdl.texi b/fdl.texi
new file mode 100644
index 0000000000..eaf3da0e92
--- /dev/null
+++ b/fdl.texi
@@ -0,0 +1,505 @@
+@c The GNU Free Documentation License.
+@center Version 1.3, 3 November 2008
+
+@c This file is intended to be included within another document,
+@c hence no sectioning command or @node.
+
+@display
+Copyright @copyright{} 2000, 2001, 2002, 2007, 2008 Free Software Foundation, 
Inc.
+@uref{https://fsf.org/}
+
+Everyone is permitted to copy and distribute verbatim copies
+of this license document, but changing it is not allowed.
+@end display
+
+@enumerate 0
+@item
+PREAMBLE
+
+The purpose of this License is to make a manual, textbook, or other
+functional and useful document @dfn{free} in the sense of freedom: to
+assure everyone the effective freedom to copy and redistribute it,
+with or without modifying it, either commercially or noncommercially.
+Secondarily, this License preserves for the author and publisher a way
+to get credit for their work, while not being considered responsible
+for modifications made by others.
+
+This License is a kind of ``copyleft'', which means that derivative
+works of the document must themselves be free in the same sense.  It
+complements the GNU General Public License, which is a copyleft
+license designed for free software.
+
+We have designed this License in order to use it for manuals for free
+software, because free software needs free documentation: a free
+program should come with manuals providing the same freedoms that the
+software does.  But this License is not limited to software manuals;
+it can be used for any textual work, regardless of subject matter or
+whether it is published as a printed book.  We recommend this License
+principally for works whose purpose is instruction or reference.
+
+@item
+APPLICABILITY AND DEFINITIONS
+
+This License applies to any manual or other work, in any medium, that
+contains a notice placed by the copyright holder saying it can be
+distributed under the terms of this License.  Such a notice grants a
+world-wide, royalty-free license, unlimited in duration, to use that
+work under the conditions stated herein.  The ``Document'', below,
+refers to any such manual or work.  Any member of the public is a
+licensee, and is addressed as ``you''.  You accept the license if you
+copy, modify or distribute the work in a way requiring permission
+under copyright law.
+
+A ``Modified Version'' of the Document means any work containing the
+Document or a portion of it, either copied verbatim, or with
+modifications and/or translated into another language.
+
+A ``Secondary Section'' is a named appendix or a front-matter section
+of the Document that deals exclusively with the relationship of the
+publishers or authors of the Document to the Document's overall
+subject (or to related matters) and contains nothing that could fall
+directly within that overall subject.  (Thus, if the Document is in
+part a textbook of mathematics, a Secondary Section may not explain
+any mathematics.)  The relationship could be a matter of historical
+connection with the subject or with related matters, or of legal,
+commercial, philosophical, ethical or political position regarding
+them.
+
+The ``Invariant Sections'' are certain Secondary Sections whose titles
+are designated, as being those of Invariant Sections, in the notice
+that says that the Document is released under this License.  If a
+section does not fit the above definition of Secondary then it is not
+allowed to be designated as Invariant.  The Document may contain zero
+Invariant Sections.  If the Document does not identify any Invariant
+Sections then there are none.
+
+The ``Cover Texts'' are certain short passages of text that are listed,
+as Front-Cover Texts or Back-Cover Texts, in the notice that says that
+the Document is released under this License.  A Front-Cover Text may
+be at most 5 words, and a Back-Cover Text may be at most 25 words.
+
+A ``Transparent'' copy of the Document means a machine-readable copy,
+represented in a format whose specification is available to the
+general public, that is suitable for revising the document
+straightforwardly with generic text editors or (for images composed of
+pixels) generic paint programs or (for drawings) some widely available
+drawing editor, and that is suitable for input to text formatters or
+for automatic translation to a variety of formats suitable for input
+to text formatters.  A copy made in an otherwise Transparent file
+format whose markup, or absence of markup, has been arranged to thwart
+or discourage subsequent modification by readers is not Transparent.
+An image format is not Transparent if used for any substantial amount
+of text.  A copy that is not ``Transparent'' is called ``Opaque''.
+
+Examples of suitable formats for Transparent copies include plain
+ASCII without markup, Texinfo input format, La@TeX{} input
+format, SGML or XML using a publicly available
+DTD, and standard-conforming simple HTML,
+PostScript or PDF designed for human modification.  Examples
+of transparent image formats include PNG, XCF and
+JPG@.  Opaque formats include proprietary formats that can be
+read and edited only by proprietary word processors, SGML or
+XML for which the DTD and/or processing tools are
+not generally available, and the machine-generated HTML,
+PostScript or PDF produced by some word processors for
+output purposes only.
+
+The ``Title Page'' means, for a printed book, the title page itself,
+plus such following pages as are needed to hold, legibly, the material
+this License requires to appear in the title page.  For works in
+formats which do not have any title page as such, ``Title Page'' means
+the text near the most prominent appearance of the work's title,
+preceding the beginning of the body of the text.
+
+The ``publisher'' means any person or entity that distributes copies
+of the Document to the public.
+
+A section ``Entitled XYZ'' means a named subunit of the Document whose
+title either is precisely XYZ or contains XYZ in parentheses following
+text that translates XYZ in another language.  (Here XYZ stands for a
+specific section name mentioned below, such as ``Acknowledgements'',
+``Dedications'', ``Endorsements'', or ``History''.)  To ``Preserve the Title''
+of such a section when you modify the Document means that it remains a
+section ``Entitled XYZ'' according to this definition.
+
+The Document may include Warranty Disclaimers next to the notice which
+states that this License applies to the Document.  These Warranty
+Disclaimers are considered to be included by reference in this
+License, but only as regards disclaiming warranties: any other
+implication that these Warranty Disclaimers may have is void and has
+no effect on the meaning of this License.
+
+@item
+VERBATIM COPYING
+
+You may copy and distribute the Document in any medium, either
+commercially or noncommercially, provided that this License, the
+copyright notices, and the license notice saying this License applies
+to the Document are reproduced in all copies, and that you add no other
+conditions whatsoever to those of this License.  You may not use
+technical measures to obstruct or control the reading or further
+copying of the copies you make or distribute.  However, you may accept
+compensation in exchange for copies.  If you distribute a large enough
+number of copies you must also follow the conditions in section 3.
+
+You may also lend copies, under the same conditions stated above, and
+you may publicly display copies.
+
+@item
+COPYING IN QUANTITY
+
+If you publish printed copies (or copies in media that commonly have
+printed covers) of the Document, numbering more than 100, and the
+Document's license notice requires Cover Texts, you must enclose the
+copies in covers that carry, clearly and legibly, all these Cover
+Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
+the back cover.  Both covers must also clearly and legibly identify
+you as the publisher of these copies.  The front cover must present
+the full title with all words of the title equally prominent and
+visible.  You may add other material on the covers in addition.
+Copying with changes limited to the covers, as long as they preserve
+the title of the Document and satisfy these conditions, can be treated
+as verbatim copying in other respects.
+
+If the required texts for either cover are too voluminous to fit
+legibly, you should put the first ones listed (as many as fit
+reasonably) on the actual cover, and continue the rest onto adjacent
+pages.
+
+If you publish or distribute Opaque copies of the Document numbering
+more than 100, you must either include a machine-readable Transparent
+copy along with each Opaque copy, or state in or with each Opaque copy
+a computer-network location from which the general network-using
+public has access to download using public-standard network protocols
+a complete Transparent copy of the Document, free of added material.
+If you use the latter option, you must take reasonably prudent steps,
+when you begin distribution of Opaque copies in quantity, to ensure
+that this Transparent copy will remain thus accessible at the stated
+location until at least one year after the last time you distribute an
+Opaque copy (directly or through your agents or retailers) of that
+edition to the public.
+
+It is requested, but not required, that you contact the authors of the
+Document well before redistributing any large number of copies, to give
+them a chance to provide you with an updated version of the Document.
+
+@item
+MODIFICATIONS
+
+You may copy and distribute a Modified Version of the Document under
+the conditions of sections 2 and 3 above, provided that you release
+the Modified Version under precisely this License, with the Modified
+Version filling the role of the Document, thus licensing distribution
+and modification of the Modified Version to whoever possesses a copy
+of it.  In addition, you must do these things in the Modified Version:
+
+@enumerate A
+@item
+Use in the Title Page (and on the covers, if any) a title distinct
+from that of the Document, and from those of previous versions
+(which should, if there were any, be listed in the History section
+of the Document).  You may use the same title as a previous version
+if the original publisher of that version gives permission.
+
+@item
+List on the Title Page, as authors, one or more persons or entities
+responsible for authorship of the modifications in the Modified
+Version, together with at least five of the principal authors of the
+Document (all of its principal authors, if it has fewer than five),
+unless they release you from this requirement.
+
+@item
+State on the Title page the name of the publisher of the
+Modified Version, as the publisher.
+
+@item
+Preserve all the copyright notices of the Document.
+
+@item
+Add an appropriate copyright notice for your modifications
+adjacent to the other copyright notices.
+
+@item
+Include, immediately after the copyright notices, a license notice
+giving the public permission to use the Modified Version under the
+terms of this License, in the form shown in the Addendum below.
+
+@item
+Preserve in that license notice the full lists of Invariant Sections
+and required Cover Texts given in the Document's license notice.
+
+@item
+Include an unaltered copy of this License.
+
+@item
+Preserve the section Entitled ``History'', Preserve its Title, and add
+to it an item stating at least the title, year, new authors, and
+publisher of the Modified Version as given on the Title Page.  If
+there is no section Entitled ``History'' in the Document, create one
+stating the title, year, authors, and publisher of the Document as
+given on its Title Page, then add an item describing the Modified
+Version as stated in the previous sentence.
+
+@item
+Preserve the network location, if any, given in the Document for
+public access to a Transparent copy of the Document, and likewise
+the network locations given in the Document for previous versions
+it was based on.  These may be placed in the ``History'' section.
+You may omit a network location for a work that was published at
+least four years before the Document itself, or if the original
+publisher of the version it refers to gives permission.
+
+@item
+For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve
+the Title of the section, and preserve in the section all the
+substance and tone of each of the contributor acknowledgements and/or
+dedications given therein.
+
+@item
+Preserve all the Invariant Sections of the Document,
+unaltered in their text and in their titles.  Section numbers
+or the equivalent are not considered part of the section titles.
+
+@item
+Delete any section Entitled ``Endorsements''.  Such a section
+may not be included in the Modified Version.
+
+@item
+Do not retitle any existing section to be Entitled ``Endorsements'' or
+to conflict in title with any Invariant Section.
+
+@item
+Preserve any Warranty Disclaimers.
+@end enumerate
+
+If the Modified Version includes new front-matter sections or
+appendices that qualify as Secondary Sections and contain no material
+copied from the Document, you may at your option designate some or all
+of these sections as invariant.  To do this, add their titles to the
+list of Invariant Sections in the Modified Version's license notice.
+These titles must be distinct from any other section titles.
+
+You may add a section Entitled ``Endorsements'', provided it contains
+nothing but endorsements of your Modified Version by various
+parties---for example, statements of peer review or that the text has
+been approved by an organization as the authoritative definition of a
+standard.
+
+You may add a passage of up to five words as a Front-Cover Text, and a
+passage of up to 25 words as a Back-Cover Text, to the end of the list
+of Cover Texts in the Modified Version.  Only one passage of
+Front-Cover Text and one of Back-Cover Text may be added by (or
+through arrangements made by) any one entity.  If the Document already
+includes a cover text for the same cover, previously added by you or
+by arrangement made by the same entity you are acting on behalf of,
+you may not add another; but you may replace the old one, on explicit
+permission from the previous publisher that added the old one.
+
+The author(s) and publisher(s) of the Document do not by this License
+give permission to use their names for publicity for or to assert or
+imply endorsement of any Modified Version.
+
+@item
+COMBINING DOCUMENTS
+
+You may combine the Document with other documents released under this
+License, under the terms defined in section 4 above for modified
+versions, provided that you include in the combination all of the
+Invariant Sections of all of the original documents, unmodified, and
+list them all as Invariant Sections of your combined work in its
+license notice, and that you preserve all their Warranty Disclaimers.
+
+The combined work need only contain one copy of this License, and
+multiple identical Invariant Sections may be replaced with a single
+copy.  If there are multiple Invariant Sections with the same name but
+different contents, make the title of each such section unique by
+adding at the end of it, in parentheses, the name of the original
+author or publisher of that section if known, or else a unique number.
+Make the same adjustment to the section titles in the list of
+Invariant Sections in the license notice of the combined work.
+
+In the combination, you must combine any sections Entitled ``History''
+in the various original documents, forming one section Entitled
+``History''; likewise combine any sections Entitled ``Acknowledgements'',
+and any sections Entitled ``Dedications''.  You must delete all
+sections Entitled ``Endorsements.''
+
+@item
+COLLECTIONS OF DOCUMENTS
+
+You may make a collection consisting of the Document and other documents
+released under this License, and replace the individual copies of this
+License in the various documents with a single copy that is included in
+the collection, provided that you follow the rules of this License for
+verbatim copying of each of the documents in all other respects.
+
+You may extract a single document from such a collection, and distribute
+it individually under this License, provided you insert a copy of this
+License into the extracted document, and follow this License in all
+other respects regarding verbatim copying of that document.
+
+@item
+AGGREGATION WITH INDEPENDENT WORKS
+
+A compilation of the Document or its derivatives with other separate
+and independent documents or works, in or on a volume of a storage or
+distribution medium, is called an ``aggregate'' if the copyright
+resulting from the compilation is not used to limit the legal rights
+of the compilation's users beyond what the individual works permit.
+When the Document is included in an aggregate, this License does not
+apply to the other works in the aggregate which are not themselves
+derivative works of the Document.
+
+If the Cover Text requirement of section 3 is applicable to these
+copies of the Document, then if the Document is less than one half of
+the entire aggregate, the Document's Cover Texts may be placed on
+covers that bracket the Document within the aggregate, or the
+electronic equivalent of covers if the Document is in electronic form.
+Otherwise they must appear on printed covers that bracket the whole
+aggregate.
+
+@item
+TRANSLATION
+
+Translation is considered a kind of modification, so you may
+distribute translations of the Document under the terms of section 4.
+Replacing Invariant Sections with translations requires special
+permission from their copyright holders, but you may include
+translations of some or all Invariant Sections in addition to the
+original versions of these Invariant Sections.  You may include a
+translation of this License, and all the license notices in the
+Document, and any Warranty Disclaimers, provided that you also include
+the original English version of this License and the original versions
+of those notices and disclaimers.  In case of a disagreement between
+the translation and the original version of this License or a notice
+or disclaimer, the original version will prevail.
+
+If a section in the Document is Entitled ``Acknowledgements'',
+``Dedications'', or ``History'', the requirement (section 4) to Preserve
+its Title (section 1) will typically require changing the actual
+title.
+
+@item
+TERMINATION
+
+You may not copy, modify, sublicense, or distribute the Document
+except as expressly provided under this License.  Any attempt
+otherwise to copy, modify, sublicense, or distribute it is void, and
+will automatically terminate your rights under this License.
+
+However, if you cease all violation of this License, then your license
+from a particular copyright holder is reinstated (a) provisionally,
+unless and until the copyright holder explicitly and finally
+terminates your license, and (b) permanently, if the copyright holder
+fails to notify you of the violation by some reasonable means prior to
+60 days after the cessation.
+
+Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, receipt of a copy of some or all of the same material does
+not give you any rights to use it.
+
+@item
+FUTURE REVISIONS OF THIS LICENSE
+
+The Free Software Foundation may publish new, revised versions
+of the GNU Free Documentation License from time to time.  Such new
+versions will be similar in spirit to the present version, but may
+differ in detail to address new problems or concerns.  See
+@uref{https://www.gnu.org/licenses/}.
+
+Each version of the License is given a distinguishing version number.
+If the Document specifies that a particular numbered version of this
+License ``or any later version'' applies to it, you have the option of
+following the terms and conditions either of that specified version or
+of any later version that has been published (not as a draft) by the
+Free Software Foundation.  If the Document does not specify a version
+number of this License, you may choose any version ever published (not
+as a draft) by the Free Software Foundation.  If the Document
+specifies that a proxy can decide which future versions of this
+License can be used, that proxy's public statement of acceptance of a
+version permanently authorizes you to choose that version for the
+Document.
+
+@item
+RELICENSING
+
+``Massive Multiauthor Collaboration Site'' (or ``MMC Site'') means any
+World Wide Web server that publishes copyrightable works and also
+provides prominent facilities for anybody to edit those works.  A
+public wiki that anybody can edit is an example of such a server.  A
+``Massive Multiauthor Collaboration'' (or ``MMC'') contained in the
+site means any set of copyrightable works thus published on the MMC
+site.
+
+``CC-BY-SA'' means the Creative Commons Attribution-Share Alike 3.0
+license published by Creative Commons Corporation, a not-for-profit
+corporation with a principal place of business in San Francisco,
+California, as well as future copyleft versions of that license
+published by that same organization.
+
+``Incorporate'' means to publish or republish a Document, in whole or
+in part, as part of another Document.
+
+An MMC is ``eligible for relicensing'' if it is licensed under this
+License, and if all works that were first published under this License
+somewhere other than this MMC, and subsequently incorporated in whole
+or in part into the MMC, (1) had no cover texts or invariant sections,
+and (2) were thus incorporated prior to November 1, 2008.
+
+The operator of an MMC Site may republish an MMC contained in the site
+under CC-BY-SA on the same site at any time before August 1, 2009,
+provided the MMC is eligible for relicensing.
+
+@end enumerate
+
+@page
+@heading ADDENDUM: How to use this License for your documents
+
+To use this License in a document you have written, include a copy of
+the License in the document and put the following copyright and
+license notices just after the title page:
+
+@smallexample
+@group
+  Copyright (C)  @var{year}  @var{your name}.
+  Permission is granted to copy, distribute and/or modify this document
+  under the terms of the GNU Free Documentation License, Version 1.3
+  or any later version published by the Free Software Foundation;
+  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
+  Texts.  A copy of the license is included in the section entitled ``GNU
+  Free Documentation License''.
+@end group
+@end smallexample
+
+If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
+replace the ``with@dots{}Texts.''@: line with this:
+
+@smallexample
+@group
+    with the Invariant Sections being @var{list their titles}, with
+    the Front-Cover Texts being @var{list}, and with the Back-Cover Texts
+    being @var{list}.
+@end group
+@end smallexample
+
+If you have Invariant Sections without Cover Texts, or some other
+combination of the three, merge those two alternatives to suit the
+situation.
+
+If your document contains nontrivial examples of program code, we
+recommend releasing these examples in parallel under your choice of
+free software license, such as the GNU General Public License,
+to permit their use in free software.
+
+@c Local Variables:
+@c ispell-local-pdict: "ispell-dict"
+@c End:
diff --git a/scanner.texi b/scanner.texi
new file mode 100644
index 0000000000..957e7f809b
--- /dev/null
+++ b/scanner.texi
@@ -0,0 +1,1053 @@
+\input texinfo   @c -*-texinfo-*-
+@c %**start of header
+@setfilename scanner.info
+@c try the time-stamp package for version stuff
+@set VERSION 0.3
+@settitle Scanner Manual @value{VERSION}
+@syncodeindex vr cp
+@syncodeindex fn cp
+@documentencoding UTF-8
+@c %**end of header
+@copying
+This is the @emph{Scanner Manual}, corresponding to version @value{VERSION}.
+
+Copyright @copyright{} 2021 Free Software Foundation, Inc.
+
+@quotation
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3
+or any later version published by the Free Software Foundation;
+with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
+A copy of the license is included in the section entitled ``GNU
+Free Documentation License''.
+
+A copy of the license is also available from the Free Software
+Foundation Web site at @url{https://www.gnu.org/licenses/fdl.html}.
+
+@end quotation
+
+The document was typeset with
+@uref{http://www.texinfo.org/, GNU Texinfo}.
+
+@end copying
+
+@dircategory Emacs
+@direntry
+* Scanner: (scanner)Document and image scanning in GNU Emacs
+@end direntry
+
+
+@titlepage
+@title Scanner
+@subtitle Document and image scanning in GNU Emacs, version @value{VERSION}
+@author Raffael Stocker
+@page
+@vskip 0pt plus 1filll
+@insertcopying
+@end titlepage
+
+@c Output the table of the contents at the beginning.
+@contents
+
+@ifnottex
+@node Top
+@top Scanner
+
+
+@insertcopying
+@end ifnottex
+
+@c Generate the nodes for this menu with `C-c C-u C-a'.
+@menu
+* Overview::
+* User Options::
+* Improving Scan Quality::
+* News::
+* Reporting Bugs::
+* GNU Free Documentation License::
+* Index::
+@end menu
+
+@c Update all node entries with `C-c C-u C-n'.
+@c Insert new nodes with `C-c C-c n'.
+@node Overview
+@chapter Overview
+@cindex overview
+
+This chapter gives provides you with the most important information to
+get started using Scanner.
+
+@menu
+* Introduction::
+* Basic Setup::
+* Scanning Documents and Images::
+@end menu
+
+@node Introduction
+@section Introduction
+@cindex introduction
+
+If you want to scan a document at high quality with @acronym{OCR,
+optical character recognition} and not use one of the available free
+GUI programs, there are several things you might have to do:
+@itemize
+
+@item
+use a program like @command{scanimage} to obtain an image file from
+your scanner,
+
+@item
+enhance the image quality using a post-processing tool like
+@command{unpaper}, and
+
+@item
+generate a PDF or text file with OCR software like @command{tesseract}.
+@end itemize
+
+Although this is not difficult to do in principle, each of these
+programs requires an elaborate incantation to produce adequate output:
+@itemize
+
+@item
+the scan resolution must be set to something appropriate for later OCR
+(usually 300 to 600 dpi,)
+
+@item
+the page size must be defined,
+
+@item
+perhaps some offsets must be added to page borders,
+
+@item
+the document may have to be rotated,
+
+@item
+some scan artifacts, like shadows, may have to be removed,
+
+@item
+the page may need deskewing,
+
+@item
+the language for OCR must be selected,
+
+@item
+@dots{}
+@end itemize
+
+Luckily, many of these items change rarely or not at all.  Scanner
+uses the customization system of GNU Emacs
+(@pxref{Customization,,,emacs,The GNU Emacs Manual}) to remember the
+necessary settings and takes care of processing using the
+abovementioned programs.
+
+@node Basic Setup
+@section Basic Setup
+@cindex basic setup
+@cindex setup, basic
+@cindex configuration, basic
+@cindex installation
+
+To get started with Scanner, make sure the following programs are
+installed:
+@table @command
+@item scanimage
+Scanimage comes with the sane-backends distribution, see
+@url{http://sane-project.org/}. 
+
+@item tesseract
+Tesseract is used for OCR and PDF generation in document scans.  The
+source is available at @url{https://github.com/tesseract-ocr/tesseract}.
+
+@item unpaper
+Unpaper is used for post-processing the scans obtained from
+@command{scanimage} before feeding them into @command{tesseract}.  This
+is optional, but highly recommended.  The source is available at
+@url{https://github.com/unpaper/unpaper}. 
+@end table
+
+Tesseract is usually provided without the language data files as they
+are very large.  The full set of language files is over 4@dmn{GB}.  Some
+GNU/Linux distributions offer individual language packages; if yours
+does not, you can download the language data files from
+@url{https://github.com/tesseract-ocr/tessdata}.
+
+Make sure the options @code{scanner-scanimage-program},
+@code{scanner-tesseract-program}, and @code{scanner-unpaper-program} are
+set correctly.  Also, the options @code{scanner-tessdata-dir} and
+@code{scanner-tesseract-configdir} must be set correctly so
+@command{tesseract} can find the language data files and output
+configurations. 
+
+Customize the basic options like @code{scanner-doc-papersize},
+@code{scanner-resolution}, @code{scanner-tesseract-languages}, and
+@code{scanner-tesseract-outputs}.  See @ref{User Options} for a detailed
+discussion of all the available options.
+
+
+@node Scanning Documents and Images
+@section Scanning Documents and Images
+@cindex scanning documents and images
+
+The Scanner package provides two commands for scanning documents and
+images.  These are described below.
+
+@table @kbd
+@item M-x scanner-scan-document
+@itemx C-u M-x scanner-scan-document
+@itemx C-u N M-x scanner-scan-document
+@findex scanner-scan-document
+Scan a document.  When called without a prefix argument, this command
+will scan only one page.  When called with the default prefix argument
+(as @kbd{C-u M-x scanner-scan-document}), it will ask after each scanned
+page whether another pages should be scanned.  With a numeric prefix
+argument, it will scan that many pages, waiting a number of seconds
+between each page, as configured in @code{scanner-scan-delay}.
+
+The scan will use the resolution configured in
+@code{scanner-resolution} with the @code{:doc} key.
+
+This command interactively reads a file name that will
+be used as the base name of the output file(s).  The extension of the
+file name is ignored as it is instead specified by the
+@command{tesseract} output formats as configured with the option
+@code{scanner-tesseract-outputs} or the command
+@code{scanner-select-outputs}.
+If the specified file already exists, @code{scanner-scan-document} will
+ask for confirmation to overwrite it.
+
+This command will trigger auto-detection if no device has been
+configured.  If more than one device are available, it will offer ask
+you to select one.
+
+If you configured Scanner to use @command{unpaper}, this command will
+post-process the scans obtained from @command{scanimage} using
+@command{unpaper} before feeding the results to @command{tesseract}.
+See @ref{User Options} to find out how to configure scan and
+post-processing.
+
+The scanning and conversion processes are run asynchronously.  If you
+want to monitor progress, bring up the @code{*Scanner*} buffer which
+collects the outputs of the backend programs.
+
+This command is also available from the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan a document}@*
+for a single-page scan and@*
+@clicksequence{Tools @click{} Scanner @click{} Scan a multi-page document}@*
+for a multi-page scan.
+
+@item M-x scanner-scan-image
+@itemx C-u M-x scanner-scan-image
+@itemx C-u n M-x scanner-scan-image
+@findex scanner-scan-image
+Scan an image.  When called without a prefix argument, this command
+will scan only one image.  When called with the default prefix argument
+(as @kbd{C-u M-x scanner-scan-image}), it will ask after each scanned
+image whether another image should be scanned.  With a numeric prefix
+argument, it will scan that many images, waiting a number of seconds
+between each image, as configured in @code{scanner-scan-delay}.
+
+The scan will use the resolution configured in
+@code{scanner-resolution} with the @code{:image} key.
+
+This command interactively reads a file name.  The extension of the file
+name specifies the output file format.  If no extension is provided, the
+default image format, as configured in @code{scanner-image-format} will
+be used.  In a multi-image scan, this command will extend the given file
+name base by @var{-number}, where @var{number} is the number of the
+scanned image.  For example, if the file name is @file{image.jpeg}, a
+multi-image scan of @var{n} images will produce the files
+@file{image-1.jpeg}, @file{image-2.jpeg} @dots{} @file{image-n.jpeg}.
+If one of these files already exists, @code{scanner-scan-image} will ask
+for confirmation to overwrite it.
+
+No post-processing with @command{unpaper} or @command{tesseract} is
+done.  See @ref{User Options} to find out how to configure scanning.
+
+This command is also available from the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan an image}@*
+for a single-page scan and@*
+@clicksequence{Tools @click{} Scanner @click{} Scan multiple images}@*
+for a multi-image scan.
+
+@item M-x scanner-scan-preview
+Make a preview scan.  This command makes a preview scan assuming
+document scan settings.  The resolution of the scan is changed to
+@code{scanner-preview-resolution} and the scan mode is set to ``Gray'',
+otherwise all options stay in effect.  If @code{scanner-use-unpaper} is
+non-@code{nil}, post-processing with unpaper is done as well.  The
+resulting scan is shown in an image window, unless Emacs can't display
+images, in which case a Dired buffer is created showing the files
+generated by the scan.
+
+This command is also available from the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Make a preview scan}@*
+@end table
+
+
+@node User Options
+@chapter User Options
+@cindex user options
+
+This chapter lists all the available user options.  All of these
+options can be edited using the customization system of GNU Emacs,
+which is advisable as then basic sanity checks are carried out.  For a
+number of options, interactive commands are available that simplify
+the customization at run time, but don't save the changed values
+between Emacs sessions.  These functions are also available from the
+Scanner menu (@clicksequence{Tools @click{} Scanner}).
+
+@menu
+* Configuration Commands::
+* General Options::
+* Configuring scanimage::
+* Configuring unpaper::
+* Configuring tesseract::
+@end menu
+
+@node Configuration Commands
+@section Configuration Commands
+@cindex configuration commands
+
+The following commands help you configure some of the more-often used
+options.  They only change the options for the running session; if you
+want to permanently set an option, so it will be remembered between
+Emacs sessions, use the customization interface.
+
+@table @kbd
+@item M-x scanner-set-image-resolution
+@item M-x scanner-set-document-resolution
+@findex scanner-set-document-resolution
+@findex scanner-set-image-resolution
+These commands interactively asks for a resolution (in @acronym{DPI,
+dots per inch}) to be used in subsequent image and document scans,
+respectively.  The corresponding user options is
+@code{scanner-resolution}.
+
+These commands are available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Select image
+resolution}@*
+and@*
+@clicksequence{Tools @click{} Scanner @click{} Select
+document resolution}.
+
+@item M-x scanner-select-papersize
+@findex scanner-select-papersize
+Select a paper size from @code{scanner-paper-sizes} or
+@code{:whatever}.  See also @code{scanner-doc-papersize}.
+
+This command is available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Select paper size}.
+
+@item M-x scanner-select-image-size
+@findex scanner-select-image-size
+Select an image size.  This command interactively reads x and y
+dimensions in millimeter from the minibuffer and sets
+@code{scanner-image-size} accordingly.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Select image size}.
+
+@item M-x scanner-select-outputs
+@findex scanner-select-outputs
+Select the document outputs.  This command reads a list of document
+output formats.  See also @code{scanner-tesseract-outputs}.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Select document outputs}.
+
+@item M-x scanner-select-languages
+@findex scanner-select-languages
+Select the languages assumed for OCR.  This command reads a list of
+languages used for OCR.  The necessary @command{tesseract} data files
+must be available.  See @code{scanner-tesseract-languages}.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Select OCR languages}.
+
+@item M-x scanner-select-device
+@itemx C-u M-x scanner-select-device
+@findex scanner-select-device
+Select a device, possibly triggering auto-detection.  Normally, manual
+device selection is not necessary as @command{scanimage} will
+auto-detect.  However, if you have multiple devices and want to change
+between them, you can use this command to do so.
+
+When called with a prefix argument, auto-detection is forced even when
+devices have already been detected before.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Select scanning device}
+
+@item M-x scanner-set-scan-delay
+Set the delay in seconds between scans in multi-page mode.  This
+commands sets the variable @code{scanner-scan-delay}.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Set delay between scans}
+
+@item M-x scanner-set-brightness
+Set the brightness of the scans.  The available range is
+device-specific.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Set brightness}
+
+@item M-x scanner-set-contrast
+Set the contrast of the scans.  The available range is
+device-specific.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Set contrast}
+
+
+@end table
+
+The following commands can be found in the ``Scan Enhancement'' submenu
+of the Scanner menu (@clicksequence{Tools @click{} Scanner @click{} Scan
+Enhancement}).  They require @command{unpaper} to be installed.  Scan
+enhancement allows such post-processing operations as rotation,
+de-noising, and deskewing, among others.  It is highly recommended as a
+preparatory step before OCR.  The descriptions of the commands below
+give a few hints on the usage of @command{unpaper}.  For more details,
+see its man-page or web-site.
+
+@table @kbd
+@item M-x scanner-toggle-use-unpaper
+@findex scanner-toggle-use-unpaper
+Toggle the use of @command{unpaper} for scan enhancement.  This command
+changes the option @code{scanner-use-unpaper} during the session.  Only
+when this option is non-@code{nil} will @command{unpaper} be used and
+the other items in the ``Scan Enhancement'' menu be available.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Use unpaper for scan enhancement}
+
+The following commands configure some important processing steps; see
+@ref{Configuring unpaper} for all the options.
+
+@item M-x scanner-select-page-layout
+@findex scanner-select-page-layout
+This command interactively asks for the page layout of the pages to be
+scanned.  Available options are ``single'', ``double'', and ``none''
+(the default).  If you scan a sheet with two pages, for example as with
+a book, you can choose ``double'' here so @command{unpaper} will divide
+the sheet into two output pages.  If you use ``single'', it will try to
+identify the actual (single-)page contents on the sheet and stretch
+these to fit the output page size.  If you don't want any rearrangement,
+choose ``none''.  Note that ``double'' page layout implies a landscape
+orientation.  This command sets the option
+@code{scanner-unpaper-page-layout} accordingly.  If you want to split up
+an input page into two output pages, you must also use the
+@command{scanner-select-output-pages} command.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Select page layout}
+
+@item M-x scanner-select-input-pages
+@findex scanner-select-input-pages
+This command allows you to select the number of input pages.  Available
+options are @code{1} and @code{2}.  It sets the option
+@code{scanner-unpaper-input-pages}.  If you wanted to combine two
+scanned input pages into one page, for example, to have left and right
+sides on one sheet, you would select two input pages and one output
+page, together with a ``single'' (or ``none'') page layout.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Select number of input pages}
+
+@item M-x scanner-select-output-pages
+@findex scanner-select-output-pages
+This command allows you to select the number of output pages.  Available
+options are @code{1} and @code{2}.  It sets the option
+@code{scanner-unpaper-output-pages}.  If you wanted to split one scanned
+input page into two output pages, for example, to have left and right
+sides from a book on separate pages, you would select one input page and
+two output pages, together with a ``double'' page layout.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Select number of output pages}
+
+@item M-x scanner-select-pre-rotation
+@findex scanner-select-pre-rotation
+This command asks for the rotation to be applied before any further
+processing.  Available values are ``clockwise'', ``counter-clockwise'',
+and ``none''.  It sets the @code{scanner-unpaper-pre-rotation} option.
+You should use this option if you have a landscape-oriented document
+scanned as portrait.  Rotating before further processing is especially
+relevant for scanning double-page documents, as it ensures that the
+document is in the correct orientation before @command{unpaper} tries to
+split pages.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Select page rotation before processing}
+
+@item M-x scanner-select-post-rotation
+@findex scanner-select-post-rotation
+This command asks for the rotation to be applied after all the 
+processing.  Available values are ``clockwise'', ``counter-clockwise'',
+and ``none''.  It sets the @code{scanner-unpaper-post-rotation} option.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Select page rotation after processing}
+
+@item M-x scanner-select-pre-size
+@findex scanner-select-pre-size
+This command interactively asks for the page size to set before further
+processing.  The scanned sheets will be scaled to this size.  Available
+options are ``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'',
+``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'',
+``legal-landscape'', ``none'', and direct width and height
+specifications as in ``21cm,29.7cm''.  See the documentation for
+@command{unpaper} for the understood units.  If you choose ``none'', no
+size will be specified in the invocation of @command{unpaper} and it
+will select the size based on the input data.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Select page size before processing}
+
+@item M-x scanner-select-post-size
+@findex scanner-select-post-size
+This command interactively asks for the page size to set after all the
+processing.  The processed sheets will be scaled to this size.  Available
+options are ``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'',
+``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'',
+``legal-landscape'', ``none'', and direct width and height
+specifications as in ``21cm,29.7cm''.  See the documentation for
+@command{unpaper} for the understood units.  If you choose ``none'', no
+size will be specified in the invocation of @command{unpaper} and it
+will select the size based on the processed data.
+
+This command is also available in the Scanner menu as@*
+@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{}
+Select page size after processing}
+@end table
+
+
+@node General Options
+@section General Options
+@cindex general options
+
+@defopt scanner-resolution
+This option specifies the resolution in DPI used for image and
+document scans as a property list with the keys @code{:image} and
+@code{:doc}, respectively, and integers as values.  The default is:
+@lisp
+(:image 600 :doc 300)
+@end lisp
+The available resolutions depend on your device.
+
+This option can be set per-session with the commands
+@code{scanner-select-image-resolution} and
+@code{scanner-select-document-resolution}.
+@end defopt
+
+@defopt scanner-preview-resolution
+This option specifies the resolution in DPI used in preview scans.  The
+default is 75.
+@end defopt
+
+@defopt scanner-brightness
+This option specifies the brightness setting for scans.  The default is
+20.  This option assumes the @command{scanimage} switch
+@option{--brightness} is available, which is device-specific.
+
+This option can be set with the command @code{scanner-set-brightness}.
+@end defopt
+
+@defopt scanner-contrast
+This option specifies the contrast setting for scans.  The default is
+50.  This option assumes the @command{scanimage} switch
+@option{--contrast} is available, which is device-specific.
+
+This option can be set with the command @code{scanner-set-contrast}.
+@end defopt
+
+@defopt scanner-paper-sizes
+This option holds paper sizes for document scans as a property list with
+the name of the page format as the key (e.g. @code{:a4}) and a list of
+width/height pairs in millimeters as value.  The default is:
+@lisp
+(:a3
+ (297 420)
+ :a4
+ (210 297)
+ :a5
+ (148 210)
+ :a6
+ (105 148)
+ :tabloid
+ (279.4 431.8)
+ :legal
+ (215.9 355.6)
+ :letter
+ (215.9 279.4))
+@end lisp
+@end defopt
+
+@anchor{scanner-doc-papersize}
+@defopt scanner-doc-papersize
+Use this option to select the paper size for the document scans.  The
+value must be one of the keys from @code{scanner-paper-sizes}, or the
+special value @code{:whatever} that lets @command{scanimage} select the
+paper size (usually the available scan area).  The default is
+@code{:a4}.
+
+This option can be set per-session with the command
+@code{scanner-select-papersize}.
+@end defopt
+
+@defopt scanner-image-size
+This option specifies the size used in image scans as a list of width
+and height values in millimeters.  The default is
+@lisp
+(200 250)
+@end lisp
+for an image of 200@dmn{mm} width and 250@dmn{mm} height.  If set to
+nil, the size is determined by @command{scanimage} (usually the available scan
+area.)
+
+This option can be set per-session with the command
+@code{scanner-select-image-size}.
+@end defopt
+
+@defopt scanner-scan-delay
+This option specifies the delay in seconds to wait between pages in a
+multi-page scan.  Set this to something large enough so you can feed
+the next sheet to your scanner before it starts scanning the next
+page.  The default is 3.
+@end defopt
+
+@defopt scanner-reverse-pages
+This option, when set to t, causes Scanner to reverse the order of the
+scanned pages in a document scan.  The default is nil.
+@end defopt
+
+@node Configuring scanimage
+@section Configuring scanimage
+@cindex configuring scanimage
+
+Some of the options @command{scanimage} accepts (and Scanner uses) are
+device-dependent.  To find out which options your scanner hardware
+offers, run @command{scanimage --help} with your scanner plugged in.
+This incantation should print a list of general and device-dependent
+options.
+
+@defopt scanner-scanimage-program
+This option specifies the path of @command{scanimage}.  The default is given by
+@lisp
+(executable-find "scanimage")
+@end lisp
+@end defopt
+
+@defopt scanner-scan-mode
+This option specifies the scan modes for document and image scans.  It
+is a property list with the keys @code{:image} and @code{:doc}, for
+images and documents, respectively, and strings naming the scan modes
+as values.  For example,
+@lisp
+(:image "Color" :doc "Gray")
+@end lisp
+sets ``Color'' mode for image scans and ``Gray'' mode for document
+scans.  The default is to use ``Color'' for both image and document
+scans.
+
+The available scan modes depend on your device.  Usually, ``Lineart'',
+``Gray'', and ``Color'' are available.  For images you probably want
+``Color'', and for good OCR results in document scans, you should
+choose either ``Gray'' or ``Color''.
+@end defopt
+
+@defopt scanner-image-format
+This option sets the default format used by @command{scanimage} for image and
+document scans.  It is a property list similar to
+@code{scanner-scan-mode}.  For example, the default
+@lisp
+(:image "jpeg" :doc "pnm")
+@end lisp
+configures Scanner to use the JPEG format for image scans and the PNM
+format for document scans.  While document scans will always use the
+format specified with this option, you can override the format used in
+image scans with the appropriate file extension, see
+@ref{Scanning Documents and Images}.
+
+The supported formats are documented in the @command{scanimage} manual page.
+For example, version 1.0.31 of @command{scanimage} supports PNM, TIFF, PNG and
+JPEG. 
+
+Note that the document scan format specified with this option is an
+intermediate format, not the document format generated at the end of
+the whole process.  With the PNM format used in the example above, you
+can still have a PDF output, see @ref{scanner-tesseract-outputs}.
+
+If you use @command{unpaper} for post-processing before OCR in document
+scans (@pxref{Configuring unpaper}), the format will silently be forced
+to PNM, as this is required by @command{unpaper}.
+@end defopt
+
+@defopt scanner-device-name
+The device name of the scanner as reported by @command{scanimage}.  The default
+is nil, which prompts Scanner to use @command{scanimage} for automatic
+detection.  The detected device will be stored in this variable and
+used for all subsequent scans, until a new detection is forced either
+by calling @code{scanner-select-device} with a prefix argument, or by
+this device becoming unavailable.
+
+Usually you need not customize this option as auto-detection should
+work just fine.
+@end defopt
+
+@defopt scanner-scanimage-switches
+You may find that additional switches to @command{scanimage} not covered by any
+of the above user options are necessary.  You can use
+@code{scanner-scanimage-switches} for these.  Specify the switches as a
+list of switch/value pairs, such as:
+@lisp
+("--switch1" "value1" "-s" "2")
+@end lisp
+The default is nil.
+@end defopt
+
+
+@node Configuring unpaper
+@section Configuring unpaper
+@cindex configuring unpaper
+
+@defopt scanner-unpaper-program
+This variable contains the path of the @command{unpaper} program.
+@end defopt
+
+@defopt scanner-use-unpaper
+If this option is non-@code{nil}, scan enhancement using
+@command{unpaper} is activated.  Although using @command{unpaper} is
+highly recommended, its configuration is a bit elaborate and might be
+confusing at first.  The default is therefore @code{nil}.
+@end defopt
+
+@defopt scanner-unpaper-page-layout
+This option specifies the page layout of the scanned sheets.  Allowed
+values are ``single'', ``double'', and ``none'', setting
+@command{unpaper} up for detection of the page extent.  Note that
+``double'' implies a landscape orientation.  This option corresponds to
+the @option{--layout} option of @command{unpaper}.  See its
+documentation for details on the implications of the values.  The
+default is ``none''.
+@end defopt
+
+@defopt scanner-unpaper-input-pages
+This option selects the number of pages per scanned sheet of input.
+Allowed values are @code{1} and @code{2}.  This variable corresponds to
+the @option{--input-pages} option of @command{unpaper}.  If set to two
+input pages, @command{unpaper} will pairwise combine input sheets.  The
+default is @code{1}.
+@end defopt
+
+@defopt scanner-unpaper-output-pages
+This option selects the number of pages per sheet of processed output.
+Allowed values are @code{1} and @code{2}.  This variable corresponds to
+the @option{--output-pages} option of @command{unpaper}.  If set to two
+output pages, @command{unpaper} will split up every page of processed
+output into two pages.  The default is @code{1}.
+@end defopt
+
+@defopt scanner-unpaper-pre-rotation
+This option specifies the rotation to be applied before further
+processing.  Allowed values are ``clockwise'', ``counter-clockwise'',
+and ``none''.  This variable corresponds to the @option{--pre-rotation}
+option of @command{unpaper}.  If you choose ``none'', no rotation is
+specified in the invocation of @command{unpaper}.  The default is
+``none.
+@end defopt
+
+@defopt scanner-unpaper-post-rotation
+This option specifies the rotation to be applied after all the
+processing.  Allowed values are ``clockwise'', ``counter-clockwise'',
+and ``none''.  This variable corresponds to the @option{--post-rotation}
+option of @command{unpaper}.  If you choose ``none'', no rotation is
+specified in the invocation of @command{unpaper}.  The default is
+``none.
+@end defopt
+
+@defopt scanner-unpaper-pre-size
+This option specifies the page size to assume before further processing.
+The scanned input will be scaled to this size.  Allowed values are
+``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'',
+``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'',
+``legal-landscape'', ``none'', and direct width and height
+specifications as in ``21cm,29.7cm''.  This variable corresponds to the
+@option{--size} option of @command{unpaper}.  The default is ``a4''.
+@end defopt
+
+@defopt scanner-unpaper-post-size
+This option specifies the page size to assume after all the processing.
+The processed output will be scaled to this size.  Allowed values are
+``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'',
+``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'',
+``legal-landscape'', ``none'', and direct width and height
+specifications as in ``21cm,29.7cm''.  This variable corresponds to the
+@option{--post-size} option of @command{unpaper}.  The default is ``a4''.
+@end defopt
+
+@defopt scanner-unpaper-border
+This option allows you to force a border of white pixels at the four
+edges of a scanned sheet.  Allowed is any list of four integers, for
+example, @code{(10 10 10 10)} (the default).  This is very useful to
+remove black or gray scan artefacts at the edges of a sheet.  Even if
+this is not specified, @command{unpaper} will try to detect any such
+artefacts and remove them.  However, forcing a border usually leads to
+better results.  This variable corresponds to the @option{--border}
+option of @command{unpaper}.
+@end defopt
+
+@defopt scanner-unpaper-switches
+Any additional parameters to @command{unpaper} can be specified using
+this option.  Allowed is any list comprising valid @command{unpaper}
+options as strings.
+@end defopt
+
+@node Configuring tesseract
+@section Configuring tesseract
+@cindex configuring tesseract
+
+@defopt scanner-tesseract-program
+This option specifies the path of the @command{tesseract} program.
+@end defopt
+
+@defopt scanner-tessdata-dir
+This option specifies the @file{tessdata} directory.  This directory is
+supposed to contain the language data files for @command{tesseract}.
+The default is @file{/usr/share/tessdata/}.
+@end defopt
+
+@defopt scanner-tesseract-configdir
+This option specifies the @command{tesseract} @file{configs} directory.
+This directory is supposed to contain the language data files for
+@command{tesseract}.  The default is
+@file{/usr/share/tessdata/configs/}.
+@end defopt
+
+@defopt scanner-tesseract-languages
+This option lists the languages passed to @command{tesseract} as a list
+of strings.  The default is:
+@lisp
+("eng")
+@end lisp
+It is possible to pass more than one language to @command{tesseract},
+which can be useful if you have a multi-language document.  For
+instance,
+@lisp
+("eng" "deu")
+@end lisp
+sets @command{tesseract} up for recognizing english and german language.
+However, for single-language documents, the best results are usually
+obtained when setting only one language.
+
+This option can be set per-session with the command
+@code{scanner-select-languages}.
+@end defopt
+
+@anchor{scanner-tesseract-outputs}
+@defopt scanner-tesseract-outputs
+This option lists the output formats to produce.  The available output
+formats are provided as configuration files in the
+@file{/usr/share/tessdata/configs/} directory.  The default
+@lisp
+("pdf" "txt")
+@end lisp
+causes @command{tesseract} to output both a PDF and a text file.
+
+This option can be set per-session with the command
+@code{scanner-select-outputs}.
+@end defopt
+
+@defopt scanner-tesseract-switches
+You can use this option to specify any additional switches for
+@command{tesseract} not covered by the above options.  Use the same
+format as for @code{scanner-scanimage-switches}.  The default is nil.
+@end defopt
+
+@node Improving Scan Quality
+@chapter Improving Scan Quality
+@cindex improving scan quality
+@cindex scan quality, improving
+@cindex quality, improving
+
+This chapter comprises recommendations for improving the scan or OCR
+quality.  If you know about any additional tips and tricks to improve
+quality, please let the author know about them.
+
+Besides checking the following sections, you might also want to consult
+the documentation for @command{tesseract},
+@url{https://tesseract-ocr.github.io/tessdoc/}, and @command{unpaper},
+@url{https://github.com/unpaper/unpaper/blob/main/doc/basic-concepts.md}
+(basics) and
+@url{https://github.com/unpaper/unpaper/blob/main/doc/image-processing.md}
+(details).
+
+
+@menu
+* Improving General Scan Quality::
+* Improving OCR::
+@end menu
+
+@node Improving General Scan Quality
+@section Improving General Scan Quality
+@cindex improving general scan quality
+
+@table @asis
+@item Image format
+As a lossy format, JPEG is not a good basis for later OCR.  Therefore,
+use PNG, PNM, or TIFF for document scanning.  If you also use
+@command{unpaper}, the image format is forced to PNM, as required by
+this tool.
+
+@item Scan area
+Besides setting the size of the scan area, @command{scanimage} allows
+you to specify offsets to the top and left edges.  The device-specific
+switches @option{-l} and @option{-t}, if available, allow you to specify
+the top-left x and y positions, respectively.  This can be used to get
+rid of some blacked out parts in corners due to the mis-alignment of
+scan area and scanned sheet.
+
+@item Resolution
+For document scans, use at least 300 DPI to achieve acceptable OCR
+results.  A resolution above 600 DPI will not enhance OCR quality any
+further and only leads to larger files.  For most documents, 300 DPI
+should be ok.
+
+@item Brightness and contrast
+Good document quality (and especially good OCR results) require
+sufficient contrast and a good reproduction of the document's background
+color.  If the defaults of your device are inadequate, use the
+brightness and contrast settings of @command{scanimage} to provide
+sensible values.  See @ref{Configuring scanimage} and the options
+@code{scanner-brightness} and @code{scanner-contrast} there.  You may
+want to try a low brightness setting (for example, 20) and a medium
+contrast setting (for example, 50) as a start.
+
+Note that the underlying parameters to @command{scanimage} are
+device-specific.  If the two mentioned options are not supported by your
+device, you may be able to use @code{scanner-scanimage-switches} to
+supply the specific switches to @command{scanimage}.
+
+@item Dark areas and shadows
+If your scan shows dark (black/gray) areas or shadows, for example in
+the fold when scanning a book, use @command{unpaper} to remove them.  If
+it cannot remove these areas automatically, you can manually specify an
+area to be wiped out using the @option{--wipe} switch of
+@command{unpaper}.  If you scan a page with the ``double'' layout and
+want to remove the shadow of a book fold, use the @option{--middle-wipe}
+switch.  You can put these switches into the
+@code{scanner-unpaper-switches} option.  See also the @command{unpaper}
+documentation.
+
+@item Page borders
+If you use @command{unpaper}, it will try to remove dark areas around
+the edges of the page.  If this does not work automatically, use the
+@code{scanner-unpaper-border} option to specify a border (in pixels)
+around the edges of the page that is to be wiped, see @ref{Configuring
+unpaper}.
+
+@end table
+
+@node Improving OCR
+@section Improving OCR
+@cindex improving ocr
+
+@table @asis
+@item Tesseract version
+Use version 4 or higher of @command{tesseract}.  This version includes a
+new OCR engine that delivers better results than the previous one.
+Also, @command{tesseract} is multithreaded starting from version 4 and
+is therefore faster on multi-core machines.
+
+@item Language setting
+Tesseract allows you to use multiple languages.  For single-language
+documents, however, this doesn't seem to be optimal.  It's best to
+choose a single language when possible.
+
+@item Page deskewing
+OCR is quite sensitive to any skew of a page.  Use @command{unpaper} to
+deskew the pages.  See @ref{Configuring unpaper}.
+
+In some cases, @command{unpaper} may not be able to deskew a page
+automatically.  If so, have a look at the deskewing switches of
+@command{unpaper}.  Especially @option{--deskew-scan-step},
+@option{--deskew-scan-deviation}, and @option{--deskew-scan-range} can
+be helpful.  You can put those switches into
+@code{scanner-unpaper-switches}.  See the @command{unpaper}
+documentation for details.
+
+@end table
+
+
+
+
+
+
+@node News
+@chapter News
+@cindex news
+
+This chapter lists the changes made in new releases of Scanner.
+
+@menu
+* Changes in Version 0.3::
+@end menu
+
+@node Changes in Version 0.3
+@section Changes in Version 0.3
+@cindex changes in version 0.3
+
+@itemize
+@item
+@command{unpaper} has been added as a new backend; it allows scan
+enhancement in document scans; see @ref{Configuring unpaper} for
+options.  There is a new ``Scan Enhancement'' sub-menu for this backend.
+@item
+A command for making a preview scan has been added.
+@item
+Menu items and user options for brightness and contrast have been added.
+@item
+A menu item for setting the scan delay has been added.
+@item
+Older @command{scanimage} versions (before 1.0.28) are now supported as well.
+@item
+A new page size @code{:whatever} allows @command{scanimage} to select
+the page size based on the available scan area.
+@item
+Scanner now comes with a manual.
+@end itemize
+
+
+
+@node Reporting Bugs
+@chapter Reporting Bugs
+@cindex reporting bugs
+
+Refer to @uref{https://www.gitlab.com/rstocker/scanner/}
+mention *Scanner* log buffer
+
+
+@node GNU Free Documentation License
+@chapter GNU Free Documentation License
+@c Get fdl.texi from https://www.gnu.org/licenses/fdl.html
+@include fdl.texi
+
+
+@node Index
+@unnumbered Index
+
+@printindex cp
+
+@c combine indices
+
+@bye
+
+@c scanner.texi ends here

Reply via email to