branch: externals/scanner commit b5dbefbc12530523069389814260601fd4881288 Merge: 3de9ddefc7 599cecc8a6 Author: Raffael Stocker <r.stoc...@mnet-mail.de> Commit: Raffael Stocker <r.stoc...@mnet-mail.de>
Merge branch 'documentation' into develop/unpaper --- .gitignore | 5 + fdl.texi | 505 ++++++++++++++++++++++++++++ scanner.texi | 1053 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 1563 insertions(+) diff --git a/.gitignore b/.gitignore index 0381e802c3..6a7296b1ef 100644 --- a/.gitignore +++ b/.gitignore @@ -12,3 +12,8 @@ TAGS *.png *.pdf *.txt +*.aux +*.cp +*.cps +*.toc +*.info diff --git a/fdl.texi b/fdl.texi new file mode 100644 index 0000000000..eaf3da0e92 --- /dev/null +++ b/fdl.texi @@ -0,0 +1,505 @@ +@c The GNU Free Documentation License. +@center Version 1.3, 3 November 2008 + +@c This file is intended to be included within another document, +@c hence no sectioning command or @node. + +@display +Copyright @copyright{} 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc. +@uref{https://fsf.org/} + +Everyone is permitted to copy and distribute verbatim copies +of this license document, but changing it is not allowed. +@end display + +@enumerate 0 +@item +PREAMBLE + +The purpose of this License is to make a manual, textbook, or other +functional and useful document @dfn{free} in the sense of freedom: to +assure everyone the effective freedom to copy and redistribute it, +with or without modifying it, either commercially or noncommercially. +Secondarily, this License preserves for the author and publisher a way +to get credit for their work, while not being considered responsible +for modifications made by others. + +This License is a kind of ``copyleft'', which means that derivative +works of the document must themselves be free in the same sense. It +complements the GNU General Public License, which is a copyleft +license designed for free software. + +We have designed this License in order to use it for manuals for free +software, because free software needs free documentation: a free +program should come with manuals providing the same freedoms that the +software does. But this License is not limited to software manuals; +it can be used for any textual work, regardless of subject matter or +whether it is published as a printed book. We recommend this License +principally for works whose purpose is instruction or reference. + +@item +APPLICABILITY AND DEFINITIONS + +This License applies to any manual or other work, in any medium, that +contains a notice placed by the copyright holder saying it can be +distributed under the terms of this License. Such a notice grants a +world-wide, royalty-free license, unlimited in duration, to use that +work under the conditions stated herein. The ``Document'', below, +refers to any such manual or work. Any member of the public is a +licensee, and is addressed as ``you''. You accept the license if you +copy, modify or distribute the work in a way requiring permission +under copyright law. + +A ``Modified Version'' of the Document means any work containing the +Document or a portion of it, either copied verbatim, or with +modifications and/or translated into another language. + +A ``Secondary Section'' is a named appendix or a front-matter section +of the Document that deals exclusively with the relationship of the +publishers or authors of the Document to the Document's overall +subject (or to related matters) and contains nothing that could fall +directly within that overall subject. (Thus, if the Document is in +part a textbook of mathematics, a Secondary Section may not explain +any mathematics.) The relationship could be a matter of historical +connection with the subject or with related matters, or of legal, +commercial, philosophical, ethical or political position regarding +them. + +The ``Invariant Sections'' are certain Secondary Sections whose titles +are designated, as being those of Invariant Sections, in the notice +that says that the Document is released under this License. If a +section does not fit the above definition of Secondary then it is not +allowed to be designated as Invariant. The Document may contain zero +Invariant Sections. If the Document does not identify any Invariant +Sections then there are none. + +The ``Cover Texts'' are certain short passages of text that are listed, +as Front-Cover Texts or Back-Cover Texts, in the notice that says that +the Document is released under this License. A Front-Cover Text may +be at most 5 words, and a Back-Cover Text may be at most 25 words. + +A ``Transparent'' copy of the Document means a machine-readable copy, +represented in a format whose specification is available to the +general public, that is suitable for revising the document +straightforwardly with generic text editors or (for images composed of +pixels) generic paint programs or (for drawings) some widely available +drawing editor, and that is suitable for input to text formatters or +for automatic translation to a variety of formats suitable for input +to text formatters. A copy made in an otherwise Transparent file +format whose markup, or absence of markup, has been arranged to thwart +or discourage subsequent modification by readers is not Transparent. +An image format is not Transparent if used for any substantial amount +of text. A copy that is not ``Transparent'' is called ``Opaque''. + +Examples of suitable formats for Transparent copies include plain +ASCII without markup, Texinfo input format, La@TeX{} input +format, SGML or XML using a publicly available +DTD, and standard-conforming simple HTML, +PostScript or PDF designed for human modification. Examples +of transparent image formats include PNG, XCF and +JPG@. Opaque formats include proprietary formats that can be +read and edited only by proprietary word processors, SGML or +XML for which the DTD and/or processing tools are +not generally available, and the machine-generated HTML, +PostScript or PDF produced by some word processors for +output purposes only. + +The ``Title Page'' means, for a printed book, the title page itself, +plus such following pages as are needed to hold, legibly, the material +this License requires to appear in the title page. For works in +formats which do not have any title page as such, ``Title Page'' means +the text near the most prominent appearance of the work's title, +preceding the beginning of the body of the text. + +The ``publisher'' means any person or entity that distributes copies +of the Document to the public. + +A section ``Entitled XYZ'' means a named subunit of the Document whose +title either is precisely XYZ or contains XYZ in parentheses following +text that translates XYZ in another language. (Here XYZ stands for a +specific section name mentioned below, such as ``Acknowledgements'', +``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title'' +of such a section when you modify the Document means that it remains a +section ``Entitled XYZ'' according to this definition. + +The Document may include Warranty Disclaimers next to the notice which +states that this License applies to the Document. These Warranty +Disclaimers are considered to be included by reference in this +License, but only as regards disclaiming warranties: any other +implication that these Warranty Disclaimers may have is void and has +no effect on the meaning of this License. + +@item +VERBATIM COPYING + +You may copy and distribute the Document in any medium, either +commercially or noncommercially, provided that this License, the +copyright notices, and the license notice saying this License applies +to the Document are reproduced in all copies, and that you add no other +conditions whatsoever to those of this License. You may not use +technical measures to obstruct or control the reading or further +copying of the copies you make or distribute. However, you may accept +compensation in exchange for copies. If you distribute a large enough +number of copies you must also follow the conditions in section 3. + +You may also lend copies, under the same conditions stated above, and +you may publicly display copies. + +@item +COPYING IN QUANTITY + +If you publish printed copies (or copies in media that commonly have +printed covers) of the Document, numbering more than 100, and the +Document's license notice requires Cover Texts, you must enclose the +copies in covers that carry, clearly and legibly, all these Cover +Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on +the back cover. Both covers must also clearly and legibly identify +you as the publisher of these copies. The front cover must present +the full title with all words of the title equally prominent and +visible. You may add other material on the covers in addition. +Copying with changes limited to the covers, as long as they preserve +the title of the Document and satisfy these conditions, can be treated +as verbatim copying in other respects. + +If the required texts for either cover are too voluminous to fit +legibly, you should put the first ones listed (as many as fit +reasonably) on the actual cover, and continue the rest onto adjacent +pages. + +If you publish or distribute Opaque copies of the Document numbering +more than 100, you must either include a machine-readable Transparent +copy along with each Opaque copy, or state in or with each Opaque copy +a computer-network location from which the general network-using +public has access to download using public-standard network protocols +a complete Transparent copy of the Document, free of added material. +If you use the latter option, you must take reasonably prudent steps, +when you begin distribution of Opaque copies in quantity, to ensure +that this Transparent copy will remain thus accessible at the stated +location until at least one year after the last time you distribute an +Opaque copy (directly or through your agents or retailers) of that +edition to the public. + +It is requested, but not required, that you contact the authors of the +Document well before redistributing any large number of copies, to give +them a chance to provide you with an updated version of the Document. + +@item +MODIFICATIONS + +You may copy and distribute a Modified Version of the Document under +the conditions of sections 2 and 3 above, provided that you release +the Modified Version under precisely this License, with the Modified +Version filling the role of the Document, thus licensing distribution +and modification of the Modified Version to whoever possesses a copy +of it. In addition, you must do these things in the Modified Version: + +@enumerate A +@item +Use in the Title Page (and on the covers, if any) a title distinct +from that of the Document, and from those of previous versions +(which should, if there were any, be listed in the History section +of the Document). You may use the same title as a previous version +if the original publisher of that version gives permission. + +@item +List on the Title Page, as authors, one or more persons or entities +responsible for authorship of the modifications in the Modified +Version, together with at least five of the principal authors of the +Document (all of its principal authors, if it has fewer than five), +unless they release you from this requirement. + +@item +State on the Title page the name of the publisher of the +Modified Version, as the publisher. + +@item +Preserve all the copyright notices of the Document. + +@item +Add an appropriate copyright notice for your modifications +adjacent to the other copyright notices. + +@item +Include, immediately after the copyright notices, a license notice +giving the public permission to use the Modified Version under the +terms of this License, in the form shown in the Addendum below. + +@item +Preserve in that license notice the full lists of Invariant Sections +and required Cover Texts given in the Document's license notice. + +@item +Include an unaltered copy of this License. + +@item +Preserve the section Entitled ``History'', Preserve its Title, and add +to it an item stating at least the title, year, new authors, and +publisher of the Modified Version as given on the Title Page. If +there is no section Entitled ``History'' in the Document, create one +stating the title, year, authors, and publisher of the Document as +given on its Title Page, then add an item describing the Modified +Version as stated in the previous sentence. + +@item +Preserve the network location, if any, given in the Document for +public access to a Transparent copy of the Document, and likewise +the network locations given in the Document for previous versions +it was based on. These may be placed in the ``History'' section. +You may omit a network location for a work that was published at +least four years before the Document itself, or if the original +publisher of the version it refers to gives permission. + +@item +For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve +the Title of the section, and preserve in the section all the +substance and tone of each of the contributor acknowledgements and/or +dedications given therein. + +@item +Preserve all the Invariant Sections of the Document, +unaltered in their text and in their titles. Section numbers +or the equivalent are not considered part of the section titles. + +@item +Delete any section Entitled ``Endorsements''. Such a section +may not be included in the Modified Version. + +@item +Do not retitle any existing section to be Entitled ``Endorsements'' or +to conflict in title with any Invariant Section. + +@item +Preserve any Warranty Disclaimers. +@end enumerate + +If the Modified Version includes new front-matter sections or +appendices that qualify as Secondary Sections and contain no material +copied from the Document, you may at your option designate some or all +of these sections as invariant. To do this, add their titles to the +list of Invariant Sections in the Modified Version's license notice. +These titles must be distinct from any other section titles. + +You may add a section Entitled ``Endorsements'', provided it contains +nothing but endorsements of your Modified Version by various +parties---for example, statements of peer review or that the text has +been approved by an organization as the authoritative definition of a +standard. + +You may add a passage of up to five words as a Front-Cover Text, and a +passage of up to 25 words as a Back-Cover Text, to the end of the list +of Cover Texts in the Modified Version. Only one passage of +Front-Cover Text and one of Back-Cover Text may be added by (or +through arrangements made by) any one entity. If the Document already +includes a cover text for the same cover, previously added by you or +by arrangement made by the same entity you are acting on behalf of, +you may not add another; but you may replace the old one, on explicit +permission from the previous publisher that added the old one. + +The author(s) and publisher(s) of the Document do not by this License +give permission to use their names for publicity for or to assert or +imply endorsement of any Modified Version. + +@item +COMBINING DOCUMENTS + +You may combine the Document with other documents released under this +License, under the terms defined in section 4 above for modified +versions, provided that you include in the combination all of the +Invariant Sections of all of the original documents, unmodified, and +list them all as Invariant Sections of your combined work in its +license notice, and that you preserve all their Warranty Disclaimers. + +The combined work need only contain one copy of this License, and +multiple identical Invariant Sections may be replaced with a single +copy. If there are multiple Invariant Sections with the same name but +different contents, make the title of each such section unique by +adding at the end of it, in parentheses, the name of the original +author or publisher of that section if known, or else a unique number. +Make the same adjustment to the section titles in the list of +Invariant Sections in the license notice of the combined work. + +In the combination, you must combine any sections Entitled ``History'' +in the various original documents, forming one section Entitled +``History''; likewise combine any sections Entitled ``Acknowledgements'', +and any sections Entitled ``Dedications''. You must delete all +sections Entitled ``Endorsements.'' + +@item +COLLECTIONS OF DOCUMENTS + +You may make a collection consisting of the Document and other documents +released under this License, and replace the individual copies of this +License in the various documents with a single copy that is included in +the collection, provided that you follow the rules of this License for +verbatim copying of each of the documents in all other respects. + +You may extract a single document from such a collection, and distribute +it individually under this License, provided you insert a copy of this +License into the extracted document, and follow this License in all +other respects regarding verbatim copying of that document. + +@item +AGGREGATION WITH INDEPENDENT WORKS + +A compilation of the Document or its derivatives with other separate +and independent documents or works, in or on a volume of a storage or +distribution medium, is called an ``aggregate'' if the copyright +resulting from the compilation is not used to limit the legal rights +of the compilation's users beyond what the individual works permit. +When the Document is included in an aggregate, this License does not +apply to the other works in the aggregate which are not themselves +derivative works of the Document. + +If the Cover Text requirement of section 3 is applicable to these +copies of the Document, then if the Document is less than one half of +the entire aggregate, the Document's Cover Texts may be placed on +covers that bracket the Document within the aggregate, or the +electronic equivalent of covers if the Document is in electronic form. +Otherwise they must appear on printed covers that bracket the whole +aggregate. + +@item +TRANSLATION + +Translation is considered a kind of modification, so you may +distribute translations of the Document under the terms of section 4. +Replacing Invariant Sections with translations requires special +permission from their copyright holders, but you may include +translations of some or all Invariant Sections in addition to the +original versions of these Invariant Sections. You may include a +translation of this License, and all the license notices in the +Document, and any Warranty Disclaimers, provided that you also include +the original English version of this License and the original versions +of those notices and disclaimers. In case of a disagreement between +the translation and the original version of this License or a notice +or disclaimer, the original version will prevail. + +If a section in the Document is Entitled ``Acknowledgements'', +``Dedications'', or ``History'', the requirement (section 4) to Preserve +its Title (section 1) will typically require changing the actual +title. + +@item +TERMINATION + +You may not copy, modify, sublicense, or distribute the Document +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense, or distribute it is void, and +will automatically terminate your rights under this License. + +However, if you cease all violation of this License, then your license +from a particular copyright holder is reinstated (a) provisionally, +unless and until the copyright holder explicitly and finally +terminates your license, and (b) permanently, if the copyright holder +fails to notify you of the violation by some reasonable means prior to +60 days after the cessation. + +Moreover, your license from a particular copyright holder is +reinstated permanently if the copyright holder notifies you of the +violation by some reasonable means, this is the first time you have +received notice of violation of this License (for any work) from that +copyright holder, and you cure the violation prior to 30 days after +your receipt of the notice. + +Termination of your rights under this section does not terminate the +licenses of parties who have received copies or rights from you under +this License. If your rights have been terminated and not permanently +reinstated, receipt of a copy of some or all of the same material does +not give you any rights to use it. + +@item +FUTURE REVISIONS OF THIS LICENSE + +The Free Software Foundation may publish new, revised versions +of the GNU Free Documentation License from time to time. Such new +versions will be similar in spirit to the present version, but may +differ in detail to address new problems or concerns. See +@uref{https://www.gnu.org/licenses/}. + +Each version of the License is given a distinguishing version number. +If the Document specifies that a particular numbered version of this +License ``or any later version'' applies to it, you have the option of +following the terms and conditions either of that specified version or +of any later version that has been published (not as a draft) by the +Free Software Foundation. If the Document does not specify a version +number of this License, you may choose any version ever published (not +as a draft) by the Free Software Foundation. If the Document +specifies that a proxy can decide which future versions of this +License can be used, that proxy's public statement of acceptance of a +version permanently authorizes you to choose that version for the +Document. + +@item +RELICENSING + +``Massive Multiauthor Collaboration Site'' (or ``MMC Site'') means any +World Wide Web server that publishes copyrightable works and also +provides prominent facilities for anybody to edit those works. A +public wiki that anybody can edit is an example of such a server. A +``Massive Multiauthor Collaboration'' (or ``MMC'') contained in the +site means any set of copyrightable works thus published on the MMC +site. + +``CC-BY-SA'' means the Creative Commons Attribution-Share Alike 3.0 +license published by Creative Commons Corporation, a not-for-profit +corporation with a principal place of business in San Francisco, +California, as well as future copyleft versions of that license +published by that same organization. + +``Incorporate'' means to publish or republish a Document, in whole or +in part, as part of another Document. + +An MMC is ``eligible for relicensing'' if it is licensed under this +License, and if all works that were first published under this License +somewhere other than this MMC, and subsequently incorporated in whole +or in part into the MMC, (1) had no cover texts or invariant sections, +and (2) were thus incorporated prior to November 1, 2008. + +The operator of an MMC Site may republish an MMC contained in the site +under CC-BY-SA on the same site at any time before August 1, 2009, +provided the MMC is eligible for relicensing. + +@end enumerate + +@page +@heading ADDENDUM: How to use this License for your documents + +To use this License in a document you have written, include a copy of +the License in the document and put the following copyright and +license notices just after the title page: + +@smallexample +@group + Copyright (C) @var{year} @var{your name}. + Permission is granted to copy, distribute and/or modify this document + under the terms of the GNU Free Documentation License, Version 1.3 + or any later version published by the Free Software Foundation; + with no Invariant Sections, no Front-Cover Texts, and no Back-Cover + Texts. A copy of the license is included in the section entitled ``GNU + Free Documentation License''. +@end group +@end smallexample + +If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, +replace the ``with@dots{}Texts.''@: line with this: + +@smallexample +@group + with the Invariant Sections being @var{list their titles}, with + the Front-Cover Texts being @var{list}, and with the Back-Cover Texts + being @var{list}. +@end group +@end smallexample + +If you have Invariant Sections without Cover Texts, or some other +combination of the three, merge those two alternatives to suit the +situation. + +If your document contains nontrivial examples of program code, we +recommend releasing these examples in parallel under your choice of +free software license, such as the GNU General Public License, +to permit their use in free software. + +@c Local Variables: +@c ispell-local-pdict: "ispell-dict" +@c End: diff --git a/scanner.texi b/scanner.texi new file mode 100644 index 0000000000..957e7f809b --- /dev/null +++ b/scanner.texi @@ -0,0 +1,1053 @@ +\input texinfo @c -*-texinfo-*- +@c %**start of header +@setfilename scanner.info +@c try the time-stamp package for version stuff +@set VERSION 0.3 +@settitle Scanner Manual @value{VERSION} +@syncodeindex vr cp +@syncodeindex fn cp +@documentencoding UTF-8 +@c %**end of header +@copying +This is the @emph{Scanner Manual}, corresponding to version @value{VERSION}. + +Copyright @copyright{} 2021 Free Software Foundation, Inc. + +@quotation +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 +or any later version published by the Free Software Foundation; +with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. +A copy of the license is included in the section entitled ``GNU +Free Documentation License''. + +A copy of the license is also available from the Free Software +Foundation Web site at @url{https://www.gnu.org/licenses/fdl.html}. + +@end quotation + +The document was typeset with +@uref{http://www.texinfo.org/, GNU Texinfo}. + +@end copying + +@dircategory Emacs +@direntry +* Scanner: (scanner)Document and image scanning in GNU Emacs +@end direntry + + +@titlepage +@title Scanner +@subtitle Document and image scanning in GNU Emacs, version @value{VERSION} +@author Raffael Stocker +@page +@vskip 0pt plus 1filll +@insertcopying +@end titlepage + +@c Output the table of the contents at the beginning. +@contents + +@ifnottex +@node Top +@top Scanner + + +@insertcopying +@end ifnottex + +@c Generate the nodes for this menu with `C-c C-u C-a'. +@menu +* Overview:: +* User Options:: +* Improving Scan Quality:: +* News:: +* Reporting Bugs:: +* GNU Free Documentation License:: +* Index:: +@end menu + +@c Update all node entries with `C-c C-u C-n'. +@c Insert new nodes with `C-c C-c n'. +@node Overview +@chapter Overview +@cindex overview + +This chapter gives provides you with the most important information to +get started using Scanner. + +@menu +* Introduction:: +* Basic Setup:: +* Scanning Documents and Images:: +@end menu + +@node Introduction +@section Introduction +@cindex introduction + +If you want to scan a document at high quality with @acronym{OCR, +optical character recognition} and not use one of the available free +GUI programs, there are several things you might have to do: +@itemize + +@item +use a program like @command{scanimage} to obtain an image file from +your scanner, + +@item +enhance the image quality using a post-processing tool like +@command{unpaper}, and + +@item +generate a PDF or text file with OCR software like @command{tesseract}. +@end itemize + +Although this is not difficult to do in principle, each of these +programs requires an elaborate incantation to produce adequate output: +@itemize + +@item +the scan resolution must be set to something appropriate for later OCR +(usually 300 to 600 dpi,) + +@item +the page size must be defined, + +@item +perhaps some offsets must be added to page borders, + +@item +the document may have to be rotated, + +@item +some scan artifacts, like shadows, may have to be removed, + +@item +the page may need deskewing, + +@item +the language for OCR must be selected, + +@item +@dots{} +@end itemize + +Luckily, many of these items change rarely or not at all. Scanner +uses the customization system of GNU Emacs +(@pxref{Customization,,,emacs,The GNU Emacs Manual}) to remember the +necessary settings and takes care of processing using the +abovementioned programs. + +@node Basic Setup +@section Basic Setup +@cindex basic setup +@cindex setup, basic +@cindex configuration, basic +@cindex installation + +To get started with Scanner, make sure the following programs are +installed: +@table @command +@item scanimage +Scanimage comes with the sane-backends distribution, see +@url{http://sane-project.org/}. + +@item tesseract +Tesseract is used for OCR and PDF generation in document scans. The +source is available at @url{https://github.com/tesseract-ocr/tesseract}. + +@item unpaper +Unpaper is used for post-processing the scans obtained from +@command{scanimage} before feeding them into @command{tesseract}. This +is optional, but highly recommended. The source is available at +@url{https://github.com/unpaper/unpaper}. +@end table + +Tesseract is usually provided without the language data files as they +are very large. The full set of language files is over 4@dmn{GB}. Some +GNU/Linux distributions offer individual language packages; if yours +does not, you can download the language data files from +@url{https://github.com/tesseract-ocr/tessdata}. + +Make sure the options @code{scanner-scanimage-program}, +@code{scanner-tesseract-program}, and @code{scanner-unpaper-program} are +set correctly. Also, the options @code{scanner-tessdata-dir} and +@code{scanner-tesseract-configdir} must be set correctly so +@command{tesseract} can find the language data files and output +configurations. + +Customize the basic options like @code{scanner-doc-papersize}, +@code{scanner-resolution}, @code{scanner-tesseract-languages}, and +@code{scanner-tesseract-outputs}. See @ref{User Options} for a detailed +discussion of all the available options. + + +@node Scanning Documents and Images +@section Scanning Documents and Images +@cindex scanning documents and images + +The Scanner package provides two commands for scanning documents and +images. These are described below. + +@table @kbd +@item M-x scanner-scan-document +@itemx C-u M-x scanner-scan-document +@itemx C-u N M-x scanner-scan-document +@findex scanner-scan-document +Scan a document. When called without a prefix argument, this command +will scan only one page. When called with the default prefix argument +(as @kbd{C-u M-x scanner-scan-document}), it will ask after each scanned +page whether another pages should be scanned. With a numeric prefix +argument, it will scan that many pages, waiting a number of seconds +between each page, as configured in @code{scanner-scan-delay}. + +The scan will use the resolution configured in +@code{scanner-resolution} with the @code{:doc} key. + +This command interactively reads a file name that will +be used as the base name of the output file(s). The extension of the +file name is ignored as it is instead specified by the +@command{tesseract} output formats as configured with the option +@code{scanner-tesseract-outputs} or the command +@code{scanner-select-outputs}. +If the specified file already exists, @code{scanner-scan-document} will +ask for confirmation to overwrite it. + +This command will trigger auto-detection if no device has been +configured. If more than one device are available, it will offer ask +you to select one. + +If you configured Scanner to use @command{unpaper}, this command will +post-process the scans obtained from @command{scanimage} using +@command{unpaper} before feeding the results to @command{tesseract}. +See @ref{User Options} to find out how to configure scan and +post-processing. + +The scanning and conversion processes are run asynchronously. If you +want to monitor progress, bring up the @code{*Scanner*} buffer which +collects the outputs of the backend programs. + +This command is also available from the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Scan a document}@* +for a single-page scan and@* +@clicksequence{Tools @click{} Scanner @click{} Scan a multi-page document}@* +for a multi-page scan. + +@item M-x scanner-scan-image +@itemx C-u M-x scanner-scan-image +@itemx C-u n M-x scanner-scan-image +@findex scanner-scan-image +Scan an image. When called without a prefix argument, this command +will scan only one image. When called with the default prefix argument +(as @kbd{C-u M-x scanner-scan-image}), it will ask after each scanned +image whether another image should be scanned. With a numeric prefix +argument, it will scan that many images, waiting a number of seconds +between each image, as configured in @code{scanner-scan-delay}. + +The scan will use the resolution configured in +@code{scanner-resolution} with the @code{:image} key. + +This command interactively reads a file name. The extension of the file +name specifies the output file format. If no extension is provided, the +default image format, as configured in @code{scanner-image-format} will +be used. In a multi-image scan, this command will extend the given file +name base by @var{-number}, where @var{number} is the number of the +scanned image. For example, if the file name is @file{image.jpeg}, a +multi-image scan of @var{n} images will produce the files +@file{image-1.jpeg}, @file{image-2.jpeg} @dots{} @file{image-n.jpeg}. +If one of these files already exists, @code{scanner-scan-image} will ask +for confirmation to overwrite it. + +No post-processing with @command{unpaper} or @command{tesseract} is +done. See @ref{User Options} to find out how to configure scanning. + +This command is also available from the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Scan an image}@* +for a single-page scan and@* +@clicksequence{Tools @click{} Scanner @click{} Scan multiple images}@* +for a multi-image scan. + +@item M-x scanner-scan-preview +Make a preview scan. This command makes a preview scan assuming +document scan settings. The resolution of the scan is changed to +@code{scanner-preview-resolution} and the scan mode is set to ``Gray'', +otherwise all options stay in effect. If @code{scanner-use-unpaper} is +non-@code{nil}, post-processing with unpaper is done as well. The +resulting scan is shown in an image window, unless Emacs can't display +images, in which case a Dired buffer is created showing the files +generated by the scan. + +This command is also available from the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Make a preview scan}@* +@end table + + +@node User Options +@chapter User Options +@cindex user options + +This chapter lists all the available user options. All of these +options can be edited using the customization system of GNU Emacs, +which is advisable as then basic sanity checks are carried out. For a +number of options, interactive commands are available that simplify +the customization at run time, but don't save the changed values +between Emacs sessions. These functions are also available from the +Scanner menu (@clicksequence{Tools @click{} Scanner}). + +@menu +* Configuration Commands:: +* General Options:: +* Configuring scanimage:: +* Configuring unpaper:: +* Configuring tesseract:: +@end menu + +@node Configuration Commands +@section Configuration Commands +@cindex configuration commands + +The following commands help you configure some of the more-often used +options. They only change the options for the running session; if you +want to permanently set an option, so it will be remembered between +Emacs sessions, use the customization interface. + +@table @kbd +@item M-x scanner-set-image-resolution +@item M-x scanner-set-document-resolution +@findex scanner-set-document-resolution +@findex scanner-set-image-resolution +These commands interactively asks for a resolution (in @acronym{DPI, +dots per inch}) to be used in subsequent image and document scans, +respectively. The corresponding user options is +@code{scanner-resolution}. + +These commands are available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Select image +resolution}@* +and@* +@clicksequence{Tools @click{} Scanner @click{} Select +document resolution}. + +@item M-x scanner-select-papersize +@findex scanner-select-papersize +Select a paper size from @code{scanner-paper-sizes} or +@code{:whatever}. See also @code{scanner-doc-papersize}. + +This command is available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Select paper size}. + +@item M-x scanner-select-image-size +@findex scanner-select-image-size +Select an image size. This command interactively reads x and y +dimensions in millimeter from the minibuffer and sets +@code{scanner-image-size} accordingly. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Select image size}. + +@item M-x scanner-select-outputs +@findex scanner-select-outputs +Select the document outputs. This command reads a list of document +output formats. See also @code{scanner-tesseract-outputs}. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Select document outputs}. + +@item M-x scanner-select-languages +@findex scanner-select-languages +Select the languages assumed for OCR. This command reads a list of +languages used for OCR. The necessary @command{tesseract} data files +must be available. See @code{scanner-tesseract-languages}. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Select OCR languages}. + +@item M-x scanner-select-device +@itemx C-u M-x scanner-select-device +@findex scanner-select-device +Select a device, possibly triggering auto-detection. Normally, manual +device selection is not necessary as @command{scanimage} will +auto-detect. However, if you have multiple devices and want to change +between them, you can use this command to do so. + +When called with a prefix argument, auto-detection is forced even when +devices have already been detected before. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Select scanning device} + +@item M-x scanner-set-scan-delay +Set the delay in seconds between scans in multi-page mode. This +commands sets the variable @code{scanner-scan-delay}. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Set delay between scans} + +@item M-x scanner-set-brightness +Set the brightness of the scans. The available range is +device-specific. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Set brightness} + +@item M-x scanner-set-contrast +Set the contrast of the scans. The available range is +device-specific. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Set contrast} + + +@end table + +The following commands can be found in the ``Scan Enhancement'' submenu +of the Scanner menu (@clicksequence{Tools @click{} Scanner @click{} Scan +Enhancement}). They require @command{unpaper} to be installed. Scan +enhancement allows such post-processing operations as rotation, +de-noising, and deskewing, among others. It is highly recommended as a +preparatory step before OCR. The descriptions of the commands below +give a few hints on the usage of @command{unpaper}. For more details, +see its man-page or web-site. + +@table @kbd +@item M-x scanner-toggle-use-unpaper +@findex scanner-toggle-use-unpaper +Toggle the use of @command{unpaper} for scan enhancement. This command +changes the option @code{scanner-use-unpaper} during the session. Only +when this option is non-@code{nil} will @command{unpaper} be used and +the other items in the ``Scan Enhancement'' menu be available. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{} +Use unpaper for scan enhancement} + +The following commands configure some important processing steps; see +@ref{Configuring unpaper} for all the options. + +@item M-x scanner-select-page-layout +@findex scanner-select-page-layout +This command interactively asks for the page layout of the pages to be +scanned. Available options are ``single'', ``double'', and ``none'' +(the default). If you scan a sheet with two pages, for example as with +a book, you can choose ``double'' here so @command{unpaper} will divide +the sheet into two output pages. If you use ``single'', it will try to +identify the actual (single-)page contents on the sheet and stretch +these to fit the output page size. If you don't want any rearrangement, +choose ``none''. Note that ``double'' page layout implies a landscape +orientation. This command sets the option +@code{scanner-unpaper-page-layout} accordingly. If you want to split up +an input page into two output pages, you must also use the +@command{scanner-select-output-pages} command. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{} +Select page layout} + +@item M-x scanner-select-input-pages +@findex scanner-select-input-pages +This command allows you to select the number of input pages. Available +options are @code{1} and @code{2}. It sets the option +@code{scanner-unpaper-input-pages}. If you wanted to combine two +scanned input pages into one page, for example, to have left and right +sides on one sheet, you would select two input pages and one output +page, together with a ``single'' (or ``none'') page layout. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{} +Select number of input pages} + +@item M-x scanner-select-output-pages +@findex scanner-select-output-pages +This command allows you to select the number of output pages. Available +options are @code{1} and @code{2}. It sets the option +@code{scanner-unpaper-output-pages}. If you wanted to split one scanned +input page into two output pages, for example, to have left and right +sides from a book on separate pages, you would select one input page and +two output pages, together with a ``double'' page layout. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{} +Select number of output pages} + +@item M-x scanner-select-pre-rotation +@findex scanner-select-pre-rotation +This command asks for the rotation to be applied before any further +processing. Available values are ``clockwise'', ``counter-clockwise'', +and ``none''. It sets the @code{scanner-unpaper-pre-rotation} option. +You should use this option if you have a landscape-oriented document +scanned as portrait. Rotating before further processing is especially +relevant for scanning double-page documents, as it ensures that the +document is in the correct orientation before @command{unpaper} tries to +split pages. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{} +Select page rotation before processing} + +@item M-x scanner-select-post-rotation +@findex scanner-select-post-rotation +This command asks for the rotation to be applied after all the +processing. Available values are ``clockwise'', ``counter-clockwise'', +and ``none''. It sets the @code{scanner-unpaper-post-rotation} option. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{} +Select page rotation after processing} + +@item M-x scanner-select-pre-size +@findex scanner-select-pre-size +This command interactively asks for the page size to set before further +processing. The scanned sheets will be scaled to this size. Available +options are ``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'', +``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'', +``legal-landscape'', ``none'', and direct width and height +specifications as in ``21cm,29.7cm''. See the documentation for +@command{unpaper} for the understood units. If you choose ``none'', no +size will be specified in the invocation of @command{unpaper} and it +will select the size based on the input data. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{} +Select page size before processing} + +@item M-x scanner-select-post-size +@findex scanner-select-post-size +This command interactively asks for the page size to set after all the +processing. The processed sheets will be scaled to this size. Available +options are ``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'', +``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'', +``legal-landscape'', ``none'', and direct width and height +specifications as in ``21cm,29.7cm''. See the documentation for +@command{unpaper} for the understood units. If you choose ``none'', no +size will be specified in the invocation of @command{unpaper} and it +will select the size based on the processed data. + +This command is also available in the Scanner menu as@* +@clicksequence{Tools @click{} Scanner @click{} Scan Enhancement @click{} +Select page size after processing} +@end table + + +@node General Options +@section General Options +@cindex general options + +@defopt scanner-resolution +This option specifies the resolution in DPI used for image and +document scans as a property list with the keys @code{:image} and +@code{:doc}, respectively, and integers as values. The default is: +@lisp +(:image 600 :doc 300) +@end lisp +The available resolutions depend on your device. + +This option can be set per-session with the commands +@code{scanner-select-image-resolution} and +@code{scanner-select-document-resolution}. +@end defopt + +@defopt scanner-preview-resolution +This option specifies the resolution in DPI used in preview scans. The +default is 75. +@end defopt + +@defopt scanner-brightness +This option specifies the brightness setting for scans. The default is +20. This option assumes the @command{scanimage} switch +@option{--brightness} is available, which is device-specific. + +This option can be set with the command @code{scanner-set-brightness}. +@end defopt + +@defopt scanner-contrast +This option specifies the contrast setting for scans. The default is +50. This option assumes the @command{scanimage} switch +@option{--contrast} is available, which is device-specific. + +This option can be set with the command @code{scanner-set-contrast}. +@end defopt + +@defopt scanner-paper-sizes +This option holds paper sizes for document scans as a property list with +the name of the page format as the key (e.g. @code{:a4}) and a list of +width/height pairs in millimeters as value. The default is: +@lisp +(:a3 + (297 420) + :a4 + (210 297) + :a5 + (148 210) + :a6 + (105 148) + :tabloid + (279.4 431.8) + :legal + (215.9 355.6) + :letter + (215.9 279.4)) +@end lisp +@end defopt + +@anchor{scanner-doc-papersize} +@defopt scanner-doc-papersize +Use this option to select the paper size for the document scans. The +value must be one of the keys from @code{scanner-paper-sizes}, or the +special value @code{:whatever} that lets @command{scanimage} select the +paper size (usually the available scan area). The default is +@code{:a4}. + +This option can be set per-session with the command +@code{scanner-select-papersize}. +@end defopt + +@defopt scanner-image-size +This option specifies the size used in image scans as a list of width +and height values in millimeters. The default is +@lisp +(200 250) +@end lisp +for an image of 200@dmn{mm} width and 250@dmn{mm} height. If set to +nil, the size is determined by @command{scanimage} (usually the available scan +area.) + +This option can be set per-session with the command +@code{scanner-select-image-size}. +@end defopt + +@defopt scanner-scan-delay +This option specifies the delay in seconds to wait between pages in a +multi-page scan. Set this to something large enough so you can feed +the next sheet to your scanner before it starts scanning the next +page. The default is 3. +@end defopt + +@defopt scanner-reverse-pages +This option, when set to t, causes Scanner to reverse the order of the +scanned pages in a document scan. The default is nil. +@end defopt + +@node Configuring scanimage +@section Configuring scanimage +@cindex configuring scanimage + +Some of the options @command{scanimage} accepts (and Scanner uses) are +device-dependent. To find out which options your scanner hardware +offers, run @command{scanimage --help} with your scanner plugged in. +This incantation should print a list of general and device-dependent +options. + +@defopt scanner-scanimage-program +This option specifies the path of @command{scanimage}. The default is given by +@lisp +(executable-find "scanimage") +@end lisp +@end defopt + +@defopt scanner-scan-mode +This option specifies the scan modes for document and image scans. It +is a property list with the keys @code{:image} and @code{:doc}, for +images and documents, respectively, and strings naming the scan modes +as values. For example, +@lisp +(:image "Color" :doc "Gray") +@end lisp +sets ``Color'' mode for image scans and ``Gray'' mode for document +scans. The default is to use ``Color'' for both image and document +scans. + +The available scan modes depend on your device. Usually, ``Lineart'', +``Gray'', and ``Color'' are available. For images you probably want +``Color'', and for good OCR results in document scans, you should +choose either ``Gray'' or ``Color''. +@end defopt + +@defopt scanner-image-format +This option sets the default format used by @command{scanimage} for image and +document scans. It is a property list similar to +@code{scanner-scan-mode}. For example, the default +@lisp +(:image "jpeg" :doc "pnm") +@end lisp +configures Scanner to use the JPEG format for image scans and the PNM +format for document scans. While document scans will always use the +format specified with this option, you can override the format used in +image scans with the appropriate file extension, see +@ref{Scanning Documents and Images}. + +The supported formats are documented in the @command{scanimage} manual page. +For example, version 1.0.31 of @command{scanimage} supports PNM, TIFF, PNG and +JPEG. + +Note that the document scan format specified with this option is an +intermediate format, not the document format generated at the end of +the whole process. With the PNM format used in the example above, you +can still have a PDF output, see @ref{scanner-tesseract-outputs}. + +If you use @command{unpaper} for post-processing before OCR in document +scans (@pxref{Configuring unpaper}), the format will silently be forced +to PNM, as this is required by @command{unpaper}. +@end defopt + +@defopt scanner-device-name +The device name of the scanner as reported by @command{scanimage}. The default +is nil, which prompts Scanner to use @command{scanimage} for automatic +detection. The detected device will be stored in this variable and +used for all subsequent scans, until a new detection is forced either +by calling @code{scanner-select-device} with a prefix argument, or by +this device becoming unavailable. + +Usually you need not customize this option as auto-detection should +work just fine. +@end defopt + +@defopt scanner-scanimage-switches +You may find that additional switches to @command{scanimage} not covered by any +of the above user options are necessary. You can use +@code{scanner-scanimage-switches} for these. Specify the switches as a +list of switch/value pairs, such as: +@lisp +("--switch1" "value1" "-s" "2") +@end lisp +The default is nil. +@end defopt + + +@node Configuring unpaper +@section Configuring unpaper +@cindex configuring unpaper + +@defopt scanner-unpaper-program +This variable contains the path of the @command{unpaper} program. +@end defopt + +@defopt scanner-use-unpaper +If this option is non-@code{nil}, scan enhancement using +@command{unpaper} is activated. Although using @command{unpaper} is +highly recommended, its configuration is a bit elaborate and might be +confusing at first. The default is therefore @code{nil}. +@end defopt + +@defopt scanner-unpaper-page-layout +This option specifies the page layout of the scanned sheets. Allowed +values are ``single'', ``double'', and ``none'', setting +@command{unpaper} up for detection of the page extent. Note that +``double'' implies a landscape orientation. This option corresponds to +the @option{--layout} option of @command{unpaper}. See its +documentation for details on the implications of the values. The +default is ``none''. +@end defopt + +@defopt scanner-unpaper-input-pages +This option selects the number of pages per scanned sheet of input. +Allowed values are @code{1} and @code{2}. This variable corresponds to +the @option{--input-pages} option of @command{unpaper}. If set to two +input pages, @command{unpaper} will pairwise combine input sheets. The +default is @code{1}. +@end defopt + +@defopt scanner-unpaper-output-pages +This option selects the number of pages per sheet of processed output. +Allowed values are @code{1} and @code{2}. This variable corresponds to +the @option{--output-pages} option of @command{unpaper}. If set to two +output pages, @command{unpaper} will split up every page of processed +output into two pages. The default is @code{1}. +@end defopt + +@defopt scanner-unpaper-pre-rotation +This option specifies the rotation to be applied before further +processing. Allowed values are ``clockwise'', ``counter-clockwise'', +and ``none''. This variable corresponds to the @option{--pre-rotation} +option of @command{unpaper}. If you choose ``none'', no rotation is +specified in the invocation of @command{unpaper}. The default is +``none. +@end defopt + +@defopt scanner-unpaper-post-rotation +This option specifies the rotation to be applied after all the +processing. Allowed values are ``clockwise'', ``counter-clockwise'', +and ``none''. This variable corresponds to the @option{--post-rotation} +option of @command{unpaper}. If you choose ``none'', no rotation is +specified in the invocation of @command{unpaper}. The default is +``none. +@end defopt + +@defopt scanner-unpaper-pre-size +This option specifies the page size to assume before further processing. +The scanned input will be scaled to this size. Allowed values are +``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'', +``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'', +``legal-landscape'', ``none'', and direct width and height +specifications as in ``21cm,29.7cm''. This variable corresponds to the +@option{--size} option of @command{unpaper}. The default is ``a4''. +@end defopt + +@defopt scanner-unpaper-post-size +This option specifies the page size to assume after all the processing. +The processed output will be scaled to this size. Allowed values are +``a5'', ``a4'', ``a3'', ``a5-landscape'', ``a4-landscape'', +``a3-landscape'', ``letter'', ``legal'', ``letter-landscape'', +``legal-landscape'', ``none'', and direct width and height +specifications as in ``21cm,29.7cm''. This variable corresponds to the +@option{--post-size} option of @command{unpaper}. The default is ``a4''. +@end defopt + +@defopt scanner-unpaper-border +This option allows you to force a border of white pixels at the four +edges of a scanned sheet. Allowed is any list of four integers, for +example, @code{(10 10 10 10)} (the default). This is very useful to +remove black or gray scan artefacts at the edges of a sheet. Even if +this is not specified, @command{unpaper} will try to detect any such +artefacts and remove them. However, forcing a border usually leads to +better results. This variable corresponds to the @option{--border} +option of @command{unpaper}. +@end defopt + +@defopt scanner-unpaper-switches +Any additional parameters to @command{unpaper} can be specified using +this option. Allowed is any list comprising valid @command{unpaper} +options as strings. +@end defopt + +@node Configuring tesseract +@section Configuring tesseract +@cindex configuring tesseract + +@defopt scanner-tesseract-program +This option specifies the path of the @command{tesseract} program. +@end defopt + +@defopt scanner-tessdata-dir +This option specifies the @file{tessdata} directory. This directory is +supposed to contain the language data files for @command{tesseract}. +The default is @file{/usr/share/tessdata/}. +@end defopt + +@defopt scanner-tesseract-configdir +This option specifies the @command{tesseract} @file{configs} directory. +This directory is supposed to contain the language data files for +@command{tesseract}. The default is +@file{/usr/share/tessdata/configs/}. +@end defopt + +@defopt scanner-tesseract-languages +This option lists the languages passed to @command{tesseract} as a list +of strings. The default is: +@lisp +("eng") +@end lisp +It is possible to pass more than one language to @command{tesseract}, +which can be useful if you have a multi-language document. For +instance, +@lisp +("eng" "deu") +@end lisp +sets @command{tesseract} up for recognizing english and german language. +However, for single-language documents, the best results are usually +obtained when setting only one language. + +This option can be set per-session with the command +@code{scanner-select-languages}. +@end defopt + +@anchor{scanner-tesseract-outputs} +@defopt scanner-tesseract-outputs +This option lists the output formats to produce. The available output +formats are provided as configuration files in the +@file{/usr/share/tessdata/configs/} directory. The default +@lisp +("pdf" "txt") +@end lisp +causes @command{tesseract} to output both a PDF and a text file. + +This option can be set per-session with the command +@code{scanner-select-outputs}. +@end defopt + +@defopt scanner-tesseract-switches +You can use this option to specify any additional switches for +@command{tesseract} not covered by the above options. Use the same +format as for @code{scanner-scanimage-switches}. The default is nil. +@end defopt + +@node Improving Scan Quality +@chapter Improving Scan Quality +@cindex improving scan quality +@cindex scan quality, improving +@cindex quality, improving + +This chapter comprises recommendations for improving the scan or OCR +quality. If you know about any additional tips and tricks to improve +quality, please let the author know about them. + +Besides checking the following sections, you might also want to consult +the documentation for @command{tesseract}, +@url{https://tesseract-ocr.github.io/tessdoc/}, and @command{unpaper}, +@url{https://github.com/unpaper/unpaper/blob/main/doc/basic-concepts.md} +(basics) and +@url{https://github.com/unpaper/unpaper/blob/main/doc/image-processing.md} +(details). + + +@menu +* Improving General Scan Quality:: +* Improving OCR:: +@end menu + +@node Improving General Scan Quality +@section Improving General Scan Quality +@cindex improving general scan quality + +@table @asis +@item Image format +As a lossy format, JPEG is not a good basis for later OCR. Therefore, +use PNG, PNM, or TIFF for document scanning. If you also use +@command{unpaper}, the image format is forced to PNM, as required by +this tool. + +@item Scan area +Besides setting the size of the scan area, @command{scanimage} allows +you to specify offsets to the top and left edges. The device-specific +switches @option{-l} and @option{-t}, if available, allow you to specify +the top-left x and y positions, respectively. This can be used to get +rid of some blacked out parts in corners due to the mis-alignment of +scan area and scanned sheet. + +@item Resolution +For document scans, use at least 300 DPI to achieve acceptable OCR +results. A resolution above 600 DPI will not enhance OCR quality any +further and only leads to larger files. For most documents, 300 DPI +should be ok. + +@item Brightness and contrast +Good document quality (and especially good OCR results) require +sufficient contrast and a good reproduction of the document's background +color. If the defaults of your device are inadequate, use the +brightness and contrast settings of @command{scanimage} to provide +sensible values. See @ref{Configuring scanimage} and the options +@code{scanner-brightness} and @code{scanner-contrast} there. You may +want to try a low brightness setting (for example, 20) and a medium +contrast setting (for example, 50) as a start. + +Note that the underlying parameters to @command{scanimage} are +device-specific. If the two mentioned options are not supported by your +device, you may be able to use @code{scanner-scanimage-switches} to +supply the specific switches to @command{scanimage}. + +@item Dark areas and shadows +If your scan shows dark (black/gray) areas or shadows, for example in +the fold when scanning a book, use @command{unpaper} to remove them. If +it cannot remove these areas automatically, you can manually specify an +area to be wiped out using the @option{--wipe} switch of +@command{unpaper}. If you scan a page with the ``double'' layout and +want to remove the shadow of a book fold, use the @option{--middle-wipe} +switch. You can put these switches into the +@code{scanner-unpaper-switches} option. See also the @command{unpaper} +documentation. + +@item Page borders +If you use @command{unpaper}, it will try to remove dark areas around +the edges of the page. If this does not work automatically, use the +@code{scanner-unpaper-border} option to specify a border (in pixels) +around the edges of the page that is to be wiped, see @ref{Configuring +unpaper}. + +@end table + +@node Improving OCR +@section Improving OCR +@cindex improving ocr + +@table @asis +@item Tesseract version +Use version 4 or higher of @command{tesseract}. This version includes a +new OCR engine that delivers better results than the previous one. +Also, @command{tesseract} is multithreaded starting from version 4 and +is therefore faster on multi-core machines. + +@item Language setting +Tesseract allows you to use multiple languages. For single-language +documents, however, this doesn't seem to be optimal. It's best to +choose a single language when possible. + +@item Page deskewing +OCR is quite sensitive to any skew of a page. Use @command{unpaper} to +deskew the pages. See @ref{Configuring unpaper}. + +In some cases, @command{unpaper} may not be able to deskew a page +automatically. If so, have a look at the deskewing switches of +@command{unpaper}. Especially @option{--deskew-scan-step}, +@option{--deskew-scan-deviation}, and @option{--deskew-scan-range} can +be helpful. You can put those switches into +@code{scanner-unpaper-switches}. See the @command{unpaper} +documentation for details. + +@end table + + + + + + +@node News +@chapter News +@cindex news + +This chapter lists the changes made in new releases of Scanner. + +@menu +* Changes in Version 0.3:: +@end menu + +@node Changes in Version 0.3 +@section Changes in Version 0.3 +@cindex changes in version 0.3 + +@itemize +@item +@command{unpaper} has been added as a new backend; it allows scan +enhancement in document scans; see @ref{Configuring unpaper} for +options. There is a new ``Scan Enhancement'' sub-menu for this backend. +@item +A command for making a preview scan has been added. +@item +Menu items and user options for brightness and contrast have been added. +@item +A menu item for setting the scan delay has been added. +@item +Older @command{scanimage} versions (before 1.0.28) are now supported as well. +@item +A new page size @code{:whatever} allows @command{scanimage} to select +the page size based on the available scan area. +@item +Scanner now comes with a manual. +@end itemize + + + +@node Reporting Bugs +@chapter Reporting Bugs +@cindex reporting bugs + +Refer to @uref{https://www.gitlab.com/rstocker/scanner/} +mention *Scanner* log buffer + + +@node GNU Free Documentation License +@chapter GNU Free Documentation License +@c Get fdl.texi from https://www.gnu.org/licenses/fdl.html +@include fdl.texi + + +@node Index +@unnumbered Index + +@printindex cp + +@c combine indices + +@bye + +@c scanner.texi ends here