branch: externals/doc-toc commit 4c98932c65187b285d29fc1311466c9765c88be3 Author: Daniel Nicolai <dalanico...@gmail.com> Commit: Daniel Nicolai <dalanico...@gmail.com>
create first version README --- README.org | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) diff --git a/README.org b/README.org new file mode 100644 index 0000000000..05470ad73e --- /dev/null +++ b/README.org @@ -0,0 +1,86 @@ +* toc-mode +Create, cleanup, add and manage Table Of Contents (TOC) of pdf and djvu documents with Emacs + +* Introduction +TOC-mode is a package for creating, cleaning, adding and managing the +Table Of Contents (TOC) of pdf and djvu documents. + +* Important +Currently only files with a text layer are supported. A feature to extract the +TOC via OCR will probably be added soon. Until then, the python +[[https://pypi.org/project/document-contents-extractor/][documents-contents-extractor]] package is recommended for extraction of TOC via +OCR. + +* Requirements +Currently the package requires the ~pdftotext~ (part of poppler-utils), ~pdfoutline~ +(part of [[https://launchpad.net/ubuntu/bionic/+package/fntsample][fntsample]]) and ~djvused~ (part of [[http://djvu.sourceforge.net/][http://djvu.sourceforge.net/]]) command +line utilities to be available. + +* Usage +Extraction and adding contents to the document is done in 4 steps: +1. extraction +2. cleanup +3. adjust/correct pagenumbers +4. add TOC to document + +** 1. Extraction +Open some pdf or djvu file in Emacs (pdf-tools and djvu package recommended). +Find the pagenumbers for the TOC. Then type =M-x toc-extract-pages= and answer the +subsequent prompts by entering the pagenumbers for the first and the last page +each followed by =RET=. + +A buffer with the, somewhat cleaned up, extracted text will open in TOC-cleanup +mode. + +** 2. TOC-Cleanup +In this mode you can further cleanup the contents to create a list where +each line has the structure: + +TITLE (SOME) PAGENUMBER + +There can be any number of spaces between TITLE and PAGE. The correct +pagenumbers can be edited in the next step. A document outline supports +different levels and levels are automatically assigned in order of increasing +number of preceding spaces, i.e. the lines with the least amount of preceding +spaces are assigned level 0 etc., and lines with equal number of spaces get +assigned the same levels. +#+BEGIN_SRC +Contents 1 +Chapter 1 2 + Section 1 3 + Section 1.1 4 +Chapter 2 5 +#+END_SRC +There are some handy functions to assist in the cleanup. =C-c C-j= jumps +automatically to the next line not ending with a number and joins it with the +next line. If the indentation structure of the different lines does not +correspond with the levels, then the levels can be set automatically from the +number of seperators in the indices with =M-x toc-cleanup-set-level-by-index=. The +default seperator is a ~.~ but a different seperator can be entered by preceding +the function invocation with the universal argument (=C-u=). Some documents +contain a structure like +#+BEGIN_SRC +1 Chapter 1 1 +Section 1 2 +#+END_SRC +Here the indentation can be set with ~M-x replace-regexp~ ~^[^0-9]~ -> ~\&~ (where +there is a space character before the ~\&~ represents a space). + +Type =C-c C-c= when finished + +** 3 TOC-tabular (adjust pagenumbers) +This mode provides the functionality for easy adjustment of pagenmumbers. The +buffer can be navigated with the arrow =up/down= keys. The =left= and =right= +arrow keys will shift =down/up= all the page numbers from the current line and +below (combine with =SHIFT= for setting individual pagenumbers). + +Type =C-c C-c= when done + +** TOC-mode (add outline to document) +The text of this buffer should have the right structure for adding the contents +to (for pdf's a copy of) the original document. Final adjusments can be done but +should not be necessary. Just type =C-c C-c= for adding the contents to the +document. + +For pdf the a file =pdfwithtoc.pdf= is created in the same folder as the original +pdf. For djvu, the TOC is simply added to the original djvu file.