On Wed, Jan 22 2020, Daniel Kahn Gillmor <d...@fifthhorseman.net> wrote:
> Hi Sean--
>
> On Fri 2020-01-17 09:26:38 -0700, Sean Whitton wrote:
>> I think the easiest thing to do would be for one of you to prepare a
>> single patch, signed off, and for the other to write an e-mail signing
>> it off.  I'll then do a code review of the latest version of the script.
>
> The attached git-formatted patch is also present on the imap-dl-squashed
> branch on https://salsa.debian.org/dkg/mailscripts.  jrollins confirmed
> that it was OK, which is why it bears both of our signoffs.
>
> Thanks for considering imap-dl for inclusion within mailscripts!
>
>      --dkg
>
> From 9e5c1a893c17343102b042de23bdaa0f91b37d66 Mon Sep 17 00:00:00 2001
> From: Daniel Kahn Gillmor <d...@fifthhorseman.net>
> Date: Sun, 15 Sep 2019 19:55:07 -0400
> Subject: [PATCH] Add imap-dl, a simple imap downloader
>
> getmail upstream appears to have no plans to convert to python3 in the
> near future.
>
> Some of us use only a minimal subset of features of getmail, and it
> would be nice to have something simpler, with the main complexity
> offloaded to the modern python3 stdlib.
>
> This patch represents a squashed series of changes from both Jameson
> Graef Rollins and Daniel Kahn Gillmor (dkg), though dkg is primarily
> responsible for any remaining bugs.
>
> Signed-off-by: Jameson Graef Rollins <jroll...@finestructure.net>
> Signed-off-by: Daniel Kahn Gillmor <d...@fifthhorseman.net>

I confirm that I truly do sign off on this code, and fully support it's
inclusion in mailscripts.

jamie.


> ---
>  Makefile                           |   4 +-
>  debian/control                     |   2 +
>  debian/mailscripts.bash-completion |   1 +
>  debian/mailscripts.install         |   1 +
>  debian/mailscripts.manpages        |   1 +
>  imap-dl                            | 254 +++++++++++++++++++++++++++++
>  imap-dl.1.pod                      |  88 ++++++++++
>  7 files changed, 350 insertions(+), 1 deletion(-)
>  create mode 100755 imap-dl
>  create mode 100644 imap-dl.1.pod
>
> diff --git a/Makefile b/Makefile
> index af30616..ec3d851 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1,15 +1,17 @@
>  MANPAGES=mdmv.1 mbox2maildir.1 \
>       notmuch-slurp-debbug.1 notmuch-extract-patch.1 maildir-import-patch.1 \
> +     imap-dl.1 \
>       email-extract-openpgp-certs.1 \
>       email-print-mime-structure.1 \
>       notmuch-import-patch.1
> -COMPLETIONS=completions/bash/email-print-mime-structure
> +COMPLETIONS=completions/bash/email-print-mime-structure 
> completions/bash/imap-dl
>  
>  all: $(MANPAGES) $(COMPLETIONS)
>  
>  check:
>       ./tests/email-print-mime-structure.sh
>       mypy --strict ./email-print-mime-structure
> +     mypy --strict ./imap-dl
>  
>  clean:
>       rm -f $(MANPAGES)
> diff --git a/debian/control b/debian/control
> index bc8268a..21afa45 100644
> --- a/debian/control
> +++ b/debian/control
> @@ -77,3 +77,5 @@ Description: collection of scripts for manipulating e-mail 
> on Debian
>   email-print-mime-structure -- tree view of a message's MIME structure
>   .
>   email-extract-openpgp-certs -- extract OpenPGP certificates from a message
> + .
> + imap-dl -- download messages from an IMAP mailbox to a maildir
> diff --git a/debian/mailscripts.bash-completion 
> b/debian/mailscripts.bash-completion
> index 435576f..657de01 100644
> --- a/debian/mailscripts.bash-completion
> +++ b/debian/mailscripts.bash-completion
> @@ -1 +1,2 @@
>  completions/bash/email-print-mime-structure
> +completions/bash/imap-dl
> diff --git a/debian/mailscripts.install b/debian/mailscripts.install
> index 2c060df..3739c49 100644
> --- a/debian/mailscripts.install
> +++ b/debian/mailscripts.install
> @@ -1,5 +1,6 @@
>  email-extract-openpgp-certs /usr/bin
>  email-print-mime-structure /usr/bin
> +imap-dl /usr/bin
>  maildir-import-patch /usr/bin
>  mbox2maildir /usr/bin
>  mdmv /usr/bin
> diff --git a/debian/mailscripts.manpages b/debian/mailscripts.manpages
> index 1de088f..a915617 100644
> --- a/debian/mailscripts.manpages
> +++ b/debian/mailscripts.manpages
> @@ -1,5 +1,6 @@
>  email-extract-openpgp-certs.1
>  email-print-mime-structure.1
> +imap-dl.1
>  maildir-import-patch.1
>  mbox2maildir.1
>  mdmv.1
> diff --git a/imap-dl b/imap-dl
> new file mode 100755
> index 0000000..f5d7a85
> --- /dev/null
> +++ b/imap-dl
> @@ -0,0 +1,254 @@
> +#!/usr/bin/python3
> +# PYTHON_ARGCOMPLETE_OK
> +# -*- coding: utf-8 -*-
> +
> +# Copyright (C) 2019 Daniel Kahn Gillmor
> +#
> +# This program is free software: you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation, either version 3 of the License, or (at
> +# your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <https://www.gnu.org/licenses/>.
> +
> +DESCRIPTION = '''A simple replacement for a minimalist use of getmail.
> +
> +In particular, if you use getmail to reach an IMAP server as though it
> +were POP (retrieving from the server and optionally deleting), you can
> +point this script to the getmail config and it should do the same
> +thing.
> +
> +It tries to ensure that the configuration file is of the expected
> +type, and will terminate raising an exception, and it should not lose
> +messages.
> +
> +If there's any interest in supporting other use cases for getmail,
> +patches are welcome.
> +
> +If you've never used getmail, you can make the simplest possible
> +config file like so:
> +
> +----------
> +[retriever]
> +server = mail.example.net
> +username = foo
> +password = sekr1t!
> +
> +[destination]
> +path = /home/foo/Maildir
> +
> +[options]
> +delete = True
> +----------
> +'''
> +
> +import re
> +import sys
> +import ssl
> +import time
> +import imaplib
> +import logging
> +import mailbox
> +import os.path
> +import argparse
> +import statistics
> +import configparser
> +
> +from typing import Dict, List
> +
> +try:
> +    import argcomplete #type: ignore
> +except ImportError:
> +    argcomplete = None
> +
> +_summary_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) 
> RFC822.SIZE (?P<size>[0-9]+)\)$')
> +def break_fetch_summary(line:bytes) -> Dict[str,int]:
> +    '''b'1 (UID 160 RFC822.SIZE 1867)' -> {id: 1, uid: 160, size: 1867}'''
> +    match = _summary_splitter.match(line)
> +    if not match:
> +        raise Exception(f'malformed summary line {line!r}')
> +    ret:Dict[str,int] = {}
> +    i:str
> +    for i in ['id', 'uid', 'size']:
> +        ret[i] = int(match[i])
> +    return ret
> +
> +_fetch_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) (FLAGS 
> \([\\A-Za-z ]*\) )?BODY\[\] \{(?P<size>[0-9]+)\}$')
> +def break_fetch(line:bytes) -> Dict[str,int]:
> +    '''b'1 (UID 160 BODY[] {1867}' -> {id: 1, uid: 160, size: 1867}'''
> +    match = _fetch_splitter.match(line)
> +    if not match:
> +        raise Exception(f'malformed fetch line {line!r}')
> +    ret:Dict[str,int] = {}
> +    i:str
> +    for i in ['id', 'uid', 'size']:
> +        ret[i] = int(match[i])
> +    return ret
> +
> +def pull_msgs(configfile:str, verbose:bool) -> None:
> +    conf = configparser.ConfigParser()
> +    conf.read_file(open(configfile, 'r'))
> +    oldloglevel = logging.getLogger().getEffectiveLevel()
> +    conf_verbose = conf.getint('options', 'verbose', fallback=1)
> +    if conf_verbose > 1:
> +        logging.getLogger().setLevel(logging.INFO)
> +    logging.info('pulling from config file %s', configfile)
> +    delete = conf.getboolean('options', 'delete', fallback=False)
> +    read_all = conf.getboolean('options', 'read_all', fallback=True)
> +    if not read_all:
> +        raise NotImplementedError('imap-dl only supports 
> options.read_all=True, got False')
> +    rtype = conf.get('retreiver', 'type', fallback='SimpleIMAPSSLRetriever')
> +    if rtype.lower() != 'simpleimapsslretriever':
> +        raise NotImplementedError('imap-dl only supports 
> retriever.type=SimpleIMAPSSLRetriever, got %s'%(rtype,))
> +    # FIXME: handle `retriever.record_mailbox`
> +    dtype = conf.get('destination', 'type', fallback='Maildir')
> +    if dtype.lower() != 'maildir':
> +        raise NotImplementedError('imap-dl only supports 
> destination.type=Maildir, got %s'%(dtype,))
> +    dst = conf.get('destination', 'path')
> +    dst = os.path.expanduser(dst)
> +    if os.path.exists(dst) and not os.path.isdir(dst):
> +        raise Exception('expected destination directory, but %s is not a 
> directory'%(dst,))
> +    mdst = mailbox.Maildir(dst, create=True)
> +    ca_certs = conf.get('retriever', 'ca_certs', fallback=None)
> +    on_size_mismatch = conf.get('options', 'on_size_mismatch', 
> fallback='exception').lower()
> +    sizes_mismatched:List[int] = []
> +    ctx = ssl.create_default_context(cafile=ca_certs)
> +    with imaplib.IMAP4_SSL(host=conf.get('retriever', 'server'), #type: 
> ignore
> +                           port=int(conf.get('retriever', 'port', 
> fallback=993)),
> +                           ssl_context=ctx) as imap:
> +        logging.info("Logging in as %s", conf.get('retriever', 'username'))
> +        resp = imap.login(conf.get('retriever', 'username'),
> +                          conf.get('retriever', 'password'))
> +        if resp[0] != 'OK':
> +            raise Exception('login failed with %s as user %s on %s'%(
> +                resp,
> +                conf.get('retriever', 'username'),
> +                conf.get('retriever', 'server')))
> +        if verbose: # only enable debugging after login to avoid leaking 
> credentials in the log
> +            imap.debug = 4
> +            logging.info("capabilities reported: %s", ', 
> '.join(imap.capabilities))
> +        resp = imap.select(readonly=not delete)
> +        if resp[0] != 'OK':
> +            raise Exception('selection failed: %s'%(resp,))
> +        if len(resp[1]) != 1:
> +            raise Exception('expected exactly one EXISTS response from 
> select, got %s'%(resp[1]))
> +        n = int(resp[1][0])
> +        if n == 0:
> +            logging.info('No messages to retrieve')
> +            logging.getLogger().setLevel(oldloglevel)
> +            return
> +        resp = imap.fetch('1:%d'%(n), '(UID RFC822.SIZE)')
> +        if resp[0] != 'OK':
> +            raise Exception('initial FETCH 1:%d not OK (%s)'%(n, resp))
> +        pending = list(map(break_fetch_summary, resp[1]))
> +        sizes:Dict[int,int] = {}
> +        for m in pending:
> +            sizes[m['uid']] = m['size']
> +        fetched:Dict[int,int] = {}
> +        uids = ','.join(map(lambda x: str(x['uid']), sorted(pending, 
> key=lambda x: x['uid'])))
> +        totalbytes = sum([x['size'] for x in pending])
> +        logging.info('Fetching %d messages, expecting %d bytes of message 
> content',
> +                     len(pending), totalbytes)
> +        # FIXME: sort by size?
> +        # FIXME: fetch in batches or singly instead of all-at-once?
> +        # FIXME: rolling deletion?
> +        # FIXME: asynchronous work?
> +        before = time.perf_counter()
> +        resp = imap.uid('FETCH', uids, '(UID BODY.PEEK[])')
> +        after = time.perf_counter()
> +        if resp[0] != 'OK':
> +            raise Exception('UID fetch failed %s'%(resp[0]))
> +        for f in resp[1]:
> +            # these objects are weirdly structured. i don't know why
> +            # these trailing close-parens show up.  so this is very
> +            # ad-hoc and nonsense
> +            if isinstance(f, bytes):
> +                if f != b')':
> +                    raise Exception('got bytes object of length %d but 
> expected simple closeparen'%(len(f),))
> +            elif isinstance(f, tuple):
> +                if len(f) != 2:
> +                    raise Exception('expected 2-part tuple, got 
> %d-part'%(len(f),))
> +                m = break_fetch(f[0])
> +                if m['size'] != len(f[1]):
> +                    raise Exception('expected %d octets, got %d'%(
> +                        m['size'], len(f[1])))
> +                if m['size'] != sizes[m['uid']]:
> +                    if on_size_mismatch == 'warn':
> +                        if len(sizes_mismatched) == 0:
> +                            logging.warning('size mismatch: summary said %d 
> octets, fetch sent %d',
> +                                            sizes[m['uid']], m['size'])
> +                        elif len(sizes_mismatched) == 1:
> +                            logging.warning('size mismatch: (mismatches 
> after the first suppressed until summary)')
> +                        sizes_mismatched.append(sizes[m['uid']] - m['size'])
> +                    elif on_size_mismatch == 'exception':
> +                        raise Exception('size mismatch: summary said %d 
> octets, fetch sent %d\n(set options.on_size_mismatch to none or warn to avoid 
> hard failure)',
> +                                        sizes[m['uid']], m['size'])
> +                    elif on_size_mismatch != 'none':
> +                        raise Exception('size_mismatch: 
> options.on_size_mismatch should be none, warn, or exception (found "%s")', 
> on_size_mismatch)
> +                fname = mdst.add(f[1].replace(b'\r\n', b'\n'))
> +                logging.info('stored message %d/%d (uid %d, %d bytes) in %s',
> +                             len(fetched) + 1, len(pending), m['uid'], 
> m['size'], fname)
> +                del sizes[m['uid']]
> +                fetched[m['uid']] = m['size']
> +        if sizes:
> +            logging.warning('unhandled UIDs: %s', sizes)
> +        logging.info('%d bytes of %d messages fetched in %g seconds (~%g 
> KB/s)',
> +                     sum(fetched.values()), len(fetched), after - before,
> +                     sum(fetched.values())/((after - before)*1024))
> +        if on_size_mismatch == 'warn' and len(sizes_mismatched) > 1:
> +            logging.warning('%d size mismatches out of %d messages 
> (mismatches in bytes: mean %f, stddev %f)',
> +                            len(sizes_mismatched), len(fetched),
> +                            statistics.mean(sizes_mismatched),
> +                            statistics.stdev(sizes_mismatched))
> +        if delete:
> +            logging.info('trying to delete %d messages from IMAP store', 
> len(fetched))
> +            resp = imap.uid('STORE', ','.join(map(str, fetched.keys())), 
> '+FLAGS', r'(\Deleted)')
> +            if resp[0] != 'OK':
> +                raise Exception('failed to set \\Deleted flag: %s'%(resp))
> +            resp = imap.expunge()
> +            if resp[0] != 'OK':
> +                raise Exception('failed to expunge! %s'%(resp))
> +        else:
> +            logging.info('not deleting any messages, since options.delete is 
> not set')
> +        logging.getLogger().setLevel(oldloglevel)
> +
> +if __name__ == '__main__':
> +    parser = argparse.ArgumentParser(
> +        description=DESCRIPTION,
> +        formatter_class=argparse.RawDescriptionHelpFormatter,
> +    )
> +    parser.add_argument(
> +        'config', nargs='+', metavar='CONFIG',
> +        help="configuration file")
> +    parser.add_argument(
> +        '-v', '--verbose', action='store_true',
> +        help="verbose log output")
> +
> +    if argcomplete:
> +        argcomplete.autocomplete(parser)
> +    elif '_ARGCOMPLETE' in os.environ:
> +        logging.error('Argument completion requested but the "argcomplete" '
> +                      'module is not installed. '
> +                      'Maybe you want to "apt install python3-argcomplete"')
> +        sys.exit(1)
> +
> +    args = parser.parse_args()
> +
> +    if args.verbose:
> +        logging.getLogger().setLevel(logging.INFO)
> +
> +    errs = {}
> +    for confname in args.config:
> +        try:
> +            pull_msgs(confname, args.verbose)
> +        except imaplib.IMAP4.error as e:
> +            logging.error('IMAP failure for config file %s: %s', confname, e)
> +            errs[confname] = e
> +    if errs:
> +        exit(1)
> diff --git a/imap-dl.1.pod b/imap-dl.1.pod
> new file mode 100644
> index 0000000..1407d05
> --- /dev/null
> +++ b/imap-dl.1.pod
> @@ -0,0 +1,88 @@
> +=encoding utf8
> +
> +=head1 NAME
> +
> +imap-dl -- a simple replacement for a minimalist user of getmail
> +
> +=head1 SYNOPSIS
> +
> +B<imap-dl> [B<-v>|B<--verbose>] B<configfile>...
> +
> +=head1 DESCRIPTION
> +
> +If you use getmail to reach an IMAP server as though it were POP
> +(retrieving from the server, storing it in a maildir and optionally
> +deleting), you can point this script to the getmail config and it
> +should do the same thing.
> +
> +It tries to ensure that the configuration file is of the expected
> +type, and will terminate raising an exception, and it should not lose
> +messages.
> +
> +If there's any interest in supporting other use cases for getmail,
> +patches are welcome.
> +
> +=head1 OPTIONS
> +
> +B<-v> or B<--verbose> causes B<imap-dl> to print more details
> +about what it is doing.
> +
> +In addition to parts of the standard B<getmail> configuration,
> +B<imap-dl> supports the following keywords in the config file:
> +
> +B<options.on_size_mismatch> can be set to B<exception>, B<none>, or
> +B<warn>.  This governs what to do when the remote IMAP server claims a
> +different size in the message summary list than the actual message
> +retrieval (default: B<exception>).
> +
> +=head1 EXAMPLE CONFIG
> +
> +If you've never used getmail, you can make the simplest possible
> +config file like so:
> +
> +=over 4
> +
> +    [retriever]
> +    server = mail.example.net
> +    username = foo
> +    password = sekr1t!
> +
> +    [destination]
> +    path = /home/foo/Maildir
> +
> +    [options]
> +    delete = True
> +
> +=back
> +
> +=head1 LIMITATIONS
> +
> +B<imap-dl> is currently deliberately minimal.  It is designed to be
> +used by someone who treats their IMAP mailbox like a POP server.
> +
> +It works with IMAP-over-TLS only, and it just fetches all messages
> +from the default IMAP folder.  It does not support all the various
> +features of getmail.
> +
> +B<imap-dl> is deliberately implemented in a modern version of python3,
> +and tries to just use the standard library.  It will not be backported
> +to python2.
> +
> +B<imap-dl> uses imaplib, which means that it does synchronous calls to
> +the imap server.  A more clever implementation would use asynchronous
> +python to avoid latency/roundtrips.
> +
> +B<imap-dl> does not know how to wait and listen for new mail using
> +IMAP IDLE.  This would be a nice additional feature.
> +
> +B<imap-dl> does not yet know how to deliver to an MDA (or to
> +B<notmuch-insert>).  This would be a nice thing to be able to do.
> +
> +=head1 SEE ALSO
> +
> +https://tools.ietf.org/html/rfc3501, http://pyropus.ca/software/getmail/
> +
> +=head1 AUTHOR
> +
> +B<imap-dl> and this manpage were written by Daniel Kahn Gillmor,
> +inspired by some functionality from the getmail project.
> -- 
> 2.24.1

Attachment: signature.asc
Description: PGP signature

Reply via email to