On Wed, Jan 22 2020, Daniel Kahn Gillmor <d...@fifthhorseman.net> wrote: > Hi Sean-- > > On Fri 2020-01-17 09:26:38 -0700, Sean Whitton wrote: >> I think the easiest thing to do would be for one of you to prepare a >> single patch, signed off, and for the other to write an e-mail signing >> it off. I'll then do a code review of the latest version of the script. > > The attached git-formatted patch is also present on the imap-dl-squashed > branch on https://salsa.debian.org/dkg/mailscripts. jrollins confirmed > that it was OK, which is why it bears both of our signoffs. > > Thanks for considering imap-dl for inclusion within mailscripts! > > --dkg > > From 9e5c1a893c17343102b042de23bdaa0f91b37d66 Mon Sep 17 00:00:00 2001 > From: Daniel Kahn Gillmor <d...@fifthhorseman.net> > Date: Sun, 15 Sep 2019 19:55:07 -0400 > Subject: [PATCH] Add imap-dl, a simple imap downloader > > getmail upstream appears to have no plans to convert to python3 in the > near future. > > Some of us use only a minimal subset of features of getmail, and it > would be nice to have something simpler, with the main complexity > offloaded to the modern python3 stdlib. > > This patch represents a squashed series of changes from both Jameson > Graef Rollins and Daniel Kahn Gillmor (dkg), though dkg is primarily > responsible for any remaining bugs. > > Signed-off-by: Jameson Graef Rollins <jroll...@finestructure.net> > Signed-off-by: Daniel Kahn Gillmor <d...@fifthhorseman.net>
I confirm that I truly do sign off on this code, and fully support it's inclusion in mailscripts. jamie. > --- > Makefile | 4 +- > debian/control | 2 + > debian/mailscripts.bash-completion | 1 + > debian/mailscripts.install | 1 + > debian/mailscripts.manpages | 1 + > imap-dl | 254 +++++++++++++++++++++++++++++ > imap-dl.1.pod | 88 ++++++++++ > 7 files changed, 350 insertions(+), 1 deletion(-) > create mode 100755 imap-dl > create mode 100644 imap-dl.1.pod > > diff --git a/Makefile b/Makefile > index af30616..ec3d851 100644 > --- a/Makefile > +++ b/Makefile > @@ -1,15 +1,17 @@ > MANPAGES=mdmv.1 mbox2maildir.1 \ > notmuch-slurp-debbug.1 notmuch-extract-patch.1 maildir-import-patch.1 \ > + imap-dl.1 \ > email-extract-openpgp-certs.1 \ > email-print-mime-structure.1 \ > notmuch-import-patch.1 > -COMPLETIONS=completions/bash/email-print-mime-structure > +COMPLETIONS=completions/bash/email-print-mime-structure > completions/bash/imap-dl > > all: $(MANPAGES) $(COMPLETIONS) > > check: > ./tests/email-print-mime-structure.sh > mypy --strict ./email-print-mime-structure > + mypy --strict ./imap-dl > > clean: > rm -f $(MANPAGES) > diff --git a/debian/control b/debian/control > index bc8268a..21afa45 100644 > --- a/debian/control > +++ b/debian/control > @@ -77,3 +77,5 @@ Description: collection of scripts for manipulating e-mail > on Debian > email-print-mime-structure -- tree view of a message's MIME structure > . > email-extract-openpgp-certs -- extract OpenPGP certificates from a message > + . > + imap-dl -- download messages from an IMAP mailbox to a maildir > diff --git a/debian/mailscripts.bash-completion > b/debian/mailscripts.bash-completion > index 435576f..657de01 100644 > --- a/debian/mailscripts.bash-completion > +++ b/debian/mailscripts.bash-completion > @@ -1 +1,2 @@ > completions/bash/email-print-mime-structure > +completions/bash/imap-dl > diff --git a/debian/mailscripts.install b/debian/mailscripts.install > index 2c060df..3739c49 100644 > --- a/debian/mailscripts.install > +++ b/debian/mailscripts.install > @@ -1,5 +1,6 @@ > email-extract-openpgp-certs /usr/bin > email-print-mime-structure /usr/bin > +imap-dl /usr/bin > maildir-import-patch /usr/bin > mbox2maildir /usr/bin > mdmv /usr/bin > diff --git a/debian/mailscripts.manpages b/debian/mailscripts.manpages > index 1de088f..a915617 100644 > --- a/debian/mailscripts.manpages > +++ b/debian/mailscripts.manpages > @@ -1,5 +1,6 @@ > email-extract-openpgp-certs.1 > email-print-mime-structure.1 > +imap-dl.1 > maildir-import-patch.1 > mbox2maildir.1 > mdmv.1 > diff --git a/imap-dl b/imap-dl > new file mode 100755 > index 0000000..f5d7a85 > --- /dev/null > +++ b/imap-dl > @@ -0,0 +1,254 @@ > +#!/usr/bin/python3 > +# PYTHON_ARGCOMPLETE_OK > +# -*- coding: utf-8 -*- > + > +# Copyright (C) 2019 Daniel Kahn Gillmor > +# > +# This program is free software: you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation, either version 3 of the License, or (at > +# your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, but > +# WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +# General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see <https://www.gnu.org/licenses/>. > + > +DESCRIPTION = '''A simple replacement for a minimalist use of getmail. > + > +In particular, if you use getmail to reach an IMAP server as though it > +were POP (retrieving from the server and optionally deleting), you can > +point this script to the getmail config and it should do the same > +thing. > + > +It tries to ensure that the configuration file is of the expected > +type, and will terminate raising an exception, and it should not lose > +messages. > + > +If there's any interest in supporting other use cases for getmail, > +patches are welcome. > + > +If you've never used getmail, you can make the simplest possible > +config file like so: > + > +---------- > +[retriever] > +server = mail.example.net > +username = foo > +password = sekr1t! > + > +[destination] > +path = /home/foo/Maildir > + > +[options] > +delete = True > +---------- > +''' > + > +import re > +import sys > +import ssl > +import time > +import imaplib > +import logging > +import mailbox > +import os.path > +import argparse > +import statistics > +import configparser > + > +from typing import Dict, List > + > +try: > + import argcomplete #type: ignore > +except ImportError: > + argcomplete = None > + > +_summary_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) > RFC822.SIZE (?P<size>[0-9]+)\)$') > +def break_fetch_summary(line:bytes) -> Dict[str,int]: > + '''b'1 (UID 160 RFC822.SIZE 1867)' -> {id: 1, uid: 160, size: 1867}''' > + match = _summary_splitter.match(line) > + if not match: > + raise Exception(f'malformed summary line {line!r}') > + ret:Dict[str,int] = {} > + i:str > + for i in ['id', 'uid', 'size']: > + ret[i] = int(match[i]) > + return ret > + > +_fetch_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) (FLAGS > \([\\A-Za-z ]*\) )?BODY\[\] \{(?P<size>[0-9]+)\}$') > +def break_fetch(line:bytes) -> Dict[str,int]: > + '''b'1 (UID 160 BODY[] {1867}' -> {id: 1, uid: 160, size: 1867}''' > + match = _fetch_splitter.match(line) > + if not match: > + raise Exception(f'malformed fetch line {line!r}') > + ret:Dict[str,int] = {} > + i:str > + for i in ['id', 'uid', 'size']: > + ret[i] = int(match[i]) > + return ret > + > +def pull_msgs(configfile:str, verbose:bool) -> None: > + conf = configparser.ConfigParser() > + conf.read_file(open(configfile, 'r')) > + oldloglevel = logging.getLogger().getEffectiveLevel() > + conf_verbose = conf.getint('options', 'verbose', fallback=1) > + if conf_verbose > 1: > + logging.getLogger().setLevel(logging.INFO) > + logging.info('pulling from config file %s', configfile) > + delete = conf.getboolean('options', 'delete', fallback=False) > + read_all = conf.getboolean('options', 'read_all', fallback=True) > + if not read_all: > + raise NotImplementedError('imap-dl only supports > options.read_all=True, got False') > + rtype = conf.get('retreiver', 'type', fallback='SimpleIMAPSSLRetriever') > + if rtype.lower() != 'simpleimapsslretriever': > + raise NotImplementedError('imap-dl only supports > retriever.type=SimpleIMAPSSLRetriever, got %s'%(rtype,)) > + # FIXME: handle `retriever.record_mailbox` > + dtype = conf.get('destination', 'type', fallback='Maildir') > + if dtype.lower() != 'maildir': > + raise NotImplementedError('imap-dl only supports > destination.type=Maildir, got %s'%(dtype,)) > + dst = conf.get('destination', 'path') > + dst = os.path.expanduser(dst) > + if os.path.exists(dst) and not os.path.isdir(dst): > + raise Exception('expected destination directory, but %s is not a > directory'%(dst,)) > + mdst = mailbox.Maildir(dst, create=True) > + ca_certs = conf.get('retriever', 'ca_certs', fallback=None) > + on_size_mismatch = conf.get('options', 'on_size_mismatch', > fallback='exception').lower() > + sizes_mismatched:List[int] = [] > + ctx = ssl.create_default_context(cafile=ca_certs) > + with imaplib.IMAP4_SSL(host=conf.get('retriever', 'server'), #type: > ignore > + port=int(conf.get('retriever', 'port', > fallback=993)), > + ssl_context=ctx) as imap: > + logging.info("Logging in as %s", conf.get('retriever', 'username')) > + resp = imap.login(conf.get('retriever', 'username'), > + conf.get('retriever', 'password')) > + if resp[0] != 'OK': > + raise Exception('login failed with %s as user %s on %s'%( > + resp, > + conf.get('retriever', 'username'), > + conf.get('retriever', 'server'))) > + if verbose: # only enable debugging after login to avoid leaking > credentials in the log > + imap.debug = 4 > + logging.info("capabilities reported: %s", ', > '.join(imap.capabilities)) > + resp = imap.select(readonly=not delete) > + if resp[0] != 'OK': > + raise Exception('selection failed: %s'%(resp,)) > + if len(resp[1]) != 1: > + raise Exception('expected exactly one EXISTS response from > select, got %s'%(resp[1])) > + n = int(resp[1][0]) > + if n == 0: > + logging.info('No messages to retrieve') > + logging.getLogger().setLevel(oldloglevel) > + return > + resp = imap.fetch('1:%d'%(n), '(UID RFC822.SIZE)') > + if resp[0] != 'OK': > + raise Exception('initial FETCH 1:%d not OK (%s)'%(n, resp)) > + pending = list(map(break_fetch_summary, resp[1])) > + sizes:Dict[int,int] = {} > + for m in pending: > + sizes[m['uid']] = m['size'] > + fetched:Dict[int,int] = {} > + uids = ','.join(map(lambda x: str(x['uid']), sorted(pending, > key=lambda x: x['uid']))) > + totalbytes = sum([x['size'] for x in pending]) > + logging.info('Fetching %d messages, expecting %d bytes of message > content', > + len(pending), totalbytes) > + # FIXME: sort by size? > + # FIXME: fetch in batches or singly instead of all-at-once? > + # FIXME: rolling deletion? > + # FIXME: asynchronous work? > + before = time.perf_counter() > + resp = imap.uid('FETCH', uids, '(UID BODY.PEEK[])') > + after = time.perf_counter() > + if resp[0] != 'OK': > + raise Exception('UID fetch failed %s'%(resp[0])) > + for f in resp[1]: > + # these objects are weirdly structured. i don't know why > + # these trailing close-parens show up. so this is very > + # ad-hoc and nonsense > + if isinstance(f, bytes): > + if f != b')': > + raise Exception('got bytes object of length %d but > expected simple closeparen'%(len(f),)) > + elif isinstance(f, tuple): > + if len(f) != 2: > + raise Exception('expected 2-part tuple, got > %d-part'%(len(f),)) > + m = break_fetch(f[0]) > + if m['size'] != len(f[1]): > + raise Exception('expected %d octets, got %d'%( > + m['size'], len(f[1]))) > + if m['size'] != sizes[m['uid']]: > + if on_size_mismatch == 'warn': > + if len(sizes_mismatched) == 0: > + logging.warning('size mismatch: summary said %d > octets, fetch sent %d', > + sizes[m['uid']], m['size']) > + elif len(sizes_mismatched) == 1: > + logging.warning('size mismatch: (mismatches > after the first suppressed until summary)') > + sizes_mismatched.append(sizes[m['uid']] - m['size']) > + elif on_size_mismatch == 'exception': > + raise Exception('size mismatch: summary said %d > octets, fetch sent %d\n(set options.on_size_mismatch to none or warn to avoid > hard failure)', > + sizes[m['uid']], m['size']) > + elif on_size_mismatch != 'none': > + raise Exception('size_mismatch: > options.on_size_mismatch should be none, warn, or exception (found "%s")', > on_size_mismatch) > + fname = mdst.add(f[1].replace(b'\r\n', b'\n')) > + logging.info('stored message %d/%d (uid %d, %d bytes) in %s', > + len(fetched) + 1, len(pending), m['uid'], > m['size'], fname) > + del sizes[m['uid']] > + fetched[m['uid']] = m['size'] > + if sizes: > + logging.warning('unhandled UIDs: %s', sizes) > + logging.info('%d bytes of %d messages fetched in %g seconds (~%g > KB/s)', > + sum(fetched.values()), len(fetched), after - before, > + sum(fetched.values())/((after - before)*1024)) > + if on_size_mismatch == 'warn' and len(sizes_mismatched) > 1: > + logging.warning('%d size mismatches out of %d messages > (mismatches in bytes: mean %f, stddev %f)', > + len(sizes_mismatched), len(fetched), > + statistics.mean(sizes_mismatched), > + statistics.stdev(sizes_mismatched)) > + if delete: > + logging.info('trying to delete %d messages from IMAP store', > len(fetched)) > + resp = imap.uid('STORE', ','.join(map(str, fetched.keys())), > '+FLAGS', r'(\Deleted)') > + if resp[0] != 'OK': > + raise Exception('failed to set \\Deleted flag: %s'%(resp)) > + resp = imap.expunge() > + if resp[0] != 'OK': > + raise Exception('failed to expunge! %s'%(resp)) > + else: > + logging.info('not deleting any messages, since options.delete is > not set') > + logging.getLogger().setLevel(oldloglevel) > + > +if __name__ == '__main__': > + parser = argparse.ArgumentParser( > + description=DESCRIPTION, > + formatter_class=argparse.RawDescriptionHelpFormatter, > + ) > + parser.add_argument( > + 'config', nargs='+', metavar='CONFIG', > + help="configuration file") > + parser.add_argument( > + '-v', '--verbose', action='store_true', > + help="verbose log output") > + > + if argcomplete: > + argcomplete.autocomplete(parser) > + elif '_ARGCOMPLETE' in os.environ: > + logging.error('Argument completion requested but the "argcomplete" ' > + 'module is not installed. ' > + 'Maybe you want to "apt install python3-argcomplete"') > + sys.exit(1) > + > + args = parser.parse_args() > + > + if args.verbose: > + logging.getLogger().setLevel(logging.INFO) > + > + errs = {} > + for confname in args.config: > + try: > + pull_msgs(confname, args.verbose) > + except imaplib.IMAP4.error as e: > + logging.error('IMAP failure for config file %s: %s', confname, e) > + errs[confname] = e > + if errs: > + exit(1) > diff --git a/imap-dl.1.pod b/imap-dl.1.pod > new file mode 100644 > index 0000000..1407d05 > --- /dev/null > +++ b/imap-dl.1.pod > @@ -0,0 +1,88 @@ > +=encoding utf8 > + > +=head1 NAME > + > +imap-dl -- a simple replacement for a minimalist user of getmail > + > +=head1 SYNOPSIS > + > +B<imap-dl> [B<-v>|B<--verbose>] B<configfile>... > + > +=head1 DESCRIPTION > + > +If you use getmail to reach an IMAP server as though it were POP > +(retrieving from the server, storing it in a maildir and optionally > +deleting), you can point this script to the getmail config and it > +should do the same thing. > + > +It tries to ensure that the configuration file is of the expected > +type, and will terminate raising an exception, and it should not lose > +messages. > + > +If there's any interest in supporting other use cases for getmail, > +patches are welcome. > + > +=head1 OPTIONS > + > +B<-v> or B<--verbose> causes B<imap-dl> to print more details > +about what it is doing. > + > +In addition to parts of the standard B<getmail> configuration, > +B<imap-dl> supports the following keywords in the config file: > + > +B<options.on_size_mismatch> can be set to B<exception>, B<none>, or > +B<warn>. This governs what to do when the remote IMAP server claims a > +different size in the message summary list than the actual message > +retrieval (default: B<exception>). > + > +=head1 EXAMPLE CONFIG > + > +If you've never used getmail, you can make the simplest possible > +config file like so: > + > +=over 4 > + > + [retriever] > + server = mail.example.net > + username = foo > + password = sekr1t! > + > + [destination] > + path = /home/foo/Maildir > + > + [options] > + delete = True > + > +=back > + > +=head1 LIMITATIONS > + > +B<imap-dl> is currently deliberately minimal. It is designed to be > +used by someone who treats their IMAP mailbox like a POP server. > + > +It works with IMAP-over-TLS only, and it just fetches all messages > +from the default IMAP folder. It does not support all the various > +features of getmail. > + > +B<imap-dl> is deliberately implemented in a modern version of python3, > +and tries to just use the standard library. It will not be backported > +to python2. > + > +B<imap-dl> uses imaplib, which means that it does synchronous calls to > +the imap server. A more clever implementation would use asynchronous > +python to avoid latency/roundtrips. > + > +B<imap-dl> does not know how to wait and listen for new mail using > +IMAP IDLE. This would be a nice additional feature. > + > +B<imap-dl> does not yet know how to deliver to an MDA (or to > +B<notmuch-insert>). This would be a nice thing to be able to do. > + > +=head1 SEE ALSO > + > +https://tools.ietf.org/html/rfc3501, http://pyropus.ca/software/getmail/ > + > +=head1 AUTHOR > + > +B<imap-dl> and this manpage were written by Daniel Kahn Gillmor, > +inspired by some functionality from the getmail project. > -- > 2.24.1
signature.asc
Description: PGP signature