Hello Daniel, On Wed 22 Jan 2020 at 04:52PM -05, Daniel Kahn Gillmor wrote:
> The attached git-formatted patch is also present on the imap-dl-squashed > branch on https://salsa.debian.org/dkg/mailscripts. jrollins confirmed > that it was OK, which is why it bears both of our signoffs. Thanks for sorting that out. Here are my comments on the current version of imap-dl. Looking forward to including it in mailscripts. > diff --git a/imap-dl b/imap-dl > new file mode 100755 > index 0000000..f5d7a85 > --- /dev/null > +++ b/imap-dl > + > +_summary_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) > RFC822.SIZE (?P<size>[0-9]+)\)$') > +def break_fetch_summary(line:bytes) -> Dict[str,int]: > + '''b'1 (UID 160 RFC822.SIZE 1867)' -> {id: 1, uid: 160, size: 1867}''' > + match = _summary_splitter.match(line) > + if not match: > + raise Exception(f'malformed summary line {line!r}') > + ret:Dict[str,int] = {} > + i:str > + for i in ['id', 'uid', 'size']: > + ret[i] = int(match[i]) > + return ret > + > +_fetch_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) (FLAGS > \([\\A-Za-z ]*\) )?BODY\[\] \{(?P<size>[0-9]+)\}$') > +def break_fetch(line:bytes) -> Dict[str,int]: > + '''b'1 (UID 160 BODY[] {1867}' -> {id: 1, uid: 160, size: 1867}''' > + match = _fetch_splitter.match(line) > + if not match: > + raise Exception(f'malformed fetch line {line!r}') > + ret:Dict[str,int] = {} > + i:str > + for i in ['id', 'uid', 'size']: > + ret[i] = int(match[i]) > + return ret These two functions are almost identical; please refactor. > + if verbose: # only enable debugging after login to avoid leaking > credentials in the log > + imap.debug = 4 > + logging.info("capabilities reported: %s", ', > '.join(imap.capabilities)) I am not familiar with idiomatic logging in Python, but shouldn't the logging.info call be outside of the 'if' block, like all the others? And shouldn't this be `if verbose or conf_verbose`? > + if n == 0: > + logging.info('No messages to retrieve') > + logging.getLogger().setLevel(oldloglevel) > + return This is not a blocker, but it would be better to refactor such that resetting the log level does not involve you repeating yourself here and at the end of pull_msgs. Can you use a 'with' keyword to temporarily set the loglevel, perhaps? Or move the part of pull_msgs after config parsing into its own subroutine. > + uids = ','.join(map(lambda x: str(x['uid']), sorted(pending, > key=lambda x: x['uid']))) This line could be shorter and easier to read, afaict. How about: from operator import itemgetter uids = ','.join(map(str, sorted(map(itemgetter('uid'), pending)))) or uids = ','.join(map(str, sorted([x['uid'] for x in pending]))) (untested) > + for f in resp[1]: > + # these objects are weirdly structured. i don't know why > + # these trailing close-parens show up. so this is very > + # ad-hoc and nonsense > + if isinstance(f, bytes): > + if f != b')': > + raise Exception('got bytes object of length %d but > expected simple closeparen'%(len(f),)) > + elif isinstance(f, tuple): What is f is neither bytes nor tuple? Should probably raise an exception in that case. Is there really no documentation from the IMAP library you're using about the closeparens? > + if on_size_mismatch == 'warn': > + if len(sizes_mismatched) == 0: > + logging.warning('size mismatch: summary said %d > octets, fetch sent %d', > + sizes[m['uid']], m['size']) > + elif len(sizes_mismatched) == 1: > + logging.warning('size mismatch: (mismatches > after the first suppressed until summary)') > + sizes_mismatched.append(sizes[m['uid']] - m['size']) > + elif on_size_mismatch == 'exception': > + raise Exception('size mismatch: summary said %d > octets, fetch sent %d\n(set options.on_size_mismatch to none or warn to avoid > hard failure)', > + sizes[m['uid']], m['size']) > + elif on_size_mismatch != 'none': > + raise Exception('size_mismatch: > options.on_size_mismatch should be none, warn, or exception (found "%s")', > on_size_mismatch) Checking whether size_mismatch contains the wrong sort of value should not happen in the middle of the IMAP response processing code. Please move the check to where on_size_mismatch gets set. Would be nice if you could avoid the really long lines here. > + fname = mdst.add(f[1].replace(b'\r\n', b'\n')) Could a message contain a mixture of UNIX and Windows line endings, such that this line corrupts the message? If not, please write a comment saying why it is always safe to perform this replacement. Maybe python has a e-mail message processing library with a function to transform line endings safely? > + if delete: > + logging.info('trying to delete %d messages from IMAP store', > len(fetched)) > + resp = imap.uid('STORE', ','.join(map(str, fetched.keys())), > '+FLAGS', r'(\Deleted)') > + if resp[0] != 'OK': > + raise Exception('failed to set \\Deleted flag: %s'%(resp)) > + resp = imap.expunge() Not a blocker, but it would be nice if the user could request that the expunge step be skipped. Also, will imap-dl skip messages with the deleted flag? Do you think it should? > diff --git a/imap-dl.1.pod b/imap-dl.1.pod > new file mode 100644 > index 0000000..1407d05 > --- /dev/null > +++ b/imap-dl.1.pod > + > +It tries to ensure that the configuration file is of the expected > +type, and will terminate raising an exception, and it should not lose > +messages. It will always terminate raising an exception? :) > + > +If there's any interest in supporting other use cases for getmail, > +patches are welcome. I think this should be limited to other /similarly simple/ use cases for getmail -- we don't want imap-dl to reimplement getmail, right? Same comment applies to the DESCRIPTION in the script itself. > +B<options.on_size_mismatch> can be set to B<exception>, B<none>, or > +B<warn>. This governs what to do when the remote IMAP server claims a > +different size in the message summary list than the actual message > +retrieval (default: B<exception>). I think that you should use 'terminate' or 'error' instead of 'exception' here. Exceptions are thrown by subroutines in programming languages, and caught by other subroutines, but imap-dl is not a subroutine or library of them, it's a program on the user's PATH. Programs executed by a user do not throw exceptions to be caught by the user, they just exit nonzero. > +B<imap-dl> is deliberately implemented in a modern version of python3, > +and tries to just use the standard library. It will not be backported > +to python2. s/just use/use just/ -- Sean Whitton
signature.asc
Description: PGP signature