Hello Daniel,

On Wed 22 Jan 2020 at 04:52PM -05, Daniel Kahn Gillmor wrote:

> The attached git-formatted patch is also present on the imap-dl-squashed
> branch on https://salsa.debian.org/dkg/mailscripts.  jrollins confirmed
> that it was OK, which is why it bears both of our signoffs.

Thanks for sorting that out.  Here are my comments on the current
version of imap-dl.  Looking forward to including it in mailscripts.

> diff --git a/imap-dl b/imap-dl
> new file mode 100755
> index 0000000..f5d7a85
> --- /dev/null
> +++ b/imap-dl
> +
> +_summary_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) 
> RFC822.SIZE (?P<size>[0-9]+)\)$')
> +def break_fetch_summary(line:bytes) -> Dict[str,int]:
> +    '''b'1 (UID 160 RFC822.SIZE 1867)' -> {id: 1, uid: 160, size: 1867}'''
> +    match = _summary_splitter.match(line)
> +    if not match:
> +        raise Exception(f'malformed summary line {line!r}')
> +    ret:Dict[str,int] = {}
> +    i:str
> +    for i in ['id', 'uid', 'size']:
> +        ret[i] = int(match[i])
> +    return ret
> +
> +_fetch_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) (FLAGS 
> \([\\A-Za-z ]*\) )?BODY\[\] \{(?P<size>[0-9]+)\}$')
> +def break_fetch(line:bytes) -> Dict[str,int]:
> +    '''b'1 (UID 160 BODY[] {1867}' -> {id: 1, uid: 160, size: 1867}'''
> +    match = _fetch_splitter.match(line)
> +    if not match:
> +        raise Exception(f'malformed fetch line {line!r}')
> +    ret:Dict[str,int] = {}
> +    i:str
> +    for i in ['id', 'uid', 'size']:
> +        ret[i] = int(match[i])
> +    return ret

These two functions are almost identical; please refactor.

> +        if verbose: # only enable debugging after login to avoid leaking 
> credentials in the log
> +            imap.debug = 4
> +            logging.info("capabilities reported: %s", ', 
> '.join(imap.capabilities))

I am not familiar with idiomatic logging in Python, but shouldn't the
logging.info call be outside of the 'if' block, like all the others?

And shouldn't this be `if verbose or conf_verbose`?

> +        if n == 0:
> +            logging.info('No messages to retrieve')
> +            logging.getLogger().setLevel(oldloglevel)
> +            return

This is not a blocker, but it would be better to refactor such that
resetting the log level does not involve you repeating yourself here and
at the end of pull_msgs.

Can you use a 'with' keyword to temporarily set the loglevel, perhaps?
Or move the part of pull_msgs after config parsing into its own
subroutine.

> +        uids = ','.join(map(lambda x: str(x['uid']), sorted(pending, 
> key=lambda x: x['uid'])))

This line could be shorter and easier to read, afaict.  How about:

    from operator import itemgetter
    uids = ','.join(map(str, sorted(map(itemgetter('uid'), pending))))

or

    uids = ','.join(map(str, sorted([x['uid'] for x in pending])))

(untested)

> +        for f in resp[1]:
> +            # these objects are weirdly structured. i don't know why
> +            # these trailing close-parens show up.  so this is very
> +            # ad-hoc and nonsense
> +            if isinstance(f, bytes):
> +                if f != b')':
> +                    raise Exception('got bytes object of length %d but 
> expected simple closeparen'%(len(f),))
> +            elif isinstance(f, tuple):

What is f is neither bytes nor tuple?  Should probably raise an
exception in that case.

Is there really no documentation from the IMAP library you're using
about the closeparens?

> +                    if on_size_mismatch == 'warn':
> +                        if len(sizes_mismatched) == 0:
> +                            logging.warning('size mismatch: summary said %d 
> octets, fetch sent %d',
> +                                            sizes[m['uid']], m['size'])
> +                        elif len(sizes_mismatched) == 1:
> +                            logging.warning('size mismatch: (mismatches 
> after the first suppressed until summary)')
> +                        sizes_mismatched.append(sizes[m['uid']] - m['size'])
> +                    elif on_size_mismatch == 'exception':
> +                        raise Exception('size mismatch: summary said %d 
> octets, fetch sent %d\n(set options.on_size_mismatch to none or warn to avoid 
> hard failure)',
> +                                        sizes[m['uid']], m['size'])
> +                    elif on_size_mismatch != 'none':
> +                        raise Exception('size_mismatch: 
> options.on_size_mismatch should be none, warn, or exception (found "%s")', 
> on_size_mismatch)

Checking whether size_mismatch contains the wrong sort of value should
not happen in the middle of the IMAP response processing code.  Please
move the check to where on_size_mismatch gets set.

Would be nice if you could avoid the really long lines here.

> +                fname = mdst.add(f[1].replace(b'\r\n', b'\n'))

Could a message contain a mixture of UNIX and Windows line endings, such
that this line corrupts the message?  If not, please write a comment
saying why it is always safe to perform this replacement.

Maybe python has a e-mail message processing library with a function to
transform line endings safely?

> +        if delete:
> +            logging.info('trying to delete %d messages from IMAP store', 
> len(fetched))
> +            resp = imap.uid('STORE', ','.join(map(str, fetched.keys())), 
> '+FLAGS', r'(\Deleted)')
> +            if resp[0] != 'OK':
> +                raise Exception('failed to set \\Deleted flag: %s'%(resp))
> +            resp = imap.expunge()

Not a blocker, but it would be nice if the user could request that the
expunge step be skipped.

Also, will imap-dl skip messages with the deleted flag?  Do you think it
should?

> diff --git a/imap-dl.1.pod b/imap-dl.1.pod
> new file mode 100644
> index 0000000..1407d05
> --- /dev/null
> +++ b/imap-dl.1.pod
> +
> +It tries to ensure that the configuration file is of the expected
> +type, and will terminate raising an exception, and it should not lose
> +messages.

It will always terminate raising an exception? :)

> +
> +If there's any interest in supporting other use cases for getmail,
> +patches are welcome.

I think this should be limited to other /similarly simple/ use cases for
getmail -- we don't want imap-dl to reimplement getmail, right?

Same comment applies to the DESCRIPTION in the script itself.

> +B<options.on_size_mismatch> can be set to B<exception>, B<none>, or
> +B<warn>.  This governs what to do when the remote IMAP server claims a
> +different size in the message summary list than the actual message
> +retrieval (default: B<exception>).

I think that you should use 'terminate' or 'error' instead of
'exception' here.  Exceptions are thrown by subroutines in programming
languages, and caught by other subroutines, but imap-dl is not a
subroutine or library of them, it's a program on the user's PATH.
Programs executed by a user do not throw exceptions to be caught by the
user, they just exit nonzero.

> +B<imap-dl> is deliberately implemented in a modern version of python3,
> +and tries to just use the standard library.  It will not be backported
> +to python2.

s/just use/use just/

-- 
Sean Whitton

Attachment: signature.asc
Description: PGP signature

Reply via email to