On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote:
> Why is Gnu sieve so extremely fast to batch process an mbox file, but
> while Dovecot's sieve-filter is an order of magnitude slower?
> 
> Sequence:
> 
>  - mpop or getmail to pipeline download emails into temp mbox file
>  - filter that file
> 
> Gnu sieve just flies through a local mbox file and saving emails to
> other local mbox files.
> 
> Gnu sieve rejects too many emails with "malformed" errors, so after a
> few years I bit the bullet and upgraded to Dovecot's sieve-filter.
> 
> Dovecot's sieve-filter, at present, is an order of magnitude slower.
> 
> Here's my filter command (one line):
> 
> /usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o 
> mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions
>  ~/etc/email/sieve.rc email-incoming-unsorted
> 
> The sieve script is fine now that I have the correct "require"
> clauses (hint: "capability strings").
> 
> File ~/etc/email/sieve-dovecot-config.conf:
> 
>   protocols = pop
>   lda_mailbox_autocreate = yes
>   lda_mailbox_autosubscribe = yes
>   mail_fsync = never
> 
> There's no re-sending of emails into my local Postfix SMTP server - I
> checked the system logs and confirmed this (journalctl -f).
> 
> I suspect that Gnu sieve was directly writing each email to the
> appropriate sieve-determined mbox file (perhaps with only a sync at
> the end of a single batch process - what I've attempted to achieve
> above with sieve-filter), and that sieve-filter is instead passing
> each email through some (dovecot) lda?
> 
> Here's the output for a sieve-filter batch processing of 11 emails:
> 
> $ /usr/bin/sieve-filter -veW -c /home/zen/etc/email/sieve-dovecot-config.conf 
> -o 
> mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions
>  /home/zen/etc/email/sieve.rc email-incoming-unsorted
> # PS0 Timestamp: 20190912@07:02:23
> info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes] `Re: 
> VentureBeat: The death of disk? H...'.
> info: 
> msgid=<CAMjeLr91T9R7APsuxQVuM3WbqDsxAfwn4=oydedx4fmcord...@mail.gmail.com>: 
> stored mail into mailbox 'l/cp/cp'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 07:29:53 -0400; 12968 bytes] `[zfs-devel] 
> xattr naming format in Zo...'.
> info: msgid=<15675101930.d5ba2e.12...@composer.zfsonlinux.topicbox.com>: 
> stored mail into mailbox 'l/z/zdev'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 03 Sep 2019 15:29:09 +0300; 20461 bytes] `Re: 
> [zfs-devel] xattr naming format i...'.
> info: msgid=<23955051567513...@sas1-02732547ccc0.qloud-c.yandex.net>: stored 
> mail into mailbox 'l/z/zdev'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 18:20:42 +0530; 18065 bytes] `Re: 
> [Gluster-users] Issues with Geo-r...'.
> info: 
> msgid=<CADmkyZMxrfOANrAP+_URAHJcMqCqh=igdajtszkfq5pczsu...@mail.gmail.com>: 
> stored mail into mailbox 'l/gl/user'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 09:34:20 -0400; 13342 bytes] `Re: tasksel'.
> info: msgid=<20190903133420.gs6...@eeg.ccf.org>: stored mail into mailbox 
> 'l/deb/user'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 06:56:07 -0700 (PDT); 12390 bytes] 
> `[awx-project] Re: AWX on Kubernetes m...'.
> info: msgid=<0715adb7-540f-4cff-9282-e1252c53c...@googlegroups.com>: stored 
> mail into mailbox 'l/ansible/awx'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 07:01:27 -0700 (PDT); 12220 bytes] 
> `[awx-project] Re: AWX on Kubernetes m...'.
> info: msgid=<949b2c17-4254-49f1-83b4-cd54d15aa...@googlegroups.com>: stored 
> mail into mailbox 'l/ansible/awx'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 10:14:58 -0400; 25313 bytes] `Re: 
> [zfs-devel] xattr naming format i...'.
> info: 
> msgid=<cab5c7xphcdfx1w3ya9fyrl-kq8buicr4jbidqrufjj9nogk...@mail.gmail.com>: 
> stored mail into mailbox 'l/z/zdev'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 17:10:22 +0200; 7567 bytes] `Re: 
> [asterisk-users] Playing MP3's in...'.
> info: msgid=<20190903151022.354xpe6ds2vglher@red.localdomain>: stored mail 
> into mailbox 'l/as/users'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Wed, 4 Sep 2019 01:04:49 +0900; 14858 bytes] `Re: 
> [Hyperledger Fabric] a primitive ...'.
> info: msgid=<160901d8-b903-9e9a-91ac-267571b0e...@gmx.com>: stored mail into 
> mailbox 'l/hl/fabric'.
> info: message expunged from source mailbox upon successful move.
> info: filtering: [Tue, 3 Sep 2019 09:55:22 -0700 (PDT); 13337 bytes] 
> `[awx-project] Re: AWX on Kubernetes m...'.
> info: msgid=<f9bc4e6a-8445-4b34-927a-35f577ffc...@googlegroups.com>: stored 
> mail into mailbox 'l/ansible/awx'.
> info: message expunged from source mailbox upon successful move.
> 2 ▶︎️ zen@eye 20190912@07:02:30 ~ $
> 
> 
> So about 3/4 of a second is spent by dovecot's sieve-filter, on each
> email that it processes - watching it is painful given how fast Gnu
> sieve has been for the last few years - it's almost (but not quite)
> as slow as my previous fetchmail email download per-email time.
> 
> Attached is a -D debug run of sieve-filter on 20 emails - slightly
> longer than the above, and took roughly 15 seconds to run.
> 
> Any help appreciated...


On another test run of ~600 emails, sieve-filter is consistently
running ~100% of one CPU (for about 4 minutes) to process these
emails, which leads to the conclusion that despite what looks like
should be a batch process, sieve-filter is perhaps reloading the
rules for every single email that it processes, even though I gave it
a whole mbox, and not a single email, to process.

Can sieve-filter work the way it should / the way I want it / batch
process a whole mbox - without reloading the sieve rules for every
email?

Reply via email to