On Thu, Sep 12, 2019 at 07:55:23AM +1000, Zenaan Harkness wrote: > Why is Gnu sieve so extremely fast to batch process an mbox file, but > while Dovecot's sieve-filter is an order of magnitude slower? > > Sequence: > > - mpop or getmail to pipeline download emails into temp mbox file > - filter that file > > Gnu sieve just flies through a local mbox file and saving emails to > other local mbox files. > > Gnu sieve rejects too many emails with "malformed" errors, so after a > few years I bit the bullet and upgraded to Dovecot's sieve-filter. > > Dovecot's sieve-filter, at present, is an order of magnitude slower. > > Here's my filter command (one line): > > /usr/bin/sieve-filter -veW -c $HOME/etc/email/sieve-dovecot-config.conf -o > mail_location=mbox:~/mail:INBOX=~/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions > ~/etc/email/sieve.rc email-incoming-unsorted > > The sieve script is fine now that I have the correct "require" > clauses (hint: "capability strings"). > > File ~/etc/email/sieve-dovecot-config.conf: > > protocols = pop > lda_mailbox_autocreate = yes > lda_mailbox_autosubscribe = yes > mail_fsync = never > > There's no re-sending of emails into my local Postfix SMTP server - I > checked the system logs and confirmed this (journalctl -f). > > I suspect that Gnu sieve was directly writing each email to the > appropriate sieve-determined mbox file (perhaps with only a sync at > the end of a single batch process - what I've attempted to achieve > above with sieve-filter), and that sieve-filter is instead passing > each email through some (dovecot) lda? > > Here's the output for a sieve-filter batch processing of 11 emails: > > $ /usr/bin/sieve-filter -veW -c /home/zen/etc/email/sieve-dovecot-config.conf > -o > mail_location=mbox:/home/zen/mail:INBOX=/home/zen/mail/Inbox:INDEX=:UTF-8:VOLATILEDIR=/tmp/dovecot-volatile/%2.256Nu/%u:SUBSCRIPTIONS=dovecot_subscriptions > /home/zen/etc/email/sieve.rc email-incoming-unsorted > # PS0 Timestamp: 20190912@07:02:23 > info: filtering: [Tue, 3 Sep 2019 05:17:16 -0500; 10240 bytes] `Re: > VentureBeat: The death of disk? H...'. > info: > msgid=<CAMjeLr91T9R7APsuxQVuM3WbqDsxAfwn4=oydedx4fmcord...@mail.gmail.com>: > stored mail into mailbox 'l/cp/cp'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 07:29:53 -0400; 12968 bytes] `[zfs-devel] > xattr naming format in Zo...'. > info: msgid=<15675101930.d5ba2e.12...@composer.zfsonlinux.topicbox.com>: > stored mail into mailbox 'l/z/zdev'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 03 Sep 2019 15:29:09 +0300; 20461 bytes] `Re: > [zfs-devel] xattr naming format i...'. > info: msgid=<23955051567513...@sas1-02732547ccc0.qloud-c.yandex.net>: stored > mail into mailbox 'l/z/zdev'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 18:20:42 +0530; 18065 bytes] `Re: > [Gluster-users] Issues with Geo-r...'. > info: > msgid=<CADmkyZMxrfOANrAP+_URAHJcMqCqh=igdajtszkfq5pczsu...@mail.gmail.com>: > stored mail into mailbox 'l/gl/user'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 09:34:20 -0400; 13342 bytes] `Re: tasksel'. > info: msgid=<20190903133420.gs6...@eeg.ccf.org>: stored mail into mailbox > 'l/deb/user'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 06:56:07 -0700 (PDT); 12390 bytes] > `[awx-project] Re: AWX on Kubernetes m...'. > info: msgid=<0715adb7-540f-4cff-9282-e1252c53c...@googlegroups.com>: stored > mail into mailbox 'l/ansible/awx'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 07:01:27 -0700 (PDT); 12220 bytes] > `[awx-project] Re: AWX on Kubernetes m...'. > info: msgid=<949b2c17-4254-49f1-83b4-cd54d15aa...@googlegroups.com>: stored > mail into mailbox 'l/ansible/awx'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 10:14:58 -0400; 25313 bytes] `Re: > [zfs-devel] xattr naming format i...'. > info: > msgid=<cab5c7xphcdfx1w3ya9fyrl-kq8buicr4jbidqrufjj9nogk...@mail.gmail.com>: > stored mail into mailbox 'l/z/zdev'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 17:10:22 +0200; 7567 bytes] `Re: > [asterisk-users] Playing MP3's in...'. > info: msgid=<20190903151022.354xpe6ds2vglher@red.localdomain>: stored mail > into mailbox 'l/as/users'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Wed, 4 Sep 2019 01:04:49 +0900; 14858 bytes] `Re: > [Hyperledger Fabric] a primitive ...'. > info: msgid=<160901d8-b903-9e9a-91ac-267571b0e...@gmx.com>: stored mail into > mailbox 'l/hl/fabric'. > info: message expunged from source mailbox upon successful move. > info: filtering: [Tue, 3 Sep 2019 09:55:22 -0700 (PDT); 13337 bytes] > `[awx-project] Re: AWX on Kubernetes m...'. > info: msgid=<f9bc4e6a-8445-4b34-927a-35f577ffc...@googlegroups.com>: stored > mail into mailbox 'l/ansible/awx'. > info: message expunged from source mailbox upon successful move. > 2 ▶︎️ zen@eye 20190912@07:02:30 ~ $ > > > So about 3/4 of a second is spent by dovecot's sieve-filter, on each > email that it processes - watching it is painful given how fast Gnu > sieve has been for the last few years - it's almost (but not quite) > as slow as my previous fetchmail email download per-email time. > > Attached is a -D debug run of sieve-filter on 20 emails - slightly > longer than the above, and took roughly 15 seconds to run. > > Any help appreciated...
On another test run of ~600 emails, sieve-filter is consistently running ~100% of one CPU (for about 4 minutes) to process these emails, which leads to the conclusion that despite what looks like should be a batch process, sieve-filter is perhaps reloading the rules for every single email that it processes, even though I gave it a whole mbox, and not a single email, to process. Can sieve-filter work the way it should / the way I want it / batch process a whole mbox - without reloading the sieve rules for every email?