Hi All,
I have finally managed to import some of the mails from our current user
base to test out dbmail. I am doing the tests with 2.0.3 and 2.1.3 but have
noticed severe performance problems with 2.1.3. The IMAP client I used in
the test is Outlook Express.
Normal message insertion and single message retrieval does not show any
problems with either version. However, when I am trying to download all
headers of a folder, top reports memory usage like this:
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
76199 nobody 60 0 561M 478M RUN 3:22 84.86% 84.86%
dbmail-imapd
Usually the memory usage can go even higher, up to somewhere around 800MB,
then the growth would suddenly stop and the process become halted (but
memory is not released until killed manually!) because it says it has caught
a segfault. The error message looks like 'GMime .... assert(string !=
NULL)'.
From maillog, I see these 3 queries get logged:
1) SELECT pm.rfcsize FROM dbmail_physmessage pm, dbmail_messages msg WHERE
pm.id = msg.physmessage_id AND msg.message_idnr = '358' AND msg.status< '2'
AND msg.mailbox_idnr = '1'
2) SELECT physmessage_id FROM dbmail_messages WHERE message_idnr = '358'
3) SELECT messageblk FROM dbmail_messageblks WHERE physmessage_id = '358'
ORDER BY messageblk_idnr
The first two queries were fine; the only problem is with the third query:
it appeared to me as if it was waiting for mysql to return the result set,
but then I explained the query in MySQL and everything appeared sane, i.e.
the problem should be with dbmail. When I switched back to the dbmail-imapd
of 2.0.3, everything works again without problem. Trace level is set to 10
already.
From the above observation, I believe it is gmime (or the code that deals
with gmime) in 2.1.3 that causes the problem. I have not consulted the
source tree yet, but I think dbmail is somehow trying to read the contents
of the whole folder into memory without releasing it after each FETCH
statements. My questions finally come:
1) why was it necessary to change from the internal mime parser to gmime?
does the internal mime parser read everything into memory as well?
2) is this memory hogging a normal behavior (i.e. the memory is supposed to
be free'd when the sync finishes) or something unexpected?
3) why is 'full MIME parsing' ever needed when the imap clients only expect
to know certain meta info such as subject, sender, rfclines, ..., so on and
so forth? Why isn't only the wanted fields extracted from a message instead
of reading the whole big big message into memory and pass it to a MIME
parser? Can I consider the additional workload by the MIME parser redundant?
4) after all, if message parsing (or just extract some meta data from the
headers?) is really needed for each message, why isn't the parsing done when
the message is inserted? The headers could hopefully be saved in more compat
formats in the database, and no extra parsing is ever needed when the
message is retrieved.
Just a side note, only mail folders with 'big and perhaps complex MIME
structure' (i.e. those generated by Outlook, Outlook Express, ...) causes
the mentioned problem. Folders with 'mailing list style' plain text simple
MIME structure mails are unaffected.
Any ideas?
P.S.: sorry for asking so many questions at the same time :-) hopefully they
are not too stupid to be asked here...
Cheers,
mc.