Hi All,

I have finally managed to import some of the mails from our current user base to test out dbmail. I am doing the tests with 2.0.3 and 2.1.3 but have noticed severe performance problems with 2.1.3. The IMAP client I used in the test is Outlook Express.

Normal message insertion and single message retrieval does not show any problems with either version. However, when I am trying to download all headers of a folder, top reports memory usage like this:

 PID USERNAME PRI NICE  SIZE    RES STATE    TIME   WCPU    CPU COMMAND
76199 nobody 60 0 561M 478M RUN 3:22 84.86% 84.86% dbmail-imapd

Usually the memory usage can go even higher, up to somewhere around 800MB, then the growth would suddenly stop and the process become halted (but memory is not released until killed manually!) because it says it has caught a segfault. The error message looks like 'GMime .... assert(string != NULL)'.

From maillog, I see these 3 queries get logged:
1) SELECT pm.rfcsize FROM dbmail_physmessage pm, dbmail_messages msg WHERE pm.id = msg.physmessage_id AND msg.message_idnr = '358' AND msg.status< '2' AND msg.mailbox_idnr = '1'
2) SELECT physmessage_id FROM dbmail_messages WHERE message_idnr = '358'
3) SELECT messageblk FROM dbmail_messageblks WHERE physmessage_id = '358' ORDER BY messageblk_idnr

The first two queries were fine; the only problem is with the third query: it appeared to me as if it was waiting for mysql to return the result set, but then I explained the query in MySQL and everything appeared sane, i.e. the problem should be with dbmail. When I switched back to the dbmail-imapd of 2.0.3, everything works again without problem. Trace level is set to 10 already.

From the above observation, I believe it is gmime (or the code that deals
with gmime) in 2.1.3 that causes the problem. I have not consulted the source tree yet, but I think dbmail is somehow trying to read the contents of the whole folder into memory without releasing it after each FETCH statements. My questions finally come: 1) why was it necessary to change from the internal mime parser to gmime? does the internal mime parser read everything into memory as well? 2) is this memory hogging a normal behavior (i.e. the memory is supposed to be free'd when the sync finishes) or something unexpected? 3) why is 'full MIME parsing' ever needed when the imap clients only expect to know certain meta info such as subject, sender, rfclines, ..., so on and so forth? Why isn't only the wanted fields extracted from a message instead of reading the whole big big message into memory and pass it to a MIME parser? Can I consider the additional workload by the MIME parser redundant? 4) after all, if message parsing (or just extract some meta data from the headers?) is really needed for each message, why isn't the parsing done when the message is inserted? The headers could hopefully be saved in more compat formats in the database, and no extra parsing is ever needed when the message is retrieved.

Just a side note, only mail folders with 'big and perhaps complex MIME structure' (i.e. those generated by Outlook, Outlook Express, ...) causes the mentioned problem. Folders with 'mailing list style' plain text simple MIME structure mails are unaffected.

Any ideas?


P.S.: sorry for asking so many questions at the same time :-) hopefully they are not too stupid to be asked here...


Cheers,
mc.

Reply via email to