Date: Fri, 21 Feb 2003 12:30:21 -0500 (EST) From: David A Powicki <[EMAIL PROTECTED]> [...] This computer seems happy and fast in all respects, except for mailbox creations (4 seconds) and deletions (about 2 seconds) and ACL updates (2 seconds). A truss of an imapd that creates a mailbox and then deletes it reveals 75 iterations of this:
Are these the only system calls that are getting iterated? ** If you were seeing: fcntl(5, F_SETLKW, 0xFFBEE870) = 0 fstat(5, 0xFFBEE998) = 0 stat("/var/cyrus/imap/mailboxes.db", 0xFFBED3D0) = 0 fcntl(5, F_SETLKW, 0xFFBED448) = 0 open(...) dup(...) I'd diagnose as following: This iteration is unusual. What is happening is that the process is getting an exclusive lock on mailboxes.db and then making sure it has the latest copy of the file. (It compares the inode of the file descriptor is has locked with the inode of the "mailboxes.db" file.) If it is iterating these system calls, it's discovering that some other process has replaced mailboxes.db ** The next question is: what's actually taking the time? Use "truss -D" instead of just truss. This will get you times on the fsync()s, which may be taking substantial fractions of a second. We've noticed a bug with some Solaris setups where fsync() times gradually climb until they're untolerable. Remounting the filesystem without logging and then with logging again seems to clear this bogus behavior. In fact, on our frontend systems (which are the ones that are suspectible to this) we have a cronjob that runs the following once a day: --- #!/bin/sh /usr/sbin/mount -o remount,noatime / /usr/sbin/mount -o remount,noatime,logging / --- Our backend systems use vxfs and don't seem to suffer from this problem. fsyncs should take fractions of a second, not multiple seconds. Another possibility (we haven't tested this in production) is to change "use_osync" from 0 to 1 in cyrusdb_skiplist.c. Some benchmarks I've done show this to yield better performance on Solaris (but not Linux). Larry