Re: cyrus murder, mupdate sucking up CPU

Patrick Radtke Tue, 07 Mar 2006 07:42:19 -0800

We have the same/similar problems with mupdate on RHEL4.

Our problem usually shows up when we are creating new users or ifusers are creating new mailboxes. The mailbox creation may hang or goextremely slow (and eventually start hanging). This seems to belinked to when a frontend restarts and is synching its mailbox list.

Mupdate uses 99% of cpu apparently doing nothing. If we do the strace-f -p then the the process does idle, but also stops doing anythingat all (nothing is logged to the log files from that point on).

If we restart the murder master, then all our frontends (10) andbackends (14) reconnect and the murder master starts droppingconnections, and the frontends connect again and then getdisconnected (and so on). we're still investigating this one. Theworker thread count keeps increasing as the frontends keepreconnecting. It seems our only way to restart the murder master isby using iptables to block connections from the backends and thenslowly re-allow connections once the frontends have re-synched. Itappears that frontends re-synching and backends creating mailboxes atthe same time do not get along in our setup.



-Patrick


On Mar 3, 2006, at 2:53 PM, Aleksandar Milivojevic wrote:

I've asked about this problem earlier while trying out version2.3.1. I've just compiled 2.3.3 (Simon's SRPM package) and stillhaving the same problem. This is the show stopper for me forupgrading from 2.2 to 2.3.
The problem is mupdate process sucks all CPU cycles it can get.

Now for the weird stuff.
Running strace -p 3990 (3990 being PID of mupdate process) justshows it waiting in accept system call.
However, running strace -f -p 3990 showed this:

[pid  3995] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
[pid  3998] futex(0x8122134, FUTEX_WAKE, 1 <unfinished ...>
[pid  3995] <... clock_gettime resumed> {1141412737, 901972000}) = 0
[pid  3994] <... futex resumed> )       = 0
[pid  3998] <... futex resumed> )       = 1
[pid  3995] futex(0x8119fe0, FUTEX_WAKE, 1 <unfinished ...>
[pid  3994] futex(0x8122134, FUTEX_WAKE, 1 <unfinished ...>
[pid  3998] gettimeofday( <unfinished ...>
[pid  3995] <... futex resumed> )       = 0
[pid  3994] <... futex resumed> )       = 0
[pid  3998] <... gettimeofday resumed> {1141412737, 902155}, NULL) = 0
[pid 3995] futex(0x8119fe4, FUTEX_WAIT, -106641967, {59,994760000} <unfinished ...>
[pid  3994] time( <unfinished ...>
[pid  3998] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
[pid 3995] <... futex resumed> ) = -1 EAGAIN (Resourcetemporarily unavailable)
[pid  3994] <... time resumed> NULL)    = 1141412737
[pid  3998] <... clock_gettime resumed> {1141412737, 902307000}) = 0
[pid  3995] futex(0x8119fe0, FUTEX_WAIT, 2, NULL <unfinished ...>
[pid  3994] select(7, [6], NULL, NULL, {0, 0}finished ...>
[pid  3992] <... clock_gettime resumed> {1141412737, 903913000}) = 0
Now the strange thing, after I exit strace, mupdate starts tobehave and goes to idling. Attaching again to it with strace stillshows the same output, but it is not consuming almost any CPUcycles. However, it is still huge, around 170MB.
Even more strange is that if I restart it (stop Cyrus, start itagain), the new mupdate process also seems to work OK!? Reboot thesystem, and get the same problem again.
Could it be that I'm hitting a bug somewhere else in the system(like kernel)? Is anybody else running Cyrus 2.3.x in murderconfiguration on CentOS4 or RHEL4 (update 2)?
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


----
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


----
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: cyrus murder, mupdate sucking up CPU

Reply via email to