Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Nik Conwell

On Feb 28, 2008, at 4:38 PM, Jeff Fookson wrote:

> is about 200GB.  There are typically about 200  'imapd'
> processes at a given time and a hugely varying number of  
> 'lmtpds' (from
> about 6 to many hundreds during
> times of greatest pathology). System load is correspondingly in the  
> 2-15
> range, but can spike to 50-70!

Typically when deadlocks free you get load spikes as work can now  
progress.  It implies one thing was holding the lock for a long time -  
that thing itself probably being impeded by something else.  If there  
was high activity of many things hitting the lock, you wouldn't expect  
to see spikes - the system might even look idle as everything is just  
waiting for the lock.

> waits of  upwards of 1-2 minutes to get a write lock as shown by the
> example below (this is from a trace of an 'lmtpd')
>
> [strace -f -p 9817 -T]
> 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
> len=0}) = 0 <84.998159>
[...]
> Can anyone suggest what we might do next to debug the problem further?

Good job with the strace.  Now figure out what fd 10 is, either by  
lsof or earlier in the strace output (look for "= 10" and that should  
show what opened it).

Then install lslk and figure out who is holding the lock on that file  
and for how long, etc.  Then look at that process to see what it's  
doing for so long (strace again).

-nik


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Kenneth Marshall
On Fri, Feb 29, 2008 at 07:16:24AM +0100, Pascal Gienger wrote:
> Jeff Fookson <[EMAIL PROTECTED]> wrote:
> 
> > Databases are all skiplist.
> 
> As a rule of thumb, do not use skiplist for the duplicate delivery 
> suppression database (deliver.db). Even if everybody hates it, use 
> BerkeleyDB, Version 4.4.52 or higher. Give it a quite fair amount of shared 
> memory. And run cyr_expunge often to prune that database so that no entry 
> is older than - say - 3 days.
> 
> We have approx 10-15 messages/sec incoming on one node.

I would like to add that we use skiplist for the deliver.db here
with a hardware caching controller for a system with 7500 accounts
and have no performance problems. It is key to run cyr_expunge to
keep it pruned. Also, with your setup (software RAID + DRBD) you
would benefit from the in memory nature of the BerkeleyDB format.
That one change may make a significant improvement for your system.

Cheers,
Ken

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Gabor Gombas
On Thu, Feb 28, 2008 at 04:56:18PM -0600, Kenneth Marshall wrote:

> It may be that the software RAID 5 is your problem. Without the
> use of NVRAM for a cache, all of the writes need all 3 disks.
> That will cause quite a bottle-neck.

It's much worse than that. Since metadata updates are almost certainly
smaller than the stripe size, evety metadata update will look like this:

- read the full stripe (i.e. read from ALL disks)
- calculate the new parity
- write back the modification & the new parity

That sure as hell will kill your performance. Move at least the matadata
partition to a RAID1 or RAID10 array. With Linux, you can do RAID10 even
with just 3 disks, but you will of course loose 1/2 disk capacity
compared to RAID5.

Gabor

-- 
 -
 MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
 -

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Dual & Quad Core processors

2008-02-29 Thread Adam Tauno Williams

On Fri, 2008-02-29 at 08:45 -0600, Sam Egelhof wrote:
> Does Cyrus-imapd take advantage of Dual and\or Quad core processors?

Yes.

>  We are looking at upgrading our server to either 2x Dual core Xeon’s
> or 1 x Quad core Xeon processor. Does Cyrus have the ability to take
> advantage of this?

Yes



Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Dual & Quad Core processors

2008-02-29 Thread Sam Egelhof
Does Cyrus-imapd take advantage of Dual and\or Quad core processors? We are 
looking at upgrading our server to either 2x Dual core Xeon's or 1 x Quad core 
Xeon processor. Does Cyrus have the ability to take advantage of this?

Thanks,

Sam

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Allen Chen
I just got out of this kind of situation.
If your OS is Linux, can you post /etc/syslog.conf?

Allen

Jeff Fookson wrote:
> Folks-
>
> I am hoping to get some help and guidance as to why our installation of 
> cyrus-imapd 2.3.9
> is unusably slow. Here are the specifics:
>
> The software is running on a 1.6GHz Opteron with 2Gb memory supporting a 
> user base of about 400
> users. The average rate of arriving mail is on the order of 1-2 
> messages/sec. The active mailstore
> is about 200GB.  There are typically about 200  'imapd'
> processes at a given time and a hugely varying number of 'lmtpds' (from 
> about 6 to many hundreds during
> times of greatest pathology). System load is correspondingly in the 2-15 
> range, but can spike to 50-70!
>
> Our users complain that the system is extremely sluggish during the day 
> when the system is most busy.
>
> The most obvious thing we observe is that both the lmtpds and the imapds 
> are spending HUGE times waiting
> on locks. Even when the system load is only 1-2, an 'strace' attached to 
> an instance of lmtpd or imapd shows
> waits of  upwards of 1-2 minutes to get a write lock as shown by the 
> example below (this is from a trace of an 'lmtpd')
>
> [strace -f -p 9817 -T]
> 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, 
> len=0}) = 0 <84.998159>
>
> We strongly suspect that these large times waiting on locks is what is 
> causing the slowness our users are reporting.
>
> We are under the impression that a single instance of cyrus-imapd scales 
> well up to about 1000 users (with about 1MB active
> memory per 'imapd' process),  and so we are baffled as to what might be 
> going on.
>
> A non-standard aspect of our installation which may have something to do 
> with the problem is that we are
> running cyrus on an lvm2 partition that itself is running on top of 
> drbd. Thinking that the remote writes
> to the drbd secondary might be causing delays, we put the primary in 
> stand-alone mode so that the drbd layer
> was not doing any network activity (the drbd link is running at gigabit 
> speed on its own crossover cable to
> the secondary box) and saw no significant change in behavior. Any issues 
> due to locking and the lvm2 layer
> would, of course, still be present even with drbd's activity reduced to 
> just local writes.
>
> Can anyone suggest what we might do next to debug the problem further? 
> Needless to say, our users get
> extremely unhappy when trivial operations in their mail clients take 
> over a minute to complete.
>
> Thank you for any thoughts or advice.
>
> Jeff Fookson
>
>   


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Allen Chen
Can you put a "-" just before /var/log/messages and 
/var/log/cyrus/imapd.log in your /etc/syslog.conf? (just like 
-/var/log/maillog)
and restart syslog: service syslog restart.


Allen


Jeff Fookson wrote:
> Allen Chen wrote:
>
>> I just got out of this kind of situation.
>> If your OS is Linux, can you post /etc/syslog.conf?
>>
>> Allen
>
>
> Allan-
>
> Yes, the installation is running under CentOS4.4, kernel 2.6.18.8. 
> I've attached our /etc/syslog.conf.
> I am really curious what you found and got out of that makes you 
> suspect syslog involvement.
> Thanks.
>
> Jeff
>
>>
>> Jeff Fookson wrote:
>>
>>> Folks-
>>>
>>> I am hoping to get some help and guidance as to why our installation 
>>> of cyrus-imapd 2.3.9
>>> is unusably slow. Here are the specifics:
>>>
>>> The software is running on a 1.6GHz Opteron with 2Gb memory 
>>> supporting a user base of about 400
>>> users. The average rate of arriving mail is on the order of 1-2 
>>> messages/sec. The active mailstore
>>> is about 200GB.  There are typically about 200  'imapd'
>>> processes at a given time and a hugely varying number of 'lmtpds' 
>>> (from about 6 to many hundreds during
>>> times of greatest pathology). System load is correspondingly in the 
>>> 2-15 range, but can spike to 50-70!
>>>
>>> Our users complain that the system is extremely sluggish during the 
>>> day when the system is most busy.
>>>
>>> The most obvious thing we observe is that both the lmtpds and the 
>>> imapds are spending HUGE times waiting
>>> on locks. Even when the system load is only 1-2, an 'strace' 
>>> attached to an instance of lmtpd or imapd shows
>>> waits of  upwards of 1-2 minutes to get a write lock as shown by the 
>>> example below (this is from a trace of an 'lmtpd')
>>>
>>> [strace -f -p 9817 -T]
>>> 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, 
>>> len=0}) = 0 <84.998159>
>>>
>>> We strongly suspect that these large times waiting on locks is what 
>>> is causing the slowness our users are reporting.
>>>
>>> We are under the impression that a single instance of cyrus-imapd 
>>> scales well up to about 1000 users (with about 1MB active
>>> memory per 'imapd' process),  and so we are baffled as to what might 
>>> be going on.
>>>
>>> A non-standard aspect of our installation which may have something 
>>> to do with the problem is that we are
>>> running cyrus on an lvm2 partition that itself is running on top of 
>>> drbd. Thinking that the remote writes
>>> to the drbd secondary might be causing delays, we put the primary in 
>>> stand-alone mode so that the drbd layer
>>> was not doing any network activity (the drbd link is running at 
>>> gigabit speed on its own crossover cable to
>>> the secondary box) and saw no significant change in behavior. Any 
>>> issues due to locking and the lvm2 layer
>>> would, of course, still be present even with drbd's activity reduced 
>>> to just local writes.
>>>
>>> Can anyone suggest what we might do next to debug the problem 
>>> further? Needless to say, our users get
>>> extremely unhappy when trivial operations in their mail clients take 
>>> over a minute to complete.
>>>
>>> Thank you for any thoughts or advice.
>>>
>>> Jeff Fookson
>>>
>>>   
>>
>>
>
>


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Simon Matter
> Can you put a "-" just before /var/log/messages and
> /var/log/cyrus/imapd.log in your /etc/syslog.conf? (just like
> -/var/log/maillog)
> and restart syslog: service syslog restart.

Another culprit can be name resolution. At least localhost and the servers
own hostnames should be listed in the hosts file for fast lookups. And how
important it is to look at it shoes for example postfix: if you are using
postfix on the imap server and deliver via LMTP socket, postfix in it's
default configuration will still resolve everything using DNS, even
localhost. It simply doesn't care about your resolver configuration. So,
for final delivery postfix instance, I always use "disable_dns_lookups =
yes" which means use your OS resolver to lookup hosts and not the builtin
DNS resolver.

Another problem can be handling of groups. If there is any special
configuration I suggest to look at it as well.

Simon

>
>
> Allen
>
>
> Jeff Fookson wrote:
>> Allen Chen wrote:
>>
>>> I just got out of this kind of situation.
>>> If your OS is Linux, can you post /etc/syslog.conf?
>>>
>>> Allen
>>
>>
>> Allan-
>>
>> Yes, the installation is running under CentOS4.4, kernel 2.6.18.8.
>> I've attached our /etc/syslog.conf.
>> I am really curious what you found and got out of that makes you
>> suspect syslog involvement.
>> Thanks.
>>
>> Jeff
>>
>>>
>>> Jeff Fookson wrote:
>>>
 Folks-

 I am hoping to get some help and guidance as to why our installation
 of cyrus-imapd 2.3.9
 is unusably slow. Here are the specifics:

 The software is running on a 1.6GHz Opteron with 2Gb memory
 supporting a user base of about 400
 users. The average rate of arriving mail is on the order of 1-2
 messages/sec. The active mailstore
 is about 200GB.  There are typically about 200  'imapd'
 processes at a given time and a hugely varying number of 'lmtpds'
 (from about 6 to many hundreds during
 times of greatest pathology). System load is correspondingly in the
 2-15 range, but can spike to 50-70!

 Our users complain that the system is extremely sluggish during the
 day when the system is most busy.

 The most obvious thing we observe is that both the lmtpds and the
 imapds are spending HUGE times waiting
 on locks. Even when the system load is only 1-2, an 'strace'
 attached to an instance of lmtpd or imapd shows
 waits of  upwards of 1-2 minutes to get a write lock as shown by the
 example below (this is from a trace of an 'lmtpd')

 [strace -f -p 9817 -T]
 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
 len=0}) = 0 <84.998159>

 We strongly suspect that these large times waiting on locks is what
 is causing the slowness our users are reporting.

 We are under the impression that a single instance of cyrus-imapd
 scales well up to about 1000 users (with about 1MB active
 memory per 'imapd' process),  and so we are baffled as to what might
 be going on.

 A non-standard aspect of our installation which may have something
 to do with the problem is that we are
 running cyrus on an lvm2 partition that itself is running on top of
 drbd. Thinking that the remote writes
 to the drbd secondary might be causing delays, we put the primary in
 stand-alone mode so that the drbd layer
 was not doing any network activity (the drbd link is running at
 gigabit speed on its own crossover cable to
 the secondary box) and saw no significant change in behavior. Any
 issues due to locking and the lvm2 layer
 would, of course, still be present even with drbd's activity reduced
 to just local writes.

 Can anyone suggest what we might do next to debug the problem
 further? Needless to say, our users get
 extremely unhappy when trivial operations in their mail clients take
 over a minute to complete.

 Thank you for any thoughts or advice.

 Jeff Fookson


>>>
>>>
>>
>>
>
> 
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>



Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Dual & Quad Core processors

2008-02-29 Thread Rob Mueller

  Does Cyrus-imapd take advantage of Dual and\or Quad core processors? We are 
looking at upgrading our server to either 2x Dual core Xeon's or 1 x Quad core 
Xeon processor. Does Cyrus have the ability to take advantage of this?
Since it uses a multi-process model, yes it does.

However that's not what you should upgrade your servers for. cyrus uses very 
little CPU time on any modern processor. Even with 10,000's of users, our 
servers with 3 year old 2.4Ghz Netburst Xeons never get more than about 30% CPU 
usage (out of 400% because they're dual processor, hyper-threaded machines).

Instead cyrus is incredibly IO hungry. You should have as much RAM as possible 
for caching, and a battery backed NVRAM RAID controller to try and improve the 
random write IO that's generated. These will do much more for you than 
upgrading your CPU will.

Rob

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html

Migrate & upgrade 2.3.1->2.3.7: seen status on INBOX

2008-02-29 Thread Steve Huston
We're upgrading our mail server hardware, and at the same time moving
from Fedora to CentOS for the OS.  The version of Cyrus we run currently
is 2.3.1, and the one on CentOS is 2.3.7
(cyrus-imapd-2.3.7-1.1.el5.x86_64.rpm).

I've setup the configuration to match our old system, and verified that
logging in and the like works, and the next step was to test moving the
mails over.  After exporting the mboxlist on the old server and
importing it on the new one, I started with my own mail; copy over
/var/spool/imap/user/huston/ to the new server, then copy
/var/imap/user/h/huston.[seen,sub].  I had to reconstruct the mail
store, and then everything seemed fine - except all the messages in my
inbox were flagged as unread.

I thought the seen state for all mails was in the .seen file, so I'm
confused why it's correct on all the subfolders but flagging all in the
inbox as new.  Any pointers?  A Google of the archives only turned up a
similar question from last year with no response.

Thanks!

-- 
Steve Huston - W2SRH - Unix Sysadmin, Dept. of Astrophysical Sciences
  Princeton University  |ICBM Address: 40.346525   -74.651285
126 Peyton Hall |"On my ship, the Rocinante, wheeling through
  Princeton, NJ   08544 | the galaxies; headed for the heart of Cygnus,
(609) 258-7375  | headlong into mystery."  -Rush, 'Cygnus X-1'

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: possible sieve/cyrus redirect issue?

2008-02-29 Thread Todd Lyons
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, Feb 28, 2008 at 10:10:46AM -0600, [EMAIL PROTECTED] wrote:

>   Also, part of the same email message, the log entry below alway show for
>   that user... I don't know what that means
>
>   sendmail[24713]: m1QIIdVB024713: Authentication-Warning: server.jsums.edu:
>   cyrus set sender to [EMAIL PROTECTED] using -f

Add user "cyrus" to the trusted-users file in your sendmail
configuration directory (typically /etc/mail/trusted-users on Linux
boxen).

- -- 
Regards...  Todd
we're off on the usual strange tangents.  next will be whether
it is ethical to walk in your neighbor's open house if they're
running ipv6:-).  --Randy Bush
Linux kernel 2.6.22-14-generic   load average: 0.02, 0.03, 0.00
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHxx6pY2VBGxIDMLwRAht2AJ4+OurJILM7YXDQH98QK9KNi26n5ACdF/fM
wOQdZG7jUqZhpr5KrfIx1xI=
=lav7
-END PGP SIGNATURE-

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Kenneth Marshall
Delivery through the lmtpd process should not take long enough
to cause this type of backlog unless there is a performance
bottle-neck, such as the delivery DB format that has been suggested
previously, particularly in such a small system.

Cheers,
Ken
On Thu, Feb 28, 2008 at 04:09:58PM -0600, Paul M Fleming wrote:
> Limit the number of lmtpd daemons to around 10 -- that solved the issue 
> for me.. We let sendmail handle the queuing. It is more than likely a 
> locking issue..
> 
> 
> Michael Bacon wrote:
> > What database format are you using for the mailboxes database?  What kind 
> > of storage is the "metapartition" (usually /var/imap) on?  What kind of 
> > storage are your mail partitions on?
> > 
> > 
> > --On Thursday, February 28, 2008 2:38 PM -0700 Jeff Fookson 
> > <[EMAIL PROTECTED]> wrote:
> > 
> >> Folks-
> >>
> >> I am hoping to get some help and guidance as to why our installation of
> >> cyrus-imapd 2.3.9
> >> is unusably slow. Here are the specifics:
> >>
> >> The software is running on a 1.6GHz Opteron with 2Gb memory supporting a
> >> user base of about 400
> >> users. The average rate of arriving mail is on the order of 1-2
> >> messages/sec. The active mailstore
> >> is about 200GB.  There are typically about 200  'imapd'
> >> processes at a given time and a hugely varying number of 'lmtpds' (from
> >> about 6 to many hundreds during
> >> times of greatest pathology). System load is correspondingly in the 2-15
> >> range, but can spike to 50-70!
> >>
> >> Our users complain that the system is extremely sluggish during the day
> >> when the system is most busy.
> >>
> >> The most obvious thing we observe is that both the lmtpds and the imapds
> >> are spending HUGE times waiting
> >> on locks. Even when the system load is only 1-2, an 'strace' attached to
> >> an instance of lmtpd or imapd shows
> >> waits of  upwards of 1-2 minutes to get a write lock as shown by the
> >> example below (this is from a trace of an 'lmtpd')
> >>
> >> [strace -f -p 9817 -T]
> >> 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
> >> len=0}) = 0 <84.998159>
> >>
> >> We strongly suspect that these large times waiting on locks is what is
> >> causing the slowness our users are reporting.
> >>
> >> We are under the impression that a single instance of cyrus-imapd scales
> >> well up to about 1000 users (with about 1MB active
> >> memory per 'imapd' process),  and so we are baffled as to what might be
> >> going on.
> >>
> >> A non-standard aspect of our installation which may have something to do
> >> with the problem is that we are
> >> running cyrus on an lvm2 partition that itself is running on top of
> >> drbd. Thinking that the remote writes
> >> to the drbd secondary might be causing delays, we put the primary in
> >> stand-alone mode so that the drbd layer
> >> was not doing any network activity (the drbd link is running at gigabit
> >> speed on its own crossover cable to
> >> the secondary box) and saw no significant change in behavior. Any issues
> >> due to locking and the lvm2 layer
> >> would, of course, still be present even with drbd's activity reduced to
> >> just local writes.
> >>
> >> Can anyone suggest what we might do next to debug the problem further?
> >> Needless to say, our users get
> >> extremely unhappy when trivial operations in their mail clients take
> >> over a minute to complete.
> >>
> >> Thank you for any thoughts or advice.
> >>
> >> Jeff Fookson
> >>
> >> --
> >> Jeffrey E. Fookson, PhDPhone: (520) 621 3091
> >> Support Systems Analyst, Principal [EMAIL PROTECTED]
> >> Steward Observatory
> >> University of Arizona
> >>
> >> 
> >> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> >> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> >> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
> > 
> > 
> > 
> > 
> > 
> > Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> > Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> > List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
> 
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
> 

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Henrique de Moraes Holschuh
On Thu, 28 Feb 2008, Michael Bacon wrote:
> I've never seen drbd used for Cyrus, but it looks like other folks have 
> done it.  The combination of drbd+lvm2+ext3 might put you somewhere 
> unpleasant, but I'll have to let the Linux-heads jump in on that one.

Don't try it with 4k stacks, IMO.  It could blow up badly.  Stacked devices
and filesystems have this nasty tendency to eat up way too much stack :(

And whatever you do, don't do mailspool IO patterns over Linux raid5 with
the raid bitmap updates enabled and ext3.  Performance goes to crap.  I
don't exactly know how to enable or disable these bitmaps, though.  Look at
mdadm's manpage.

> > a linux software RAID 5 (3 SATA disks). On top of the md layer is the
> > drbd device; on top of that is an lvm2 logical volume; on top of that is
> > an ext3 filesystem, mounted
> > as '/var/imap'. The mail is then in /var/imap/mail and the metadata in
> > /var/imap/config (and we also have /var/imap/certs for the ssl stuff, and
> > /var/imap/sieve for sieve scripts).

Do look into that md raid bitmap option, remember that using lvm anywhere in
a chain kills any and all write-barrier support which means a full
sync-cache command to the HD even if it is a nice SCSI one, remember that
drbd is not a lightning bolt either (you do have a direct gigabit ethernet
link in use just for the drbd sync, don't you?), and remember to inform lvm
AND ext3 of the raid stripe size when making the filesystems and lvm
volumes.

Also, the usual mount tricks like noatime should apply.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

2008-02-29 Thread Simon Matter
> Michael Bacon wrote:
>
>> What database format are you using for the mailboxes database?  What
>> kind of storage is the "metapartition" (usually /var/imap) on?  What
>> kind of storage are your mail partitions on?
>
> Databases are all skiplist. Our mail partition and the metapartition are

skiplist is good.

> both on the same filesystem, as we intended that both be part of the
> same drbd mirror. That partition is
> a linux software RAID 5 (3 SATA disks). On top of the md layer is the

software RAID 5 seems fine for data but I stronly suggest separate RAID 1
for config.

> drbd device; on top of that is an lvm2 logical volume; on top of that is

I don't think LVM2 is the problem here, I'm using it almost everywhere.
The same with ext3.

I have never used drbd in production but, could it be that it's causing
you the problems? I've done some intensive benchmarks with different
solutions like AOE and gnbd and found that it performs quite bad for
certain types of usage.
Couldn't you test by simply mounting the LVM device without the drbd layer
(maybe with an offset where the real filesystem begins)?

What I know for sure is that your server should do very fine with that
count of connections.

Simon

> an ext3 filesystem, mounted
> as '/var/imap'. The mail is then in /var/imap/mail and the metadata in
> /var/imap/config (and we also have /var/imap/certs for the ssl stuff,
> and /var/imap/sieve for sieve scripts).
>
> Thanks.
>
> Jeff Fookson
>
>>
>>
>> --On Thursday, February 28, 2008 2:38 PM -0700 Jeff Fookson
>> <[EMAIL PROTECTED]> wrote:
>>
>>> Folks-
>>>
>>> I am hoping to get some help and guidance as to why our installation of
>>> cyrus-imapd 2.3.9
>>> is unusably slow. Here are the specifics:
>>>
>>> The software is running on a 1.6GHz Opteron with 2Gb memory supporting
>>> a
>>> user base of about 400
>>> users. The average rate of arriving mail is on the order of 1-2
>>> messages/sec. The active mailstore
>>> is about 200GB.  There are typically about 200  'imapd'
>>> processes at a given time and a hugely varying number of 'lmtpds' (from
>>> about 6 to many hundreds during
>>> times of greatest pathology). System load is correspondingly in the
>>> 2-15
>>> range, but can spike to 50-70!
>>>
>>> Our users complain that the system is extremely sluggish during the day
>>> when the system is most busy.
>>>
>>> The most obvious thing we observe is that both the lmtpds and the
>>> imapds
>>> are spending HUGE times waiting
>>> on locks. Even when the system load is only 1-2, an 'strace' attached
>>> to
>>> an instance of lmtpd or imapd shows
>>> waits of  upwards of 1-2 minutes to get a write lock as shown by the
>>> example below (this is from a trace of an 'lmtpd')
>>>
>>> [strace -f -p 9817 -T]
>>> 9817  fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
>>> len=0}) = 0 <84.998159>
>>>
>>> We strongly suspect that these large times waiting on locks is what is
>>> causing the slowness our users are reporting.
>>>
>>> We are under the impression that a single instance of cyrus-imapd
>>> scales
>>> well up to about 1000 users (with about 1MB active
>>> memory per 'imapd' process),  and so we are baffled as to what might be
>>> going on.
>>>
>>> A non-standard aspect of our installation which may have something to
>>> do
>>> with the problem is that we are
>>> running cyrus on an lvm2 partition that itself is running on top of
>>> drbd. Thinking that the remote writes
>>> to the drbd secondary might be causing delays, we put the primary in
>>> stand-alone mode so that the drbd layer
>>> was not doing any network activity (the drbd link is running at gigabit
>>> speed on its own crossover cable to
>>> the secondary box) and saw no significant change in behavior. Any
>>> issues
>>> due to locking and the lvm2 layer
>>> would, of course, still be present even with drbd's activity reduced to
>>> just local writes.
>>>
>>> Can anyone suggest what we might do next to debug the problem further?
>>> Needless to say, our users get
>>> extremely unhappy when trivial operations in their mail clients take
>>> over a minute to complete.
>>>
>>> Thank you for any thoughts or advice.
>>>
>>> Jeff Fookson
>>>
>>> --
>>> Jeffrey E. Fookson, PhDPhone: (520) 621 3091
>>> Support Systems Analyst, Principal[EMAIL PROTECTED]
>>> Steward Observatory
>>> University of Arizona
>>>
>>> 
>>> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
>>> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
>>> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>>
>>
>>
>>
>>
>
>
> --
> Jeffrey E. Fookson, PhD   Phone: (520) 621 3091
> Support Systems Analyst, Principal[EMAIL PROTECTED]
> Steward Observatory
> University of Arizona
>
> 
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>



Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FA