[Pan-users] Re: Forcing an expire?

Duncan Fri, 10 Jul 2009 04:11:12 -0700

Ron Johnson <ron.l.john...@cox.net> posted
4a5700df.9060...@cox.net, excerpted below, on  Fri, 10 Jul 2009 03:50:39
-0500:

> On 2009-07-09 21:28, Duncan wrote:
>> Ron Johnson <ron.l.john...@cox.net> posted:
>> 
>>> I boosted dirty_ratio from 10 to 30 and dirty_background_ratio from 5
>>> to 10.
>>>
>>> Now tasks.nzb.tmp seems to get written 3x faster.
>>>
>>> Thanks!!!
>> 
>> Cool! =:^)
> 
> The problem now is that the nntp pipes are so fat that, "because" of
> multi-threading, the tasks.nzb.tmp flush starts again only 3-5 seconds
> after the previous flush completed.

OK, let's get some real side data to work with here:

vm.dirty_background_ratio and vm.dirt_ratio are % of RAM, which you said 
is 8 gig.  So every 1% is 80 meg (well, 81.92 meg), the default 5 & 10%  
are ~410 & ~820 meg, and your new 10 & 30% are ~820 meg & just under 2.5 
gig.

FWIW, note that there are, at least in newer kernels (this may be part of 
the mods made for 2.6.29 or 2.6.30, however), vm.dirty_bytes and 
vm.dirty_background_bytes.  These parallel the ratio settings above, and 
default to 0, thus using the ratio settings.  You may find it more useful 
to set the bytes instead of the ratio settings, thus avoiding the 
percentage of RAM conversion, if you prefer to work in bytes.  Of course, 
then you're working in 2^10 (1024) multiplier units instead of 10^3 
multiplier units, converting MB or GB to bytes, but it may still be 
easier.  It's up to you.  I think (but haven't tested) that if you set 
the bytes values manually, the ratio values should reset to zero, 
indicating they're not being used, just as the byte values are zero when 
ratios are being used.

Trying to push those further is pointless, since you're pushing the 
download data cache syncing as well, and that's what's going to be 
triggering it...

*IF* it's data size triggering at all, now.  See below.

Do you have any data on the thruput of your disk?  (You can get a 
reasonably realistic read benchmark with hdparm -t /dev/<whatever>.)  And 
how does that compare to the thruput on your Internet pipe?  Disk thruput 
once one gets past the small on-disk cache should run 50-100 MB/sec, 
depending on disk speed, tho 3-way striped or higher RAID arrays should 
push it higher.  Your inet pipe might be fast but you're one lucky 
bastard if it's faster than that (we're talking gigabit Ethernet speeds 
here), so it should be reasonably safe to assume the net /should/ be the 
bottleneck.

But anyway, taking those estimated disk speeds of 50-100 MB/sec, you're 
talking ~8-17 seconds to clear that 820 meg write backlog, let alone the 
2.5 gig backlog.

HOWEVER, I suspect that you're not hitting those caps now, but rather, 
the time based caps:

vm.dirty_*_centicecs, where * = writeback and expire.  The defaults there 
are 499 (5 sec) for writeback, the "background" setting, and 2999
(30 sec) for expire, the full priority setting.

Since you said 3-5 seconds, I'm guessing you're now hitting the 5 second 
writeback timer, not that 820 meg background background write trigger.

However, you don't want to increase them too much, because like the ratio 
triggers, if you get it too high, it'll just take "forever" to clear the 
backlog.  Additionally, the writeback and expire timers provide some 
glimpse of your data vulnerability window in the event of a crash (at 
least with data=ordered, and to a bit lessor extent with data=writeback 
on 2.6.30 and newer kernels).  So you don't want them too high as it'll 
both take longer to "catchup" when they do trigger, and expose you to a 
larger data vuln window in the event of a crash.

However, a writeback of 10 sec doesn't seem unreasonable to me, and is 
what I have here.  FWIW, I left the expire at 30 seconds.  But you may 
wish to push that to 15-20s or even 30s for writeback, and at least 
double that, but I'd suggest not more than a minute, for expire.

Of course the balance is up to you, but I expect at least 10s is a no-
brainer in your case, and 15-20 may well be preferred.  You may even try 
30, with a minute for the expire.

> This is why I was pushing for on-disk structures instead of having to
> flush memory structures to disk.
> 
> If SQLite is bad, then maybe Berkeley DB.  It's got rollbacks, fine
> grained locking.  KLibido, a KDE newsreader, uses it for a similar
> purpose.

FWIW, I used klibido for awhile, and loved all the download graphics eye 
candy.  If I was as strongly into binaries as you apparently are, I'd 
almost certainly favor it over pan, especially as I'm normally a KDE user.

However, the one thing that was irritating about it (besides no filters 
at all, back when I was using it, probably a couple years ago now), was 
its BerkDB usage.  BerkDB has a policy of incompatible minor updates, so 
every time I updated it, I'd have to recompile klibido.  But it wasn't 
just that.  There were patches required that were hard to get for 
klibido, at least on Gentoo.  It was rather frustrating.  That's why I 
ended up ultimately dropping it, as it wasn't worth hassling the updates 
for level of binary downloading I do.  Plus, the pan rewrite was going 
strong by then, with its automated multi-server support that old-pan (the 
C version, 0.14.x) lacked, and I had been using pan for years for text, 
and for binaries before klibido, so it was easy enough to switch to from 
klibido, as well.

> *Something* so that a 10KB change in tasks does not require a 300MB file
> to be written to disk.

I still believe that's barking up the wrong tree, at least for tasks.nzb 
(the 3+ gig header file for a single group is a different matter 
entirely).

Now, something that actually DOES make sense, would be what pan already 
does for the newsgroup state, don't write it back to disk for every 
single change!  Pan delays the read message tracking (newsrc) writes, and 
possibly the header file writes (I'm not sure) until group change.  FWIW, 
the rewrite used to delay it until pan exited entirely, but I appealed to 
Charles to return the pre-rewrite behavior of writing on group change, 
since losing read-message tracking on thousands of messages on tens of 
groups if pan crashed (or X, or the system, at the time, 2006, I was 
running an early unstable xorg with composite, which was crashing X on me 
pretty regularly, naturally taking pan with it) was NOT cool!

Now with tasks.nzb we probably don't want to wait that long, but pan 
could EASILY build up changes in memory and actually write out tasks.nzb 
every minute or so.  To me, that sounds MUCH more sane and easier to do 
that switching to a new proprietary tasks format at this point, and 
losing up to a minute's download on either dialup or broadband shouldn't 
be a big issue.  (That's why I didn't make it a set number of downloads; 
if it were adjusted for dialup, basically write at every update, like we 
have now, it'd be a performance killer for broadband, while if we 
adjusted for broadband, losing perhaps hours of state on dialup would be 
VERY frustrating, thus, make it write the file a maximum of every minute, 
or 30 seconds, or whatever, but NOT every single freakin' update!)

Actually, one would hope that'd be a reasonably small patch, likely able 
to go into K. Haley's git repo version.  If anyone with the coding skills 
is so motivated, of course...

>> BTW, I don't know what kernel you are on,
> 
> A home-rolled 2.6.28.
> 
>> uncovered and stop-gap fixed in kernel 2.6.29, fixed more correctly in
>> 2.6.30, and possibly settle-in fixes to in-development 2.6.31
> 
> I'll have to upgrade soon.

> I lost a WHOLE LOT of downloads from an ext4-over-lvm2 mishap.  Must
> have been an unclean (or non-existent) umount on shutdown.

Just be sure to consider the default switch to data=writeback for ext3 on 
2.6.30, and act accordingly.  (If you decide to, you can set data=ordered 
in either the mount options, or the filesystem superblock itself, using 
tune2fs.)  I'm not sure the fixes for data=writeback that help for 
normally tiny config files, are going to do much good for massive 
downloads of the type we are discussing here, and it'd be a shame to have 
ext3 dump on you too, after the update and resulting switch to 
data=writeback by default.

But other than that, you may well find the changes for 2.6.30 make all 
this other tweaking unnecessary.  The improvement has been dramatic, 
according to the benchmarks, tho the worst case and what they targeted 
for improvement was rather different usage scenarios than what you're 
doing.

Also, I was a naysayer on ext4 before, but with 2.6.31, the change rate 
is slow enough it's looking to be a reasonably mature fs now, and (were I 
on ext3 not reiserfs) presuming that holds, I'd consider it for myself 
starting with either 2.6.31 full release, or by 2.6.32 full release.  I 
would, however, use data=ordered on it, just as I do for reiserfs (where 
it's still the default) and would for ext3.  Except for partitions where 
I was OK with losing anything on them (/tmp, any local cache of Internet 
data that's trivially redownloadable...).  data=writeback is fine for 
them.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

_______________________________________________
Pan-users mailing list
Pan-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/pan-users

[Pan-users] Re: Forcing an expire?

Reply via email to