Ron Johnson <ron.l.john...@cox.net> posted 4a5700df.9060...@cox.net, excerpted below, on Fri, 10 Jul 2009 03:50:39 -0500:
> On 2009-07-09 21:28, Duncan wrote: >> Ron Johnson <ron.l.john...@cox.net> posted: >> >>> I boosted dirty_ratio from 10 to 30 and dirty_background_ratio from 5 >>> to 10. >>> >>> Now tasks.nzb.tmp seems to get written 3x faster. >>> >>> Thanks!!! >> >> Cool! =:^) > > The problem now is that the nntp pipes are so fat that, "because" of > multi-threading, the tasks.nzb.tmp flush starts again only 3-5 seconds > after the previous flush completed. OK, let's get some real side data to work with here: vm.dirty_background_ratio and vm.dirt_ratio are % of RAM, which you said is 8 gig. So every 1% is 80 meg (well, 81.92 meg), the default 5 & 10% are ~410 & ~820 meg, and your new 10 & 30% are ~820 meg & just under 2.5 gig. FWIW, note that there are, at least in newer kernels (this may be part of the mods made for 2.6.29 or 2.6.30, however), vm.dirty_bytes and vm.dirty_background_bytes. These parallel the ratio settings above, and default to 0, thus using the ratio settings. You may find it more useful to set the bytes instead of the ratio settings, thus avoiding the percentage of RAM conversion, if you prefer to work in bytes. Of course, then you're working in 2^10 (1024) multiplier units instead of 10^3 multiplier units, converting MB or GB to bytes, but it may still be easier. It's up to you. I think (but haven't tested) that if you set the bytes values manually, the ratio values should reset to zero, indicating they're not being used, just as the byte values are zero when ratios are being used. Trying to push those further is pointless, since you're pushing the download data cache syncing as well, and that's what's going to be triggering it... *IF* it's data size triggering at all, now. See below. Do you have any data on the thruput of your disk? (You can get a reasonably realistic read benchmark with hdparm -t /dev/<whatever>.) And how does that compare to the thruput on your Internet pipe? Disk thruput once one gets past the small on-disk cache should run 50-100 MB/sec, depending on disk speed, tho 3-way striped or higher RAID arrays should push it higher. Your inet pipe might be fast but you're one lucky bastard if it's faster than that (we're talking gigabit Ethernet speeds here), so it should be reasonably safe to assume the net /should/ be the bottleneck. But anyway, taking those estimated disk speeds of 50-100 MB/sec, you're talking ~8-17 seconds to clear that 820 meg write backlog, let alone the 2.5 gig backlog. HOWEVER, I suspect that you're not hitting those caps now, but rather, the time based caps: vm.dirty_*_centicecs, where * = writeback and expire. The defaults there are 499 (5 sec) for writeback, the "background" setting, and 2999 (30 sec) for expire, the full priority setting. Since you said 3-5 seconds, I'm guessing you're now hitting the 5 second writeback timer, not that 820 meg background background write trigger. However, you don't want to increase them too much, because like the ratio triggers, if you get it too high, it'll just take "forever" to clear the backlog. Additionally, the writeback and expire timers provide some glimpse of your data vulnerability window in the event of a crash (at least with data=ordered, and to a bit lessor extent with data=writeback on 2.6.30 and newer kernels). So you don't want them too high as it'll both take longer to "catchup" when they do trigger, and expose you to a larger data vuln window in the event of a crash. However, a writeback of 10 sec doesn't seem unreasonable to me, and is what I have here. FWIW, I left the expire at 30 seconds. But you may wish to push that to 15-20s or even 30s for writeback, and at least double that, but I'd suggest not more than a minute, for expire. Of course the balance is up to you, but I expect at least 10s is a no- brainer in your case, and 15-20 may well be preferred. You may even try 30, with a minute for the expire. > This is why I was pushing for on-disk structures instead of having to > flush memory structures to disk. > > If SQLite is bad, then maybe Berkeley DB. It's got rollbacks, fine > grained locking. KLibido, a KDE newsreader, uses it for a similar > purpose. FWIW, I used klibido for awhile, and loved all the download graphics eye candy. If I was as strongly into binaries as you apparently are, I'd almost certainly favor it over pan, especially as I'm normally a KDE user. However, the one thing that was irritating about it (besides no filters at all, back when I was using it, probably a couple years ago now), was its BerkDB usage. BerkDB has a policy of incompatible minor updates, so every time I updated it, I'd have to recompile klibido. But it wasn't just that. There were patches required that were hard to get for klibido, at least on Gentoo. It was rather frustrating. That's why I ended up ultimately dropping it, as it wasn't worth hassling the updates for level of binary downloading I do. Plus, the pan rewrite was going strong by then, with its automated multi-server support that old-pan (the C version, 0.14.x) lacked, and I had been using pan for years for text, and for binaries before klibido, so it was easy enough to switch to from klibido, as well. > *Something* so that a 10KB change in tasks does not require a 300MB file > to be written to disk. I still believe that's barking up the wrong tree, at least for tasks.nzb (the 3+ gig header file for a single group is a different matter entirely). Now, something that actually DOES make sense, would be what pan already does for the newsgroup state, don't write it back to disk for every single change! Pan delays the read message tracking (newsrc) writes, and possibly the header file writes (I'm not sure) until group change. FWIW, the rewrite used to delay it until pan exited entirely, but I appealed to Charles to return the pre-rewrite behavior of writing on group change, since losing read-message tracking on thousands of messages on tens of groups if pan crashed (or X, or the system, at the time, 2006, I was running an early unstable xorg with composite, which was crashing X on me pretty regularly, naturally taking pan with it) was NOT cool! Now with tasks.nzb we probably don't want to wait that long, but pan could EASILY build up changes in memory and actually write out tasks.nzb every minute or so. To me, that sounds MUCH more sane and easier to do that switching to a new proprietary tasks format at this point, and losing up to a minute's download on either dialup or broadband shouldn't be a big issue. (That's why I didn't make it a set number of downloads; if it were adjusted for dialup, basically write at every update, like we have now, it'd be a performance killer for broadband, while if we adjusted for broadband, losing perhaps hours of state on dialup would be VERY frustrating, thus, make it write the file a maximum of every minute, or 30 seconds, or whatever, but NOT every single freakin' update!) Actually, one would hope that'd be a reasonably small patch, likely able to go into K. Haley's git repo version. If anyone with the coding skills is so motivated, of course... >> BTW, I don't know what kernel you are on, > > A home-rolled 2.6.28. > >> uncovered and stop-gap fixed in kernel 2.6.29, fixed more correctly in >> 2.6.30, and possibly settle-in fixes to in-development 2.6.31 > > I'll have to upgrade soon. > I lost a WHOLE LOT of downloads from an ext4-over-lvm2 mishap. Must > have been an unclean (or non-existent) umount on shutdown. Just be sure to consider the default switch to data=writeback for ext3 on 2.6.30, and act accordingly. (If you decide to, you can set data=ordered in either the mount options, or the filesystem superblock itself, using tune2fs.) I'm not sure the fixes for data=writeback that help for normally tiny config files, are going to do much good for massive downloads of the type we are discussing here, and it'd be a shame to have ext3 dump on you too, after the update and resulting switch to data=writeback by default. But other than that, you may well find the changes for 2.6.30 make all this other tweaking unnecessary. The improvement has been dramatic, according to the benchmarks, tho the worst case and what they targeted for improvement was rather different usage scenarios than what you're doing. Also, I was a naysayer on ext4 before, but with 2.6.31, the change rate is slow enough it's looking to be a reasonably mature fs now, and (were I on ext3 not reiserfs) presuming that holds, I'd consider it for myself starting with either 2.6.31 full release, or by 2.6.32 full release. I would, however, use data=ordered on it, just as I do for reiserfs (where it's still the default) and would for ext3. Except for partitions where I was OK with losing anything on them (/tmp, any local cache of Internet data that's trivially redownloadable...). data=writeback is fine for them. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/pan-users