[EMAIL PROTECTED] posted [EMAIL PROTECTED], excerpted below, on Tue, 09 Dec 2008 04:27:04 +0100:
> I have a quad core with 4 GB of memory running kubunty hardy. > > I do follow a large set of newsgroups with lots of articles. > > it seems to me that PAN could use some performance tuning. I cannot > believe that it take so long to sort some 1000000 articles. It are just > strings ? > > Sometimes, while switching between groups, Pan eats about 80% CPU for > 15-30 seconds, effectively eating up my PC. Also it used about 1GB of > non-shared memory. There was a bug with Ubuntu (Hardy I think but I'm not a *buntu person and IDR for sure) where pan on *GNOME* (NOT KDE, and NOT XFCE) would become non-responsive for half an hour or more at times. However, the same folks that saw that said it went away when they switched to KDE or XFCE (thus the NOTs above). I guessed it was a shared library issue but it was never traced to one, only to GNOME. But what you're seeing is normal. Keep in mind that if pan is saying a million articles, that's after combining multiparts. In some groups, that could mean ten or fifty million actual single-part articles. Do you have pan set to download new overviews/headers when you enter the group? It saves the current set including threading when you exit a group, and only has to load that from disk, but when you pull down new overviews/headers (including when you enter a group if you have pan set to do it then), it must process all the new ones that come in, figuring out where they plugin to the existing set, combining multiparts, etc. Also, as Jim mentions, old-pan was /seriously/ scale challenged, and would start having trouble at 100k individual overviews/headers (it didn't combine them like new-pan does). A couple million... even if you had gigs and gigs of RAM, would sit and churn for an hour or more, and forget anything above that. It just didn't scale well at all, and a couple million overviews was seriously pushing it, period. New-pan (what *buntu ships) had a quite a lot of serious work go into it to improve memory use and scaling, and it actually does quite well now, in general scaling linearly or better. With a reasonable amount of memory it'll handle 10 or 20 million overviews without issue, tho processing that many overviews does take time/memory/cpu, no way around it. As I said, if you're working in a multipart group (say mp3s) and pan says a million headers unread, that's likely to be a good ten million individual message parts, more on movie and iso groups, less on jpeg groups. One of the things pan now does to save memory is track strings and combine where possible. This is why it displays multiparts as a single part, thus being able to track the subject and author only once for the multipart. However, it does more than that. If you look at pan's data files, it counts the number of times an author's name occurs, for instance, and for regulars will store it in memory only once, using a much shorter reference the additional times. All this sort of stuff it sorts out and plugs into its database system as it's downloading the overviews. Then when you leave the group, it saves it and pulls the next group's data off of disk. It may be just strings, but you try working with a few million times say a kilobyte of data in headers each (a million times 1 KB in headers each, that's a gig right there!), and tell me when you're done that you still can't believe it takes pan a half a minute and a gig of memory to process it all! But if you code and are good with database type stuff, and can make it more efficient, I'm sure Charles would like to see it. I know there were 2-3 database coder guys that experimented with various enhancements with old-pan, with the results reflected in the changes made to handing in new- pan, but if you believe it's possible to do better, please see what you can do, and if it's actually better in practice, by all means, file a bug with the patches and let Charles know. It's not like any of us are going to complain if it's made faster or less memory intensive! =:^) Meanwhile, how do you monitor CPU usage? Are you monitoring it per core, or overall only? Most of new-pan is single-threaded, because Charles had gone with multi-threaded in old-pan and found the complexity and thread- race bugs just not worth it for the limited increase in performance. Instead, new-pan now hatches threads only in limited performance critical sections (like when starting multiple connections at once, one place I know it's used as I remember Charles fixing a bug I had with it). So pan will likely be using near 100% of a single core, but the others should remain mostly idle, I /think/. (It has been awhile since I did binaries and IDR for sure.) Also, it may be disk I/O related, if you have a single disk only and that group's data isn't in cache yet. I run a dual dual-core Opteron 290 (2.8 GHz) here, so have four cores too, but I'm running Gentoo/~amd64 with everything compiled to my specific hardware, which will help some (BTW, you didn't mention whether you were running 32-bit or 64-bit kubuntu, 4 gigs on 32-bit is going to be less efficient than 4 gigs on 64-bit), and I run a 4-disk kernel/md RAID, with pan's data on RAID-6, which means it's two-way striped. RAID striping really /does/ help, and not just with pan; you might be surprised how much. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/pan-users