Heinz Mezera posted on Tue, 05 Jul 2016 12:47:21 +0200 as excerpted: > Hello pan-users, > > does the size of newsrc-1 influence pan's time to start, to quit or its > performance? > > I use Ubuntu's 16.04 version of pan (0.139-5build1) and it takes rather > long until pan appears on Ubuntu's desktop. > > Can I compact newsrc-1 or reduce its size somehow?
I suspect your problem isn't the newsrc file, but something else... [discussed below, but first...] To answer your question somewhat directly, however, the newsrc file(s, one per configured server) can indeed be compacted some, and that /might/ affect startup time, tho in my own experience there's a far worse trigger of startup delay that I suspect is the real problem. However, the newrc files can be made more efficient. These newsrc files follow a standard text-based format and can be edited using a standard text editor. As always, making a backup of the unaltered file before you begin is recommended, just in case you screw up the edits. Rather than describe in detail the format, I'll simply provide you a google link... https://www.google.com/search?q=newsrc+file+format There is however one caveat about pan's usage. (Current) Pan doesn't use the subscription info in the newsrc (tho old C-based pan, 0.14.x, did, before the C++ rewrite), because a newsrc is inherently single-server, and pan's subscriptions apply across all configured servers that carry the group. So pan uses a different method to track group subscriptions. What pan /does/ track in the newsrcs, however, is the per-server per- newsgroup article sequence numbers, so it knows which ones on each server you've already seen so it knows not to download those headers again. It's this sequence of comma-separated article numbers that appears at the end of the newsrc line for any group you've visited (or seen a cross- posted message in). And you can consolidate these article numbers lists by removing the gaps and making the ranges continuous. It's worth noting that news servers initially communicate what they currently have using only a high-water and a low-water mark, plus an / estimated/ count of the number of messages available, with that estimate allowed to be /more/ than the number of currently available messages, but never /less/. These are IOW the lowest numbered message still available (unexpired), and the highest numbered message available (the latest message to arrive), plus the estimate. Missing article numbers between the high and low water marks are specifically allowed -- this lets servers remove messages reported as spam or as copyright violations, etc. Sometimes these missing messages will be filled in later (some servers are infamous for doing this, infamous because it screws up some news clients). Often they're not. And it's these gaps in the server store, along with simply not visiting the newsgroup for longer than its expiration period if your server does expire messages (some dedicated news service providers effectively don't expire messages, these days), that appear as gaps in pan's sequence number lists -- because it never saw those messages. Now, if you're reasonably sure your server doesn't fill in article sequence numbers, only ever increasing them, or if you simply don't care to see what are likely old messages if they are filled in, you can cut out all the commas and make the list a single range, from 1 or whatever the lowest number is in the existing list, to the highest number. If the server does do fill-ins, you might still be able to make the oldest messages a continuous range, while leaving the gaps in anything newer than say a month old, just in case. So, to take one example line from the linuxtopia google hit (the first hit in the google above, as a write this, note that this page is from a book copyrighted in 2003, and its mention of pan as an exception to the newsrc format is... dated, pan does use the format now): news.software.readers! 1-95504,137265,137274,140059,140091,140117 You can edit that to: news.software.readers! 1-140117 Much shorter! =:^) Unfortunately, if you follow a lot of groups, all that manual editing could be a big chore (unless you can figure out a nice script to automate the process, should be possible), with, I suspect, rather limited results in terms of startup. Instead, what I've found to take the real time, particularly on spinning rust drives (I'm on SSD now and haven't had to worry about it since I upgraded to SSD), is large message caches. Note that pan's cache size is configurable, but defaults to 10 MB which shouldn't be an issue, but also will start dumping already downloaded articles to make room for more, particularly if you do binaries, rather quickly. For a usage pattern that saves off attachments directly, with no further use for the messages in cache after that, 10 MB is fine. For a usage pattern more like mine, however, where I tend to download a bunch of stuff to cache so it's local, and then go thru it later, a cache size of several GB may be more appropriate. Similarly, if you have groups that you effectively archive, keeping all messages without expiring them at all, as I do with my text groups, a cache of several gigs will likely hold several years worth of text-group messages. (I have text messages going back to 2002 in some groups. My cache for my text-groups pan instance[1] is, as of now, 1.4 GiB, so the average usage is 100 MB/year.) Once that cache gets to a few hundred MiB, you'll start noticing pan startup gets slower and slower on *first* startup, as the cache gets bigger and bigger. (Pan will start up faster after the first start, since everything's already cached. At least it will if you have enough memory to cache into RAM the full pan message cache. If you're running 1 GiB or less of RAM... probably not so much.) This is because pan loads those messages every time it starts, in ordered to rethread them -- it keeps track of message threading in memory. Back when I was on spinning rust, I found a few ways to deal with this. One was, set pan to start with my X user session, so it could grind away for several minutes loading stuff while I did other things. A few minutes later when I had completed other tasks, pan would generally be up (in the system tray) and ready to go. I'd normally keep pan running constantly, in the system tray, until I was ready to end the user X session. Another I found quite by accident. I periodically do backups of the multiple partitions on my system, and every few years, I'll boot to the backup, wipe away the normal working partition, and copy things back from the backup to the working copy, renewing it. I found that at least with some filesystems (I was using reiserfs at the time), pan evidently fragments the cache files rather heavily. I believe this is most likely to happen when multiple threads are downloading files at once, writing them in parallel and fragmenting them in the process. By backing up the cache files, erasing the working cache copy, and copying everything back into place, the new copy was defragmented due to the copy process, and pan started up much faster after that, even tho it still had the same size cache. Of course over time it slowed down again as I added new messages to my newsgroup archive, but now that I knew the trick, I could defrag the cache any time the start time got too long, and pan would startup faster again. And of course as I mentioned, putting it on SSD sped things up dramatically, because ssds have zero seek time, so fragmentation doesn't affect them anything close to as badly (tho it can still have some effect due to IOPs per file increasing with the number of fragments). That's what definitely took the load time for me, pan reading all those files from cache into memory, so it could rethread them. There's a simple way to confirm whether this is your problem or not. With pan closed, simply rename the article-cache directory to something else, so pan will recreate a new, empty cache, when it starts. If the cache is your slowdown, pan should start much faster, likely nearly instantly, with no cache to load. Tho of course if you've never upped your cache size from the default 10 MB, the cache is unlikely to be the problem, and you probably won't notice a difference with the above test. Finally, I should mention that a big scorefile will slow pan down at startup. There are ways to dramatically optimize the scorefile, but that's a different subject, that we can deal with later if you find it to be the problem. Meanwhile, however, you can test it using the same technique I suggested above for testing the cache. Simply rename the scorefile and see if pan starts faster with an empty one. If the scorefile turns out to be your problem, post back with the results and we can deal with that, then. --- [1] Text-groups pan instance: It is possible to have several separately configured pan instances, each with their own configuration and cache. ~/.pan2/ is only the default location. If the $PAN_HOME variable is found to be set in pan's environment as it starts, it will use the location found in that variable as its configuration and cache home, instead. I've taken advantage of this to setup a number of pan wrapper scripts here, pan.text, pan.test, and pan.bin, that each point at a different config and cache. This lets me manage my unexpiring text-group- archive cache separately from my binaries cache, also unexpiring and set rather large, but cleared manually from time to time. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users