Hello Duncan, a detailed answer as ususal!
Forgive me if I do not follow standard answering methods and summerize here: Pan takes approx. 10 minutes to start, ls -lh newsrc-1 --> 567M (a manual edit would be an enormous task and I've _never_ done any scripting before) renamed article-cache directory and reduced cache size to 10 MB (from 100) only one news server less than a dozen groups subscribed (a few text, others binaries) my working style: read new headers, decide which ones might be of interest and save them to disk; I never go back to yesterdays or older articles I don't use scores (or I'm not aware off) Should I remove/rename .pan2 directory and start from scratch? kr Heinz Am Mittwoch, den 06.07.2016, 06:22 +0000 schrieb Duncan: > Heinz Mezera posted on Tue, 05 Jul 2016 12:47:21 +0200 as excerpted: > > > > > Hello pan-users, > > > > does the size of newsrc-1 influence pan's time to start, to quit or > > its > > performance? > > > > I use Ubuntu's 16.04 version of pan (0.139-5build1) and it takes > > rather > > long until pan appears on Ubuntu's desktop. > > > > Can I compact newsrc-1 or reduce its size somehow? > I suspect your problem isn't the newsrc file, but something else... > [discussed below, but first...] > > To answer your question somewhat directly, however, the newsrc > file(s, > one per configured server) can indeed be compacted some, and that > /might/ > affect startup time, tho in my own experience there's a far worse > trigger > of startup delay that I suspect is the real problem. However, the > newrc > files can be made more efficient. > > These newsrc files follow a standard text-based format and can be > edited > using a standard text editor. As always, making a backup of the > unaltered file before you begin is recommended, just in case you > screw up > the edits. > > Rather than describe in detail the format, I'll simply provide you a > google link... > > https://www.google.com/search?q=newsrc+file+format > > There is however one caveat about pan's usage. (Current) Pan doesn't > use > the subscription info in the newsrc (tho old C-based pan, 0.14.x, > did, > before the C++ rewrite), because a newsrc is inherently single- > server, > and pan's subscriptions apply across all configured servers that > carry > the group. So pan uses a different method to track group > subscriptions. > > What pan /does/ track in the newsrcs, however, is the per-server per- > newsgroup article sequence numbers, so it knows which ones on each > server > you've already seen so it knows not to download those headers again. > > It's this sequence of comma-separated article numbers that appears at > the > end of the newsrc line for any group you've visited (or seen a cross- > posted message in). > > And you can consolidate these article numbers lists by removing the > gaps > and making the ranges continuous. > > It's worth noting that news servers initially communicate what they > currently have using only a high-water and a low-water mark, plus an > / > estimated/ count of the number of messages available, with that > estimate > allowed to be /more/ than the number of currently available messages, > but > never /less/. These are IOW the lowest numbered message still > available > (unexpired), and the highest numbered message available (the latest > message to arrive), plus the estimate. Missing article numbers > between > the high and low water marks are specifically allowed -- this lets > servers remove messages reported as spam or as copyright violations, > etc. Sometimes these missing messages will be filled in later (some > servers are infamous for doing this, infamous because it screws up > some > news clients). Often they're not. > > And it's these gaps in the server store, along with simply not > visiting > the newsgroup for longer than its expiration period if your server > does > expire messages (some dedicated news service providers effectively > don't > expire messages, these days), that appear as gaps in pan's sequence > number lists -- because it never saw those messages. > > > Now, if you're reasonably sure your server doesn't fill in article > sequence numbers, only ever increasing them, or if you simply don't > care > to see what are likely old messages if they are filled in, you can > cut > out all the commas and make the list a single range, from 1 or > whatever > the lowest number is in the existing list, to the highest number. If > the > server does do fill-ins, you might still be able to make the oldest > messages a continuous range, while leaving the gaps in anything > newer > than say a month old, just in case. > > So, to take one example line from the linuxtopia google hit (the > first > hit in the google above, as a write this, note that this page is from > a > book copyrighted in 2003, and its mention of pan as an exception to > the > newsrc format is... dated, pan does use the format now): > > news.software.readers! 1-95504,137265,137274,140059,140091,140117 > > You can edit that to: > > news.software.readers! 1-140117 > > Much shorter! =:^) > > Unfortunately, if you follow a lot of groups, all that manual > editing > could be a big chore (unless you can figure out a nice script to > automate > the process, should be possible), with, I suspect, rather limited > results > in terms of startup. > > > Instead, what I've found to take the real time, particularly on > spinning > rust drives (I'm on SSD now and haven't had to worry about it since > I > upgraded to SSD), is large message caches. > > Note that pan's cache size is configurable, but defaults to 10 MB > which > shouldn't be an issue, but also will start dumping already > downloaded > articles to make room for more, particularly if you do binaries, > rather > quickly. For a usage pattern that saves off attachments directly, > with > no further use for the messages in cache after that, 10 MB is > fine. For > a usage pattern more like mine, however, where I tend to download a > bunch > of stuff to cache so it's local, and then go thru it later, a cache > size > of several GB may be more appropriate. Similarly, if you have > groups > that you effectively archive, keeping all messages without expiring > them > at all, as I do with my text groups, a cache of several gigs will > likely > hold several years worth of text-group messages. (I have text > messages > going back to 2002 in some groups. My cache for my text-groups pan > instance[1] is, as of now, 1.4 GiB, so the average usage is 100 > MB/year.) > > Once that cache gets to a few hundred MiB, you'll start noticing pan > startup gets slower and slower on *first* startup, as the cache gets > bigger and bigger. (Pan will start up faster after the first start, > since everything's already cached. At least it will if you have > enough > memory to cache into RAM the full pan message cache. If you're > running 1 > GiB or less of RAM... probably not so much.) This is because pan > loads > those messages every time it starts, in ordered to rethread them -- > it > keeps track of message threading in memory. > > Back when I was on spinning rust, I found a few ways to deal with > this. > > One was, set pan to start with my X user session, so it could grind > away > for several minutes loading stuff while I did other things. A few > minutes later when I had completed other tasks, pan would generally > be up > (in the system tray) and ready to go. I'd normally keep pan running > constantly, in the system tray, until I was ready to end the user X > session. > > Another I found quite by accident. I periodically do backups of the > multiple partitions on my system, and every few years, I'll boot to > the > backup, wipe away the normal working partition, and copy things back > from > the backup to the working copy, renewing it. > > I found that at least with some filesystems (I was using reiserfs at > the > time), pan evidently fragments the cache files rather heavily. I > believe > this is most likely to happen when multiple threads are downloading > files > at once, writing them in parallel and fragmenting them in the > process. > > By backing up the cache files, erasing the working cache copy, and > copying everything back into place, the new copy was defragmented due > to > the copy process, and pan started up much faster after that, even tho > it > still had the same size cache. > > Of course over time it slowed down again as I added new messages to > my > newsgroup archive, but now that I knew the trick, I could defrag the > cache any time the start time got too long, and pan would startup > faster > again. > > And of course as I mentioned, putting it on SSD sped things up > dramatically, because ssds have zero seek time, so fragmentation > doesn't > affect them anything close to as badly (tho it can still have some > effect > due to IOPs per file increasing with the number of fragments). > > > That's what definitely took the load time for me, pan reading all > those > files from cache into memory, so it could rethread them. > > There's a simple way to confirm whether this is your problem or > not. > With pan closed, simply rename the article-cache directory to > something > else, so pan will recreate a new, empty cache, when it starts. If > the > cache is your slowdown, pan should start much faster, likely nearly > instantly, with no cache to load. > > Tho of course if you've never upped your cache size from the default > 10 > MB, the cache is unlikely to be the problem, and you probably won't > notice a difference with the above test. > > > Finally, I should mention that a big scorefile will slow pan down at > startup. There are ways to dramatically optimize the scorefile, but > that's a different subject, that we can deal with later if you find > it to > be the problem. Meanwhile, however, you can test it using the same > technique I suggested above for testing the cache. Simply rename > the > scorefile and see if pan starts faster with an empty one. If the > scorefile turns out to be your problem, post back with the results > and we > can deal with that, then. > > --- > [1] Text-groups pan instance: It is possible to have several > separately > configured pan instances, each with their own configuration and > cache. > ~/.pan2/ is only the default location. If the $PAN_HOME variable is > found to be set in pan's environment as it starts, it will use the > location found in that variable as its configuration and cache home, > instead. I've taken advantage of this to setup a number of pan > wrapper > scripts here, pan.text, pan.test, and pan.bin, that each point at a > different config and cache. This lets me manage my unexpiring text- > group- > archive cache separately from my binaries cache, also unexpiring and > set > rather large, but cleared manually from time to time. > _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users