Jurgen Defurne posted on Sun, 22 Aug 2010 18:10:14 +0200 as excerpted: > I am a regular user of Pan for some high technical newsgroups. > > What I would like is to have the contents of these groups as a local > archive which can be searched using Pan. > > I have already tried two ways to do this. The first one was using 'Cache > Article' after selecting all articles, but it seems that when the cache > gets beyond a certain size, older cached articles disappear. > > I am now trying with 'Save Articles...', but this creates one file, > which cannot be incrementally updated. > > What other (simple, preferably) possibilities do there exist, not > necessarily using Pan for storage, but certainly for reading and > searching?
You're running into pan's default cache size limit, 10 MB. That setting, like several others, *IS* available in the config files, but is not made available in the GUI, basically because while pan only requires gtk+, it's a gnome family app, and gnome in general caters to the "simple" users who are apparently afraid of too many config options, even when they'd be seriously useful for some users! (FWIW, that's one reason that despite all the problems with kde4, I'm still a kde user -- kde's comparable policy is to create a generally sane default, but expose far more options in the configuration for those who wish to use them. But knode doesn't handle binaries as well as pan does and klibido handles binaries but not text, and I'm not sure if it was ported to kde4, either, so pan it is.) Anyway, desktop environment politics aside... As you may know, pan's config and data are stored in ~/.pan2/ by default. In that directory (or whatever one you have pan's files stored in, if you've made use of the PAN_HOME environmental variable to point pan at a different location, find preferences.xml. As usual, if you're going to edit config files, do so with the app you're editing the config for, pan in this case, closed. In preferences.xml, the preferences are grouped by type, and then alphabetically by name. Look for type int, name "cache-size-megs". Make it whatever integer number of megs you like. Here, I make use of the PAN_HOME environmental variable I mentioned to run multiple pan "instances", each pointed at a different data dir. The way I have it setup, I have one for text groups, one for binaries, and a third for testing, but of course, you can split it up however you like. I mention this by way of explaining how and why I have multiple preferences.xml files, each with a different cache size. For my text groups instance, I have: <int name='cache-size-megs' value='5120'/> Since those groups are mostly text and I've set the expiration to none for the servers in that instance, I have posts going back years in some groups (to when the pan C++ rewrite was introduced with 0.90, as it changed file formats for a number of things, actually, back further than that on some gmane.org groups/lists, gmane of course being a list2news archive and gateway, presenting a whole bunch of mailing lists as newsgroups, with unexpiring posts), and the cache is still only ~2 gig, so I'm a long way from maxing it out. The test instance is I think still at default. I use a separate test instance so I can visit groups without subscribing, say if someone reports a problem post that I want to try, and not have pan storing information about groups I don't really care about and am not subscribed to, in my other instances. The binaries instance has a cache on a dedicated 12 gig partition, so I've set its cache size to an arbitrary number, a bit above 12 gigs. <int name='cache-size-megs' value='12500'/> And while I've not actually done binaries in some time (it seems I've just too many other things I find interesting to do, and just never get to it), I have actually tested that 12 gig a few times, some years ago. Pan handles it fine, or at least did, back then. So provided you set unexpiring for your server(s), you shouldn't have a problem setting a cache size into the double-digit gigs if necessary, or maintaining an archive going back as far as you can get messages, without them expiring locally, just because they expire on whatever server you're using. The one caveat I have noticed is that the more data you keep around, the longer pan takes to load up, especially from cold disk cache. My way around that has been to assign pan its own dedicated desktop (kwin allows you to configure specific apps to always appear on a specific desktop, and that's what I do with pan), and to start it when I start X/KDE, keeping it running pretty much all the time I'm in X, so it only shuts down when I shut down X/KDE. If you like, you can put pan on its own partition, and periodically back it up, then wipe the partition and copy everything back, thus defragging it, speeding up initial load. Also, I run a 4-disk kernel/md RAID-1 now, but previously ran a RAID-6, which with four spindles, is effectively two-way striped for read access. To my surprise, reading multiple files as is the case when pan is loading, the kernel is good enough at scheduling parallel I/O on the RAID-1 that it NOTICEABLY shrank my load time when I switched to that, as compared to the RAID-6. I had thought that the RAID-6 would be faster due to the effective two-way-striping for read access, but I was wrong, the kernel's good enough at scheduling on the RAID-1 that it apparently keeps all four disks reading data in parallel, so pan loads faster from mirrored RAID than from striped RAID. .... That's one option, all-pan. The other option would be to run a personal news-server installation, like leaf-node. Leaf-node would download the messages to your local disk and store them there, then serve them locally to pan. Doing it this way, you could leave pan's cache size untouched (or maybe even shrink it), and point it at your local server instead of the remote. You'd still set pan not to expire articles, so it'd keep its article index intact, but it wouldn't need a big cache, since it's pulling from the local leaf-node (or whatever) server anyway. You'd then set leafnode to unexpiring as well, so it continued to retain articles back as far as you could get. One advantage to this, if you're doing enough binaries that you're waiting on pan to download, anyway, is that the local server would presumably be running all the time in the background, downloading messages as they came in, so they'd always be available virtually instantly from pan, since they're already stored locally. No waiting on the network connection to the server. But that's probably not that significant an issue unless you're still on an analog modem dialup connection, because anything much faster than that, and if you're downloading enough data that you're waiting on the network, you'll quickly be looking for more room for your archive -- which will soon measure in terabytes, not gigabytes. However, there's another possible advantage, as well. Pan's loadup should be faster if it's only caching the default 10 MB. ..... Meanwhile, pan does have one serious limitation, in terms of search (and of scoring). It only scores and searches the message overviews -- basically, the information in the header pane, author, subject, etc (tho message-ids are also in the overviews and form the basis for the watch/ kill/score thread feature). Pan is unable to score or search on actual message content. If you're happy with pan's searching already, and just need a larger cache to search on, that's fine. But do be aware that if you do want/need to search on message content, you'll need to use something else. Of course, with kde's nepomuk/strigi indexing (what I'm familiar with since I run kde), or beagle (AFAIK the gnome indexer), or google-desktop indexing, or whatever, you can point that at either the pan cache (for option one above) or leaf-node's cache (for option two), and get full content indexing, if that's what you want. You can then open the file using whatever editor you have associated, find the subject and date info, and use pan to view the whole thread in context, if desired. So pan can still be used to view the thread, once you find a post that interests you based on content. It's just that if you do want full post content search, not just subject/author search, you'll need to use something else for the initial search, and can then open the thread in pan if you like. If that's a limitation you can live with, great. Otherwise, you should probably look for a different news client, one with full post search capability. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/pan-users