Pedro via Pan-users posted on Tue, 22 Oct 2024 11:43:00 +1000 as excerpted:
> I would like an option to show how many headers they are, and to > download a range. > > the current (and always) options were all headers, or last 100k > > I have tried last 1000000000000 headers. > > it always locks up. loses what it had downloaded. Are you tracking memory usage? Pan's likely just... running out of memory! See the (much longer!) discussion below. > so an option would be on the fly saving what it has got > > and not to try again for those ones. > > we have the drive space now for gigabytes of headers. > > any other tips or workarounds appreciated pan keeps (and has always kept) headers in memory, building a threading model as it goes, and actually rebuilds it (tho with some optimizations on reload) every time it restarts, so keep enough history around and particularly on spinning rust, you *WILL* notice pan taking some time to load up. (I effectively archive, unexpiring-cache, a number of text groups including this mailing list via gmane.io's list2news service. Back before I switched to ssd, pan was taking 10 minutes to start, so I actually scripted a system-service to cat the entire pan-text-instance cache to /dev/null, thus caching it in RAM. Then I had pan set to start with my kde login and would only shut it down momentarily between restarts, thus keeping stuff cached and the pan restart time to normally a few seconds. With ssds I've not really seen the problem with the handful of text groups I archive, tho I suspect I probably would again, either with cache-loading speed or just the shear scaling of memory issue under discussion here, if I was working with multiple millions of headers as is typical on high- retention binary groups.) Over the years pan's memory handling has triggered issues, the earliest for 32-bit, where memory address space is limited to 4 GB and depending on kernel build options individual application memory is often limited to 2 GB (50/50 kernel/userspace split), tho at some efficiency loss when switching between user/kernelspace it's possible to do 4G/4G with userspace and kernelspace each having 4GB addressable and switching between them. I remember back then, when 32-bit was still king and people were running into issues with 2 GB RAM, the complaint was pan couldn't do much over 100K headers (I'm not sure if that has anything to do with the 100K default, but it could). Some optimizations later (combining string segments so frequent poster strings are stored once and referenced, with big series where much of the subject line is duplicated similarly handled, for instance), the cap was a bit north of 200K for most people. But by then 64-bit amd64/x86_64 was becoming more common, along with 8 GB memory systems, and the complaints mostly disappeared for awhile. 64-bit doesn't have that address-space limitation, but even with those optimizations pan does still run into scaling issues due to memory usage, generally somewhere above a million headers but for most systems (I believe) near 200 million, /maybe/ half a billion if you're lucky enough to have 32 or 64 gig RAM, depending on how much memory you actually have. If I count the zeros correctly you're trying a billion headers... So just how much memory do you have, and do you have swap enabled and if so how much, and how fast are the hopefully SSDs it's on? Because you're very likely hitting your system's memory limits and the "lockup" is the memory-thrashing "live-lock" either as you get GiB into swap or without swap, before the OOM-killer (out of memory killer) is activated. Unless of course you have ulimits set and pan's simply hitting its application memory limit before the system itself runs into problems, which won't help pan, but should help limit the damage to it instead of locking up and potentially crashing other things on the system, depending on what the OOM-killer picks to kill. Meanwhile, for years (well over a decade, must be getting close to two, making "decades" possibly accurate...) now, there has been discussion of switching pan's header handling to some sort of database format, allowing the database to handle it "on-disk", with only a working-set in actual RAM. Charles Kerr mentioned it a few times back when he was still primary/lead pan dev, but my personal suspicion is that he was a C and C++ dev but didn't consider himself a database dev and simply wasn't comfortable doing it without someone more familiar with the pitfalls of that area. (And believe me as I've seen it in other areas including email, where I switched clients over the problem, it takes a *very* good coder, or often several, generally several years of stabilizing, before most database app-implementations are stable enough to *not* regularly lose data due to corrupted database, etc.) Regardless of his reason, though, to my knowledge no effort at it was ever made public. Several lead devs and as I said must be nearing two decades later, and the suggestion continues to appear from time to time. But now there's actually some development. Continue reading. =:^) Recently, Dominique Dumont (which I usually shorten to DD) stepped up as upstream pan lead dev (we were without for a few years) from Debian pan maintainer. His first priority of course was updating pan code to work with current versions of the libraries it depends on, etc, given it was behind from several years of neglect. The worst of that is now done and he seems well into dealing with the second priority, porting still working but deprecated library usage before it stops working too. Redoing the icon-handling code was part of that. Now, pan is more stable and on a better track in terms of its future than it has been for many years. =:^) Now that the critical and nearing-critical stuff is done, DD's expanding into some of the deeper projects. One of the first was porting/ modernizing the build system from gnu-auto* to cmake. That is now done and seems to be stable after a few initial hiccoughs. =:^) Another was rewriting some rather legacy color handling (I believe the old code was using recently deprecated calls so it falls under that too). As someone who /needs/ a non-default light-on-dark color-scheme for medical reasons and who builds and runs live-git pan direct from the git repos, I was personally involved in reporting and getting the hiccoughs fixed there, and I'm happy to say the new color code that was broken in 0.159 was fixed for 0.160 (with 0.161 current). =:^) But potentially more challenging, and certainly more apropos to the current topic, DD has recently started (announced in June) a sqlite database-porting effort. The announcement says it's available for testing as the sqlite branch in the git repo, and back then, only the news server information was ported, with sqlite storage for the group information next on his list. He said it would take a few months, potentially 1-2 years for all pan data, so don't hold your breath. And while I'm not building that branch yet (at the time I was stuck on the auto* builds and wasn't even doing cmake yet, I'm on cmake now but haven't tried switching to that branch and building with sqlite yet), as I've said, my experience is that it can take quite some time to stabilize database code, so even if it's "working" I could easily see it not really /stable/ for some time after that. We'll see... So there's an effort underway altho the branch is experimental and hasn't been merged to main/master yet. If you're into building from sources you may wish to try it. I'm building from sources (on gentoo, using an ebuild for the purpose), but haven't tried that branch yet, and I've not seen anything more about it on-list, so I don't know current status. But I believe it's still there (I've not actually checked recently when I do my git pulls) to try if you want... Of course as the announcement suggested and as I'll second, be sure and backup your ~/.pan dir (or whatever you have PAN_HOME pointed at if different) before you try it, and I'll add, based on experience with other projects, do expect some stability issues and potentially restoring or rebuilding the database at times, because that does tend to happen with new database code. The announcement can be found on-list as (including gmane.io newsgroup info, DD's email address deleted as gmane mungs those for spam control, spaces added around the @ in the message-id hoping to keep it from trying to mung that further too): From: Dominique Dumont Newsgroups: gmane.comp.gnome.apps.pan.devel,gmane.comp.gnome.apps.pan.user Subject: Experiment on Sqlite storage Date: Sat, 22 Jun 2024 17:08:29 +0200 Message-ID: <2942917.e9J7NaK4W3 @ ylum> Xref: news.gmane.io gmane.comp.gnome.apps.pan.devel:1714 gmane.comp.gnome.apps.pan.user:16222 -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users