Re: [Pan-users] request (if this is the way to do it) for pan to be better at downloading ALL the headers.

Duncan Tue, 22 Oct 2024 23:48:40 -0700

Pedro via Pan-users posted on Tue, 22 Oct 2024 11:43:00 +1000 as
excerpted:


> I would like an option to show how many headers they are, and to
> download a range.
> 
> the current (and always) options were all headers, or last 100k
> 
> I have tried last 1000000000000 headers.
> 
> it always locks up. loses what it had downloaded.

Are you tracking memory usage?  Pan's likely just... running out of 
memory!  See the (much longer!) discussion below.

> so an option would be on the fly saving what it has got
> 
> and not to try again for those ones.
> 
> we have the drive space now for gigabytes of headers.
> 
> any other tips or workarounds appreciated

pan keeps (and has always kept) headers in memory, building a threading 
model as it goes, and actually rebuilds it (tho with some optimizations on 
reload) every time it restarts, so keep enough history around and 
particularly on spinning rust, you *WILL* notice pan taking some time to 
load up.

(I effectively archive, unexpiring-cache, a number of text groups 
including this mailing list via gmane.io's list2news service.  Back before 
I switched to ssd, pan was taking 10 minutes to start, so I actually 
scripted a system-service to cat the entire pan-text-instance cache to 
/dev/null, thus caching it in RAM.  Then I had pan set to start with my 
kde login and would only shut it down momentarily between restarts, thus 
keeping stuff cached and the pan restart time to normally a few seconds.  
With ssds I've not really seen the problem with the handful of text groups 
I archive, tho I suspect I probably would again, either with cache-loading 
speed or just the shear scaling of memory issue under discussion here, if 
I was working with multiple millions of headers as is typical on high-
retention binary groups.)

Over the years pan's memory handling has triggered issues, the earliest 
for 32-bit, where memory address space is limited to 4 GB and depending on 
kernel build options individual application memory is often limited to 2 
GB (50/50 kernel/userspace split), tho at some efficiency loss when 
switching between user/kernelspace it's possible to do 4G/4G with 
userspace and kernelspace each having 4GB addressable and switching 
between them.

I remember back then, when 32-bit was still king and people were running 
into issues with 2 GB RAM, the complaint was pan couldn't do much over 
100K headers (I'm not sure if that has anything to do with the 100K 
default, but it could).

Some optimizations later (combining string segments so frequent poster 
strings are stored once and referenced, with big series where much of the 
subject line is duplicated similarly handled, for instance), the cap was a 
bit north of 200K for most people.  But by then 64-bit amd64/x86_64 was 
becoming more common, along with 8 GB memory systems, and the complaints 
mostly disappeared for awhile.

64-bit doesn't have that address-space limitation, but even with those 
optimizations pan does still run into scaling issues due to memory usage, 
generally somewhere above a million headers but for most systems (I 
believe) near 200 million, /maybe/ half a billion if you're lucky enough 
to have 32 or 64 gig RAM, depending on how much memory you actually have.  
If I count the zeros correctly you're trying a billion headers... 

So just how much memory do you have, and do you have swap enabled and if 
so how much, and how fast are the hopefully SSDs it's on?  Because you're 
very likely hitting your system's memory limits and the "lockup" is the 
memory-thrashing "live-lock" either as you get GiB into swap or without 
swap, before the OOM-killer (out of memory killer) is activated.

Unless of course you have ulimits set and pan's simply hitting its 
application memory limit before the system itself runs into problems, 
which won't help pan, but should help limit the damage to it instead of 
locking up and potentially crashing other things on the system, depending 
on what the OOM-killer picks to kill.


Meanwhile, for years (well over a decade, must be getting close to two, 
making "decades" possibly accurate...) now, there has been discussion of 
switching pan's header handling to some sort of database format, allowing 
the database to handle it "on-disk", with only a working-set in actual 
RAM.  Charles Kerr mentioned it a few times back when he was still 
primary/lead pan dev, but my personal suspicion is that he was a C and C++ 
dev but didn't consider himself a database dev and simply wasn't 
comfortable doing it without someone more familiar with the pitfalls of 
that area.  (And believe me as I've seen it in other areas including 
email, where I switched clients over the problem, it takes a *very* good 
coder, or often several, generally several years of stabilizing, before 
most database app-implementations are stable enough to *not* regularly 
lose data due to corrupted database, etc.)  Regardless of his reason, 
though, to my knowledge no effort at it was ever made public.

Several lead devs and as I said must be nearing two decades later, and the 
suggestion continues to appear from time to time.  But now there's 
actually some development.  Continue reading. =:^)

Recently, Dominique Dumont (which I usually shorten to DD) stepped up as 
upstream pan lead dev (we were without for a few years) from Debian pan 
maintainer.  His first priority of course was updating pan code to work 
with current versions of the libraries it depends on, etc, given it was 
behind from several years of neglect.  The worst of that is now done and 
he seems well into dealing with the second priority, porting still working 
but deprecated library usage before it stops working too.  Redoing the 
icon-handling code was part of that.  Now, pan is more stable and on a 
better track in terms of its future than it has been for many years. =:^)

Now that the critical and nearing-critical stuff is done, DD's expanding 
into some of the deeper projects.  One of the first was porting/
modernizing the build system from gnu-auto* to cmake.  That is now done 
and seems to be stable after a few initial hiccoughs. =:^)  Another was 
rewriting some rather legacy color handling (I believe the old code was 
using recently deprecated calls so it falls under that too).  As someone 
who /needs/ a non-default light-on-dark color-scheme for medical reasons 
and who builds and runs live-git pan direct from the git repos, I was 
personally involved in reporting and getting the hiccoughs fixed there, 
and I'm happy to say the new color code that was broken in 0.159 was fixed 
for 0.160 (with 0.161 current). =:^)  


But potentially more challenging, and certainly more apropos to the 
current topic, DD has recently started (announced in June) a sqlite 
database-porting effort.  The announcement says it's available for testing 
as the sqlite branch in the git repo, and back then, only the news server 
information was ported, with sqlite storage for the group information next 
on his list.

He said it would take a few months, potentially 1-2 years for all pan 
data, so don't hold your breath.  And while I'm not building that branch 
yet (at the time I was stuck on the auto* builds and wasn't even doing 
cmake yet, I'm on cmake now but haven't tried switching to that branch and 
building with sqlite yet), as I've said, my experience is that it can take 
quite some time to stabilize database code, so even if it's "working" I 
could easily see it not really /stable/ for some time after that.  We'll 
see...

So there's an effort underway altho the branch is experimental and hasn't 
been merged to main/master yet.  If you're into building from sources you 
may wish to try it.  I'm building from sources (on gentoo, using an ebuild 
for the purpose), but haven't tried that branch yet, and I've not seen 
anything more about it on-list, so I don't know current status.  But I 
believe it's still there (I've not actually checked recently when I do my 
git pulls) to try if you want...

Of course as the announcement suggested and as I'll second, be sure and 
backup your ~/.pan dir (or whatever you have PAN_HOME pointed at if 
different) before you try it, and I'll add, based on experience with other 
projects, do expect some stability issues and potentially restoring or 
rebuilding the database at times, because that does tend to happen with 
new database code.

The announcement can be found on-list as (including gmane.io newsgroup 
info, DD's email address deleted as gmane mungs those for spam control, 
spaces added around the @ in the message-id hoping to keep it from trying 
to mung that further too):

From: Dominique Dumont
Newsgroups: gmane.comp.gnome.apps.pan.devel,gmane.comp.gnome.apps.pan.user
Subject: Experiment on Sqlite storage
Date: Sat, 22 Jun 2024 17:08:29 +0200
Message-ID: <2942917.e9J7NaK4W3 @ ylum>
Xref: news.gmane.io gmane.comp.gnome.apps.pan.devel:1714
        gmane.comp.gnome.apps.pan.user:16222

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


_______________________________________________
Pan-users mailing list
Pan-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/pan-users

Re: [Pan-users] request (if this is the way to do it) for pan to be better at downloading ALL the headers.

Reply via email to