Re: [Pan-users] Database stability discussion

Dominique Dumont via Pan-users Sun, 10 Aug 2025 10:42:47 -0700

On Sunday, 10 August 2025 11:08:12 Central European Summer Time Duncan wrote:
> I suppose most computer-experienced folks have at some point experienced
> at least one "catastrophic" database corruption failure.  Hopefully we've
> mostly learned the importance of timely backups in the process, but we all
> carry those scars and often fear dealing with "databased" data -- as
> opposed to plain-text data where at worst, partial recovery is available
> via manual text-editing.


Whatever scheme is used by Pan, you should have backups in case of disk 
failure. Ideally using 3-2-1 strategy [1] 

I had one SSD fail completely in April. Fortunately, I had backups.

> So it's certainly possible to have a reasonably stable binary-format
> database solution, as long as folks sufficiently familiar with database
> stability coding techniques are handling it.

Well sqlite has made a lot of efforts for reliability. Looking at "How To 
Corrupt An SQLite Database File" [2], it looks like it's quite hard to corrupt 
a DB.

In Pan case, I don't remember exactly what happened to corrupt the DB, either 
a stray pointer or double Ctrl-C while the DB was caught in a too long query 
(I had to optimize some queries during my tests)

> Which is I very strongly
> suspect why Charles ultimately didn't do it despite his recognizing it
> needed done -- he simply wasn't sure he could pull off the stability he
> himself wanted.  And while I'm not as sure on the reasoning, that could
> be why Heinrich never attempted it either.

I guess that SQLite was too new 20 years ago. 

> So anyway, this has been not a small, if as yet unstated, concern for me.
> Yes I know the reasons and have now seen pan's scalability struggle for
> nearing two and a half decades now, and I 100% agree it's long past time
> the move to database needs done, but... I'm still fearful.

That's why I'm asking for tests and advise people to create (and test) backups 
(several). I now run daily duplicity to create incremental backups.

> Here's hoping it goes well! May you truly be the coder saving the day in
> that regard with pan as you have been in just stepping up as upstream pan
> maintainer in the first place!  =:^)

Fingers crossed ;-)

> > To limit the consequence of a corrupted DB, I think I need to split the
> > DB in 3 separate files:
> > - server (mostly filled manually by user)
> > - groups
> > - articles and headers
> > 
> > Hopefully, only article and headers would be corrupted in case of a
> > crash.
> 
> That makes a lot of sense.  Is this likely to be simple enough to be
> implemented soon?  I've been considering switching to the database branch,
> and after that split might be an opportune time to do so.

Well, actually, that would impact most of the SQL statements I wrote. And it 
would not fix your concern: most of your unrecoverable data is old articles. 
articles would be stored in "article and headers" DB, which would be the 
biggest and the one modified most often.

Splitting the DB would avoid people to enter again server information and 
downloading groups again in case of corruption. Which is not a big deal when 
it's lost.

On the other hand, splitting the DB is quite a lot of work (several hours) for 
a theoretical problem. Sqlite is used extensively by Firefox and a lot of 
other program and I don't recall people complaining about lost data. 

That's why I'm asking for feedback. If people report DB corruption, I'll 
analyse the case and then decide if splitting the DB is worth the trouble.

> But my current use-case, in practice, is archiving text-groups, never
> expiring headers and with a (multi-gig dedicated partition) cache several
> times bigger than my multi-decade accumulation so it won't be expiring
> anything either.

ok, then. That's a use case I need to test. I've been focused on big binary 
group but I've not yet tested big text groups.

> In the case of my ISP's old groups, I'm /literally/ archiving them, as
> they're no longer a provider and while some of the groups do appear on the
> public newsgroup tree, what I have is archived from their original servers
> and likely no longer publicly available (the NSA surely has them archived
> too but that's not public).

Do you use a free news server ? (they tend to have a short retention). Paid 
servers have a longer retention. I see articles dating back to 2008 on my test 
group.

> My "news server" configured for them is set
> to zero connections and of course the DNS address is now invalid, so
> what's in my pan text-instance cache for them is /literally/ archived --
> if that was corrupted without backup, it couldn't be replaced, but by the
> same token, it's no longer updated, so a backup will always remain
> "current" for that server.

Indeed

> Obviously the gmane lists/groups could be refetched if corrupted locally
> as long as it remains a public server, but I'd prefer not to need to do
> that, certainly beyond the year-ish I might worst-case go between backups
> but ideally beyond say a month (a more reliable backup frequency in any
> case).

I've been hacking on pan/sqlite for 18 months. I had several crashes per 
hacking session and I've corrupted the DB 2 or 3 times. 

> How good a fit for my text-post-archive use-case do you believe pan's
> database backend, once stable, will be?  I'm assuming that any database-
> corrupted messages no longer on the server and not on a local backup
> either simply won't be recoverable...

Well, Sqlite DB corruption where SQlite lib cannot read the DB should never 
happen (see [2]). Another possible crash is the case where the DB is readable 
but the data was corrupted by a pan bug. In this case, the data can be fixed 
with sql queries.

Barring that, the recovery strategies are:
- local DB backup (preferrably incremental backups)
- Re-start from Pan text files (kept in a backup) to re-create the DB.

> Regardless, this should be a somewhat different use-case to try the
> database code on... as long as I'm keeping good backups!

I cannot insist enough on creating *and* testing backups.

> Might there be an option to still build with the old text-based backend
> for text-message-archive-use-cases like mine?  (Not that I really consider
> that practical, but maybe...)

No, the code to read/write Pan text files is thorny and quite fragile. I've 
already removed the code to write files and I hope to get rid of the readers in 
a few years.

> Or perhaps more practical, call the new database version pan3 or some such
> (or start its pre-stable release versions at 1.9xx and bump to 2.0 on
> stabilization), and continue maintaining pan 0.xxx as the text-based
> version?

Uh, sorry, no. To make pan with sqlite practical, I had to rewrite quite a lot 
of internal code. Maintaining these 2 versions would take too much time for 
me.

> Or maybe I should look into a more traditonal text-based news client for
> that archiving use-case?  (OTOH, current pan has the convenience of being
> able to handle the occasionally posted binary, say screenshots attached to
> messages here on the pan-user list/group, without issue, while many
> traditional text-based news clients require jumping thru hoops for
> binaries.)

How about first giving pan/sqlite a chance before jumping ship ?

> Any recommendations on other news clients that are still around and
> maintained if so?  (Sure I can and will when necessary look myself, but
> early-stage thinking via typing, ATM.)

20 years ago, I was quite happy with Gnus emacs.

> Or maybe I should just run something like a leafnode local server, as
> arguably a more appropriate news archive in the first place?  Then I can
> have it be the unexpiring archive and let pan's database corrupt and be
> rebuilt from a fresh pull from leafnode as necessary?

Yes, that's a good idea, (and that's another way to backup your data ! ;-) )

> And assuming I do decide I need to look for a pan archive-alternative, how
> long do you anticipate it'll be before (a) a database-stable release-
> version pan is available,

6 months to a year

> and (b) the older text-based pan bitrots to the
> point it's no longer easy to build against reasonably current and distro-
> available libraries?  (Obviously the latter depends somewhat on the
> distro, but compared to things like gtk2, python2, etc, thus giving people
> a good idea for their distro by comparison.)

I would guess several years. Pan only use C and C++ libs that tends to be 
backward compat for several years. Even old c++13 compiler is still usable.

> > Note that, as long as sqlite branch is not merged, I don't guarantee
> > that the DB stays compatible. Updating the branch may mean that DB needs
> > to be destroyed.
> 
> Useful to have specified... and a practical reason I should probably wait
> until that database split to switch, given I've not done so yet! =:^)

I would still welcome feedback. I do not ask to use pan/sqlite as your main 
news reader, just to give it a try with *your* use case.

I'd suggest to:
- copy your .pan2 dir to .pan2-sqlite
- run pan/slite with PAN_HOME set to .pan2-sqlite
- open one group (that triggers the migration process for that group), the 
migration itself can be quite long (5 mns for 27M headers)
- play with the group and tell us what's happens.

> [1]  Due to being on the autism spectrum my messages, as regulars surely
> know, tend to the long side (understatement) because we/I tend to see a
> complex picture where properly addressing one subject, to us, means
> covering how it touches all sorts of other stuff most other folks consider
> tangentially related at best.

No problem. I reply to what seem relevant from my point of view. Please get 
back to me if I missed one of your important point.

All the best

[1] https://www.seagate.com/fr/fr/blog/what-is-a-3-2-1-backup-strategy/
[2] https://sqlite.org/howtocorrupt.html



_______________________________________________
Pan-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/pan-users

Re: [Pan-users] Database stability discussion

Reply via email to