David Chmelik posted on Fri, 8 Mar 2024 05:19:17 -0000 (UTC) as excerpted: > On Tue, 30 Sep 2014 21:10:56 +0000 (UTC), Duncan wrote: >> As to your question, years ago I was the person who asked to bump the >> max cache size from 1 GiB -- I needed 4 GiB at the time and it was >> bumped to 20, which was great. > > What size do you recommend if I currently use 1,500+ newsgroups, and > some are binary but dead, so let's say all plain-text, but some are > high- traffic like the Linux kernel listserv on gmane? I rarely read > that; it's more out of curiosity. There's maybe under 40 I'd read daily > if they have traffic, but many/most don't except occasionally/rarely, > though usually there's something daily. Most are miscellaneous > subjects, like computer science/engineering & software I just > occasionally have questions on, like here, but other times don't keep up > on, and just select and mark read.
Interesting/good question. The discussion below gets a bit technical and arguably goes on a couple tangents. Jump to the 4th paragraph from the end if you're just interested in some recommendations. Read thru if you like technical and find tangents interesting! =:^) Primarily practical news-cache size depends on how you use pan and how long you intend to retain messages. Pan's cache-size default, way too small by my usage (text or binary, I have separate instances for each), appears to be designed primarily for either text-only with some short-term (a few sessions) caching or process- as-you-go (not even a single full session) with anything above trivial numbers of binaries. My usage, instead, is archiving for text, and for binary, multi-session sampling and download-interesting-to-cache first session, then go through again when everything's cached so access is instant, to sort out what I downloaded and either delete directly if I decide I don't want to save it permanently after all, or sort and save off to permanent storage, then delete from pan (which I believe deletes from cache). While the default size would (for my usage) keep text around a few sessions so I could refer back to messages if I wanted to see a full message when it was context-quoted in the reply, it certainly wasn't suitable for long-term "archiving" storage of any sort. For binaries it was HORRIBLE, as I'd hit the cache-size limit and start deleting older messages in the first session, before I did anything but read the overview! I wasn't even reading downloaded messages before they were deleted due to cache limits! So you said basically text, (what I'd call *MANY* groups (1500), with some high traffic and perhaps a few trivial binaries. Great. But how much do you download to keep around even if you don't read it, and how long do you actually want to KEEP it around? Here, for text and trivial binaries (say "trim" for HTML messages level in some text groups that allow them (the kernel group while high traffic does NOT AFAIK), the occasional screenshot, etc), only a relatively few groups but with near all traffic to them archived (unexpiring-cached) in some cases since 2002... Here's what compsize (transparent compression report for btrfs) says for my text instance dedicated partition, basically the .pan directory but mostly cache: $$ sudo compsize /nt/ Processed 278330 files, 180543 regular extents (180543 refs), 99005 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 47% 999M 2.0G 2.0G none 100% 14M 14M 14M zstd 47% 985M 2.0G 2.0G (Compsize says the article cache itself is 970M used, 1.9G uncompressed, so it is indeed most of the above. And the 14 M uncompressible in in the cache, so I'll presume it's pre-compressed binaries sent yencoded, because MIME/UUE encoding is inefficient/compressible) So roughly 2 GiB uncompressed, compressed down to ~half size or ~ 1 gig using zstd (level 3, default for btrfs if zstd compression is chosen) compression. Only a trivial 14 MiB is uncompressable. Here's the btrfs filesystem usage report for that partition, which is btrfs raid1, so I can use half that 10 GiB total space: Overall: Device size: 10.00GiB Device allocated: 3.06GiB Device unallocated: 6.94GiB Device missing: 0.00B Device slack: 0.00B Used: 2.37GiB Free (estimated): 3.63GiB (min: 3.63GiB) Free (statfs, df): 3.63GiB Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 16.69MiB (used: 0.00B) Multiple profiles: no Data,RAID1: Size:1.00GiB, Used:853.71MiB (83.37%) /dev/sdd8 1.00GiB /dev/sdc8 1.00GiB Metadata,RAID1: Size:512.00MiB, Used:357.91MiB (69.90%) /dev/sdd8 512.00MiB /dev/sdc8 512.00MiB System,RAID1: Size:32.00MiB, Used:16.00KiB (0.05%) /dev/sdd8 32.00MiB /dev/sdc8 32.00MiB Unallocated: /dev/sdd8 3.47GiB /dev/sdc8 3.47GiB Now btrfs stores small files (2048 byte and under by default, which I use here) in-line in the metadata, and some of those text-message cache files will certainly qualify, thus explaining the difference between the reported data usage of ~854 MiB here while compsize said 999 MiB -- some of that 999 is stored in the metadata not data. Total used including metadata is 2.37 gig but that's across both physical devices so divide by two for raid1, ~1.2 gig of data+metadata. The 3.63 GiB reported Free pre-accounts for the raid1, including 3.47 GiB not allocated (per device) plus the still unused space withing the data chunks. So of the 5 GiB effective space (5 gig per device but raid1 across two devices), ~1.2 gig is used, ~3.6 gig is free, and the other ~0.2 gig is in the unused metadata, system chunk, etc. But if I wasn't using btrfs compression it'd be roughly half full. All in all, pretty reasonable usage for a dedicated-usage partition where you want some room to grow. Finally, the pan cache for that: Again, set unexpiring (server settings) so it effectively caches "forever", in prefs, size of article cache is set to 5120 MiB = 5 GiB. Which pan couldn't actually hit if I weren't using btrfs compression because the filesystem itself is exactly 5 GiB, and there's metadata overhead plus the non-article-cache files in the pan dir. But with compression it should actually be able to hit that 5 GiB, and could probably hit ~9 GiB or so, assuming the same near 2:1 compression ratio continues. So I have room to set that higher as my archive continues to grow... Now a guess at translating that for you... Many more groups (say 100 times as many...), still mostly text, but presumably you aren't archiving "forever", and if I've interpreted your description correctly, you probably don't download as much of the groups as I do. However, at least one of those groups is LKML (the kernel list), far higher traffic (if enforced text-only) than anything I subscribe to and archive. At a guess, I'd say start with a gig. That should reasonably safely accommodate even your 100X the number of groups, text-mostly, for a "reasonable" period of a month or so, which I'll say is about the max time discussion threads are likely to be active so you can refer back to previous articles without re-downloading, again assuming you're not downloading everything in the group. If you want to be extra safe or see messages you know you downloaded disappearing (and your filesystems aren't going haywire due to crashing and filesystem immaturity... btrfs is generally past that now but was still a bit iffy when I started with it), double that to 2 GiB (uncompressed), which again is roughly what I'm seeing with some groups near-archived for 20+ years now, but at ~1% of the groups. Even with ~1500 groups, text-mostly, downloading-to-cache near all messages, I'd be quite surprised to see usage over 2 GiB with an effective lifetime of under a month (even two), because that's simply *HARD* to do with text-mostly groups ... *UNLESS* you're grabbing some prolifically AI- spammed groups or something (the *HARD* to do assumes *humans* actually writing all those messages -- two GiB of data is simply a LOT of text for even a few hundred /humans/ to write over a couple months, but automate it with AI and that assumption's out the window!) If you're considering a dedicated partition, 5 gig for it should be good, as it is for me. If you're actually archiving those 1500 groups... I'd say start with 10 GiB, but until you have say a year of history to make a reasonable projection into the future, watch the usage and consider the possibility of having to adjust that up or being able to adjust it down, with a dedicated partition if used similarly larger, maybe 20 or 25 gig. With a year of history you should be able to project /reasonably/ comfortably the usage out to storage replacement cycle lengths (double the year's activity for a reasonable margin and multiply to cover your time until expected upgrade, increase by 50% or double again for dedicated partition size if used -- unless of course activity is multiplying, as it well could be on groups with uncontrolled AI spam). -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users