what is the reason for this requirement?
if it's really some sort of mass binary download it would
make sense to use pan just for looking at previews and for
grabbing the nzb-files to feed them to some standalone
nzb-reader.
Duncan schrieb:
Duncan posted on Tue, 30 Sep 2014 21:10:56 +0000 as excerpted:
bubba posted on Tue, 30 Sep 2014 14:20:51 -0400 as excerpted:
i need a large article cache size, like 200gb. i can't seem to get
more than 16384 allocated in the pan settings. i altered the
preferences.xml file:
<int name='cache-size-megs' value='200000'/>
but it has no effect on the maximum allowed 'size of article cache (in
mib)' in pan -> edit -> edit preferences. i have a terabyte free on
that hard drive.
is the max size hard-coded or am i missing something blindingly
obvious?
[Y]ears ago I was the person who asked to bump the max cache size from
1 GiB -- I needed 4 GiB at the time and it was bumped to 20, which was
great.
Later it was bumped again and I had /thought/ that the last time it was
made effectively unlimited. However, that may be incorrect, or it may
now be running up against the maximum size limit of the type of integer
used.
FWIW, I'm running 12 gig on a dedicated cache partition, here.
Is your pan 32-bit, or 64-bit? I don't claim to be a coder myself, but
chances are I can make at least some sense of the code and see how it
works, possibly coming up with a patch for you... unless it /is/ running
into a maxint condition and changing that is more complex than a simple
type change.
OK, took a look at the code. Again I'm no coder so this is going to look
a bit simplistic to them and I might be doing something stupid here like
off-by-one on the bits or something, but I can read enough code to
analyze and come up with patches occasionally, and the explanation might
be interesting to others who don't read code even as well as I do, so
let's see... This is for current live-git pan, the commit you see in my
headers, since I'm posting with pan and that's included:
I did a search on "cache-size" and came up with three hits, all in the
pan/gui subdir (line and filename):
985:prefs-ui.cc
975:pan.cc
1149:gui.cc
Relevant pair of lines in prefs-ui.cc:
w = new_spin_button ("cache-size-megs", 10, 1024*16, prefs);
l = gtk_label_new(_("Size of article cache (in MiB):"));
OK, this is pretty obviously setting up the preferences GUI, with a max
of 1024*16 (MiB), thus 16 GiB. The spinbox in the GUI is confined to
that max, *BUT*, that doesn't /necessarily/ mean pan won't honor a higher
setting if you set it yourself. In fact, there's precedent for that in
the number of connections allowed per server...
History: Pan is GNKSA compliant[1], and while parts of GNKSA are arguably
dated, a couple years ago when it came up, the overwhelming feeling on
the list appeared to be that it was worth keeping that 100%, because once
we (of course really pan's devs, but the feeling was strong enough it
gave them a clear signal where users wanted to be) let that slide in one
area, where would we eventually end up? Pan would be in danger of losing
everything that made pan /pan/.
The problem is that GNKSA specifies that a news client can allow only up
to four connections per server, while today, paid news providers often
allow 50-ish connections. While most such providers don't seriously
limit per-connection speeds and four connections is very often more than
enough to saturate a user's Internet link, some users wanted to set more.
The compromise pan has allowed for quite some time rests on the fact that
GNKSA specifies how many connections (4 per server) a compliant client
can allow a user to set, NOT that a client must limit to that number of
connections if a user edits the config file directly. Thus, for many
years now, I think since the C++ rewrite introduced as 0.90, while the
GUI spinners limit the connections per server to four, pan has actually
attempted to use whatever was set in the config file, thus letting the
user set a full 50 connections for a server if they want, as long as they
do it by directly editing the config file!
So it's quite possible that while pan only allows setting upto 16 GiB
cache size in the GUI, it'll actually use more if a user sets it...
provided the integer-type used doesn't overflow. But the above code just
sets up the UI and says nothing about the integer type used to actually
store the number. Let's see what the other hits have to say...
In pan.cc the relevant lines are:
if (gui)
{
// load the preferences...
...
// instantiate the backend...
const int cache_megs = prefs.get_int ("cache-size-megs", 10);
OK, so we have (signed) int. That may be a problem as the spec says int
is only required to hold 16-bits (tho some platforms may standardize on
larger, 16-bit or 32-bit ints, for instance), signed-int (since it isn't
uint, unsigned int) reserving one for the sign, thus 15 bits. 1024 is
2^10 so we have five bits to play with, but 0 is a number too and counts
as positive so the range is one less on the positive side.
32*1024-1=32767
We're using four bits for that 16*1024 above. Upping that to five bits
for 32*1024 may well be possible, but beyond that could get complicated
for some archs at least.
Since a negative cache size doesn't make a lot of sense, uint would be an
option, gaining us a bit, to 64 GiB, but that's still way under your
desired 200 GiB.
Changing that to a long might be possible but gets a bit complex for my
non-coder abilities (tho if I were determined enough I'd definitely
experiment with it for my own patches).
Of course if your platform defines an int as 32-bit or 64-bit, then
upping it to say 256*1024 shouldn't be a problem for that platform, and
you could certainly apply the patch yourself, tho I don't know enough
about the various platforms (both MS Windows and *ix) pan runs on to know
if it's safe to stay at int everywhere or if a switch to long is
required. But 32-bit should be plenty in any case, since we're dealing
in MiB already and that'd take us (if my math is correct) to 4 EiB max,
which I guess should be a /few/ years away, anyway! =:^)
Meanwhile, I'd suggest trying say 32*1024-1 (=32767) MiB, and see if pan
actually uses that, first, regardless of what the GUI says. If my point
above is correct, pan should take that even if the GUI says differently.
Tho I'm not sure how pan actually manages cache -- if it lets it get a
bit above that and then deletes back down to it, that might blow up,
while if it gets to that and then deletes say a GiB to make room, it
should be fine.
If that works, then try above that, say your 200 GiB. If your platform
uses 32-bit or 64-bit ints, that'll be fine. If not, it won't.
Meanwhile, our last hit, in gui.cc:
void GUI :: prefs_dialog_destroyed (GtkWidget *)
{
const Quark& group (_header_pane->get_group());
if (!group.empty() && _prefs._rules_changed)
{
_prefs._rules_changed = !_prefs._rules_changed;
_header_pane->rules(_prefs._rules_enabled);
}
_cache.set_max_megs(_prefs.get_int("cache-size-megs",10));
}
This appears to be where pan actually loads the setting when the prefs
dialog is closed. Note that _cache.set_max_megs is OUTSIDE the if-
conditional so applies WHENEVER the prefs dialog is closed. That means
whether you've actually changed anything or not.
Which unless I'm mistaken means that if you set a larger cache manually,
you'll have to be very careful NOT TO OPEN the prefs dialog, since the
moment you close it, pan will reset to the max 16*1024, 16 GiB cache
size, thus triggering a delete of anything above that in the cache! That
could make for quite some frustration -- I know as I remember similar
problems way back when I originally requested that bump from 1 GiB max,
back in the day.
That concludes my initial code analysis. I don't know whether you even
build from source and guess that if you do, either you can patch as well
or you can at least figure out what to try and manually change from the
discussion above, so I won't post patches ATM, anyway.
Meanwhile, filling in the alternatives I mentioned earlier...
The alternative would be modifying your download style a bit. Pan's
assumptions about how people do downloading are obviously different than
yours and mine, and a huge cache isn't needed for its way. But some of
use obviously use pan differently, and while it generally works, it's
not quite the easy fit it would be if those assumptions were different.
More about that later when I have more time and/or have taken a look.
What I was referring to here is simply the fact that pan seems to assume
that binary downloaders direct-download, that is, find what they want and
tell pan to save attachments directly. Pan really doesn't use a lot of
cache in this mode because it's deleting posts as fast as it's
downloading them and saving the attachments, so its 10 MB default cache
is generally enough.
OTOH, there's a very different type of binary group usage that I do, and
with your request, that I guess you do as well. I tend to want to set
pan up to download anything that looks interesting to cache and then go
away for awhile, say to work or to sleep. When I get back or wake up
several hours later, all those posts are cached locally and I can browse
thru them as I like, basically instantly, sorting as I go, saving off
attachments that I decide I really want to keep and then normally
deleting the messages, deleting others without permanent saving if after
looking at them I decide they're not worth keeping. Of course this
requires a *MUCH* larger cache, generally large enough to contain the
entire download session, since in the first stage it's all only
downloaded to cache, where it must remain until I've gone thru it.
But, as attachment sizes and volume increases, there comes a point at
which the pre-cache method doesn't work so well. For still images and
even mp3s and low-ish resolution mpegs of a few minutes max, it still
works reasonably well, and for that a cache size of say 16 Gig should be
fine since that's more or less what you can sort thru after a single
session anyway.
But if you're looking at a 200 GiB cache, I'm guessing you're doing
rather larger attachments, ISO-images and/or half-hour minimum possibly
HD-resolution TV programs and feature-length movies. Several hundred MiB
files minimum, 4.7 GiB DVD images, possibly full 20-ish GiB Bluray
images, and/or perhaps whole TV series at a time.
For this, pre-caching really doesn't work so well anyway, in part because
pan doesn't direct-preview them as it does still-images. As such, the
direct-save method becomes the only practical method, both because it
doesn't require the huge cache, and because you have to save the files
off to view them anyway.
So IMHO anyway, you might wish to reconsider your download method. At
your volume the direct-save method is likely to be most practical in any
case, and shouldn't require that huge cache.
One other possibility. It's possible (at least on Linux, don't know
about MS Windows or Apple OSX) to run multiple different pan instances
at once. If you are active in enough groups and they split by subject
well enough, you could do that. There's a variable that can be set to
point pan at a directory other than its default, and you can set this
differently to have multiple different pan setups. I do that here. If
multiple pan setups each with a 16 gig cache would work...
The environmental var in question is PAN_HOME. Here I set it in a
wrapper script to one of my three pan profiles, with one script for each
profile: bin, text, test. Of course if you want and the groups are
separate enough, you can make that say mp3s, dvds, tvprogs. Or whatever.
Specifically, for my text groups I set PAN_HOME to ~/pan/text, so it uses
the settings there instead of in the default ~/.pan2. That lets me have
different settings (including different cache sizes) for each of my
profiles.
Then where appropriate, I use symlinks in each profile, pointing to files
in ~/pan/globals for some files (like my shared scorefile), and for my
binary profile, pointing my cache to a separate, dedicated partition,
that's only used for cache for my pan binary profile.
Meanwhile, yet another possibility I didn't think of earlier...
You can setup a local news server such as leafnode. You'd then configure
it to do the mass downloading from your NSP into its cache, and could
then simply point pan at your local leafnode or whatever news server.
Then you could probably leave pan's cache at the default 10 MiB size and/
or even place it in tmpfs (a RAM-based filesystem), since your server
with its own cache of whatever size would be local anyway.
So here's hoping you find at least some of that helpful... =:^)
---
[1] GNKSA: Good Net-Keeping Seal of Approval. While even its keepers
acknowledge it's a bit outdated today, back in the day it served as a
widely accepted guideline for news-client acceptable net behavior. See
the pan website for details about pan's compliance and a link, and the
list archives for previous discussions about pan's compliance here.
--
Mit besten Grüßen / Kind Regards
Andreas Nastke
IT System Management
g/d/p Markt- und Sozialforschung GmbH
Ein Unternehmen der Forschungsgruppe g/d/p
Richardstr. 18
D-22081 Hamburg
Fon: +49 (0)40 / 29876-117
Fax: +49 (0)40 / 29876-127
nas...@gdp-group.com
www.gdp-group.com
Sitz der Gesellschaft ist Hamburg, Handelsregister Hamburg, HRB 40482
Geschäftsführer: Christa Braaß, Volker Rohweder
-----------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient please notify the sender and delete
this e-mail from your whole system. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.
-----------------------------------------------------------------------
_______________________________________________
Pan-users mailing list
Pan-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/pan-users