Ron Johnson posted on Sun, 09 Oct 2011 22:15:24 -0500 as excerpted: > On 10/09/2011 09:58 PM, Lacrocivious Acrophosist wrote: >> Ron Johnson<ron.l.johnson@...> writes: >> >>> [64-bit pan] >>> >>> One of the first things that I did was try out Pan on a binary group. >>> >>> Many hours later, it had fetched 6 weeks of headers and consumed 6.8GB >>> of RAM. The 2+ years of data in Giganews would require 123GB of RAM. >>> >>> :( >>> >> [I]s this 64-bit performance different from 32-bit performance > > It's a fact that 32-bit Pan runs out of *process* address space at > around 2GB. 64-bit Pan doesn't technically have that problem, but > effectively it does, although it does for all practical intents.
Well, the 32-bit part isn't quite accurate, or at least it's accurate for only a subset of 32-bit. The 32-bit memory limit has to do with the amount of memory addressable by a 32-bit number. 2^32 = 4 Gi, so if the addressable unit is single 8-bit bytes as is the case on x86 (both 32-bit and 64-bit), that's a directly addressable memory area of 4 GiB. However, from the viewpoint of an application, traditionally that address space is divided in half between user-space and kernel-space addresses, 2 GiB for each. That's where the above 2 GiB limit comes from. (Do note, however, that this viewpoint is per-app -- each app gets its own 2-gig userspace address- space view, it's not shared.) But, at least on Linux, as that 4 GiB total space began to look smaller and smaller, a number of alternatives were developed. Generally, these options are applied to the kernel configuration pre-build, so it depends on what kernel you choose to run. As is often the case, that means the choices available to those who configure and compile their own sources have far more choices exposed to them than those who choose to run pre- built binaries.[1] In addition to the 2:2 split, there's also 3:1 and 1:3 split choices available, so it should be possible to run a 3-gig pan on a 32-bit kernel set for 3-gig of userspace and 1-gig of kernel-space. Additionally, there's a 4:4 mode option, where userspace and kernelspace each have their own separately addressed 4-gig space. This allows apps to utilize a full 4 GiB userspace, but at a measurable performance cost, since the hardware must now switch between address modes and fully flush the TLB (address translation lookaside buffer) each time it switches between usermode and kernelmode, even if it's the same app. A full TLB flush costs a significant amount of CPU cycles, so this means a significant drop in efficiency for apps that make a lot of calls to kernel services. As such, it's not recommended for desktop use, but it *IS* available, for those who need to use the full 4 GiB address space in either usermode or kernelmode, for a particular workload. So if one chooses the right kernel, it should be possible to run even 32- bit pan up to close to 4 GiB userspace memory. However, given the kernel calls pan makes for access to both the network and block-storage devices, this could be somewhat slow. Meanwhile, it should be noted that back in the 16-bit era, a segmented memory space addressing scheme was generally used, as opposed to the "flat address space" scheme generally used four 32-bit and 64-bit processing. Were it to be necessary, the same segmented memory addressing scheme could certainly be used again. However, a flat address scheme is *VASTLY* simpler, and as it happens, 64-bit machines that eliminated the 4 GiB addressing barrier were introduced at just the right time to avoid forcing people back into segmented address schemes, as would have certainly become the case had 64-bit consumer-level systems not been introduced. That brings us to 64-bit. As mentioned, the additional address space available in 64-bit eliminates the problem for the foreseeable future, allowing flat addressing to be used up into the PiB, IIRC. As it happens, the additional number-space addresses a number of other problems as well, including the fact that it's quickly becoming feasible to brute- force a 32-bit random number space in many cases, so there's significant security advantages to switching to 64-bit as well. All that said, the base fact remains true; per-app usable memory on a 32- bit system is QUITE restricted, for usage measured in the GiB. 64-bit removes that barrier. But, while 8-16 GiB physical memory machines aren't entirely unusual in the consumer space any more, much beyond that certainly remains so. Thus, a 128 GiB magnitude memory requirement remains almost as much out of practical range as it always has. It at least should be possible to buy systems that have that sort of memory slot capacity, these days, but it'll be a few years yet before consumer level machines will be shipping with that sort of populated memory, for sure. But hey, at least it's possible to reasonably project the sort of memory it'd require, on a reasonably consumer level system, now. That's definitely progress from a few years ago, both because pan uses far less memory than it did before the rewrite (back in the C pan days, version 0.14 era, pan would choke on ~2 million headers, regardless of the memory one had in the machine; it simply didn't scale at all), and because of 64- bit and 4-8 GiB memory being reasonably common, these days. >> As for the multi-bazillion-header binary groups... is there *any* 'old >> style' >> newsreader capable of downloading all their headers? By 'old style' I >> mean newsreaders intended to include conversation. Giganews, for one, >> would seem to me to make this nearly impossible due to their vast >> retention span. >> >> > Any "straight to file" news reader could do it, given the time to d/l > all the headers. > > Pan's fatal binary group flaw is that it stores all the headers in > memory before writing them out to disk. Not only that, but it reads them all back into memory every time it starts! That's why pan takes so long to load from cold-disk-cache if you up the cache size and set no-expiration, even on text groups, as I do here. The biggest problem is pan's assumption that it has all the information necessary to maintain its threading structure in memory at all times. In ordered to really allow pan to become a disk-based client, to store most of that info on disk and only read in a rather smaller limited working set at once, pan really needs some sort of header indexing or at least hashing system devised, such that it can figure out what info it needs to read in from disk, from a vastly larger on-disk store, in ordered to work with and properly thread, at a minimum, the currently displayed article headers, likely plus some pages from the article list before and after the currently displayed set. But curing that "it's all in memory and immediately accessible for threading purposes" logic, is likely to be quite a big job, unfortunately. I wonder if it's that challenge that had Charles finally giving up, after leading pan's development all those years, thru at least one rewrite? --- [1] This is of course one of the big advantages Gentoo touts in general, that for the most part, it deliberately exposes these choices to the end user, yet still within a structured and highly automated framework, so while the builds do take time, for the most part they can be done in the background and the time spent actually administering the system, at least once the basic choices are made, can compare reasonably favorably to that for other distros. However, Linux is modular enough and the kernel separate enough from the rest of the system, that it's not /that/ difficult to learn how to configure and build just the kernel on a regular distro, if one so desires, as one might if they aren't happy with the available pre-compiled kernels on their distro of choice. In fact, I was configuring and building my own kernel, on the then Mandrake (now Mandriva/Mageia) binary distro I was running as a Linux newbie, even before I had finished my switch to Linux and before I had chosen my desktop environment and my defaults for apps such as the browser, mail and news clients. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users