freeslkr <freeslkr.w...@mailnull.com> posted gn87u5$4g...@ger.gmane.org, excerpted below, on Sun, 15 Feb 2009 05:07:18 +0000:
> Maybe it's this _very_annoying_ bug? > http://bugzilla.gnome.org/show_bug.cgi?id=533686 Hmm. I've seen a couple mentions of that behavior, but have never actually observed it, neither on either of my two current servers (gmane.org and the Cox outsourced to highwinds-media servers) nor previously, when I had a subscription to a paid server. I'd speculate that it may be a problem either due to an optimization bug, or due to bugs (optimization or code) in particular versions of distribution shipped libraries that pan depends on. Given that the entire subthread doesn't show up, it would seem to be an issue with the way pan threads the messages, and MAY be related to a different bug that also occurred on Ubuntu with certain library versions. In that case, pan when run on GNOME, but NOT when run on XFCE or on KDE, would stall for some period (like it was in a loop that repeatedly hit an assertion and bailed, then hit it again processing the next header, IIRC it was a single thread high CPU utilization stall), while threading new messages after downloading new headers in a group of some size. As I said, the folks reporting it were all on Ubuntu, at the time 8.04 but I have no idea whether it was fixed for 8.10 or not, and someone discovered that it ONLY happened when running GNOME, NOT when running KDE or XFCE on the same installation. That DEFINITELY points to some sort of shared library issue but it was never pinned down. Anyway, given that the OP here is on Ubuntu 8.04 and the bug poster said Debian, in 8.05, AND that they are both threading bugs, I find the coincidence at mimimum "interesting". So, Rick, are you running GNOME or XFCE or KDE (or something else), and does the behavior change if you switch desktop environments or not? Also, to everybody running Ubuntu 8.10 or the 9.04 pre-releases (alphas/ betas/whatever-they-are-at-this-point), does the problem still occur? If not, the domain of possibly culprit libraries is definitely limited and it should be relatively easy to pin down to a specific library and version or set of versions. Once that has been done, it may be possible to upgrade just that single package -- or if not possible due to ABI incompatibility, possible for someone to compile the upgrade and perhaps make it available to anyone needing it who trusts them not to malware it, at least. The speculation would be that there's a namespace collision and two different libraries (perhaps different versions of the same one) providing incompatible functions with the same name. When GNOME loads, it loads the one. Then pan loads, and uses its incompatible functions (incompatible because it was built against the other library's headers) instead of loading the compatible functions of the same name from the other library. But it doesn't abort because all the functions it needs are there. Then when it comes to actually calling the bad functions, it would normally segfault, but being a well behaved C++ app, pan has assertions set to catch such unexpected problems (instead of causing a security issue or messy segfault as would be likely without them) and they trigger, throwing pan into an error recovery routine, which copes by dropping its attempt to thread that message. Pan then goes on to the next message. In a group with enough headers, this might trigger often enough to cause pan to go lock at 100% CPU for the 20 minutes or so that people were reporting for the other bug, but if there's just a few, the extra processing time wouldn't be noticed but the threads wouldn't show. If a redisplay is triggered, a different code-path is used and the problem functions never called, so the messages suddenly show up. It's worth noting that for efficiency reasons pan only threads headers once, when they first come in. That they appear on redisplay therefore indicates it's not the threading itself that gets botched, but rather the display of said threading as it occurs. Again, the redisplay apparently uses a different code-path which doesn't call the problem functions, so it works fine. If a desktop environment other than GNOME is used, the incompatible version of the library won't normally be in memory (possibly with some exceptions if other apps are using it), and pan (or more accurately the glibc loader lib, ld.so.*) will find and load the compatible version of the library. Since it'd be unlikely (impossible?? maybe possible with fast user switching??) to have pan stay loaded while then loading GNOME, the effect of the pan-compatible version of the library on GNOME's behavior should the pan version be loaded first, remains unknown. Assuming that it IS a variation on the same bug, and I'll be pretty close to convinced if it ends up that switching desktop environments changes the behavior of this bug too, it's likely that a diff of the output of "ldd pan" run from a terminal window in GNOME, against the output of the same command run from a terminal window in XFCE or KDE, will point to the culprit library. If it doesn't, it's because there's a piece of the puzzle I'm not aware of yet, either in the way library loading and ldd works (I'm most definitely NOT an oracle on the subject), or some aspect of the bug that's more complicated than I'm speculating and that's likely beyond my ability to understand at this time. But it /should/ work, given what I know and understand at this moment. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/pan-users