walt <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Tue, 05 Aug 2008 02:43:59 +0000:
> Ah! What you want is a highly-threaded newsreader. Pan2 isn't there > yet, and don't hold your breath. I believe I read a post from Duncan > saying that legacy pan is multi-threaded, but I'm not certain. Duncan? Yes, legacy-pan was fully multi-threaded. New-pan's main loop and tasks are single threaded, because Charles decided the multithreading was too hard to debug. However, it splits off threads for selected tasks. For example, setting up connections should be multithreaded IIRC -- assuming it has the tasks to use them, it spins off a thread for each connection setup, then feeds the negotiated connection back to the main processing thread. Similarly, the combiner/decoder splits off threads so it doesn't interfere with the downloading. There may be other areas as well, but otherwise, (new-)pan is single-threaded. And if you note... the threaded jobs are all basically timing independent, so there should be few if any threading related race conditions or possibilities. THAT's the big difference from legacy-pan, and the reason Charles set upnew-pan's threading (or lack thereof) the way he did. > Meanwhile, there is a bug *somewhere* that needs fixing. But where? GL's suggestion, I think, was that it was just taking the time to sort all those headers, and there wasn't a lot that could be done about it. I don't believe that's the case, because it's not behaving that way for everyone, and where it is, it's fairly recent -- it was working fine some months ago. That implies it's either some deoptimization somewhere, or a bug where pan gets stuck in a loop processing the same header for awhile, but eventually either gives up and moves on, or gets it right. I'm guessing it's a deoptimization in something. But... in what? As I pointed out, the gtk_tree_model_iter_next proc has showed up near the top of all three traces so far. Now, that /could/ be simply processing different posts, but I find it curious that it was in all three, as if it's spending an inordinate amount of time there. That's my first guess for a deoptimization. But those traces are indeed kind of like shooting blind. It /looks/ like it may be that proc, but maybe not, too. The strace thing will show kernel calls, file-opens and the like. If it's looping on the same data, that might show repeated accesses to the same calls. Depending on what they are, it might again be normal. Or not. But it should provide a more complete picture than we have now. After that, assuming it doesn't make the problem plain. I'd suggest the debug backtraces again, only take several in short (30 seconds runtime apart) succession. If they show similar patterns to the ones we have already, then find the bits similar to this: #16 0x0000000000491bb7 in PanTreeStore::insert_sorted (this=0x2ccbad0, [EMAIL PROTECTED]) at pan-tree.cc:828 and compare the values for "this" and "new_parent". If they are changing, then pan is simply processing a bunch of data, maybe slow, but it's working thru it. If it's still processing the same post ("this"), trying to attach it under the same parent, 30 seconds later, THEN WE FOUND A PROBLEM. We won't yet know for sure if it's looping on the same one (unless the rest of the trace comparison shows it verified looping) or if we have a serious deoptimization, but any decent 64-bit machine anyway should be fast enough that it shouldn't be working on threading the same post for 30 seconds. If it is, and it's working on it still 30 seconds later than that, we have it on the same one for a minute, which will point to an even worse bottleneck. Here's a couple more with values that could be traced: #20 0x00000000004f05a9 in pan::DataImpl::MyTree::apply_filter (this=0x292e330, [EMAIL PROTECTED]) at my-tree.cc:235 #21 0x00000000004f0d9a in pan::DataImpl::MyTree::add_articles (this=0x292e330, [EMAIL PROTECTED]) at my-tree.cc:351 If it's showing different ones 30 seconds apart, try 10 seconds apart. Really, it shouldn't be taking 10 seconds, either, unless you're deep into swap or something and it's thrashing disk to swap. Disk is of course much slower than memory, so if it's thrashing disk, that's going to account for the slowdown. In that case, we're looking in the wrong place ATM, but the reports haven't sounded to me like disk is being hit that hard during all this, so I've assumed that's not it. < 10 seconds, it likely depends on your system, CPU speed, memory, etc. Of course, the above assumes you know how to have gdb suspend the run (the interrupt does that) to do the bt, then resume it, then suspend it again 30 seconds later, keeping both backtraces (plus a few more in the suspend/bt/resume/wait30 cycle) to compare them later. I'm not a gdb guru, so I'd be feeling my way on that too. As for 64-bit gcc being not ready for prime-time yet, I would have agreed during the gcc 3.x series, but in the 4.x series from 4.1.x anyway, I think it's reasonably mature. It hasn't had the decades that x86_32 has, but I'd say it's pretty close, all things considered. I know it has been very stable and reasonable here. Now one thing I /don't/ know about is how good the generic not- specifically chip optimized x86_64 stuff is, or for that matter, the Intel em64t optimization, because I run and optimize for amd64/k8 here (plus sse3 since my CPUs have it, early AMD 64-bit chips didn't, so that's got to be added separately if you have it). However, all x86_64 is the same instructions in general, just ordered differently if specifically targeting Intel vs AMD vs generic, so (with the exception of SSE3/4, etc) the instructions should be the same on x86_64, regardless. That said, I /am/ optimizing mine for amd-k8, not the generic x86_64 that binary distributions probably compile to, and it /is/ possible the generic isn't quite as solid as the specific targeted I've been using and can thus vouch for. Still, I'd put the chances of gcc screwing up at about the same as with 32-bit, not more. While it does happen, I'd consider it more likely that there's a hardware problem. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/pan-users