Rhialto posted on Sun, 14 Sep 2014 00:46:21 +0200 as excerpted: [So this is a couple weeks old, but I had it saved in my unread queue to reply to later as the last few weeks have been /way/ too busy...]
> I have the impression that the mult-server fetch strategy of Pan is as > follows. It keeps headers of its various servers, and remembers which > server claims to have which articles. > > When a multi-part article is to be saved, it looks at all the parts and > assigns which server is going to supply which. Then, it tries to > download them from those pre-determined servers. Not exactly. That wouldn't make good use of the available bandwidth when one server has much faster connections than another. I believe it's more like this. Assuming servers are at the same rank (primary or first/second/etc backup), as it gets to a particular article in the queue, pan will try to fetch it from whatever server has a free connection first, thus trying to keep all the connections active. When a server isn't very good and is missing a lot of articles, pan will naturally get way ahead on that server since it's skipping so many articles that aren't there. Which is fine, since that means it'll get every article that it can from that server, leaving the more reliable servers to fill in the gaps, since pan will be further behind on them since they have more of the content and pan will be picking it up whatever the first servers didn't have from them. Of course the server with lots of holes will finish first as it's skipping ahead because so many are missing, but that means pan will naturally get what it can from it, and then idle those connections since it doesn't have anything else it can get from that server. > However, my news server quite often claims to have some article, but > then when Pan tries to fetch it, it doesn't have it. If a server /claims/ to have an article and then doesn't, that would indeed throw a monkey wrench in the works, since pan would try to get the articles from that server, and would end up waiting for articles it claimed to have that would never show up. However, that would end up stalling those connections waiting for nothing, which would /normally/ mean that pan would end up getting pretty much everything else from the other, still active, connections. > However there is now no failover to another server. In the mean time, > the article stays Queued at for instance 95% and never finishes. This is probably waiting for the stalled connections. This actually sounds like something rather different, the infamous TCP dropped packet connection congestion issue. TCP has a problem when connections get unreliable and are regularly dropping enough packets. The problem is that TCP interprets such dropped packets as congestion and throttles its speed accordingly. But when too many packets are getting dropped, it cuts back on speed, and cuts back and cuts back, until the connection is effectively just sitting there doing nothing. =:^( Unfortunately, there's little to be done at that point except reset the connection and start over. And at some point, TCP will automatically try that as well. But if the reset packets get dropped too... then the connection basically gets frozen until either one side or the other times it out. The gotta for services with a limited number of connections, like many ISPs' own news servers back in the day when they actually still had them (at least in the US few ISPs include new service any longer; your headers suggest nl, which seems to be more news friendly and may not have dropped them like ISPs in the US did), is that if their timeouts are long enough (like a day), they can still be counting those long dead connections against the connections allowed from your IP or login, and not let you connect any more as they're registering you at max connections. If that happens, you may have to try to get a new IP (on many DHCP systems, you may be able to get a different IP if you change your MAC address, an arguably somewhat technical trick, but possible for those who know how), or if that's not possible either due to lack of know-how or to having a static IP address assigned, you may have to have the NSP "reset" your connection count manually. I know that from experience, unfortunately, tho not having an ISP supplied news service, the case these days, is even worse. =:^( I guess that's one reason some of the big NSPs offer 20 or 50 connections per paid account, these days. Paid accounts often don't cap per- connection speed and there's no reason to actually /use/ that many connections, but having that many available /will/ mean less trouble if connections get stuck "on" for some reason, so it means significantly less support costs and/or user unhappiness and dropped accounts. =:^) Anyway... > I found a tedious workaround. What I can do is to edit the news servers' > priorities. Then I need to remove the articles from the download queue > and re-save then. Only then the new server priority settings take > effect. Question: Does quitting and restarting pan without doing the server- priority switchup thing help? When pan is shut down, does netstat or whatever open connection reporting tool still report open connections to that server and/or does pan refuse to die? If so, a reboot may be necessary to get rid of the stalled connections, but if I'm correct, getting them out of the picture should allow you to download the articles without redoing server priority. Do you have tcptraceroute available as a troubleshooting tool? It's can be used to check the route using actual TCP packets of an appropriate size (1500 byte, normally) on the appropriate TCP port, in case the results are different for it than they are for normal ICMP or UDP traceroute packets. What about mtr (matt's traceroute)? It uses normal traceroute packets but does continuous tracing and nice graphing of the results, instead of just 1-3 shots per hop. I'm wondering if they register any packet loss? I'd guess they do when you notice the problem. > If this happens for a significant number of articles it is a lot of work > to do, and if you wait for all downloads to cease, maybe not everything > that is downloaded but not decoded is in the cache anymore. That is indeed a problem. You can try increasing your cache size... to 16 GiB max as discussed on a different thread recently. The default cache size is 10 MiB, which is a bit small if you're having issues, or if you prefer to download to cache and then browse when everything's local and thus instantly available, as I tend to do. > I'm looking in the code (starting at task-article.cc which seems related > to this) to see where this happens and if I can change it, but I'm not > familiar with the code and it's quite complex. Maybe somebody with more > experience / knowledge can have a look? Well, I'm not a coder so am unlikely to be of help there. I can do limited analysis and come up with the occasional patch if things aren't too complex, but I guess you're either at that point or beyond, so... -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users