Rock posted on Tue, 09 Jul 2013 05:11:42 +0000 as excerpted: > Have you ever seen this problem (and do you have insight into the > cause)? > > I'm using Pan 0.135, which is the latest easily possible on Centos6 > (long story) > and, at times, but only with aioe.org:119, I end up with triplicate > posts, each with its own message id, generally sent within the span of a > few seconds to a minute, based on the time stamps.
I'm reasonably sure I know the issue, and it's not exactly pan's fault, as I'll explain. However, if you like, you can use the "ngrep" tool (homepage http://ngrep/ sourceforge.net , if you'd like to read a bit about it before installing...), to grep your net connection for packets on port 119, etc, and see what's actually being sent/received. That's one of the better net-activity diagnostics I know of, and I too learned about it here. =:^) > I can generally tell when this is going to happen, because when I hit > the File->Send Article (or Control Return), Pan, when set up by default > to not create a message ID, hangs saying "sending", for, oh, about 30 > seconds or so, and then it ends up sending three exact messages all with > different message ids. This matches the pattern I'd expect... > The weird thing is that it doesn't happen frequently (but it's > embarrassing when it does happen); and that it only happens with > aioe.org:119 (and, from a look at the event log, in general, it seems to > happen when there are numerous connection errors with aioe.org such that > I often resort to setting aioe to 0 connections just to quiet down the > log file. So does this... > Through extensive work on the aioe side, we've ruled out that AIOE is > creating the three messages (they're actually receiving three separate > messages) and, in fact, when it happens too much, they ban me for a > week or two (so there are negative repercussions from what Pan is doing > wrongly). ... and this... > What I "think" is happening, is this sequence (but I can't prove it): > a) Pan experiences intermittent connection errors I'm with you so far. > b) Pan sometimes sends single messages, and sometimes sends triplicate > messages. Well... sort of. But this is where you go wild, as I expect you're missing some of the pieces to the puzzle. > c) The aioe server is the innocent recipient of multiple messages, and > creates multiple message IDs for those messages Mostly true (tho the server could certainly try a bit harder to detect duplicates sent without message-id from the same user within a narrow timeframe), but misses the point, as you're missing pieces of the puzzle. > After experiencing this anomaly perhaps a dozen times in the past year, > I gave up trying to debug (because I just don't know how); and simply > set Pan so that it created a (hopefully unique) message ID. It took > about two weeks before aioe let me back on - and then, now, with Pan > setting unique message IDs, the problem went away. So, in a sense, it's > 'solved'. It's worth noting here that in current pan, the option to let the server assign message-id is gone. Pan always assigns a message-id now, but does have an option to let you set the domain used in the message-id if you like. Perhaps this sort of trouble is one reason why that option disappeared. > But, I ask, in the general sense: Have you seen something similar? I'm not sure I've ever seen this personally, but I believe I've seen it reported on the list a few times, and I understand enough about NNTP to have a pretty good idea of what's happening. If you like, you can have a look at RFC 3977, Network News Transfer Protocol (NNTP) (dated October 2006 and obsoleting early RFC 977 while updating RFC 2980, with a further update as RFC 6048, Nov 2010, altho that updates the NNTP LIST command only and thus isn't of interest to us): http://tools.ietf.org/html/rfc3977 Of interest to us is section 6.3, Article Posting, and in particular, 6.3.1, the POST command. Direct link to that: http://tools.ietf.org/html/rfc3977#section-6.3.1 Basically, when a post is sent, the server MUST return a response code indicating success (response code 240) or failure (probably 441). (Such codes often have a plain English description appended as well, for use by humans doing a manual telnet connection and/or for troubleshooting, but the numeric code is what a news client will be looking for.) A 340 code is the "OK to send article" response after the initial POST command, before the actual article content is posted. As covered in section 3.2, response codes... http://tools.ietf.org/html/rfc3977#section-3.2 For the first digit, 2xx series responses indicate command completed OK, 4xx series indicate syntax correct but failed for some reason, and 3xx indicates OK so far, send the rest. The second digit indicates the function response, with x4x being the posting function. There's posting examples at the end of section 6.3.1. =:^) Back to section 6.3.1, the POST command, now quoting from the RFC: >>>> Following the presentation of the termination sequence by the client, the server MUST return a response indicating success or failure of the article transfer. [...] A response of 240 SHOULD indicate that, barring unforeseen server errors, the posted article will be made available on the server and/or transferred to other servers, as appropriate, possibly following further processing. In other words, articles not wanted by the server SHOULD be rejected with a 441 response, rather than being accepted and then discarded silently. However, the client SHOULD NOT assume that the article has been successfully transferred unless it receives an affirmative response from the server and SHOULD NOT assume that it is being made available to other clients without explicitly checking (for example, using the STAT command). If the session is interrupted before the response is received, it is possible that an affirmative response was sent but has been lost. Therefore, in any subsequent session, the client SHOULD either check whether the article was successfully posted before resending or ensure that the server will allocate the same message-id to the new attempt (see Appendix A.2). The latter approach is preferred since the article might not have been made available for reading yet (for example, it may have to go through a moderation process). <<<< So basically, what pan's doing is waiting for that 240/441 response and acting accordingly. It's worth noting here that NNTP is a TCP based protocol, and TCP is connection-based, so if a packet is lost, there will be several retransmit requests, before finally giving up on that connection as it eventually times out at one end or the other, or is RESET. Thus the hangs are pan waiting for that response. If the connection is bad and it doesn't get it, there will be retransmission attempts, but if the connection is bad enough that they fail to reach pan as well... that last paragraph kicks in. Of course if pan's not setting the message-ids but instead letting the server do it, both it and the server have far less to go on when they try to pick up the pieces from the lost connection. As you can see in the quote, ensuring the same message-id is the recommended recovery option, and of course pan can't do that if it didn't set the message-id and thus doesn't know it. Thus as I said, this very issue very likely has something to do with why current pan always sets a message-id, but does allow the user to choose the domain name that appears as part of that message-id, if desired. So... I'd /bet/ that both pan and the server are observing at least the letter of the RFCs (the MUSTs), but not necessarily the spirit (as seen in the SHOULDs). The primary culprit here is the bad network conditions, but either side could do better. The server could do better by applying common-sense checks to messages posted within seconds of each other, particularly if they do not include the message-id, since in that case obviously the client isn't observing the recommended SHOULD, and is thus suspect. The client, pan, could do better by following the recommendation to ensure the same message-id, basically by assigning the message-id on its own, as is apparently still an option in the outdated version you're using, but is simply done without giving the user the choice, in current pan. So for your older pan version, you've "solved" the issue the RFC- recommended way, by ensuring that the same message-id gets used, by having pan set it. =:^) > Do you know how to debug why Pan is sending messages in triplicate? As I mentioned up top, the ngrep (network grep) tool is very useful for troubleshooting actual network traffic issues such as this, letting you grep the specific connection via IP and port, as well as regex-matching specific packets if you have a better idea what you're looking for than simply "everything from that connection." Here, for example, you could ngrep the connection (by IP and port) for POST, and for result codes of 240, 340 and 441. But I think the RFC and the behavior you see are clear enough that you shouldn't actually need to do that for this problem, tho of course following the wire-line logs of an actual session is likely to be quite educational and is thus something I'd recommend in its own right. =:^) FWIW, I used ngrep here, when configuring pan for the new header compression as well as the secure connections features, to verify that it was working as intended. (Obviously, ngrep won't yield much of use from the content of the secure connection except for verifying that it's no longer clear-text, so I set that up last, but the TCP packet headers will still be in the clear, or the packets couldn't be routed.) Your pan is of course too old to have those features but it's still worth doing an ngrep on the traffic at least once, just to see how the protocol actually works. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users