Volker Wysk posted on Fri, 27 Jul 2018 20:51:49 +0200 as excerpted: > Hi! > > I've set up pan, and it works fine, but for one problem. When I save > an article's attachments (to /tmp/news in this case), I get this: > > ------------------------------snip------------------------------ > desktop /tmp/news $ ls -l > -rw-r--r-- 1 v v 82911 Jul 27 20:41 THBC01-06 - Trisha Yearwood Wvocal - > Powerful Thing.zip > -rw-r--r-- 1 v v 126 Jul 27 20:41 THBC01-06 - Trisha Yearwood Wvocal - > Powerful Thing.zip.ERRORS > > desktop /tmp/news $ cat THBC01-06\ -\ Trisha\ Yearwood\ Wvocal\ -\ Powerful\ > Thing.zip.ERRORS > Warning: Data looks suspicious. Decoded file might be corrupt. > Warning: Data looks suspicious. Decoded file might be corrupt.
> desktop /tmp/news $ unzip -v *zip > Archive: THBC01-06 - Trisha Yearwood Wvocal - Powerful Thing.zip > error [THBC01-06 - Trisha Yearwood Wvocal - Powerful Thing.zip]: missing > 2995200 bytes in zipfile > (attempting to process anyway) > Length Method Size Cmpr Date Time CRC-32 Name > -------- ------ ------- ---- ---------- ----- -------- ---- > 2801664 Defl:N 2713384 3% 2005-08-09 13:25 e19a6a15 THBC01-06 - > Yearwood, Trisha wvocal - Powerful Thing.mp3 > 1260096 Defl:N 364051 71% 2005-08-03 22:30 5b936c3c THBC01-06 - > Yearwood, Trisha wvocal - Powerful Thing.cdg > -------- ------- --- ------- > 4061760 3077435 24% 2 files > ------------------------------snip------------------------------ > > The subject is "Attn : THBC fill-ins - "THBC01-06 - Trisha Yearwood Wvocal - > Powerful Thing.zip" yEnc (13/13)". > Could it have to do something with yEnd? > > I've tried it with various news postings, and it's always the same. > > Any hints? I only rarely do binaries and I've not seen *.ERRORS files like that, so I can't say for sure, but I've been using pan and on this list for over a decade and a half now, and more importantly for this particular issue I have some pre-pan knowledge of yenc functionality and somewhat controversial history, and USENET binary experience going back pre-pan (I was much more active with binary downloading up until about a decade ago) as well, and based on that I believe I have a reasonable idea what's going on. 1) Yenc, unlike earlier encoding formats, records the file size and a checksum (IDR which and it's not important enough to go look it up ATM, but at a guess I'd say either CRC or md5), so again unlike earlier encoding formats, yenc *CAN* actually say with reasonable certainty that something's corrupt, if the file size and checksums don't match what it's told they should be. Which explains the *.ERRORS files -- size and checksum aren't matching. (As the zip listing shows, zip records size and checksum as well, and it sees corruption too, but that's expected when yenc is already spitting errors in the decoding.) 2) Also unlike the earlier encoding types, yenc took express advantage of specific news peering characteristics, namely, the fact that (unlike standard internet message formats used in mail and earlier news standards which had to stay clean-conversion-compatible with all sorts of 7-bit and other technically limited formats) news is /almost/ 8-bit-clean, allowing it to be much more efficient than the standard 4/3 encoding size expansion (aka 33% encoding overhead, 4 bytes of encoding stored only 3 bytes of original data) of the time, with only ~5% overhead. In practice this means that should a yenc-based message get transferred using non-8-bit-clean methods instead of staying on the direct news-peering network that yenc was specifically designed to take advantage of, it can get corrupted. It's /possible/ that's what you're seeing, particularly if it's only messages from specific posters and/or posters uploading via specific news providers that are exhibiting the problem. If this is the case, it may be possible to compare path headers from affected and unaffected posts and isolate the problem path component that the affected posts have in common. This can help if you have access to accounts on multiple news providers that may have different peering and thus different routing for the affected posts, or if you can have someone on a different provider that gets them "clean" because they don't take the problem path reupload them to you, or if you have a provider that may be willing to work with you to try to clean up their receiving path, possibly by changing their peering a bit to avoid the bad path component. 3) The other possibility, more probable especially if you're using a low quality provider like the ISP's own news services tended to be (back in the day when they provided news, few do these days), is entirely unrelated to yenc except that yenc, due to the size and checksum recording, may help you spot the corruption easier/sooner. Backing up a bit... So the ls of the zip file says it's ~82 KiB in size, but it should be, according to unzip, ~3 MiB in size. Obviously something's missing! First the obvious: Are all the parts showing up? Is pan displaying an assembled puzzle icon before you attempt to download? If it's still an unassembled puzzle icon, then pan knows all the parts aren't there. You can still force pan to download, because for some files it's still possible to play them, with some corruption and skipping the missing part, particularly if the first part is there, and sometimes that's useful. But zip files, etc, that normally won't work, unless of course there's par files available and there's enough of them usable to repair the missing and corrupted parts. Second, are there errors in the log? Particularly on low quality servers and/or when messages are already expiring, it's possible the server will /say/ everything's there, but some parts will actually be missing when you go to download them. Third, some parts may simply be corrupt, and possibly much (!!) smaller than they should be. One thing that at least /used/ to be common on low quality providers, again, often ISP level (because unlike the dedicated providers they aren't selling news directly, they're selling a connection to the net, so unlike dedicated news providers, you'll likely still continue to pay the ISP if the news service stinks), was that either they or their peering connections would limit article sizes. The most common symptom of this would be a bunch of last parts showing up, because they'd be smaller than the full size parts, which wouldn't fit thru the size filter. Another possibility, a favorite trick of the RIAA/MPAA etc types, is to deliberately supersede just one part of each multi-part with a corrupt version of the same, thus corrupting the entire download. There are three ways to avoid that, two for the downloader, one for the uploader. There are techniques I'm not familiar with to make it more difficult to supersede/cancel, and the uploader can use them. For the downloader, it's either get there immediately after the message posted, hopefully before the supersede has had its chance to work so you get the original uploaded version not the corrupted supersede, OR, use a good quality provider that doesn't honor cancels/ supersedes in the binary groups, precisely to avoid problems such as this. (Of course at least in the US the news providers must honor copyright takedown requests or be responsible for the violations themselves, but that's a longer term thing, and I expect they take down the whole thing, not just individual parts. IOW, a good provider won't honor binary cancels/supersedes, but WILL very likely have to honor full takedown requests, tho those typically take some time to process. Anyway, there again, getting there as soon as the parts are complete is clearly best, but...) (They use this trick on P2P as well, saying they have a good copy of just one chunk of many, but it's corrupt, thus corrupting the entire thing. Smart P2Pers and/or the best P2P clients learn how to avoid that as well, by tracking "bad" peers, etc.) 4) Finally, while it appears unlikely here due to the size differential, it's possible for a decoder such as pan to assemble the parts in the wrong order. Pan used to have a bug that made this somewhat common, but that one was fixed some years ago now, and I know of no such bug in reasonably current versions. But it could still happen under unusual circumstances like say the poster mislabeling the order. If you suspect something like this (as I said, unlikely here as the files are smaller, not just corrupted), it's possible to have pan save the raw message files as "text", still encoded. You'd then do the decoding and assembly manually, likely at the terminal, using another tool such as uudeview, using the flexibility of the manual process to specify a custom assembly order and/or do other similar troubleshooting. So as they say, the above is a bit of a crapshoot, but hopefully it's /somewhat/ helpful at least, if in no other way, perhaps at least by providing a bit of technical background info on yenc. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users