Thufir posted on Mon, 10 Sep 2012 04:52:28 -0700 as excerpted: > Is there a tutorial for yenc? I've read the wikipedia entry and am as > clueless as when I started. > > I did find: > > http://packages.ubuntu.com/precise/news/python-yenc > > but it has no manpage entry :( > > > What do you do with the multipart attachments after you've downloaded > them? > > From going through the list archives on gmane, it looks like pan is > supposed to magically handle multi-part yenc attachments spread across > multiple messages? > > However, what about videos? Or, x type file?
There's two kinds of splitting, pre-splitting, where the files are normally split before posting so there's actually several files, each a piece of the larger file, posted, and attachment splitting, where a single file is split into multiple individual messages, each with a part of the attachment, but the attachment isn't complete without all parts. Because it's the posting app that does attachment splitting, if someone ended up with all the parts but one or two and asked for a repost of just those parts, the original poster would often not have them since they were split on-the-fly, and if nobody else got the part correctly either and could post just it, the original poster would have to repost the entire set again, that being the only way to get the same split. (This was a much bigger problem back before PAR files became common. The error correction and data redundancy they provide makes it possible to recover from individual corrupted or missing messages these days, as long as not too many are missing and the PARs were posted.) For that reason, the biggest files would often be pre-split into smaller individual files before posting. These smaller files would in turn be attachment split as described above. On the download end, the news client generally handles attachment-split reassembly automatically... as long as all the parts are there. This is the bit that pan does transparently. As long as all the parts are there, you don't even see that it's a bunch of individual messages combined to allow recreation of the attachment at all, pan only lists a single entry for the whole set of posts required to reassemble that file. (If parts are missing, pan still lists it as a single entry, but with an x/y indicator as part of the subject line so you know how many parts are available vs missing. You can still force pan to reassemble what it has, using the forced "read message" function, but the attachment will be corrupt due to missing data.) Pan, and yenc, is entirely pre-split file agnostic. Each of the pre- split pieces was posted as a separate file and that's what pan downloads and saves, the smaller individual pre-split files, which the user must manually reassemble after saving, just as the poster pre-split before posting. (Tho some of the power-posting apps handle pre-splitting based on pre-configured parameters as well. But they're still posted as smaller individual files that must be reassembled into a whole.) How you do that reassembly depends on how the original larger file was pre-split. In the simplest case, it was simply split into equal size chunks, nothing added, nothing subtracted, with each chunk numbered appropriately. Ideally a series of 10 chunks will be numbered *.01 thru *.10, not *.1 thru *.10, and similarly a series of 100 chunks will be numbered *.001 thru *.100, not *.1 thru *.100, thus preserving file listing order. In this case, a simple redirected cat (short for conCATenate) command suffices for recombining: cat file.mpg.* > file.mpg (That's Unix/Linux. On MS back in the DOS days anyway it was similar, but using the copy command. Alternatively, the poster would often include a file.mpg.bat batchfile script for reassembly. Downloaders could use the script or just type the command themselves, but the existence of a *.bat file was a sure sign that this type of splitting had been used, so it was nice to see it even on Linux, since it meant the simple cat command method would work. I'd assume it remains about the same today.) If the numeric file suffixes weren't 0-prefixed appropriately, it's a bit more difficult, but still /reasonably simple. Taking the *.1 thru *.100 example (make sure you have wrapping turned off to view this, the below assumes you know the wildcards aren't going to match unintended files, presumably because you're working in a subdir that only contains the pieces you want to assemble into one): # first combine the single-digit suffix files, creating a # double-digit-suffix file that orders before any of the existing ones cat file.mpg.? > file.mpg.01 # next combine the double-digit suffix files # (including the one created above) cat file.mpg.?? > file.mpg.001 # OK, now the triple-digit suffixes cat file.mpg.??? > file.mpg # Now test, and once you're sure it works, delete all the parts rm file.mpg.* Another common type of pre-split file that uses sequential numeric extensions is the RAR archive format. This archive format seems to be most common in East Asia and thus from East Asian posters. It's much like zip or the combined tar.* compression AND archive formats in that it's a compression as well as archive format, and thus can contain whole directories. But unlike zip and tar.*, the archiver's ability to split the archive at preset sizes is often used as well, with these parts then posted. For these files you simply use unrar (gpg but unarchives only, rar itself is proprietary) or some other (un)archiver that handles rar files, since the split is a native part of the format and the unarchiving process thus knows how to reassemble before unarchiving. IIRC, this format can normally be identified by the fact that in addition to the numbered files, one file (IIRC the first part, but it's been years...) is simply *.rar. Then there's the various proprietary splitters, some of which append or prepend various metadata to each chunk, thus requiring the same software, often MS-only, for reassembly, Fortunately, because these DO require specific software for reassembly, they don't tend to be very popular. Finally, there's the PAR formats. I actually only worked with PAR-1 files back in the day (I know PAR-2 was more common later, and for all I know there's PAR-3+ now...), and don't remember the specifics too well, but the general idea is that these provide additional redundant data for error correction. 10-30% of the parts may be missing and the data may still be recoverable, provided you have enough PAR files. IIRC at least some of these use *.par.N extensions, with the N being a numeric, of course. You don't need these unless the main post is corrupted or partially missing as they're only used for recovery of missing/corrupted data, but if you DO need to recover missing/corrupted data, definitely get help from someone with more current experience with these than I have. That should be a reasonable general overview, anyway... With any luck, you'll only have to do the redirected-cat style assembly. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users