On Wed, Nov 26, 2025 at 12:29:14 -0500, [email protected] wrote: > Does anybody here know of an AWK or sed program to convert mbox files to HTML? > [...] > I know that maildir is the currently favored approach for mail storage, but I > have well over 100 MB of emails (or pseudo emails) stored in mbox files, and > want to convert them for easy viewing on the Internet (by anyone).
Why did you specifically ask for awk or sed? They don't seem like the best choices for programming languages to implement this. With that large of an input, I would avoid bash. It'll be slow. Also, it has no useful libraries. You're processing a large amount of text, in a fairly well-defined format, so any language that's good at text processing should do the job. Perl, Python, or Tcl would be my picks, but that's probably my personal bias. I'm guessing that what you want to end up with would be a directory containing one file per message, plus some sort of index.html file that links to all of them. If all the messages were plain pre-MIME "header and body", you could probably write a program to do that in less than an hour. It's going to be tricky if you need to parse MIME attachments. At that point, you'll probably need to break out whatever MIME libraries your chosen language has. Even if it's just to discard the attachments, using a MIME library is a better approach than scrubbing out the MIME metadata lines with raw text manipulation. If you actually want to preserve and link to the attachments, then the MIME libraries become indispensable. Finally, you need to think about what you want to do with multipart messages. A whole bunch of email these days is written in either HTML or some kind of "rich text", and then gets sent out as a multipart message, with the original HTML (or rich text converted to HTML) as the "preferred" part, and the same HTML or rich text converted to regular text as a "fallback" part. Would you attempt to offer both parts somehow? Or just offer the HTML part "as is" (probably with some of the headers reattached above it)?

