Thanks for the helpful reply -- some comments interspersed below: On Wednesday, November 26, 2025 01:30:09 PM Greg Wooledge wrote: > On Wed, Nov 26, 2025 at 12:29:14 -0500, [email protected] wrote: > > Does anybody here know of an AWK or sed program to convert mbox files to > > HTML? [...] > > I know that maildir is the currently favored approach for mail storage, > > but I have well over 100 MB of emails (or pseudo emails) stored in mbox > > files, and want to convert them for easy viewing on the Internet (by > > anyone). > > Why did you specifically ask for awk or sed? They don't seem like the > best choices for programming languages to implement this.
I thought they would be languages I could reasonably "handle" -- Perl, C[++], and Python (and TCL), for example I have little knowledge or experience with. (The last general purpose languages I was reasonably fluent in were Algol and Pascal. (I might be forgetting some.)) If I found a reasonably well written and well documented program in some other language that already does most of what I need, I imagine that I could modify it as required. > With that large of an input, I would avoid bash. It'll be slow. Also, > it has no useful libraries. > > You're processing a large amount of text, in a fairly well-defined format, > so any language that's good at text processing should do the job. Perl, > Python, or Tcl would be my picks, but that's probably my personal bias. > > I'm guessing that what you want to end up with would be a directory > containing one file per message, plus some sort of index.html file that > links to all of them. I hadn't thought that far ahead, but that seems like a good approach. > If all the messages were plain pre-MIME "header and > body", you could probably write a program to do that in less than an hour. > > It's going to be tricky if you need to parse MIME attachments. At that > point, you'll probably need to break out whatever MIME libraries your > chosen language has. Even if it's just to discard the attachments, using > a MIME library is a better approach than scrubbing out the MIME metadata > lines with raw text manipulation. If you actually want to preserve and > link to the attachments, then the MIME libraries become indispensable. Yeah, MIME. The "pseudo emails" I referred to are basically my own plain text notes without attachments. But, I do want to deal with "real" emails as well and will have to deal with MIME -- requires more thought, or I may defer that until some indefinite time in the future. > Finally, you need to think about what you want to do with multipart > messages. A whole bunch of email these days is written in either HTML > or some kind of "rich text", and then gets sent out as a multipart > message, with the original HTML (or rich text converted to HTML) as the > "preferred" part, and the same HTML or rich text converted to regular > text as a "fallback" part. Would you attempt to offer both parts > somehow? Or just offer the HTML part "as is" (probably with some of > the headers reattached above it)? Like MIME, my notes are not multipart, and I may defer that until some indefinite time in the future.

