On Wed, Nov 26, 2025 at 3:40 PM <[email protected]> wrote: > > Does anybody here know of an AWK or sed program to convert mbox files to HTML?
I don't think sed and awk are good choices for the task. In the past, I wrote a C++ program to parse an mbox file and analyze the files in the collection. I had a terrible time parsing subject: lines with emojis and printing them. That's UTF-8 encoding per RFC2047, and it looks like "=?utf-8?b?4pyF?=". The parsing and conversion from UTF-8 was not bad. And conversion to PDF was not bad. But all the open source tools, like LibreOffice and OpenOffice, could not print them properly. > My google (well, ddg) fu has not been very helpful -- I've turned up > proprietary programs to do that, most of which run on Windows :-( Try <https://www.google.com/search?q=parse+mbox+site:github.com>. > I need to have the source code as I will need to modify the conversion in > some special ways. > > I guess I could use something other than AWK or sed, but I'm reluctant to use > (and learn) some other language (including things like Perl, C[++], or > Python, although I think I'd like the syntax of Python the best, just wish it > was compiled instead of interpreted (P-code, iiuc)). > > I know that maildir is the currently favored approach for mail storage, but I > have well over 100 MB of emails (or pseudo emails) stored in mbox files, and > want to convert them for easy viewing on the Internet (by anyone). One last point... the mbox format is specified in RFC 4155, <https://datatracker.ietf.org/doc/rfc4155/>. Jeff

