On Wed, Nov 26, 2025 at 3:40 PM <[email protected]> wrote:
>
> Does anybody here know of an AWK or sed program to convert mbox files to HTML?

I don't think sed and awk are good choices for the task.

In the past, I wrote a C++ program to parse an mbox file and analyze
the files in the collection.

I had a terrible time parsing subject: lines with emojis and printing
them.  That's UTF-8 encoding per RFC2047, and it looks like
"=?utf-8?b?4pyF?=".  The parsing and conversion from UTF-8 was not
bad.  And conversion to PDF was not bad.  But all the open source
tools, like LibreOffice and OpenOffice, could not print them properly.

> My google (well, ddg) fu has not been very helpful -- I've turned up 
> proprietary programs to do that, most of which run on Windows :-(

Try <https://www.google.com/search?q=parse+mbox+site:github.com>.

> I need to have the source code as I will need to modify the conversion in 
> some special ways.
>
> I guess I could use something other than AWK or sed, but I'm reluctant to use 
> (and learn) some other language (including things like Perl, C[++], or 
> Python, although I think I'd like the syntax of Python the best, just wish it 
> was compiled instead of interpreted (P-code, iiuc)).
>
> I know that maildir is the currently favored approach for mail storage, but I 
> have well over 100 MB of emails (or pseudo emails) stored in mbox files, and 
> want to convert them for easy viewing on the Internet (by anyone).

One last point... the mbox format is specified in RFC 4155,
<https://datatracker.ietf.org/doc/rfc4155/>.

Jeff

Reply via email to