on Fri, Dec 28, 2001 at 11:12:49AM -0800, Karsten M. Self (kmself@ix.netcom.com) wrote: > on Sun, Dec 23, 2001 at 10:49:46AM -0900, Christopher S. Swingley ([EMAIL > PROTECTED]) wrote: > > I need to write a program the extracts the ASCII text portion > > of email messages for insertion into a database. I looked at the > > libmailtools-perl package, but it doesn't look like it can deal with > > the annoying variety of mail that I may need to parse (The silly +'s > > at the end of lines, MIME-attached HTML, vcards, etc.). > > > > What I want is a filter that I pass an email in, and out pops the > > ASCII, 72-line width formatted message. All attachments, HTML mail, > > vcards and strangeness is removed. > > I'm looking for something vagely similar. > > I think what I'm looking for is a tool that will strictly decode > printed-quotable mail, base64-encoded mail, and other representations > that don't resolve as plaintext. I _don't_ need to resolve HTML or > other tagging formats. > > The objective is to get the mail body into a form that can be scanned > for website references. I use this as part of my spam response system, > with a script that extracts URLs, strips these to the host portion, > resolves the IP, queries WHOIS, and parses this for response email > addresses. > > This isn't possible on messages which are quoted printable (though this > appears to be possible by converting the string "=2E" to "."), or > otherwise encoded (the plaintext isn't available). > > I've explored a number of options, including munapct, uudecode, > metamail, but none appears to do what I want reliably. My current > workaround is to pipe a message segment from the "view-attachments" menu > within mutt. I'd like to be able to run this from either the index > mode, or against an mbox or maildir folder.
uudeview was suggested to me off list. However it doesn't seem to allow for pipelining data in and/or out. And when it does, you're liable to get a graphic image file dumped in a directory someplace. Grumble. -- Karsten M. Self <kmself@ix.netcom.com> http://kmself.home.netcom.com/ What part of "Gestalt" don't you understand? Home of the brave http://gestalt-system.sourceforge.net/ Land of the free We freed Dmitry! Boycott Adobe! Repeal the DMCA! http://www.freesklyarov.org Geek for Hire http://kmself.home.netcom.com/resume.html
pgpzfPibDquaW.pgp
Description: PGP signature