On 11 Jun 2003 Grzesiek Sedek <[EMAIL PROTECTED]> wrote: > Anyone have an idea how to extract clear text from inbox file (actual > file is from m$ entuage on mac called Messages) it got corrupded and > mail client does not read it. its quite big 500 Mb so I have to do it at > least semi automaticly. main problem are the attachments(I dont need > them)- they quite big, rest of content is text.
You do not describe what the contents of the file look like, so I must guess at what distinguishes attachments from message texts. My guess then is that the 500 Mb file is essentially a text file, and that the attachments you want to get rid of are big solid blocks of characters: long sequences of lines, all of the same length, without any spaces in them. If that is true, a simple sed command will suffice: sed -e '/^[^ ][^ ]*$/d' Messages > Messages_attachments_stripped This says: delete all lines that are not empty and do not contain spaces. Be careful. You may want to refine the regular expression that selects the lines to be deleted. As it stands, a line like ------------------------------- that someone may have used in a message text to make a line stand out as a header, will also be deleted, as well as lines delimiting parts of messages, like --346095821--1674543256--1308352331 Ben -- B.F.M. Kal Anjelierstraat 1, 2014 TC Haarlem, Netherlands tel +31 23 5324909, [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]