This is OT, but I thought I'd start with this list as it is the list that I deal with more than any other. If no one here can help, suggestions for a better list to try will be appreciated.
I've never used Perl, but I'm hoping Perl can do the job for me. What I need to do: I have multiple large files (one example is 5.4 MB). It is essentially a data dump from a database--I have no control over the database or the format ofe dump. The file is ugly, with lots of extraneous characters--I want to run a series of regular expression search and replace commands over the file to clean it up. Some of the things that may make it tough: * In essence, there are no line breaks (0Ah) (or 0Dh)--in essence, there is one long 5.4 MB line (well, there are 4 line breaks for some short lines at the beginning of the file, maybe somewhere between 32 and 80 characters on each of those 4 lines. * The file can, and often will have UTF-8 characters in it (iiuc--the file contains URLs, some of which, I'm sure, can include UTF-8 characters, or maybe some other encoding??). The search and replace doesn't particularly have to handle the UTF-8 search terms (because the keywords and punctuation I will search on will be plain ASCII), but any UTF-8 characters have to remain "intact" after the search and replace. I'm hoping that I can write a Perl script that may be something like this: Code to open a file (which I will need to learn / find) Multiple statements of the form "s/<search regular expression>/<replace regular expression/g (Aside, the replace probably doesn't have to be a regular expression, it will need to include things like line break characters (\n).) I did try to do this with one of the editors I use (I started with Kate), but kate breaks that 5.4 MB "line" into multiple lines of about 4096 bytes / characters (at inconvenient places), and, although I got the job (almost) done, it required a lot of manual intervention / correction, so I want to automate it with a tool that can work on very long lines without inserting line breaks (other than those I require). If some simpler tool can do the job, I'll consider that as well (I have occasionally used awk, and maybe sed (I don't think sed ever proved useful for me). Any help appreciated.