Line-ending anomalies are one of the reasons I prefer using Perl over sed
for this kind of grunt work.
I chomp() all incoming lines then add \n on the outbound stream. Also like
to use a different s///g delimiter than / to avoid LTS (leaning toothpick
syndrome).
Gene
On Wed, 17 May 2000, Prentice wrote:
> Thanks for the help, guys. Below is a portion of my sed script. It just goes
> through a bunch of html files and replaces outdated URLs with the correct
> locations. All substitions are the same format.
>
> s/\/new\.gif/\/icons\/new.gif/g
>
>s/\/http\:\/\/.*\.pppl\.gov\/nstxhome\/nstx\/controls/nstx.pppl.gov\/local\/controls/g
>
>s/\/http\:\/\/.*\.pppl\.gov\/nstxhome\/nstx\/software/nstx.pppl.gov\/local\/software/g
> s/\/http\:\/\/.*\.pppl\.gov\/nstxhome\/nstxhome/nstx.pppl.gov/g
> s/\/http\:\/\/.*\.pppl\.gov\/nstxhome\/nstx/nstx.pppl.gov\/local/g
> s/\/iterhome\/iter/\/iter\/local_share/g
> s/\/iterhome/\/iter/g
> s/\/iter/\/iter/g
> s/\/database/\/iter\/database/g
>
> I think I may have solved my problem. Most of these html files were created on
> Macs or Windows PCs. Using tr, I replace the returns w/ a newline and then pipe
> it into sed:
>
> tr "[\r]" "[\n]" < ${1} | $sed > ${1}.new 2>/dev/null
>
> The "2>/dev/null" is to get rid of the sed errors about the missing newline on
> the last line of the file. This appears to be working with one exception. On
> the Macs and most PC editors, the files still appear normally after being
> massaged by tr and sed. The exception is Windows notepad - when the the altered
> files are opened w/ notepad, they become on long line with some junk characters
> appearing in lieu of newlines. Wordpad, Word and Netscape composer don't have
> any problems. I figure if anyone uses notepad and complains, I can just tell them
> to use a better editor (which would be just about anything else).
>
> Ideally, I would like to return the files to the original condition regarding
> newlines and returns, but putting
>
> tr "[\n]" "[\r]"
>
> on the end of the above pipe didn't do it. Any ideas?
>
>
>
> On Tue, 16 May 2000, Pete Peterson wrote:
> > The RedHat digest form of the redhat-list is severely hosed. Your message
> > just came today, though you sent it almost a week ago. The same thing
> > happened with a message *I* sent a week ago: it just appeared today,
> > although later messages have appeared previously. Perhaps you've
> > already received some suggestions, but I offer these anyhow:
> >
> > sed doesn't join lines unless you try very hard to make it do so (or it's
> > broken).
> >
> > It normally reads the input, one line at a time and runs all the commands
> > in sequence on that line. You *can* append multiple lines to the buffer
> > and delete the embedded newlines, but it doesn't happen gratuitously.
> >
> > Were you working, on Linux, with a file that was generated on a Unix
> > machine or some "foreign" file format such as DOS, Windoze, RSX-11, VMS,
> > CP/M, :-) ...? Perhaps there's some disagreement between sed and the
> > file on what constitutes a "line". Try "cat -v -e sm.html | less".
> > That will show up any strange characters, and each line should end with
> > "$" which is how that 'cat' command shows newline characters.
> >
> > It would be helpful if you showed us the content of 'substitutions.sed'
> > or at least a representative sample.
> >
> > Many Unix editors which, like me, are of the "do what l tell you, not
> > what you THINK I meant" persuasion, don't insist on putting a Newline at
> > the end of the file and some, like Emacs, allow you to specify whether
> > you want it to add one, ask you, or quietly accept what you gave it.
> > Some LINE-ORIENTED utilities, like sed, will give you a complaint if the
> > last line doesn't have a newline and will not consider it to be a "line"
> > if it doesn't. Just edit the file with vi, Emacs, or whatever your
> > favorite editor might be, insert the newline and that warning will go
> > away.
> >
> >
> > pete
> >
> >
> >
> > pete peterson
> > GenRad, Inc.
> > 7 Technology Park Drive
> > Westford, MA 01886-0033
> >
> > [EMAIL PROTECTED] or [EMAIL PROTECTED]
> > +1-978-589-7478 (GenRad); +1-978-256-5829 (Home: Chelmsford, MA)
> > +1-978-589-2088 (Closest FAX); +1-978-589-7007 (Main GenRad FAX)
> >
> >
> >
> >
> > > Date: Wed, 10 May 2000 17:24:07 -0400
> > > From: Prentice <[EMAIL PROTECTED]>
> > > To: [EMAIL PROTECTED]
> > > Subject: OT: help w/ sed
> > >
> > >
> > > I rearranged the layout of a webserver, and I need to fix URLs that are no
> > > longer correct in the html files. I wrote a sed script to do this, and it makes
> > > the substitutions correctly, but it strips the newlines of the ends of the
> > > lines so the output is one long line. Sed also issues a warning about now
> > > newline at the end of the file:
> > >
> > > $sed -f substitutions.sed sm.html
> > > sed: Missing newline at end of file somefile.html.
> > > <a bunch of output from sed that's one single line.... >
> > >
> > > How do I keep sed from removing the newlines at the end of each line? What is
> > > causing that error message?
> > >
> > > Prentice
> > > [EMAIL PROTECTED]
> > > Princeton Plasma Physics Lab
> > > http://www.pppl.gov
>
--
-----------------------------------
Gene Wilburn -}{- [EMAIL PROTECTED]
http://www.NorthernJourney.com
-----------------------------------
--
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.