Line-ending anomalies are one of the reasons I prefer using Perl over sed
for this kind of grunt work.

I chomp() all incoming lines then add \n on the outbound stream. Also like
to use a different s///g delimiter than / to avoid LTS (leaning toothpick
syndrome).

Gene

On Wed, 17 May 2000, Prentice wrote:

> Thanks for the help, guys. Below is a portion of my sed script. It just goes
> through a bunch of html files and replaces outdated URLs with the correct
> locations. All substitions are the same format. 
> 
> s/\/new\.gif/\/icons\/new.gif/g
> 
>s/\/http\:\/\/.*\.pppl\.gov\/nstxhome\/nstx\/controls/nstx.pppl.gov\/local\/controls/g
> 
>s/\/http\:\/\/.*\.pppl\.gov\/nstxhome\/nstx\/software/nstx.pppl.gov\/local\/software/g
> s/\/http\:\/\/.*\.pppl\.gov\/nstxhome\/nstxhome/nstx.pppl.gov/g
> s/\/http\:\/\/.*\.pppl\.gov\/nstxhome\/nstx/nstx.pppl.gov\/local/g
> s/\/iterhome\/iter/\/iter\/local_share/g
> s/\/iterhome/\/iter/g
> s/\/iter/\/iter/g
> s/\/database/\/iter\/database/g
> 
> I think I may have solved my problem. Most of these html files were created on
> Macs or Windows PCs. Using tr, I replace the returns w/ a newline and then pipe
> it into sed:
> 
> tr "[\r]" "[\n]" < ${1} | $sed > ${1}.new 2>/dev/null
> 
> The "2>/dev/null" is to get rid of the sed errors about the missing newline on
> the last line of the file. This appears to be working with one exception. On
> the Macs and most PC editors, the files still appear normally after being
> massaged by tr and sed. The exception is Windows notepad - when the the altered
> files are opened w/ notepad, they become on long line with some junk characters
> appearing in lieu of newlines. Wordpad, Word and Netscape composer don't have
> any problems. I figure if anyone uses notepad and complains, I can just tell them
> to use a better editor (which would be just about anything else).
> 
> Ideally, I would like to return the files to the original condition regarding
> newlines and returns, but putting 
> 
> tr "[\n]" "[\r]"  
> 
> on the end of the above pipe didn't do it.  Any ideas?
> 
> 
> 
> On Tue, 16 May 2000, Pete Peterson wrote:
> > The RedHat digest form of the redhat-list is severely hosed.  Your message
> > just came today, though you sent it almost a week ago.  The same thing
> > happened with a message *I* sent a week ago: it just appeared today,
> > although later messages have appeared previously.  Perhaps you've
> > already received some suggestions, but I offer these anyhow:
> > 
> >   sed doesn't join lines unless you try very hard to make it do so (or it's
> >   broken). 
> > 
> >   It normally reads the input, one line at a time and runs all the commands
> >   in sequence on that line.  You *can* append multiple lines to the buffer
> >   and delete the embedded newlines, but it doesn't happen gratuitously.
> > 
> >   Were you working, on Linux, with a file that was generated on a Unix
> >   machine or some "foreign" file format such as DOS, Windoze, RSX-11, VMS,
> >   CP/M, :-)  ...?   Perhaps there's some disagreement between sed and the
> >   file on what constitutes a "line".  Try "cat -v -e sm.html | less".
> >   That will show up any strange characters, and each line should end with
> >   "$" which is how that 'cat' command shows newline characters.
> > 
> >   It would be helpful if you showed us the content of 'substitutions.sed'
> >   or at least a representative sample.
> > 
> >   Many Unix editors which, like me, are of the "do what l tell you, not
> >   what you THINK I meant" persuasion, don't insist on putting a Newline at
> >   the end of the file and some, like Emacs, allow you to specify whether
> >   you want it to add one, ask you, or quietly accept what you gave it.
> >   Some LINE-ORIENTED utilities, like sed, will give you a complaint if the
> >   last line doesn't have a newline and will not consider it to be a "line"
> >   if it doesn't.  Just edit the file with vi, Emacs, or whatever your
> >   favorite editor might be, insert the newline and that warning will go
> >   away.
> > 
> > 
> >           pete
> > 
> > 
> > 
> >         pete peterson
> >         GenRad, Inc.
> >         7 Technology Park Drive
> >         Westford, MA 01886-0033
> > 
> >         [EMAIL PROTECTED] or [EMAIL PROTECTED]
> >         +1-978-589-7478 (GenRad);  +1-978-256-5829 (Home: Chelmsford, MA)
> >         +1-978-589-2088 (Closest FAX); +1-978-589-7007 (Main GenRad FAX)
> >  
> > 
> > 
> > 
> > > Date: Wed, 10 May 2000 17:24:07 -0400
> > > From: Prentice <[EMAIL PROTECTED]>
> > > To: [EMAIL PROTECTED]
> > > Subject: OT: help w/ sed
> > > 
> > > 
> > > I rearranged the layout of a webserver, and I need to fix URLs that are no
> > > longer correct in the html files. I wrote a sed script to do this, and it makes
> > > the substitutions correctly, but it strips the newlines of the ends of the
> > > lines so the output is one long line. Sed also issues a warning about now
> > > newline at the end of the file:
> > > 
> > > $sed -f substitutions.sed sm.html
> > > sed: Missing newline at end of file somefile.html.
> > > <a bunch of output from sed  that's one single line.... >
> > > 
> > > How do I keep sed from removing the newlines at the end of each line? What is
> > > causing that error message?
> > >
> > > Prentice
> > > [EMAIL PROTECTED]
> > > Princeton Plasma Physics Lab
> > > http://www.pppl.gov
> 

-- 
-----------------------------------
Gene Wilburn -}{- [EMAIL PROTECTED]
  http://www.NorthernJourney.com
-----------------------------------


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.

Reply via email to