On 11/10/05, Sourabh Bora <[EMAIL PROTECTED]> wrote: > hi,, > i am making a small tool for offline web browsing.. For this I need > to change the source of html files. > let me explain:: > > In a web page the hyper links are written as > > href="http://www.micronux.com/catalog/" > > i want this particular string to convert to > > > > href="./micronux.com_catalog" > > The logic is --1)delete > http://www. > 2) replace '/' '?' etc with '_' > > I want to write a script using sed or awk which will do all the conversion > in a file..
Since the responses so far have suggested alternatives, rather than how to do what you're asking, and knowing how to do this sort of thing is valuable in and of itself, here are some examples, though not a complete sed script. In fact, I'm going to use perl's regular expressions, since those are the ones with which I'm most familiar. s|http://www\.||i This will do a case-insensitive replacement of "http://www." with "". The "|" is a delimiter around the search target and the replacement. The standard delimiter is a slash, but then you'd have to write s/http:\/\/www\.//i which is a bit harder to read. For the other replacements, \w will match "word" characters, where word characters are a-z, A-Z, 0-9, and _. You can do the replacement as s|[^\w.\s]|_|g I've used "|" again for consistency. The "g" at the end tells perl to do this replacement as many times as it can on the current line. The expression in brackets means not (^) word characters (\w), periods (just . here), or whitespace (\s). To prepend the "./", you can do s|^|\./| where the carat now matches the beginning of the line. If you have a URL saved as $url, you can then do the following in perl: $url =~ s|http://www\.||i; $url =~ s|[^\w.\s]|_|g; $url =~ s|^|\./|; Note that this doesn't quite do what you want, since it produces a trailing "_". I'll leave getting rid of that as an exercise, with the added note that "$" matches the end of a line. -- Michael A. Marsh http://www.umiacs.umd.edu/~mmarsh http://mamarsh.blogspot.com