Bug#396810: moreutils: 'URLgrep' command wanted

A. Costa Thu, 09 Nov 2006 21:50:31 -0800

On Wed, 8 Nov 2006 15:23:44 -0500
Joey Hess <[EMAIL PROTECTED]> wrote:


> >     # usage: URLgrep pattern URL
> >     # (ad hoc grep switches return first instance of 'pattern' 
> >     # in URL and next line, with numbered lines.)
> >     URLgrep() { wget -o /dev/null --output-document=- "$2" |
> > html2text -ascii -nobs | grep -in -A 1 -m 1  "$1" ; }
> 
> The fact that this can be implemented as a simple shell pipeline (or
> more likely, as many different shell pipelines, depending on exact
> need) is a good indication that it's not a good candidate for
> moreutils.

I think you're at half right, if not for the same reasons, but let's
start with the other half...

The premise that a util should not exist based on its comparitive ease
of implementation seem questionable.  Why have a 'head' or 'tail' command
if a 'sed' or 'awk' one-liner can do the same?  

It's not likely that ordinary users would remember byzantine
switchery like '-o /dev/null --output-document=- "$2"' or '-ascii
-nobs'.  I can't, yet I wrote it; it doesn't seem "simple" to me.
It took a lot of trial and error to find switches that worked.
Even just having the knowledge that the 'wget' and 'html2text' utils
existed or could be piped isn't something to take for granted.

On the other hand you're right to notice that 'URLgrep' is too
ad hoc, that got me thinking the flaw of 'URLgrep' is that it's not
general enough.  A 'URLcat' would be much more general.

 URLcat() { wget -o /dev/null --output-document=- "$1" | html2text -ascii -nobs 
; }

(I geuss if it's called 'cat', there probably should be a loop so it
can take a lot of URLs and actaully 'cat' them...)

Then we can pipe that to 'grep' or 'wc' or just about anything, and have
all the command line switches of those utils for free.  It's even
"simpler" than before and yet a much more versatile tool.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Bug#396810: moreutils: 'URLgrep' command wanted

Reply via email to