Re: Help with regex

Rasoul Hajikhani Tue, 18 Sep 2001 10:51:21 -0700
Curtis,
thanks for your email. My $args is just a string. Would that make any
difference?
-r

Curtis Poe wrote:
> 
> --- Rasoul Hajikhani <[EMAIL PROTECTED]> wrote:
> > Hi there,
> > I am trying to match an expression that would perform different tasks
> > depending on the returned value:
> >
> >       #if (arguments begin with "<A HREF=")
> >       if ($args =~ /^\<A HREF=.*/i)
> >       {
> >               # do this
> >       }
> >       else
> >       {
> >               # do this
> >       }
> >
> > but it always fails to return any thing. Can some one tell me what am I
> > doing wrong? Appreciate all the help...
> > -r
> 
> Parding HTML with a regular expression is difficult and error-prone.  I would 
>strongly recommend
> against.  The following snippet only works for a very small test case:
> 
>     foreach my $args ( <DATA> ) {
>         if ($args =~ /^<\s*a\s*href\s*=/i) {
>             print "HREF: $args";
>         } else  {
>             print "Not and HREF: $args";
>         }
>     }
>     __DATA__
>     <a href="test.cgi">
>     <a hREf = "something_else.htm">
>     <a name="bob">
>     <a    href   =   '#bob'>
> 
> Knowing how your data gets into the system is at least as important as how your data 
>leaves the
> system.  Knowing your data source allows you to craft a better solution to the 
>problem.  For
> example, consider your regex:
> 
>     /^\<A HREF=.*/i
> 
> What is the source of the data?  Is it generated by another process or could humans 
>affect it?
> There are several places where you can insert whitespace into that anchor tag, have 
>valid HTML,
> and cause your regex to fail.  Here's an example which will break code *and* mine:
> 
>     <a
>      href=
>      "somefile.html"
>     >
> 
> That's annoying, but some of the documents I get have HTML formatted like that.  
>Also, you don't
> need the dot star at the end.  You don't use that information and forcing the regex 
>engine to
> match it is wasteful.
> 
> I would recommend learning to use HTML::TokeParser or a similar module to parse 
>HTML.  If you are
> only extracting links, try HTML::LinkExtor.
> 
> Cheers,
> Curtis "Ovid" Poe
> 
> =====
> Senior Programmer
> Onsite! Technology (http://www.onsitetech.com/)
> "Ovid" on http://www.perlmonks.org/
> 
> __________________________________________________
> Terrorist Attacks on U.S. - How can you help?
> Donate cash, emergency relief information
> http://dailynews.yahoo.com/fc/US/Emergency_Information/

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: Help with regex

Reply via email to