Re: Extracting Titles from a bunch of HTML files

Randal L. Schwartz Wed, 02 Jan 2002 07:34:55 -0800

>>>>> "Richard" == Richard S Crawford <[EMAIL PROTECTED]> writes:


Richard> I have a directory containing over 250 HTML files.  What is the best
Richard> way to extract the title (between the <TITLE> and </TITLE> tags) of
Richard> each file without having to open the file and read in the contents,
Richard> which seems like it would be very slow?

Psychic Powers?

No, seriously, you're gonna have to open it.  But you can minimize the
fuss and muss by using HTML::HeadParser, which reads a minimal amount
until it can see the entire head (usually a small part of the
document), and then gives you an interface to ask for just the title.

    use HTML::HeadParser;
    for (<*>) {
      print "$_: ";
      if ((my $hp = HTML::HeadParser->new)->parse_file($_)) {
        if (my $title = $hp->header->title) {
          print $title;
        } else {
          print "[none]";
        }
      } else {
        print "[cannot parse]";
      }
      print "\n";
    }

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[EMAIL PROTECTED]> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Extracting Titles from a bunch of HTML files

Reply via email to