>>>>> "Richard" == Richard S Crawford <[EMAIL PROTECTED]> writes:
Richard> I have a directory containing over 250 HTML files. What is the best
Richard> way to extract the title (between the <TITLE> and </TITLE> tags) of
Richard> each file without having to open the file and read in the contents,
Richard> which seems like it would be very slow?
Psychic Powers?
No, seriously, you're gonna have to open it. But you can minimize the
fuss and muss by using HTML::HeadParser, which reads a minimal amount
until it can see the entire head (usually a small part of the
document), and then gives you an interface to ask for just the title.
use HTML::HeadParser;
for (<*>) {
print "$_: ";
if ((my $hp = HTML::HeadParser->new)->parse_file($_)) {
if (my $title = $hp->header->title) {
print $title;
} else {
print "[none]";
}
} else {
print "[cannot parse]";
}
print "\n";
}
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[EMAIL PROTECTED]> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]