On Thu, Jan 1, 2009 at 10:16, <[email protected]> wrote:
snip
> use HTML::TableExtract;
snip
This makes it possible to use the code in HTML::TableExtract.
snip
> $te = HTML::TableExtract->new( headers => [qw(Date Price Cost)] );
snip
This is, in fact, creating (instantiating) a new object of the class
HTML::TableExtract. The constructor is taking an argument that tells
it what the expected headers to look for are.
snip
> $te->parse($html_string);
snip
This line is telling the object to look for the table specified by the
headers from the constructor in the HTML in contained in $html_string.
If this method call returns a true value then $te will contain the
data from the tables that matched. If it returns a false value then
no table matched.
You can get a handle to each table found by calling the tables method
on $te. You should be able to call the column method on each table to
print the desired column, but there appears to be a bug in at least
the latest version of the code (2.10 dating from 2006). The bug
occurs around line 900. He is trying to use a row object as an index.
You can change that function to look like this:
sub column {
my $self = shift;
my $c = shift;
my @column;
my $r;
foreach my $row ($self->rows) {
push(@column, $self->cell($r++, $c));
}
wantarray ? @column : \...@column;
}
Here is a program similar to what you described.
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
my $te = HTML::TableExtract->new(headers => [qw/foo bar baz/])
or die "could not create table extract object\n";
#read in all of the lines from the DATA section and join them
#into one scalar to pass to the parse method
$te->parse(join "", <DATA>)
or die "could not find table\n";
my $i = 1;
for my $table ($te->tables) {
print "table $i column 0:\n";
$i++;
for my $cell ($table->column(0)) {
print "\t$cell\n";
}
}
__DATA__
<table>
<tr><th>foo</th><th>bar</th><th>baz</th></tr>
<tr><td>1</td><td>a</td><td>z</td></tr>
<tr><td>2</td><td>b</td><td>y</td></tr>
<tr><td>3</td><td>c</td><td>x</td></tr>
</table>
<table>
<tr><th>foo</th><th>bar</th><th>baz</th></tr>
<tr><td>1</td><td>a</td><td>z</td></tr>
<tr><td>2</td><td>b</td><td>y</td></tr>
<tr><td>3</td><td>c</td><td>x</td></tr>
</table>
and here is one that works without fixing the broken module
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
my $te = HTML::TableExtract->new(headers => [qw/foo bar baz/])
or die "could not create table extract object\n";
#read in all of the lines from the DATA section and join them
#into one scalar to pass to the parse method
$te->parse(join "", <DATA>)
or die "could not find table\n";
my $i = 1;
for my $table ($te->tables) {
print "table $i column 0:\n";
$i++;
for my $col ($table->columns) {
for my $cell (@$col) {
print "\t$cell\n";
}
last;
}
}
__DATA__
<table>
<tr><th>foo</th><th>bar</th><th>baz</th></tr>
<tr><td>1</td><td>a</td><td>z</td></tr>
<tr><td>2</td><td>b</td><td>y</td></tr>
<tr><td>3</td><td>c</td><td>x</td></tr>
</table>
<table>
<tr><th>foo</th><th>bar</th><th>baz</th></tr>
<tr><td>1</td><td>a</td><td>z</td></tr>
<tr><td>2</td><td>b</td><td>y</td></tr>
<tr><td>3</td><td>c</td><td>x</td></tr>
</table>
I will see what I can do about getting the module fixed in CPAN.
--
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/