On Wednesday, Nov 26, 2003, at 12:30 US/Pacific, Paul Kraus wrote:
Someone want to show me how this module can help parse out html?
I want to grap text between <td>text</td> being able to apple regexp to get what I want.
The problem is my text is among 10,000 td tags. With the only difference
being what the above <th> tag has in it.
So if th tag = then store text between <td> into an array.
my first concern here is did you mean <th> or <tr>?
a simple table would look like:
<table>
<tr>
<th>header1</th>
<th>header2</th>
<th>header3</th>
</tr>
<tr>
<td>_Row_1_Cell_1_</td>
<td>_Row_1_Cell_2_</td>
<td>_Row_1_Cell_3_</td>
</tr>
<tr>
<td>_Row_2_Cell_1_</td>
<td>_Row_2_Cell_2_</td>
<td>_Row_2_Cell_3_</td>
</tr>
<tr>
<td>_Row_3_Cell_1_</td>
<td>_Row_3_Cell_2_</td>
<td>_Row_3_Cell_3_</td>
</tr>
</table>You have almost written your algorithm
while( my $token = $p->get_token)
{
last if ($token->is_start_tag('table')); } # there is a Table opening Tag, our hope now is that
# we can get our Keys from the headers my $count = 0;
my $header = {}; while( my $token = $p->get_token)
{
next if ($token->is_start_tag( qr/t[rd]/)); # don't care
last if ($token->is_end_tag('/tr')); # finished with headers
if ($token->is_end_tag('/td'))
{
$count++;
next;
}
if ( $token->is_text())
{
my $text = $token->as_is();
$header->{$count} = $text
if ( $text =~ <some_pattern>);
}
} #
# read the first row of headers, now to meander forward
#
At this point we know that IF if(defined($header->{$count}))
this is a column we have to grot data from
into the storage set upand that would be basically like the way that we grotted out the header sections, which is left as an exercise for the reader.
CAVEAT: simply because it looks like Perl, does not mean that I have written Perl, or that the code will actually work. It is merely a demonstration in algorithm creation.
ciao drieux
---
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
