[EMAIL PROTECTED] wrote:
> I'm trying to get the data on the "Central London Property Price Guide"
> box at the left hand side of this page
> http://www.findaproperty.com/regi0018.html
>
> I have managed to get the data :) but when I start looking for tables I
> only get tables of depth 1 how do I go about accessing inner tables?
> same happens for links...
>
> this is what I've go so far
>
> import sys
> from urllib import urlopen
> from BeautifulSoup import BeautifulSoup
>
> data = urlopen('http://www.findaproperty.com/regi0018.html').read()
> soup = BeautifulSoup(data)
>
> for tables in soup('table'):
> table = tables('table')
> if not table: continue
> print table #this returns only 1 table
There's something fishy here. soup('table') should yield all the tables
in the document, even nested ones. For example, this program:
data = '''
<body>
<table width='100%'>
<tr><td>
<TABLE WIDTH='150'>
<tr><td>Stuff</td></tr>
</table>
</td></tr>
</table>
</body>
'''
from BeautifulSoup import BeautifulSoup as BS
soup = BS(data)
for table in soup('table'):
print table.get('width')
prints:
100%
150
Another tidbit - if I open the page in Firefox and save it, then open
that file into BeautifulSoup, it finds 25 tables and this code finds the
table you want:
from BeautifulSoup import BeautifulSoup
data2 = open('regi0018-firefox.html')
soup = BeautifulSoup(data2)
print len(soup('table'))
priceGuide = soup('table', dict(bgcolor="#e0f0f8", border="0",
cellpadding="2", cellspacing="2", width="150"))[1]
print priceGuide.tr
prints:
25
<tr><td bgcolor="#e0f0f8" valign="top"><font face="Arial"
size="2"><b>Central London Property Price Guide</b></font></td></tr>
Looking at the saved file, Firefox has clearly done some cleanup. So I
think you have to look at why BS is not processing the original data the
way you want. It seems to be choking on something.
Kent
--
http://mail.python.org/mailman/listinfo/python-list