[EMAIL PROTECTED] wrote: > Kent Johnson writes: > > >>[EMAIL PROTECTED] wrote: >> >> >>>List of states: >>>http://en.wikipedia.org/wiki/U.S._state >>> >>>: soup = BeautifulSoup(html) >>>: # Get the second table (list of states). >>>: table = soup.first('table').findNext('table') >>>: print table >>> >>>... >>><tr> >>><td>WY</td> >>><td>Wyo.</td> >>><td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td> >>><td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne, >>>Wyoming">Cheyenne</a></td> >>><td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne, >>>Wyoming">Cheyenne</a></td> >>><td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image" title=""><img >>>src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin >>>g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30" >>>longdesc="/wiki/Image:Flag_of_Wyoming.svg" /></a></td> >>></tr> >>></table> >>> >>>Of each row (tr), I want to get the cells (td): 1,3,4 >>>(postal,state,capital). But cells 3 and 4 have anchors. >> >>So dig into the cells and get the data from the anchor. >> >>cells = row('td') >>cells[0].string >>cells[2]('a').string >>cells[3]('a').string >> >>Kent >> >>_______________________________________________ >>Tutor maillist - Tutor@python.org >>http://mail.python.org/mailman/listinfo/tutor > > > for row in table('tr'): > cells = row('td') > print cells[0] > > IndexError: list index out of range
It works for me: In [1]: from BeautifulSoup import BeautifulSoup as bs In [2]: soup=bs('''<tr> ...: <td>WY</td> ...: <td>Wyo.</td> ...: <td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td> ...: <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne, ...: Wyoming">Cheyenne</a></td> ...: <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne, ...: Wyoming">Cheyenne</a></td> ...: <td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image" title=""><img ...: src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin ...: g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30" ...: longdesc="/wiki/Image:Flag_of_Wyoming.svg" /></a></td> ...: </tr> ...: </table> ''' ...: ...: ...: ...: ) In [18]: rows=soup('tr') In [19]: rows Out[19]: [<tr> <td>WY</td> <td>Wyo.</td> <td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td> <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne, Wyoming">Cheyenne</a></td> <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne, Wyoming">Cheyenne</a></td> <td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image" title=""><img src="http://upload. g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30" longdesc="/wiki/Image:Flag_ </tr>] In [21]: cells=rows[0]('td') In [22]: cells Out[22]: [<td>WY</td>, <td>Wyo.</td>, <td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>, <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne, Wyoming">Cheyenne</a></td>, <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne, Wyoming">Cheyenne</a></td>, <td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image" title=""><img src="http://upload n g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30" longdesc="/wiki/Image:Flag_ In [23]: cells[0].string Out[23]: 'WY' In [24]: cells[2].a.string Out[24]: 'Wyoming' In [25]: cells[3].a.string Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor