Re: [Tutor] beautifulsoup

cs Mon, 03 Oct 2016 23:34:10 -0700

On 04Oct2016 13:35, Crusier <crus...@gmail.com> wrote:

I am trying to scrap from the (span class= 'Number'). The code looks
like this on the pages I am scrapping:


   <div id="DetailMainBox">
   <table>
   <tr>
<td rowspan="2" class="styleA">
<span class="UP">99&nbsp;</span><span class="Change">10.00    (-0.1%)</span>
<span class="Portfolio"><a href="../../members/index.php"
class="ThemeColor" target="_blank">Menu<img src="../images/more.gif"
width="11" height="11" border="0" align="absmiddle" /></a></span>
</td>



   <td class="styleB">Max Quantity<span class="RT"></span><br>
<span class="Number">100.000</span></span> </td>
   <td class="styleB">Average Quantity<span class="RT"></span><br />
<span class="Number">822</span></td>

<td class="styleB">Previous Order<br />
<span class="Number">96</span></td>

   <td class="styleB">Max Price<br />
<span class="Number">104</span></td>

   <td class="styleB">Number of Trades<br />
<span class="Number">383</span></td>
</tr>

   <tr>
<td class="styleB">Min Price<span class="RT"></span><br>
<span class="Number">59</span></td>
<td class="styleB">Total Amount<span class="RT"></span><br />
<span class="Number">800</span></td>

<td class="styleB">Start<br />
<span class="Number">10</span></td>

<td class="styleB">Low<br />
<span class="Number">98 </span></td>

I have tried to use Beautifulsoup to scrape the data. However, it
returns Nothing on the screen

    from bs4 import BeautifulSoup

    html = response.content
    soup = BeautifulSoup(html,"html.parser")
    title =  soup.select('td.styleB')[0].next_sibling
    title1 = soup.find_all('span', attrs={'class': 'Number'}).next_sibling
    print(title1)

I am hoping that I could retrieve the number as follows:

Max Quantity: 100
Average Quantity: 822
Previous Order: 96
Max Price: 104
Number of Trades:383
Min Price: 59
Total Amount:800
Start:10
Low: 98

Please advise what is the problem with my code from handling the
query. Thank you

You perform several steps here before your print. Break them up. "soup.select","[0]", "next_sibling" etc and print the intermediate values along the way.


As a wide guess, might:

 title =  soup.select('td.styleB')[0].next_sibling

fetch this?

 <span class="RT"></span>

I also suspect that next_sibling returns the next tags in the DOM tree. Nottext. Your title1 might come out better as:


 title1 = str(soup.find_all('span', attrs={'class': 'Number'})[0])

if I recall how to grab the text inside a tag. Also, don't you want a looparound your find_all?


Eg:

   for tag in soup.find_all('span', attrs={'class': 'Number'}):
     print(tag)
     print(str(tag))   # or tag.text() ?

Anyway, put in more print()s in the middle of your traversal of the DOM. Thatshould show where things are going wrong.


Cheers,
Cameron Simpson <c...@zip.com.au>
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] beautifulsoup

Reply via email to