On 04Oct2016 13:35, Crusier <crus...@gmail.com> wrote:
I am trying to scrap from the (span class= 'Number'). The code looks
like this on the pages I am scrapping:
<div id="DetailMainBox">
<table>
<tr>
<td rowspan="2" class="styleA">
<span class="UP">99 </span><span class="Change">10.00 (-0.1%)</span>
<span class="Portfolio"><a href="../../members/index.php"
class="ThemeColor" target="_blank">Menu<img src="../images/more.gif"
width="11" height="11" border="0" align="absmiddle" /></a></span>
</td>
<td class="styleB">Max Quantity<span class="RT"></span><br>
<span class="Number">100.000</span></span> </td>
<td class="styleB">Average Quantity<span class="RT"></span><br />
<span class="Number">822</span></td>
<td class="styleB">Previous Order<br />
<span class="Number">96</span></td>
<td class="styleB">Max Price<br />
<span class="Number">104</span></td>
<td class="styleB">Number of Trades<br />
<span class="Number">383</span></td>
</tr>
<tr>
<td class="styleB">Min Price<span class="RT"></span><br>
<span class="Number">59</span></td>
<td class="styleB">Total Amount<span class="RT"></span><br />
<span class="Number">800</span></td>
<td class="styleB">Start<br />
<span class="Number">10</span></td>
<td class="styleB">Low<br />
<span class="Number">98 </span></td>
I have tried to use Beautifulsoup to scrape the data. However, it
returns Nothing on the screen
from bs4 import BeautifulSoup
html = response.content
soup = BeautifulSoup(html,"html.parser")
title = soup.select('td.styleB')[0].next_sibling
title1 = soup.find_all('span', attrs={'class': 'Number'}).next_sibling
print(title1)
I am hoping that I could retrieve the number as follows:
Max Quantity: 100
Average Quantity: 822
Previous Order: 96
Max Price: 104
Number of Trades:383
Min Price: 59
Total Amount:800
Start:10
Low: 98
Please advise what is the problem with my code from handling the
query. Thank you
You perform several steps here before your print. Break them up. "soup.select",
"[0]", "next_sibling" etc and print the intermediate values along the way.
As a wide guess, might:
title = soup.select('td.styleB')[0].next_sibling
fetch this?
<span class="RT"></span>
I also suspect that next_sibling returns the next tags in the DOM tree. Not
text. Your title1 might come out better as:
title1 = str(soup.find_all('span', attrs={'class': 'Number'})[0])
if I recall how to grab the text inside a tag. Also, don't you want a loop
around your find_all?
Eg:
for tag in soup.find_all('span', attrs={'class': 'Number'}):
print(tag)
print(str(tag)) # or tag.text() ?
Anyway, put in more print()s in the middle of your traversal of the DOM. That
should show where things are going wrong.
Cheers,
Cameron Simpson <c...@zip.com.au>
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor