In following up/on what Walter said. If the browser without cookies/javascript enabled doesn't generate the content, you need to have a different approach.
The most "complete" is the use of a headless browser. However, the use/implementation of a headless browser has its' own share of issues. Speed, complexity, etc... A potentially better/useful method is to view/look at the traffic (livehttpheaders for Firefox) to get a feel for exactly what the browser requires. At the same time, view the subordinate jscript functions. I've found it's often enough to craft the requisite cookies/curl functions in order to simulate the browser data. In a few cases though, I've run across situations where a headless browser is the only real soln. On Fri, Jul 29, 2016 at 3:28 AM, Crusier <crus...@gmail.com> wrote: > I am using Python 3 on Windows 7. > > However, I am unable to download some of the data listed in the web > site as follows: > > http://data.tsci.com.cn/stock/00939/STK_Broker.htm > > 453.IMC 98.28M 18.44M 4.32 5.33 1499.Optiver 70.91M 13.29M 3.12 5.34 > 7387.花旗环球 52.72M 9.84M 2.32 5.36 > > When I use Google Chrome and use 'View Page Source', the data does not > show up at all. However, when I use 'Inspect', I can able to read the > data. > > '<th>1453.IMC</th>' > '<td>98.28M</td>' > '<td>18.44M</td>' > '<td>4.32</td>' > '<td>5.33</td>' > > '<th>1499.Optiver </th>' > '<td> 70.91M</td>' > '<td>13.29M </td>' > '<td>3.12</td>' > '<td>5.34</td>' > > Please kindly explain to me if the data is hide in CSS Style sheet or > is there any way to retrieve the data listed. > > Thank you > > Regards, Crusier > > from bs4 import BeautifulSoup > import urllib > import requests > > > > > stock_code = ('00939', '0001') > > def web_scraper(stock_code): > > broker_url = 'http://data.tsci.com.cn/stock/' > end_url = '/STK_Broker.htm' > > for code in stock_code: > > new_url = broker_url + code + end_url > response = requests.get(new_url) > html = response.content > soup = BeautifulSoup(html, "html.parser") > Buylist = soup.find_all('div', id ="BuyingSeats") > Selllist = soup.find_all('div', id ="SellSeats") > > > print(Buylist) > print(Selllist) > > > > web_scraper(stock_code) > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor