[Tutor] IDLE Subprocess Startup Error
Hello, I’ve been using Python’s IDLE for a couple of weeks now and it has been working fine but a few days ago I started getting this error message "IDLE's subprocess didn't make connection. Either IDLE can't start a subprocess or personal firewall software is blocking the connection.” I tried uninstalling it, restarting my computer, found a couple of people that said deleting the created .py files worked for them but it did not for me. I am not sure what else to do. Do you have any suggestions? I am using OS X El Capitan 10.11.5 on mac Regards, Darah Pereira ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] IDLE Subprocess Startup Error
On 29/07/16 05:24, Darah via Tutor wrote: > I’ve been using Python’s IDLE for a couple of weeks now and > it has been working fine but a few days ago I started getting > this error message > "IDLE's subprocess didn't make connection. > Either IDLE can't start a subprocess or personal firewall > software is blocking the connection.” What version of IDLE are you using? That used to be a common error message but since Python v2.6 I've never had it. If its an older version than that you could try upgrading your Python version. Other than that there is a dedicated IDLE mailing list which is quite responsive and should be able to help if you don't get an answer here. The gmane link is: gmane.comp.python.idle hth -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Unable to download , using Beautifulsoup
I am using Python 3 on Windows 7. However, I am unable to download some of the data listed in the web site as follows: http://data.tsci.com.cn/stock/00939/STK_Broker.htm 453.IMC 98.28M 18.44M 4.32 5.33 1499.Optiver 70.91M 13.29M 3.12 5.34 7387.花旗环球 52.72M 9.84M 2.32 5.36 When I use Google Chrome and use 'View Page Source', the data does not show up at all. However, when I use 'Inspect', I can able to read the data. '1453.IMC' '98.28M' '18.44M' '4.32' '5.33' '1499.Optiver ' ' 70.91M' '13.29M ' '3.12' '5.34' Please kindly explain to me if the data is hide in CSS Style sheet or is there any way to retrieve the data listed. Thank you Regards, Crusier from bs4 import BeautifulSoup import urllib import requests stock_code = ('00939', '0001') def web_scraper(stock_code): broker_url = 'http://data.tsci.com.cn/stock/' end_url = '/STK_Broker.htm' for code in stock_code: new_url = broker_url + code + end_url response = requests.get(new_url) html = response.content soup = BeautifulSoup(html, "html.parser") Buylist = soup.find_all('div', id ="BuyingSeats") Selllist = soup.find_all('div', id ="SellSeats") print(Buylist) print(Selllist) web_scraper(stock_code) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Unable to download , using Beautifulsoup
On 29/07/16 08:28, Crusier wrote: > When I use Google Chrome and use 'View Page Source', the data does not > show up at all. However, when I use 'Inspect', I can able to read the > data. > > '1453.IMC' > '98.28M' > '3.12' > '5.34' > > Please kindly explain to me if the data is hide in CSS Style sheet or > is there any way to retrieve the data listed. I don;t know the answer but I would suggest that if you print out (or send to a file) the entire html source returned by the server you can see what is actually happening and from that perhaps figure out what to do with BS to extract what you need. > from bs4 import BeautifulSoup > import urllib > import requests > > stock_code = ('00939', '0001') > > def web_scraper(stock_code): > > broker_url = 'http://data.tsci.com.cn/stock/' > end_url = '/STK_Broker.htm' > > for code in stock_code: > new_url = broker_url + code + end_url > response = requests.get(new_url) > html = response.content Try sending html to a file and examining it in a text editor... -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Unable to download , using Beautifulsoup
Hi On 29 July 2016 at 08:28, Crusier wrote: > I am using Python 3 on Windows 7. > > When I use Google Chrome and use 'View Page Source', the data does not > show up at all. However, when I use 'Inspect', I can able to read the > data. > > Please kindly explain to me if the data is hide in CSS Style sheet or > is there any way to retrieve the data listed. > Using inspect is not the same as view source. The inspect tools will give you the DOM tree as it currently is in the browser. That tree may have been modified by any number of things (e.g. likely Javascript) since the initial page source was loaded. It is likely that the data you're trying to get at, is fetched dynamically after the initial page source is fetched, which is why you don't see it when using "view source." As an experiment you can temporarily disable your browser's Javascript engine and reload the webpage. You should then find that you can't see the data you're after at all, even with inspect, if this is what's occurring. (To do this for Chrome, see here: https://productforums.google.com/forum/#!topic/chrome/BYOQskiuGU0) So, if this is what's going on then this presents you with a bit of a problem. Obviously the Python "requests" module is not a full browser and does not include a Javascript runtime, so it cannot by itself yield the same result as a real browser, if the page content is in fact the result of dynamic population by Javascript after loading the initial HTML page source. In order to get around this you would therefore need to fundamentally have a browser of some kind that you can control, and that includes a Javascript runtime that can effectively process and construct the DOM (and render the page image if you so desire) before you retrieve the data you're after. It should be possible to do this, there are projects and questions on the internet about this. Firstly there's a project named "Selenium" that provides a way of automating various browsers, and has Python bindings (I used this some years ago). So you could conceivably use Python+Selenium+(Chrome or Firefox say) for example to fetch the page and then get the data out. This has the disadvantage that there's going to be a real browser and browser window floating around. A slightly better alternative would be to use a "headless" (displayless) browser, such as PhantomJS. It is basically the browser engine with lots of ways to control and automate it. It does not (to my knowledge) include Python bindings directly, but Selenium includes a PhantomJS driver (I think.) There's lighter weight options like "jsdom" and "slimerjs", but I have no idea whether these would suffice or not or whether they would have Python wrappers or not. Perhaps the best option might be Ghost.py, which sounds like it might be exactly what you need, but I have no experience with it. So, I'm afraid to achieve what you want will require a rather more complicated solution than what you've currently got. :( Nevertheless, here's some links for you: Ghost.py: http://jeanphix.me/Ghost.py/ http://ghost-py.readthedocs.io/en/latest/# PhantomJS: http://phantomjs.org/ PhantomJS & Python: http://stackoverflow.com/questions/13287490/is-there-a-way-to-use-phantomjs-in-python http://toddhayton.com/2015/02/03/scraping-with-python-selenium-and-phantomjs/ SlimerJS: http://docs.slimerjs.org/0.9/quick-start.html I also while answering this question stumbled over the following page listing (supposedly) almost every headless browser or framework in existence: https://github.com/dhamaniasad/HeadlessBrowsers I see there's a couple of other possible options on there, but I'll leave it up to you to investigate. Good luck, Walter ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Unable to download , using Beautifulsoup
In following up/on what Walter said. If the browser without cookies/javascript enabled doesn't generate the content, you need to have a different approach. The most "complete" is the use of a headless browser. However, the use/implementation of a headless browser has its' own share of issues. Speed, complexity, etc... A potentially better/useful method is to view/look at the traffic (livehttpheaders for Firefox) to get a feel for exactly what the browser requires. At the same time, view the subordinate jscript functions. I've found it's often enough to craft the requisite cookies/curl functions in order to simulate the browser data. In a few cases though, I've run across situations where a headless browser is the only real soln. On Fri, Jul 29, 2016 at 3:28 AM, Crusier wrote: > I am using Python 3 on Windows 7. > > However, I am unable to download some of the data listed in the web > site as follows: > > http://data.tsci.com.cn/stock/00939/STK_Broker.htm > > 453.IMC 98.28M 18.44M 4.32 5.33 1499.Optiver 70.91M 13.29M 3.12 5.34 > 7387.花旗环球 52.72M 9.84M 2.32 5.36 > > When I use Google Chrome and use 'View Page Source', the data does not > show up at all. However, when I use 'Inspect', I can able to read the > data. > > '1453.IMC' > '98.28M' > '18.44M' > '4.32' > '5.33' > > '1499.Optiver ' > ' 70.91M' > '13.29M ' > '3.12' > '5.34' > > Please kindly explain to me if the data is hide in CSS Style sheet or > is there any way to retrieve the data listed. > > Thank you > > Regards, Crusier > > from bs4 import BeautifulSoup > import urllib > import requests > > > > > stock_code = ('00939', '0001') > > def web_scraper(stock_code): > > broker_url = 'http://data.tsci.com.cn/stock/' > end_url = '/STK_Broker.htm' > > for code in stock_code: > > new_url = broker_url + code + end_url > response = requests.get(new_url) > html = response.content > soup = BeautifulSoup(html, "html.parser") > Buylist = soup.find_all('div', id ="BuyingSeats") > Selllist = soup.find_all('div', id ="SellSeats") > > > print(Buylist) > print(Selllist) > > > > web_scraper(stock_code) > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Unable to download , using Beautifulsoup
On 29/07/16 23:10, bruce wrote: > The most "complete" is the use of a headless browser. However, the > use/implementation of a headless browser has its' own share of issues. > Speed, complexity, etc... Walter and Bruce have jumped ahead a few steps from where I was heading but basically it's an increasingly common scenario where web pages are no longer primarily html but rather are Javascript programs that fetch data dynamically. A headless browser is the brute force way to deal with such issues but a better (purer?) way is to access the same API that the browser is using. Many web sites now publish RESTful APIs with web services that you can call directly. It is worth investigating whether your target has this. If so that will generally provide a much nicer solution than trying to drive a headless browser. Finally you need to consider whether you have the right to the data without running a browser? Many sites provide information for free but get paid by adverts. If you bypass the web screen (adverts) you bypass their revenue and they do not allow that. So you need to be sure that you are legally entitled to scrape data from the site or use an API. Otherwise you may be on the wrong end of a law suite, or at best be contributing to the demise of the very site you are trying to use. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Unable to download , using Beautifulsoup
Hey Alan... Wow APIs.. yeah.. would be cool!!! I've worked on scraping data from lots of public sites, that have no issue (as long as you're kind) that have no clue/resource regarding offerning APIs. However, yeah, if you''re looking to "rip" off a site that has adverts, prob not a cool thing to do, no matter what tools are used. On Fri, Jul 29, 2016 at 6:59 PM, Alan Gauld via Tutor wrote: > On 29/07/16 23:10, bruce wrote: > > > The most "complete" is the use of a headless browser. However, the > > use/implementation of a headless browser has its' own share of issues. > > Speed, complexity, etc... > > Walter and Bruce have jumped ahead a few steps from where I was > heading but basically it's an increasingly common scenario where > web pages are no longer primarily html but rather are > Javascript programs that fetch data dynamically. > > A headless browser is the brute force way to deal with such issues > but a better (purer?) way is to access the same API that the browser > is using. Many web sites now publish RESTful APIs with web > services that you can call directly. It is worth investigating > whether your target has this. If so that will generally provide > a much nicer solution than trying to drive a headless browser. > > Finally you need to consider whether you have the right to the > data without running a browser? Many sites provide information > for free but get paid by adverts. If you bypass the web screen > (adverts) you bypass their revenue and they do not allow that. > So you need to be sure that you are legally entitled to scrape > data from the site or use an API. > > Otherwise you may be on the wrong end of a law suite, or at > best be contributing to the demise of the very site you are > trying to use. > > -- > Alan G > Author of the Learn to Program web site > http://www.alan-g.me.uk/ > http://www.amazon.com/author/alan_gauld > Follow my photo-blog on Flickr at: > http://www.flickr.com/photos/alangauldphotos > > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor