[Tutor] IDLE Subprocess Startup Error

2016-07-29 Thread Darah via Tutor
Hello, 

I’ve been using Python’s IDLE for a couple of weeks now and it has been working 
fine but a few days ago I started getting this error message "IDLE's subprocess 
didn't make connection.  Either IDLE can't start a subprocess or personal 
firewall software is blocking the connection.” 

I tried uninstalling it, restarting my computer, found a couple of people that 
said deleting the created .py files worked for them but it did not for me. 

I am not sure what else to do. Do you have any suggestions?

I am using OS X El Capitan 10.11.5 on mac 




Regards, 

Darah Pereira
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] IDLE Subprocess Startup Error

2016-07-29 Thread Alan Gauld via Tutor
On 29/07/16 05:24, Darah via Tutor wrote:

> I’ve been using Python’s IDLE for a couple of weeks now and 
> it has been working fine but a few days ago I started getting
> this error message
> "IDLE's subprocess didn't make connection.
> Either IDLE can't start a subprocess or personal firewall
> software is blocking the connection.”


What version of IDLE are you using? That used to be a common
error message but since Python v2.6 I've never had it. If its an older
version than that you could try upgrading your Python version.

Other than that there is a dedicated IDLE mailing list which
is quite responsive and should be able to help if you don't
get an answer here. The gmane link is:

gmane.comp.python.idle


hth
-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Unable to download , using Beautifulsoup

2016-07-29 Thread Crusier
I am using Python 3 on Windows 7.

However, I am unable to download some of the data listed in the web
site as follows:

http://data.tsci.com.cn/stock/00939/STK_Broker.htm

453.IMC 98.28M 18.44M 4.32 5.33 1499.Optiver 70.91M 13.29M 3.12 5.34
7387.花旗环球 52.72M 9.84M 2.32 5.36

When I use Google Chrome and use 'View Page Source', the data does not
show up at all. However, when I use 'Inspect', I can able to read the
data.

'1453.IMC'
'98.28M'
'18.44M'
'4.32'
'5.33'

'1499.Optiver '
' 70.91M'
'13.29M '
'3.12'
'5.34'

Please kindly explain to me if the data is hide in CSS Style sheet or
is there any way to retrieve the data listed.

Thank you

Regards, Crusier

from bs4 import BeautifulSoup
import urllib
import requests




stock_code = ('00939', '0001')

def web_scraper(stock_code):

broker_url = 'http://data.tsci.com.cn/stock/'
end_url = '/STK_Broker.htm'

for code in stock_code:

new_url  = broker_url + code + end_url
response = requests.get(new_url)
html = response.content
soup = BeautifulSoup(html, "html.parser")
Buylist = soup.find_all('div', id ="BuyingSeats")
Selllist = soup.find_all('div', id ="SellSeats")


print(Buylist)
print(Selllist)



web_scraper(stock_code)
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Unable to download , using Beautifulsoup

2016-07-29 Thread Alan Gauld via Tutor
On 29/07/16 08:28, Crusier wrote:

> When I use Google Chrome and use 'View Page Source', the data does not
> show up at all. However, when I use 'Inspect', I can able to read the
> data.
> 
> '1453.IMC'
> '98.28M'

> '3.12'
> '5.34'
> 
> Please kindly explain to me if the data is hide in CSS Style sheet or
> is there any way to retrieve the data listed.

I don;t know the answer but I would suggest that if you print
out (or send to a file)  the entire html source returned by
the server you can see what is actually happening and from
that perhaps figure out what to do with BS to extract what
you need.


> from bs4 import BeautifulSoup
> import urllib
> import requests
> 
> stock_code = ('00939', '0001')
> 
> def web_scraper(stock_code):
> 
> broker_url = 'http://data.tsci.com.cn/stock/'
> end_url = '/STK_Broker.htm'
> 
> for code in stock_code:
> new_url  = broker_url + code + end_url
> response = requests.get(new_url)
> html = response.content

Try sending html to a file and examining it in a
text editor...


-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Unable to download , using Beautifulsoup

2016-07-29 Thread Walter Prins
Hi

On 29 July 2016 at 08:28, Crusier  wrote:

> I am using Python 3 on Windows 7.
>


> When I use Google Chrome and use 'View Page Source', the data does not
> show up at all. However, when I use 'Inspect', I can able to read the
> data.
>
> Please kindly explain to me if the data is hide in CSS Style sheet or
> is there any way to retrieve the data listed.
>

Using inspect is not the same as view source. The inspect tools will give
you the DOM tree as it currently is in the browser. That tree may have been
modified by any number of things (e.g. likely Javascript) since the initial
page source was loaded.  It is likely that the data you're trying to get
at, is fetched dynamically after the initial page source is fetched, which
is why you don't see it when using "view source."

As an experiment you can temporarily disable your browser's Javascript
engine and reload the webpage.  You should then find that you can't see the
data you're after at all, even with inspect, if this is what's occurring.
 (To do this for Chrome, see here:
https://productforums.google.com/forum/#!topic/chrome/BYOQskiuGU0)

So, if this is what's going on then this presents you with a bit of a
problem. Obviously the Python "requests" module is not a full browser and
does not include a Javascript runtime, so it cannot by itself yield the
same result as a real browser, if the page content is in fact the result of
dynamic population by Javascript after loading the initial HTML page
source.

In order to get around this you would therefore need to fundamentally have
a browser of some kind that you can control, and that includes a Javascript
runtime that can effectively process and construct the DOM (and render the
page image if you so desire) before you retrieve the data you're after.

It should be possible to do this, there are projects and questions on the
internet about this.  Firstly there's a project named "Selenium" that
provides a way of automating various browsers, and has Python bindings (I
used this some years ago).  So you could conceivably use
Python+Selenium+(Chrome or Firefox say) for example to fetch the page and
then get the data out.  This has the disadvantage that there's going to be
a real browser and browser window floating around.

A slightly better alternative would be to use a "headless" (displayless)
browser, such as PhantomJS.  It is basically the browser engine with lots
of ways to control and automate it.  It does not (to my knowledge) include
Python bindings directly, but Selenium includes a PhantomJS driver (I
think.)  There's lighter weight options like "jsdom" and "slimerjs", but I
have no idea whether these would suffice or not or whether they would have
Python wrappers or not.

Perhaps the best option might be Ghost.py, which sounds like it might be
exactly what you need, but I have no experience with it.

So, I'm afraid to achieve what you want will require a rather more
complicated solution than what you've currently got.  :(

Nevertheless, here's some links for you:

Ghost.py:
http://jeanphix.me/Ghost.py/
http://ghost-py.readthedocs.io/en/latest/#

PhantomJS:
http://phantomjs.org/

PhantomJS & Python:
http://stackoverflow.com/questions/13287490/is-there-a-way-to-use-phantomjs-in-python
http://toddhayton.com/2015/02/03/scraping-with-python-selenium-and-phantomjs/

SlimerJS:
http://docs.slimerjs.org/0.9/quick-start.html

I also while answering this question stumbled over the following page
listing (supposedly) almost every headless browser or framework in
existence:
https://github.com/dhamaniasad/HeadlessBrowsers

I see there's a couple of other possible options on there, but I'll leave
it up to you to investigate.

Good luck,

Walter
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Unable to download , using Beautifulsoup

2016-07-29 Thread bruce
In following up/on what Walter said.

If the browser without cookies/javascript enabled doesn't generate the
content, you need to have a different approach.

The most "complete" is the use of a headless browser. However, the
use/implementation of a headless browser has its' own share of issues.
Speed, complexity, etc...

A potentially better/useful method is to view/look at the traffic
(livehttpheaders for Firefox) to get a feel for exactly what the browser
requires. At the same time, view the subordinate jscript functions.

I've found it's often enough to craft the requisite cookies/curl functions
in order to simulate the browser data.

In a few cases though, I've run across situations where a headless browser
is the only real soln.



On Fri, Jul 29, 2016 at 3:28 AM, Crusier  wrote:

> I am using Python 3 on Windows 7.
>
> However, I am unable to download some of the data listed in the web
> site as follows:
>
> http://data.tsci.com.cn/stock/00939/STK_Broker.htm
>
> 453.IMC 98.28M 18.44M 4.32 5.33 1499.Optiver 70.91M 13.29M 3.12 5.34
> 7387.花旗环球 52.72M 9.84M 2.32 5.36
>
> When I use Google Chrome and use 'View Page Source', the data does not
> show up at all. However, when I use 'Inspect', I can able to read the
> data.
>
> '1453.IMC'
> '98.28M'
> '18.44M'
> '4.32'
> '5.33'
>
> '1499.Optiver '
> ' 70.91M'
> '13.29M '
> '3.12'
> '5.34'
>
> Please kindly explain to me if the data is hide in CSS Style sheet or
> is there any way to retrieve the data listed.
>
> Thank you
>
> Regards, Crusier
>
> from bs4 import BeautifulSoup
> import urllib
> import requests
>
>
>
>
> stock_code = ('00939', '0001')
>
> def web_scraper(stock_code):
>
> broker_url = 'http://data.tsci.com.cn/stock/'
> end_url = '/STK_Broker.htm'
>
> for code in stock_code:
>
> new_url  = broker_url + code + end_url
> response = requests.get(new_url)
> html = response.content
> soup = BeautifulSoup(html, "html.parser")
> Buylist = soup.find_all('div', id ="BuyingSeats")
> Selllist = soup.find_all('div', id ="SellSeats")
>
>
> print(Buylist)
> print(Selllist)
>
>
>
> web_scraper(stock_code)
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Unable to download , using Beautifulsoup

2016-07-29 Thread Alan Gauld via Tutor
On 29/07/16 23:10, bruce wrote:

> The most "complete" is the use of a headless browser. However, the
> use/implementation of a headless browser has its' own share of issues.
> Speed, complexity, etc...

Walter and Bruce have jumped ahead a few steps from where I was
heading but basically it's an increasingly common scenario where
web pages are no longer primarily html but rather are
Javascript programs that fetch data dynamically.

A headless browser is the brute force way to deal with such issues
but a better (purer?) way is to access the same API that the browser
is using. Many web sites now publish RESTful APIs with web
services that you can call directly. It is worth investigating
whether your target has this. If so that will generally provide
a much nicer solution than trying to drive a headless browser.

Finally you need to consider whether you have the right to the
data without running a browser? Many sites provide information
for free but get paid by adverts. If you bypass the web screen
(adverts) you  bypass their revenue and they do not allow that.
So you need to be sure that you are legally entitled to scrape
data from the site or use an API.

Otherwise you may be on the wrong end of a law suite, or at
best be contributing to the demise of the very site you are
trying to use.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Unable to download , using Beautifulsoup

2016-07-29 Thread bruce
Hey Alan...

Wow APIs.. yeah.. would be cool!!!

I've worked on scraping data from lots of public sites, that have no issue
(as long as you're kind) that have no clue/resource regarding offerning
APIs.

However, yeah, if you''re looking to "rip" off a site that has adverts,
prob not a cool thing to do, no matter what tools are used.



On Fri, Jul 29, 2016 at 6:59 PM, Alan Gauld via Tutor 
wrote:

> On 29/07/16 23:10, bruce wrote:
>
> > The most "complete" is the use of a headless browser. However, the
> > use/implementation of a headless browser has its' own share of issues.
> > Speed, complexity, etc...
>
> Walter and Bruce have jumped ahead a few steps from where I was
> heading but basically it's an increasingly common scenario where
> web pages are no longer primarily html but rather are
> Javascript programs that fetch data dynamically.
>
> A headless browser is the brute force way to deal with such issues
> but a better (purer?) way is to access the same API that the browser
> is using. Many web sites now publish RESTful APIs with web
> services that you can call directly. It is worth investigating
> whether your target has this. If so that will generally provide
> a much nicer solution than trying to drive a headless browser.
>
> Finally you need to consider whether you have the right to the
> data without running a browser? Many sites provide information
> for free but get paid by adverts. If you bypass the web screen
> (adverts) you  bypass their revenue and they do not allow that.
> So you need to be sure that you are legally entitled to scrape
> data from the site or use an API.
>
> Otherwise you may be on the wrong end of a law suite, or at
> best be contributing to the demise of the very site you are
> trying to use.
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor