Hi On 29 July 2016 at 08:28, Crusier <crus...@gmail.com> wrote:
> I am using Python 3 on Windows 7. > > When I use Google Chrome and use 'View Page Source', the data does not > show up at all. However, when I use 'Inspect', I can able to read the > data. > > Please kindly explain to me if the data is hide in CSS Style sheet or > is there any way to retrieve the data listed. > Using inspect is not the same as view source. The inspect tools will give you the DOM tree as it currently is in the browser. That tree may have been modified by any number of things (e.g. likely Javascript) since the initial page source was loaded. It is likely that the data you're trying to get at, is fetched dynamically after the initial page source is fetched, which is why you don't see it when using "view source." As an experiment you can temporarily disable your browser's Javascript engine and reload the webpage. You should then find that you can't see the data you're after at all, even with inspect, if this is what's occurring. (To do this for Chrome, see here: https://productforums.google.com/forum/#!topic/chrome/BYOQskiuGU0) So, if this is what's going on then this presents you with a bit of a problem. Obviously the Python "requests" module is not a full browser and does not include a Javascript runtime, so it cannot by itself yield the same result as a real browser, if the page content is in fact the result of dynamic population by Javascript after loading the initial HTML page source. In order to get around this you would therefore need to fundamentally have a browser of some kind that you can control, and that includes a Javascript runtime that can effectively process and construct the DOM (and render the page image if you so desire) before you retrieve the data you're after. It should be possible to do this, there are projects and questions on the internet about this. Firstly there's a project named "Selenium" that provides a way of automating various browsers, and has Python bindings (I used this some years ago). So you could conceivably use Python+Selenium+(Chrome or Firefox say) for example to fetch the page and then get the data out. This has the disadvantage that there's going to be a real browser and browser window floating around. A slightly better alternative would be to use a "headless" (displayless) browser, such as PhantomJS. It is basically the browser engine with lots of ways to control and automate it. It does not (to my knowledge) include Python bindings directly, but Selenium includes a PhantomJS driver (I think.) There's lighter weight options like "jsdom" and "slimerjs", but I have no idea whether these would suffice or not or whether they would have Python wrappers or not. Perhaps the best option might be Ghost.py, which sounds like it might be exactly what you need, but I have no experience with it. So, I'm afraid to achieve what you want will require a rather more complicated solution than what you've currently got. :( Nevertheless, here's some links for you: Ghost.py: http://jeanphix.me/Ghost.py/ http://ghost-py.readthedocs.io/en/latest/# PhantomJS: http://phantomjs.org/ PhantomJS & Python: http://stackoverflow.com/questions/13287490/is-there-a-way-to-use-phantomjs-in-python http://toddhayton.com/2015/02/03/scraping-with-python-selenium-and-phantomjs/ SlimerJS: http://docs.slimerjs.org/0.9/quick-start.html I also while answering this question stumbled over the following page listing (supposedly) almost every headless browser or framework in existence: https://github.com/dhamaniasad/HeadlessBrowsers I see there's a couple of other possible options on there, but I'll leave it up to you to investigate. Good luck, Walter _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor