I think subdomains need there own robots.txt which docs.python.org nor docs.python.org/(2 or 3)/ have. and http://python.org/robots.txt (below) seems a little sparse. For sure /dev/ is not blocked
# Directions for robots. See this URL: # http://www.robotstxt.org/wc/norobots.html # for a description of the file format. User-agent: HTTrack User-agent: puf User-agent: MSIECrawler Disallow: / # The Krugle web crawler (though based on Nutch) is OK. User-agent: Krugle Allow: / Disallow: /moin Disallow: /pypi Disallow: /~guido/orlijn/ Disallow: /wwwstats/ Disallow: /ftpstats/ # No one should be crawling us with Nutch. User-agent: Nutch Disallow: / # Hide old versions of the documentation and various large sets of files. User-agent: * Disallow: /~guido/orlijn/ Disallow: /wwwstats/ Disallow: /webstats/ Disallow: /ftpstats/ Disallow: /moin Disallow: /pypi Disallow: /dev/buildbot/ Vincent Davis 720-301-3003 On Sat, Jan 25, 2014 at 9:04 PM, Nick Coghlan <ncogh...@gmail.com> wrote: > On 26 January 2014 05:05, Benjamin Peterson <benja...@python.org> wrote: > > > > > > On Sat, Jan 25, 2014, at 10:55 AM, Vincent Davis wrote: > >> On Sat, Jan 25, 2014 at 10:12 AM, Benjamin Peterson > >> <benja...@python.org>wrote: > >> > >> > Internal links with no version redirect to the Python 2 version for > >> > backwards compatibility reasons. > >> > > >> > >> On Sat, Jan 25, 2014 at 10:26 AM, Georg Brandl <g.bra...@gmx.net> > wrote: > >> > >> > Yep, and the URLs without version never served Python 3 docs as far > as I > >> > can > >> > > >> remember, so I don't know where Google has these <title>s from. > >> > >> That is not consistent with > >> http://docs.python.org (no version number) redirects to > >> http://docs.python.org/3/ > > > > This is recent. It used to go to Python 2 docs. > > http://www.python.org/dev/peps/pep-0430/ covers the rationale for the > current arrangement. > > The main issue is the extensive use of existing deep links into the > Python 2 documentation from Python 2 specific tutorials and other > references. Those third party references not only include vast numbers > of online resources that we don't control, but also books that can't > be updated at all. > > So, the canonical URLs on docs.python.org now always include the major > version number in the path so they're unambiguous, the Python 3 docs > are displayed by default, and unqualified deep links redirect to > Python 2 for backwards compatibility. > > The robots.txt on python.org is *supposed* to keep the web crawlers > away from the "/dev/" subtree (since most people searching for Python > info aren't going to want the docs for an unreleased version), but I > don't know if that's documented anywhere, or even if it's currently > still configured that way. > > >> Maybe this is related to google search results. > >> Seems wrong to me to point to 2.7 rather that 3.3 but I am sure there > was > >> discussion about that. > > > > The internal links all used to go to Python 2. > > There's also a lot of weight given in Google to the extensive array of > existing unqualified deep links, which relate to Python 2. > > >> I looked (googled) for an example of a google link to current version of > >> python 3.3 documentation. My approach was to google "python" and > >> something > >> listed in > >> http://docs.python.org/3/whatsnew/3.3.html > >> These results all seem to point to http://docs.python.org/dev/library > >> i.e. > >> 3.4.0b2 > > Which suggests that the Google web crawler *is* spidering the dev > docs, which we generally don't want :P > > Cheers, > Nick. > > -- > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com