Re: Don't let search bots look at buglist.cgi

Axel Freyn Tue, 17 May 2011 01:12:28 -0700

On Mon, May 16, 2011 at 10:27:44PM -0700, Ian Lance Taylor wrote:
> On Mon, May 16, 2011 at 6:42 AM, Richard Guenther
> <richard.guent...@gmail.com> wrote:
> >>>
> >>> httpd being in the top-10 always, fiddling with bugzilla URLs?
> >>> (Note, I don't have access to gcc.gnu.org, I'm relaying info from multiple
> >>> instances of discussion on #gcc and richi poking on it; that said, it
> >>> still might not be web crawlers, that's right, but I'll happily accept
> >>> _any_ load improvement on gcc.gnu.org, how unfounded they might seem)
> 
> I think that simply blocking buglist.cgi has dropped bugzilla off the
> immediate radar.
> It also seems to have lowered the load, although I'm not sure if we
> are still keeping
> historical data.
> 
> 
> > I for example see also
> >
> > 66.249.71.59 - - [16/May/2011:13:37:58 +0000] "GET
> > /viewcvs?view=revision&revision=169814 HTTP/1.1" 200 1334 "-"
> > "Mozilla/5.0 (compatible; Googlebot/2.1;
> > +http://www.google.com/bot.html)" (35%) 2060117us
> >
> > and viewvc is certainly even worse (from an I/O perspecive).  I thought
> > we blocked all bot traffic from the viewvc stuff ...
> 
> This is only happening at top level.  I committed this patch to fix this.
Probably you know it much better than me, but wouldn't it be a
possibility to only allow some of google crawlers? (if all try to crawl
bugzilla)
As I read
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=1061943
it would be possible to block the Crawlers Googlebot-Mobile,
Mediapartners-Google and AdsBot-Google, (which seem to be independent
Crawlers?) while allowing the main Googlebot (Well, I don't know how
often which crawler appears how often on bugzilla...)


Axel

Re: Don't let search bots look at buglist.cgi

Reply via email to