Re: [Mediawiki-l] Robot.txt

Platonides Sat, 07 Feb 2009 17:09:37 -0800

ekompute wrote:
> Hi, can anyone help me with my robot.txt. 

The name is 'robots.txt'


> My contents for the page reads as
> follows:
> 
> User-agent: *
> Disallow: /Help
> Disallow: /MediaWiki
> Disallow: /Template
> Disallow: /skins/
> 
> But it is blocking pages like:
> 
>    - http://www.dummipedia.org/Special:Protectedpages
>    - http://dummipedia.org/Special:Allpages

Special pages try to autoprotect themselves.
See how they have '<meta name="robots" content="noindex,nofollow" />'
A crawler traversing Special:Allpages would likely produce too much load.


> and external pages like:
> 
>    - http://www.stumbleupon.com/
>    - http://www.searchtheweb.com/

$wgNoFollowLinks = false;

http://www.mediawiki.org/wiki/Manual:$wgNoFollowLinks
http://www.mediawiki.org/wiki/Manual:$wgNoFollowDomainExceptions

> As you can see, my robot.txt did not block these pages. Also, should I block
> the print version to prevent what Google calls "duplicate content"? If so,
> how?

Disable /index.php (printable, edit...)

> Response will be very much appreciated.
> 
> PM Poon


_______________________________________________
MediaWiki-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

Re: [Mediawiki-l] Robot.txt

Reply via email to