Hi,
Hi, I have the following rewrite rule in place on one of our
staging sites to redirect bots and malicious scripts to our
corporate page:
RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteCond %{HTTP_USER_AGENT}
^.*(<|>|'|%0A|%0D|%27|%3C|%3E|%00).* [NC,OR]
RewriteCond %{HTTP_USER_AGENT}
^.*(HTTrack|clshttp|archiver|loader|email|nikto|miner|python).*
[NC,OR]
RewriteCond %{HTTP_USER_AGENT}
^.*(winhttp|libwww\-perl|curl|wget|harvest|scan|grab|extract).*
[NC,OR]
RewriteCond %{HTTP_USER_AGENT}
^.*(Googlebot|SemrushBot|PetalBot|Bytespider|bingbot).* [NC]
RewriteRule (.*) https://guardiandigital.com$1 [L,R=301]
However, it doesn't appear to always work properly:
66.249.68.6 - - [08/Jul/2024:11:43:41 -0400] "GET /robots.txt
HTTP/1.1" 200 343 r:"-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)" 0/5493 1145/6615/343 H:HTTP/1.1
U:/robots.txt s:200
Instead of making changes to my rules then having to wait until
the condition is met (Googlebot scans the site again), I'd like to
simulate the above request against my ruleset to see if it
matches. Is this possible?
For the user agent, just install an extension in your browser to
"fake" the value, and make a HTTP request. Alternatively, you can use
curl as well.
I should have mentioned that this was part of a larger effort to
redirect bots while also blocking access to others altogether as well as
allowing authorized users. Here's what I've come up with, which seems to
work quite well. This also all has to appear in .htaccess because it's
processed after the virtualhost config and any requireall/requireany
entries are overridden that already appear there. I also learned that
RequireAny is default deny.
RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*(<|>|'|%0A|%0D|%27|%3C|%3E|%00).* [NC,OR]
RewriteCond %{HTTP_USER_AGENT}
^.*(HTTrack|clshttp|archiver|loader|email|nikto|miner|python).* [NC,OR]
RewriteCond %{HTTP_USER_AGENT}
^.*(winhttp|libwww\-perl|curl|wget|harvest|scan|grab|extract).* [NC,OR]
RewriteCond %{HTTP_USER_AGENT}
^.*(Googlebot|SemrushBot|PetalBot|Bytespider|bingbot).* [NC]
RewriteRule (.*) https://guardiandigital.com/$1 [L,R=301]
SetEnvIf user-agent "(?i:GoogleBot)" googlebot=1
SetEnvIf user-agent "(?i:SemrushBot)" googlebot=1
SetEnvIf user-agent "(?i:PetalBot)" googlebot=1
SetEnvIf user-agent "(?i:Bytespider)" googlebot=1
SetEnvIf user-agent "(?i:bingbot)" googlebot=1
<RequireAny>
Require ip 1.2.3.4
Require env googlebot
</RequireAny>
I was also originally trying to associate the rewriterules with the
requireany using <If> but then realized I didn't even have to do that -
it would just automatically get processed independently. It looks so
simple now, but took me a while to make it this simple.