That IP resolves to rate-limited-proxy-72-14-199-18.google.com - this is
not the Google search crawler, hence why it ignores your robots.txt. No one
seems to know for sure what the rate-limited-proxy IPs are used for. They
could represent random Chrome users using the Google data saving feature,
he
On Thu, Jun 07, 2018 at 07:57:43PM -0400, shiz wrote:
Hi there,
> Recently, Google has started spidering my website and in addition to normal
> pages, appended "&" to all urls, even the pages excluded by robots.txt
>
> e.g. page.php?page=aaa -> page.php?page=aaa&
>
> Any idea how to redirect/r
'The & to & conversion is another sign of a poor quality crawler.'
I wasn't referring to any of them but to '&'. Important difference.
Also explaining my failure to filter it from parameters since parameters
contains an equal sign. E.g. ...&= something or even &=
& or & would also easy do filt
I see another poster have written this, and deleted it afterwards.
`This is almost certainly not Google as they obey robots.txt. The & to
&
conversion is another sign of a poor quality crawler. Check the RDNS and
you will find it's probably some IP faking Google UA, I suggest blocking at
network l
Hi,
When enabling the cache on image filter; nginx workers crash and keep
getting 500.
I'm using Nginx 1.14.0
error log:
2018/06/11 12:30:49 [alert] 46105#0: worker process 46705 exited on signal
11 (core dumped)
proxy_cache_path /opt/nginx/img-cache/resized levels=1:2
keys_zone=resizedimages:1
This is almost certainly not Google as they obey robots.txt. The & to &
conversion is another sign of a poor quality crawler. Check the RDNS and
you will find it's probably some IP faking Google UA, I suggest blocking at
network level.
On Fri, Jun 8, 2018 at 1:57 AM shiz wrote:
> Hi,
>
> Recentl