Sorry this request for help is a bit off topic for this group but I am really stuck and could do with some help. If you can't help but know where I might be able to get help I would appreciate a pointer in the right direction.
I run a few sites off one static IP address using virtual hosting. Some sites (www.crazysquirrel.com, www.ruralescapes.co.uk and www.shallowsea.com) are Java based and use the Apache Tomcat connector. Others, such as blog.crazysquirrel.com are php based and hosted straight out of Apache (I'm running Apache 2.0 on Debian). All the sites appear to work just fine. There doesn't appear to be any problems with people navigating around them. The problem is with search engines such as Yahoo Slurp and Googlebot. A large number of requests for pages that are in one of the other domains are ending up at blog.crazysquirrel.com. My best guess is that for some reason Slurp and Googlebot are making requests but leaving off the Host header. Now this wouldn't be completely out of spec because they are making HTTP 1.0 requests and as such don't require a Host header. I would have expected, however, that every request from them would come with one since virtual hosting is now so common. It briefly crossed my mind that it was simply a probe to detect virtual hosting but there are way to many requests going astray (more go astray than the real sites actually get) therefore I conclude something must be wrong. A little more digging seems to indicate that the search bots are able to load the first page (say http://www.shallowsea.com/index.html) but then start screwing it up when trying to access the links they find in that page. For example here is a little snippet of log file from yesterday for blog.crazysquirrel.com. These is are page requests that should have gone to shallowsea.com 66.249.66.34 - - [12/Oct/2005:14:08:47 +0100] "GET /events.html?change-category=7&resource-name=event HTTP/1.1" 404 209 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.66.34 - - [12/Oct/2005:14:09:14 +0100] "GET /links.html?change-category=51&resource-name=link HTTP/1.1" 404 208 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" in the shallowsea.com log I find this: 66.249.66.34 - - [12/Oct/2005:14:07:09 +0100] "GET /events.html?change-category=60&resource-name=event HTTP/1.1" 200 7073 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.66.34 - - [12/Oct/2005:14:09:50 +0100] "GET /links.html?change-category=54&resource-name=link HTTP/1.1" 200 7547 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" note the times of these requests. It is fairly obvious that the Googlebot is trying to index shallowsea.com but for some reason about half the requests are going to the wrong domain. Has anyone got any idea what might be going on here? I'm perfectly happy to accept that there is some header that I should be sending back that I am not but that doesn't feel like it's the problem as some requests seem to get through fine. Many thanks, Graham -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]