Please don't top post. On 25 November 2010 15:38, Ron Piggott <ron.pigg...@actsministries.org> wrote: > > Is "User Agent" suppose to have a hyphen "-" ? Ron > > > > The Verse of the Day > “Encouragement from God’s Word” > http://www.TheVerseOfTheDay.info > -----Original Message----- From: Richard Quadling > Sent: Thursday, November 25, 2010 9:16 AM > To: Deva > Cc: Shreyas Agasthya ; Ron Piggott ; php-general@lists.php.net ; > a...@ashleysheridan.co.uk > Subject: Re: [PHP] Fw: Spoofing user_agent > > On 25 November 2010 11:32, Deva <devendra...@gmail.com> wrote: >> >> Use curl >> http://php.net/manual/en/book.curl.php >> >> >> On Thu, Nov 25, 2010 at 4:41 PM, Shreyas Agasthya >> <shreya...@gmail.com>wrote: >> >>> I feel you should use more of the 4th method here as you are not trying >>> to >>> read the file but the header level (7th layer) information of the HTTP >>> protocol. >>> >>> http://php.net/manual/en/function.file-get-contents.php >>> >>> >>> --Shreyas >>> >>> On Thu, Nov 25, 2010 at 4:11 PM, Ron Piggott < >>> ron.pigg...@actsministries.org >>> > wrote: >>> >>> > Will the header pass with using file_get_contents , or should I be >>> using >>> > another command, and if so, which one? Ron >>> > >>> > <?php >>> > >>> > header('User Agent: RonBot (http://www.example.com)'); >>> > $url = "http://www.example.com"; <http://www.example.com%22;> >>> > >>> > $input = file_get_contents($url); >>> > >>> > >>> > >>> > The Verse of the Day >>> > “Encouragement from God’s Word” >>> > http://www.TheVerseOfTheDay.info >>> > >>> > *From:* Shreyas Agasthya <shreya...@gmail.com> >>> > *Sent:* Thursday, November 25, 2010 4:21 AM >>> > *To:* Ron Piggott <ron.pigg...@actsministries.org> >>> > *Cc:* php-general@lists.php.net ; a...@ashleysheridan.co.uk >>> > *Subject:* Re: [PHP] Fw: Spoofing user_agent >>> > >>> > A standard HTTP Request headers is : User Agent (without the > >>> > underscore). >>> > >>> > --Shreyas >>> > >>> > On Thu, Nov 25, 2010 at 2:36 PM, Ron Piggott < >>> > ron.pigg...@actsministries.org> wrote: >>> > >>> >> >>> >> Is this what you are telling me to do: >>> >> >>> >> header('user_agent: RonBot (http://www.theverseoftheday.info)'); >>> >> >>> >> Ron >>> >> >>> >> The Verse of the Day >>> >> “Encouragement from God’s Word” >>> >> http://www.TheVerseOfTheDay.info >>> >> >>> >> From: a...@ashleysheridan.co.uk >>> >> Sent: Thursday, November 25, 2010 3:34 AM >>> >> To: Ron Piggott ; php-general@lists.php.net >>> >> Subject: Re: [PHP] Fw: Spoofing user_agent >>> >> >>> >> You need to set it in the header request you make. Putting it in the >>> >> script you're using as a spider with ini_set won't do anything because >>> the >>> >> Target site doesn't know anything about it. >>> >> >>> >> Thanks, >>> >> Ash >>> >> http://www.ashleysheridan.co.uk >>> >> >>> >> ----- Reply message ----- >>> >> From: "Ron Piggott" <ron.pigg...@actsministries.org> >>> >> Date: Thu, Nov 25, 2010 08:25 >>> >> Subject: [PHP] Fw: Spoofing user_agent >>> >> To: <php-general@lists.php.net> >>> >> >>> >> I have wrote a script to generate a sitemap of my web site. It crawls >>> all >>> >> of the site web pages. (About 30,000) >>> >> >>> >> I need help to spoof the user_agent variable so the stats program >>> running >>> >> in the background ( “AWSTATS” ) will treat the crawl as a bot, not >>> browsing >>> >> usage. >>> >> >>> >> The sitemap generator is a cron job. I tried the syntax: >>> >> ini_set('user_agent', 'RonBot (http://www.theverseoftheday.info)/'/); >>> >> >>> >> This didn’t work. The browsing was attributed to the dedicated IP >>> >> address. >>> >> >>> >> How do I get AWSTATS to access this, such as other entries under the >>> >> “Robots/Spiders visitors” heading: >>> >> Unknown robot (identified by 'bot*') >>> >> >>> >> I don’t mean any ill will by changing this setting. Thanks for the >>> help. >>> >> >>> >> Ron >>> >> >>> >> The Verse of the Day >>> >> “Encouragement from God’s Word” >>> >> http://www.TheVerseOfTheDay.info >>> >> >>> >> >>> > >>> > >>> > -- >>> > Regards, >>> > Shreyas Agasthya >>> > >>> >>> >>> >>> -- >>> Regards, >>> Shreyas Agasthya >>> >> >> >> >> -- >> :DJ >> > > It is no use using header(). This sets a header for the client, not > the server of any file_get_contents() requests. > > I use stream_contexts. > > $s_Contents = file_get_contents( > $s_URL, > False, > stream_context_create( > array( > 'http' => array( > 'method' => 'GET', > 'header' => "User-Agent: RonBot (http://www.example.com)\r\n" > ), > ) > ) > ); > > You can supply cookies, or anything else, with the request. Make sure > you add a \r\n to each of the headers and just concatenate them. > > If you are doing this in a loop, then I'd recommend creating a default > stream context and then the request would just be ... > > $s_Contents = file_get_contents($s_URL); > > As the default stream context would be applied. > > I had to use a default stream context to route all http requests > through an NTLM authentication proxy server because PHP doesn't deal > with NTLM authentication. > > See my user notes on > http://docs.php.net/manual/en/function.stream-context-get-default.php. > Don't bother with the link at the bottom of the user note- it's not > live. > > Richard. > > -- > Richard Quadling > Twitter : EE : Zend > @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY >
http://en.wikipedia.org/wiki/User_agent "... the identity is transmitted via the User-Agent request header, ... " -- Richard Quadling Twitter : EE : Zend @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php