ID:               43098
 Updated by:       [EMAIL PROTECTED]
 Reported By:      harvie at email dot cz
-Status:           Open
+Status:           Feedback
 Bug Type:         Performance problem
 Operating System: Linux (Debian Etch) - php5-cli
 PHP Version:      5.2.4
 New Comment:

It won't matter what you put in timeout if you wrap everything in a
while(1) loop. Of course it just sits there, try adding some error
checking there..



Previous Comments:
------------------------------------------------------------------------

[2007-10-25 17:21:10] harvie at email dot cz

[EMAIL PROTECTED]: It's possible, but in this case the
default_socket_timeout have to close the socket and continue with next
URL (crawler freezing too with many different URLs). Or
default_socket_timeout doesn't matter in here?

It's true, that my router sux, but i don't see any reason why PHP
should crash at first problem with connectivity, that ends in total
script freeze, i thought, that is why we have socket timeout option. Or
not?

------------------------------------------------------------------------

[2007-10-25 11:57:20] [EMAIL PROTECTED]

Are you sure it's not just your network connection freezing? f.e. some
kind of firewall stopping you from connecting to one site too many times
in too short time? (fyi: your script works fine for me, I stopped it
after 10 minutes..)


------------------------------------------------------------------------

[2007-10-24 19:02:01] harvie at email dot cz

I tryed to run this at PHP4 - CLI (MS Windows Server2003)
It returned this errors. May be, this error is handled another way in
PHP5 and it causes the hang up...

c:/>bugshow.php
#0#1#2#3#4#5#6#7#8#9#10#11#12#13#14#15
Warning: file_get_contents(res:///PHP/http:\\w.moreover.com\): failed
to open stream: No such file or directory in D:\bug.php on line 12
#16#17#18#19#20#21#22#23#24#25#26#27#28#29#30#31#32#33#34#35#36#37#38#39#40#41#4
2#43#44#45#46#47
Warning: file_get_contents(res:///PHP/http:\\w.moreover.com\): failed
to open stream: No such file or directory in D:\bug.php on line 12
#48#49#50

------------------------------------------------------------------------

[2007-10-24 18:16:26] harvie at email dot cz

I have runned the script with strace debuger (This is debuging
interpreter calls, not PHP code... of course.), if you are interested:

# strace ./bugshow.php
execve("./emails.php", ["./emails.php"], [/* 29 vars */]) = 0
uname({sys="Linux", node="harvie-ntb", ...}) = 0
brk(0)                                  = 0x854c000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or
directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7fa8000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=67381, ...}) = 0
mmap2(NULL, 67381, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f97000
close(3)                                = 0

...Lot of irrelevant stuff...

connect(3, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("192.168.2.1")}, 28) = 0
fcntl64(3, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
gettimeofday({1193256681, 718238}, NULL) = 0
poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
send(3, "\6?\1\0\0\1\0\0\0\0\0\0\1w\10moreover\3com\0\0\1\0\1", 32, 0)
= 32
poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
ioctl(3, FIONREAD, [56])                = 0
recvfrom(3,
"\6?\201\200\0\1\0\1\0\0\0\0\1w\10moreover\3com\0\0\1\0"..., 1024, 0,
{sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("192.168.2.1")}, [16]) = 56
close(3)                                = 0
gettimeofday({1193256681, 768001}, NULL) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
fcntl64(3, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80),
sin_addr=inet_addr("170.224.8.50")}, 16) = -1 EINPROGRESS (Operation now
in progress)
poll([{fd=3, events=POLLIN|POLLOUT|POLLERR|POLLHUP, revents=POLLOUT}],
1, 1000) = 1
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
fcntl64(3, F_SETFL, O_RDWR)             = 0
send(3, "GET / HTTP/1.0\r\n", 16, 0)    = 16
send(3, "Host: w.moreover.com\r\n", 22, 0) = 22
send(3, "\r\n", 2, 0)                   = 2
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN}], 1, 1000)
= 1
recv(3, "HTTP/1.1 200 OK\r\nDate: Wed, 24 O"..., 8192, 0) = 524
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN}], 1, 1000)
= 1
recv(3, "s, online news, current awarenes"..., 8192, 0) = 524
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 1000) = 0
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 1000) = 0
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 1000) = 0
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 1000) = 0
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0
poll([{fd=3, events=POLLIN|POLLERR|POLLHUP}], 1, 1000) = 0
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0
...This is repeating few times a second...

------------------------------------------------------------------------

[2007-10-24 18:00:48] harvie at email dot cz

Description:
------------
I have writed spider/crawler to make some web search engine as school
project.

So... I have small problem:
I am using file_get_contents() (i've tryed fopen() too...).
Crawler works 100% great, but sometimes it freezing. I have tryed to
trace what function freezes, and i found it, it's
file_get_contents()...

So, i googled and found default_socket_timeout setting, i set it to 1,
but sometimes its freezes and never get up again.

I've done this example, so you can see, that it freezes after few
iterations. I have supplyed URL, that causes freeze of my crawler (im
not sure why...):


Reproduce code:
---------------
#!/usr/bin/php
< ?php

/*Run and wait for a while, this can totaly stop the script at the dead
point...*/

ini_set('default_socket_timeout',1);
set_time_limit(0);
//$url='http://ad.doubleclick.net/click';
$url='http://w.moreover.com/';
while(1) {
    @file_get_contents($url, false, null, 0, 10000);
    echo "#";
}

?>

Expected result:
----------------
I will download file from specified URL few times, and after that it
will freeze and never be better...
(It works if you are using different url each time too, but it takes
more time...)


Actual result:
--------------
harvie-ntb:/home/harvie/Desktop/crawler# ./bugshow.php
#1#2#3#4#5#6#7#8#9#10#11#12#13#14#15#16#17

And in there it freezes for eternity (i thought, that this will
continue after 1 second if failed with
ini_set('default_socket_timeout',1);, But whole script stops, i tryed to
wait realy long long time...)



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=43098&edit=1

Reply via email to