ID:               47643
 Updated by:       dmi...@php.net
 Reported By:      viper7 at viper-7 dot com
 Status:           Assigned
 Bug Type:         Performance problem
 Operating System: *
 PHP Version:      5.2.6+, 5.3, 6CVS (2009-04-13)
 Assigned To:      dmitry
 New Comment:

The problems occurs because of "bad" patch for bug #42838.

The diff algorithm sorts arrays using qsort and then assumes that they
are sorted correctly. But in case of user compaison function it can't be
guaranteed. Thus in ext/standard/tests/array/bug42838.phpt
key_compare_func() can't sort array correctly because expressions (0 <
'a') and (0 > 'a') both false ('a' is interpreted as a number 0).

It should be fixed in some way


Previous Comments:
------------------------------------------------------------------------

[2009-06-30 15:22:24] der...@php.net

Dmitry, could you have a look? I have no idea why this occurs.

------------------------------------------------------------------------

[2009-06-30 15:19:43] viper7 at viper-7 dot com

I've tracked down the change that broke things, this is it. but the
exact reason is beyond me heh. Hopefully this helps.

http://cvs.php.net/viewvc.cgi/php-src/ext/standard/array.c?r1=1.308.2.21.2.51&r2=1.308.2.21.2.52&pathrev=PHP_5_2

------------------------------------------------------------------------

[2009-03-24 21:19:01] cisa at cisa85 dot de

Like I described [1] I use this function to get the performance I
need:


function array_diff_fast($data1, $data2) {
    $data1 = array_flip($data1);
    $data2 = array_flip($data2);

    foreach($data2 as $hash => $key) {
       if (isset($data1[$hash])) unset($data1[$hash]);
    }

    return array_flip($data1);
}

Thanks to Viper for his help.

[1]
http://nohostname.de/blog/2009/03/24/bug-gefunden-array_diff-in-php-526-unglaublich-langsam/

------------------------------------------------------------------------

[2009-03-13 11:49:36] viper7 at viper-7 dot com

Description:
------------
This bug was reported in ##php on freenode, and after some thorough
testing on multiple machines we determined it must be an engine bug.

array_diff on two large arrays of md5 hashes (600,000 elements each)
takes approximately 4 seconds on a fast server in PHP 5.2.4 and below
(confirmed with PHP 5.2.0), but over 4 hours (!) on PHP 5.2.6 and
greater (confirmed with PHP 5.2.9 and PHP 5.3.0 beta2)


Reproduce code:
---------------
<?php
$i=0; $j=500000;
while($i < 600000) {
        $i++; $j++;
        $data1[] = md5($i);
        $data2[] = md5($j);
}
 
$time = microtime(true);

echo "Starting array_diff\n";
$data_diff1 = array_diff($data1, $data2);

$time = microtime(true) - $time;

echo 'array_diff() took ' . number_format($time, 3) . ' seconds and
returned ' . count($data_diff1) . " entries\n";
?>

Expected result:
----------------
Starting array_diff
array_diff() took 3.778 seconds and returned 500000 entries

Actual result:
--------------
Starting array_diff
array_diff() took 14826.278 seconds and returned 500000 entries


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=47643&edit=1

Reply via email to