ID: 47643 Updated by: dmi...@php.net Reported By: viper7 at viper-7 dot com Status: Assigned Bug Type: Performance problem Operating System: * PHP Version: 5.2.6+, 5.3, 6CVS (2009-04-13) Assigned To: dmitry New Comment:
The problems occurs because of "bad" patch for bug #42838. The diff algorithm sorts arrays using qsort and then assumes that they are sorted correctly. But in case of user compaison function it can't be guaranteed. Thus in ext/standard/tests/array/bug42838.phpt key_compare_func() can't sort array correctly because expressions (0 < 'a') and (0 > 'a') both false ('a' is interpreted as a number 0). It should be fixed in some way Previous Comments: ------------------------------------------------------------------------ [2009-06-30 15:22:24] der...@php.net Dmitry, could you have a look? I have no idea why this occurs. ------------------------------------------------------------------------ [2009-06-30 15:19:43] viper7 at viper-7 dot com I've tracked down the change that broke things, this is it. but the exact reason is beyond me heh. Hopefully this helps. http://cvs.php.net/viewvc.cgi/php-src/ext/standard/array.c?r1=1.308.2.21.2.51&r2=1.308.2.21.2.52&pathrev=PHP_5_2 ------------------------------------------------------------------------ [2009-03-24 21:19:01] cisa at cisa85 dot de Like I described [1] I use this function to get the performance I need: function array_diff_fast($data1, $data2) { $data1 = array_flip($data1); $data2 = array_flip($data2); foreach($data2 as $hash => $key) { if (isset($data1[$hash])) unset($data1[$hash]); } return array_flip($data1); } Thanks to Viper for his help. [1] http://nohostname.de/blog/2009/03/24/bug-gefunden-array_diff-in-php-526-unglaublich-langsam/ ------------------------------------------------------------------------ [2009-03-13 11:49:36] viper7 at viper-7 dot com Description: ------------ This bug was reported in ##php on freenode, and after some thorough testing on multiple machines we determined it must be an engine bug. array_diff on two large arrays of md5 hashes (600,000 elements each) takes approximately 4 seconds on a fast server in PHP 5.2.4 and below (confirmed with PHP 5.2.0), but over 4 hours (!) on PHP 5.2.6 and greater (confirmed with PHP 5.2.9 and PHP 5.3.0 beta2) Reproduce code: --------------- <?php $i=0; $j=500000; while($i < 600000) { $i++; $j++; $data1[] = md5($i); $data2[] = md5($j); } $time = microtime(true); echo "Starting array_diff\n"; $data_diff1 = array_diff($data1, $data2); $time = microtime(true) - $time; echo 'array_diff() took ' . number_format($time, 3) . ' seconds and returned ' . count($data_diff1) . " entries\n"; ?> Expected result: ---------------- Starting array_diff array_diff() took 3.778 seconds and returned 500000 entries Actual result: -------------- Starting array_diff array_diff() took 14826.278 seconds and returned 500000 entries ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=47643&edit=1