EUREKA!
> -----Original Message-----
> From: Stuart Dallas [mailto:[email protected]]
> Sent: Tuesday, September 03, 2013 6:31 AM
> To: Daevid Vincent
> Cc: [email protected]
> Subject: Re: [PHP] refernces, arrays, and why does it take up so much
> memory?
>
> On 3 Sep 2013, at 02:30, Daevid Vincent <[email protected]> wrote:
>
> > I'm confused on how a reference works I think.
> >
> > I have a DB result set in an array I'm looping over. All I simply want
to
> do
> > is make the array key the "id" of the result set row.
> >
> > This is the basic gist of it:
> >
> > private function _normalize_result_set()
> > {
> > foreach($this->tmp_results as $k => $v)
> > {
> > $id = $v['id'];
> > $new_tmp_results[$id] =& $v; //2013-08-29 [dv] using
a
> > reference here cuts the memory usage in half!
>
> You are assigning a reference to $v. In the next iteration of the loop, $v
> will be pointing at the next item in the array, as will the reference
you're
> storing here. With this code I'd expect $new_tmp_results to be an array
> where the keys (i.e. the IDs) are correct, but the data in each item
matches
> the data in the last item from the original array, which appears to be
what
> you describe.
>
> > unset($this->tmp_results[$k]);
>
> Doing this for every loop is likely very inefficient. I don't know how the
> inner workings of PHP process something like this, but I wouldn't be
> surprised if it's allocating a new chunk of memory for a version of the
> array without this element. You may find it better to not unset anything
> until the loop has finished, at which point you can just unset($this-
> >tmp_results).
>
> >
> > /*
> > if ($i++ % 1000 == 0)
> > {
> > gc_enable(); // Enable Garbage Collector
> > var_dump(gc_enabled()); // true
> > var_dump(gc_collect_cycles()); // # of
elements
> > cleaned up
> > gc_disable(); // Disable Garbage Collector
> > }
> > */
> > }
> > $this->tmp_results = $new_tmp_results;
> > //var_dump($this->tmp_results); exit;
> > unset($new_tmp_results);
> > }
>
>
> Try this:
>
> private function _normalize_result_set()
> {
> // Initialise the temporary variable.
> $new_tmp_results = array();
>
> // Loop around just the keys in the array.
> foreach (array_keys($this->tmp_results) as $k)
> {
> // Store the item in the temporary array with the ID as the key.
> // Note no pointless variable for the ID, and no use of &!
> $new_tmp_results[$this->tmp_results[$k]['id']] =
$this->tmp_results[$k];
> }
>
> // Assign the temporary variable to the original variable.
> $this->tmp_results = $new_tmp_results;
> }
>
> I'd appreciate it if you could plug this in and see what your memory usage
> reports say. In most cases, trying to control the garbage collection
through
> the use of references is the worst way to go about optimising your code.
In
> my code above I'm relying on PHPs copy-on-write feature where data is only
> duplicated when assigned if it changes. No unsets, just using scope to
mark
> a variable as able to be cleaned up.
>
> Where is this result set coming from? You'd save yourself a lot of
> memory/time by putting the data in to this format when you read it from
the
> source. For example, if reading it from MySQL, $this-
> >tmp_results[$row['id']] = $row when looping around the result set.
>
> Also, is there any reason why you need to process this full set of data in
> one go? Can you not break it up in to smaller pieces that won't put as
much
> strain on resources?
>
> -Stuart
There were reasons I had the $id -- I only showed the relevant parts of the
code for sake of not overly complicating what I was trying to illustrate.
There is other processing that had to be done too in the loop and that is
also what I illustrated.
Here is your version effectively:
private function _normalize_result_set() //Stuart
{
if (!$this->tmp_results || count($this->tmp_results) < 1)
return;
$new_tmp_results = array();
// Loop around just the keys in the array.
$D_start_mem_usage = memory_get_usage();
foreach (array_keys($this->tmp_results) as $k)
{
/*
if ($this->tmp_results[$k]['genres'])
{
// rip through each scene's `genres` and
store them as an array since we'll need'em later too
$g = explode('|',
$this->tmp_results[$k]['genres']);
array_pop($g); // there is an extra ''
element due to the final | character. :-\
$this->tmp_results[$k]['g'] = $g;
}
*/
// Store the item in the temporary array with the ID
as the key.
// Note no pointless variable for the ID, and no use of
&!
$new_tmp_results[$this->tmp_results[$k]['id']] =
$this->tmp_results[$k];
}
// Assign the temporary variable to the original variable.
$this->tmp_results = $new_tmp_results;
echo "\nMEMORY USED FOR STUART's version:
".number_format(memory_get_usage() - $D_start_mem_usage)." PEAK:
(".number_format(memory_get_peak_usage(true)).")<br>\n";
var_dump($this->tmp_results);
exit();
}
MEMORY USED FOR STUART's version: -128 PEAK: (90,439,680)
With the processing in the genres block
MEMORY USED FOR STUART's version: 97,264,368 PEAK: (187,695,104)
So a slight improvement from the original of -28,573,696
MEMORY USED FOR _normalize_result_set(): 97,264,912 PEAK: (216,268,800)
No matter what I tried however it seems that frustratingly just the simple
act of adding a new hash to the array is causing a significant memory jump.
That really blows! Therefore my solution was to not store the $g as ['g'] --
which would seem to be the more efficient way of doing this once and re-use
the array over and over, but instead I am forced to inline rip through and
explode() in three different places of my code.
We get over 30,000 hits per second, and even with lots of caching, 216MB vs
70-96MB is significant and the speed hit is only about 1.5 seconds more per
page.
Here are three distinctly different example pages that exercise different
parts of the code path:
PAGE RENDERED IN 7.0466279983521 SECONDS
MEMORY USED @START: 262,144 - @END: 26,738,688 = 26,476,544 BYTES
MEMORY PEAK USAGE: 69,730,304 BYTES
PAGE RENDERED IN 6.9327299594879 SECONDS
MEMORY USED @START: 262,144 - @END: 53,739,520 = 53,477,376 BYTES
MEMORY PEAK USAGE: 79,167,488 BYTES
PAGE RENDERED IN 7.558168888092 SECONDS
MEMORY USED @START: 262,144 - @END: 50,855,936 = 50,593,792 BYTES
MEMORY PEAK USAGE: 96,206,848 BYTES
Furthermore I investigated what Jim Giner suggested and it turns out there
was a way for me to wedge into our Connection class a way to mangle the
results at that point, which is actually a more elegant solution overall as
we can re-use that in many more places going forward.
/**
* Execute a database SQL query and return all the results in an
associative array
*
* @access public
* @return array or false
* @param string $sql the SQL code to execute
* @param boolean $print (false) Print a color coded version
of the query.
* @param boolean $get_first (false) return the first element
only. useful for when 1 row is returned such as "LIMIT 1"
* @param string $key (null) if a column name, such as 'id' is
used here, then that column will be used as the array key
* @author Daevid Vincent [[email protected]]
* @date 2013-09-03
* @see get_instance(), execute(), fetch_query_pair()
*/
public function fetch_query($sql = "", $print = false,
$get_first=false, $key=null)
{
//$D_start_mem_usage = memory_get_usage();
if (!$this->execute($sql, $print)) return false;
$tmp = array();
if (is_null($key))
while($arr = $this->fetch_array(MYSQL_ASSOC)) $tmp[]
= $arr;
else
while($arr = $this->fetch_array(MYSQL_ASSOC))
$tmp[$arr[$key]] = $arr;
$this->free_result(); // freeing result from memory
//echo "\nMEMORY USED FOR fetch_query():
".number_format(memory_get_usage() - $D_start_mem_usage)." PEAK:
(".number_format(memory_get_peak_usage(true)).")<br>\n";
return (($get_first) ? array_shift($tmp) : $tmp);
}
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php