Hi Folks,

I'm running two buildbots here at home and am getting consistent failures
from the Pandas because of overheating. I've set up a monitor that will
tell me the current CPU temperature and the allowed maximum, and when the
bot passes 90%, it shuts itself off.

The problem is that I'm running with heat-sinks and the boards are on top
of three fans, so there really isn't much more I can do to solve this
problem.

I personally think this is a hardware problem, since everything is in the
same die, CPU, GPU and RAM, and the physical dimensions of the chip are
quite small. I remember when Intel started overheating (around 486DX66) and
the die was huge (more head dissipation), plus RAM and GPU were separate,
and it still needed a hefty heat-sink.

It's true that gates are far smaller today, but it's not true that a dual
core 1.3GHz + GPU + RAM will produce less heat on a small die than a 66KHz
CPU on a huge die, so why anyone think it's a good idea to release a 1+GHz
chip without *any* form of heat dissipation is beyond my comprehension.

Manufacturers only got away with it, so far, because people rarely use 100%
of the CPU power for extended periods of time, because ARM devices end up
as set-top boxes, mobile phones and tablets. However, even those devices
will heat up when playing 2 h films or games, and they do have some form of
heat sink.

We, at the toolchain group, make things worse by using 100% CPU, 24 / 7,
something that Panda boards, or Arndales were not designed to do. However,
with ARM moving into the server space, their designs will have to be
re-thought, and what a better place than Linaro for making sure we get it
right?

For the time being, I believe we *must* have air conditioning in the Lab
all the time, and we *must* have heat-sinks on every board, and we *must*
monitor the CPU temperature of the boards, at least until we're comfortable
that they're not failing all the time.

Can we make a temperature monitor (like the one attached) a default feature
on Linaro Ubuntu distributions? We could dump that info to the syslog/dmesg
whenever it crosses the (say) 75% threshold, and report more often when it
crosses the 95%, possibly dumping the processe(s) that are consuming more
CPU at the time, to enable post-mortem debugging.

cheers,
--renato

As a side note, the quad-A9 ODroid does ship with a massive heat-sink,
which also serves as a fancy case. Quite clever, really.

Attachment: monitor
Description: Binary data

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to