----- Forwarded message from Brian Atkins <[EMAIL PROTECTED]> -----
From: Brian Atkins <[EMAIL PROTECTED]>
Date: Wed, 20 Jun 2007 16:23:29 -0500
To: transhumantech <[EMAIL PROTECTED]>
Subject: [tt] Nvidia unveils Tesla, moves into supercomputing
User-Agent: Thunderbird 2.0.0.4 (Windows/20070604)
http://www.tgdaily.com/content/view/32557/135/
Santa Clara (CA) – Nvidia today announced Tesla, a third product line next
to
the GeForce and Quadro graphics products. The company aims to use Tesla
cards
and the massive floating point horsepower of its graphics processors to
take
over a portion of the lucrative supercomputing market.
The core of each Tesla device is a GeForce 8-series GPU as well as the
general
component layout of the high-end Quadro FX 5600 workstation graphics card
with
1.5 GB of memory. The only noteworthy difference between the FX 5600 and a
Tesla
card is the fact that the supercomputing-targeted devices lack the
graphics
outputs on the backpanel, which we were told, allows Nvidia to increase
the
clock speed on Tesla.
While the actual clock speed of the Tesla GeForce GPU is kept under wraps,
Nvidia said that one processor (used in the C870 add-in card) is good for
a
performance of 518 GFlops, two processors (used in the deskside
supercomputer
D870, which integrates two C870 cards) will bring 1 TFlops; the Tesla GPU
server
with four processors will hit 2 TFlops.
In terms of pure number crunching horsepower, Nvidia told us that one
GeForce
GPU can match the combined performance of 40 x86 processors. In addition
to the
raw performance, Tesla also makes a case for power efficiency: The C870 is
rated
at a maximum power consumption of 170 watts and the GPU server at 800
watts,
which may sound a lot at first look. However, 40 low-power x86 processors
would
run at a typical 1600 watts. With a common power budget of about 25
kilowatts
per rackserver, a Tesla GPU server rack has a theoretical maximum
performance of
more than 60 TFlops – which would put the floating point rating of such a
device
among the 15 fastest supercomputers currently ranked on the Top 500
Supercomputer list.
Similarities to ATI’s stream processor card, implications for developers
Readers, who have been following recent general purpose GPU announcements,
will
remember that ATI has product in its portfolio that is very similar to the
Tesla
C870 – the stream processor card (which is based on a R580 GPU and 1 GB of
memory). Both products follow the same concept to make the massively
processing
capability provided by shader processors available to run arbitrary code
instead
of graphics code.
Developers such as John Stone and James Philips, senior research
programmers at
the Beckman Institute of Advanced Science and Technology at the University
of
Illinois, have been looking at accelerators such as GPUs for some, but
have been
limited mainly by bugs in shader drivers. Stone told us that much of his
work
with GPUs in the past was focused “on finding driver bugs” and “writing
his
applications around them” in order to make the technology usable for
scientific
simulations. “There can be a lot of rounding errors and because of this
very
fact, I wasn’t very excited about working with GPUs,” he said.
However, both AMD and Nvidia came up with a programming model to solve
this
problem. On AMD’s side, it is called CTM (“close to metal”) and on Nvidia’s
side
it is CUDA (“Compute Unified Device Architecture”). At this time, it
appears to
come down to personal liking which model is preferred by a developer, as,
for
example, there are some universities that are working with CTM (such as
Stanford’s [EMAIL PROTECTED] project) and there are some that are working with
CUDA.
Stone and Philips are focusing on the Nvidia model as they claim its
C++-based
language model is easier to deal with than AMD’s CTM version, which uses a
low-level assembly language.
While CUDA works very much like a regular programming model and, according
to
Stone, can deliver results very quickly, the big challenge in exploiting
these
devices will be knowledge to write advanced parallelized code for these
GPGPUs.
Stone believes that especially coders who have written code for (massively
parallel) supercomputers before will have an easy transition opportunity.
Of
course, knowledge of the hardware, graphics processing and a good look at
the
parallelizable parts of applications help to take advantage of the
technology.
Shane Ryoo, a graduate research assistant at the University of Illinois at
Urbana-Champaign, said that CUDA will allow programmers with some
experience in
developing threaded applications to get “really good results right off the
bat.”
However, it will be the fine-tuning process, which will increase the value
of
GPGPUs: Ryoo noted that expert knowledge that will allow developers to
squeeze
the best possible performance out of GPUs, sometimes can accelerate
application
code by a factor of 5x or greater.
Nvidia is well aware of this challenge and has begun assisting
universities in
establishing classes and developing course material focusing on massively
parallel programming and CUDA in particular. Eventually, the company
hopes, that
GPGPU programming will become a standard part in computer science course
work
and help to educate a whole new generation of programmers. So far, Nvidia
has
taught courses at the University of Illinois, The University of
California, the
University of North Carolina and Purdue University. Nvidia said that
several
universities are developing their own courses, including the University of
Virginia, the University of Pennsylvania, Oregon State University, the
University of Wisconsin. Caltech, MIT, Berkeley and Stanford have been
offering
“legacy” GPGPU and GPU programming classes, according to Nvidia chief
scientist
David Kirk.
The payoff: Accelerated applications
If the capabilities of these GPGPUs are exploited, there can be a big
payoff.
Stone, who is working on Nanoscale Molecular Dynamics (NAMD) as well as
Visual
Molecular Dynamics (VMD), said that a virus simulation that took 110 CPU
hours
on a SGI Altix Itanium 2 supercomputer at NCSA required only 27 GPU
minutes on a
GeForce 8 graphics processor – which translates into a 240x speedup.
In an example that showcases an impact that can touch many lifes, Ryoo and
his
team are working on an interactive, medical MRI application that
substantially
increases the resolution of MRI scans thanks to the added processing
power. As a
result, they expect to be able to deliver much finer images, which allow
physicians to detect tumors at an earlier state or differentiate between a
blip
or an actual tumor.
In a demonstration showed during an Nvidia event, a representative from
Headwave, a company that provides geophysical data analysis, highlighted a
4D
application, which allows users to visualize gigabytes and apparently even
terabytes of data in a three-dimensional scale and even apply a time
filter to
display changes to geological layers over time. The company claims that
GPUs are
accelerating their application by about 2000% and are delivering an output
of
about 2000 MB/s.
In fairness, we should mention that Tesla (or stream processor cards for
that
matter) will not be able to replace supercomputers, which continue to
provide a
memory bandwidth a few Tesla cards cannot match. Scientists such as Stone
believe that products such as Tesla will make their way into
supercomputers to
create an overall more balanced environment. “Number crunching was the
limiting
factor up until now. Now Infiniband will be a problem,” he said.
GPGPUs are likely to have a greater impact on deskside supercomputers in
the
short term. While scientists today have to apply for expensive
supercomputer
time and in most cases have to wait several days until their application
can be
processed - if those requests are not turned down anyway – there is now an
opportunity to run many of those tests on a desk right in the lab.
Conceivably,
GPGPUs will allow more scientists to run more and higher quality
simulations in
less time.
Cost and impact on the consumer
Nvidia’s Tesla products will start at $1300 for the single GPU add-in
card; the
2-GPU deskside unit will run for $7500 and the 4 GPU server, which soon
will
also be offered in an 8 GPU version, will sell for $12,000. Leaving out of
consideration that, at least to our knowledge, Tesla is not yet available,
these
apparently lofty price tags turn out to be bargains at a closer look.
The C870 not only undercuts the ATI stream processor card, which currently
sells
for about $2000, but also Nvidia’s own workstation products. The C870, at
$1300,
compares to a Quadro FX 5600 graphics card, which requires and investment
in the
neighborhood of $3000 and up. Clearspeed’s CSX600 accelerator card, which
provides a performance of about 100 GFlops, is selling in volume for about
$7500.
A representative of Evolved Machines told us that the company plans to be
offering a 12 TFlops Tesla server, which will cost somewhere between
$60,000 and
$70,000, but will be fast enough to match the floating point performance
of the
19th fastest supercomputer on the Top-500 list.
Stone told us that even if the GPUs per se may appear to be expensive for
a
consumer point of view, they “are available for far less money than the
next
best thing that is available today.”
So, what does that mean for the consumer? Clearly, there is only an
indirect
benefit for most consumers that we may see in improved research results
down the
road. However, as all technologies, these GPUs will get cheaper over time
and
even today, a $1300 card would be in reach for enthusiasts, who often
spend
substantially more than $5000 on their rig. The fact is that there is no
magic
necessary to make these cards work on a PC - and CUDA even works with
GeForce 8
graphics cards, which can be had for less than $250 in the case of
8600-series
models. The real question is: When will there be applications that take
advantage of this technology and will they provide enough incentive for
consumers to purchase a GeForce 8 card? Industry experts believe that it
will be
up do developers to come up with new applications that will take advantage
of
the capability of GPGPUs on the desktop.
Nvidia CEO Jen-Hsun Huang told TG Daily that Tesla will be strictly
focused for
the enterprise market and will not be making its way to the consumer
market. In
the end, it will be up to the GeForce product groups to leverage CUDA on
desktop
computers, but at least for now, Nvidia has little motivation to push this
technology for the average consumer: “Perhaps in the future,” said Huang,
“[this
technology] could do physics on the PC, but this would need a Windows
API.”
--
Brian Atkins
Singularity Institute for Artificial Intelligence
http://www.singinst.org/
_______________________________________________
tt mailing list
[EMAIL PROTECTED]
http://postbiota.org/mailman/listinfo/tt
----- End forwarded message -----
--
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf