Bummer, Eugen was a quicker than me. Here are some pictures though: http://www.google.com/about/datacenters/gallery/#/
Regards Jörg On Wednesday 17 October 2012 14:40:12 Eugen Leitl wrote: > (PUE 1.2 doesn't sound so otherworldly to me) > > http://www.wired.com/wiredenterprise/2012/10/ff-inside-google-data-center/a > ll/ > > Google Throws Open Doors to Its Top-Secret Data Center > > By Steven Levy 10.17.12 7:30 AM > > Follow @StevenLevy > > Photo: Google/Connie Zhou > > If you’re looking for the beating heart of the digital age — a physical > location where the scope, grandeur, and geekiness of the kingdom of bits > become manifest—you could do a lot worse than Lenoir, North Carolina. This > rural city of 18,000 was once rife with furniture factories. Now it’s the > home of a Google data center. > > Engineering prowess famously catapulted the 14-year-old search giant into > its place as one of the world’s most successful, influential, and > frighteningly powerful companies. Its constantly refined search algorithm > changed the way we all access and even think about information. Its > equally complex ad-auction platform is a perpetual money-minting machine. > But other, less well-known engineering and strategic breakthroughs are > arguably just as crucial to Google’s success: its ability to build, > organize, and operate a huge network of servers and fiber-optic cables > with an efficiency and speed that rocks physics on its heels. Google has > spread its infrastructure across a global archipelago of massive > buildings—a dozen or so information palaces in locales as diverse as > Council Bluffs, Iowa; St. Ghislain, Belgium; and soon Hong Kong and > Singapore—where an unspecified but huge number of machines process and > deliver the continuing chronicle of human experience. > > This is what makes Google Google: its physical network, its thousands of > fiber miles, and those many thousands of servers that, in aggregate, add up > to the mother of all clouds. This multibillion-dollar infrastructure allows > the company to index 20 billion web pages a day. To handle more than 3 > billion daily search queries. To conduct millions of ad auctions in real > time. To offer free email storage to 425 million Gmail users. To zip > millions of YouTube videos to users every day. To deliver search results > before the user has finished typing the query. In the near future, when > Google releases the wearable computing platform called Glass, this > infrastructure will power its visual search results. > > The problem for would-be bards attempting to sing of these data centers has > been that, because Google sees its network as the ultimate competitive > advantage, only critical employees have been permitted even a peek inside, > a prohibition that has most certainly included bards. Until now. > > A server room in Council Bluffs, Iowa. Previous spread: A central cooling > plant in Google’s Douglas County, Georgia, data center. Photo: > Google/Connie Zhou > > Here I am, in a huge white building in Lenoir, standing near a reinforced > door with a party of Googlers, ready to become that rarest of species: an > outsider who has been inside one of the company’s data centers and seen the > legendary server floor, referred to simply as “the floor.” My visit is the > latest evidence that Google is relaxing its black-box policy. My hosts > include Joe Kava, who’s in charge of building and maintaining Google’s data > centers, and his colleague Vitaly Gudanets, who populates the facilities > with computers and makes sure they run smoothly. > > A sign outside the floor dictates that no one can enter without hearing > protection, either salmon-colored earplugs that dispensers spit out like > trail mix or panda-bear earmuffs like the ones worn by airline ground > crews. (The noise is a high-pitched thrum from fans that control airflow.) > We grab the plugs. Kava holds his hand up to a security scanner and opens > the heavy door. Then we slip into a thunderdome of data … > > Urs Hölzle had never stepped into a data center before he was hired by > Sergey Brin and Larry Page. A hirsute, soft-spoken Swiss, Hölzle was on > leave as a computer science professor at UC Santa Barbara in February 1999 > when his new employers took him to the Exodus server facility in Santa > Clara. Exodus was a colocation site, or colo, where multiple companies > rent floor space. Google’s “cage” sat next to servers from eBay and other > blue-chip Internet companies. But the search company’s array was the most > densely packed and chaotic. Brin and Page were looking to upgrade the > system, which often took a full 3.5 seconds to deliver search results and > tended to crash on Mondays. They brought Hözle on to help drive the > effort. > > It wouldn’t be easy. Exodus was “a huge mess,” Hölzle later recalled. And > the cramped hodgepodge would soon be strained even more. Google was not > only processing millions of queries every week but also stepping up the > frequency with which it indexed the web, gathering every bit of online > information and putting it into a searchable format. AdWords—the service > that invited advertisers to bid for placement alongside search results > relevant to their wares—involved computation-heavy processes that were > just as demanding as search. Page had also become obsessed with speed, > with delivering search results so quickly that it gave the illusion of > mind reading, a trick that required even more servers and connections. And > the faster Google delivered results, the more popular it became, creating > an even greater burden. Meanwhile, the company was adding other > applications, including a mail service that would require instant access > to many petabytes of storage. Worse yet, the tech downturn that left many > data centers underpopulated in the late ’90s was ending, and Google’s > future leasing deals would become much more costly. > > For Google to succeed, it would have to build and operate its own data > centers—and figure out how to do it more cheaply and efficiently than > anyone had before. The mission was codenamed Willpower. Its first > built-from-scratch data center was in the Dalles, a city in Oregon near > the Columbia River. > > Hözle and his team designed the $600 million facility in light of a radical > insight: Server rooms did not have to be kept so cold. The machines throw > off prodigious amounts of heat. Traditionally, data centers cool them off > with giant computer room air conditioners, or CRACs, typically jammed > under raised floors and cranked up to arctic levels. That requires massive > amounts of energy; data centers consume up to 1.5 percent of all the > electricity in the world. Data centers consume up to 1.5 percent of all > the world’s > electricity. > > Google realized that the so-called cold aisle in front of the machines > could be kept at a relatively balmy 80 degrees or so—workers could wear > shorts and T-shirts instead of the standard sweaters. And the “hot aisle,” > a tightly enclosed space where the heat pours from the rear of the > servers, could be allowed to hit around 120 degrees. That heat could be > absorbed by coils filled with water, which would then be pumped out of the > building and cooled before being circulated back inside. Add that to the > long list of Google’s accomplishments: The company broke its CRAC habit. > > Google also figured out money-saving ways to cool that water. Many data > centers relied on energy-gobbling chillers, but Google’s big data centers > usually employ giant towers where the hot water trickles down through the > equivalent of vast radiators, some of it evaporating and the remainder > attaining room temperature or lower by the time it reaches the bottom. In > its Belgium facility, Google uses recycled industrial canal water for the > cooling; in Finland it uses seawater. > > The company’s analysis of electrical flow unearthed another source of > waste: the bulky uninterrupted-power-supply systems that protected servers > from power disruptions in most data centers. Not only did they leak > electricity, they also required their own cooling systems. But because > Google designed the racks on which it placed its machines, it could make > space for backup batteries next to each server, doing away with the big > UPS units altogether. According to Joe Kava, that scheme reduced > electricity loss by about 15 percent. > > All of these innovations helped Google achieve unprecedented energy > savings. The standard measurement of data center efficiency is called > power usage effectiveness, or PUE. A perfect number is 1.0, meaning all > the power drawn by the facility is put to use. Experts considered > 2.0—indicating half the power is wasted—to be a reasonable number for a > data center. Google was getting an unprecedented 1.2. > > For years Google didn’t share what it was up to. “Our core advantage really > was a massive computer network, more massive than probably anyone else’s in > the world,” says Jim Reese, who helped set up the company’s servers. “We > realized that it might not be in our best interest to let our competitors > know.” > > But stealth had its drawbacks. Google was on record as being an exemplar of > green practices. In 2007 the company committed formally to carbon > neutrality, meaning that every molecule of carbon produced by its > activities—from operating its cooling units to running its diesel > generators—had to be canceled by offsets. Maintaining secrecy about energy > savings undercut that ideal: If competitors knew how much energy Google > was saving, they’d try to match those results, and that could make a real > environmental impact. Also, the stonewalling, particularly regarding the > Dalles facility, was becoming almost comical. Google’s ownership had > become a matter of public record, but the company still refused to > acknowledge it. > > In 2009, at an event dubbed the Efficient Data Center Summit, Google > announced its latest PUE results and hinted at some of its techniques. It > marked a turning point for the industry, and now companies like Facebook > and Yahoo report similar PUEs. > > Make no mistake, though: The green that motivates Google involves > presidential portraiture. “Of course we love to save energy,” Hölzle says. > “But take something like Gmail. We would lose a fair amount of money on > Gmail if we did our data centers and servers the conventional way. Because > of our efficiency, we can make the cost small enough that we can give it > away for free.” > > Google’s breakthroughs extend well beyond energy. Indeed, while Google is > still thought of as an Internet company, it has also grown into one of the > world’s largest hardware manufacturers, thanks to the fact that it builds > much of its own equipment. In 1999, Hözle bought parts for 2,000 > stripped-down “breadboards” from “three guys who had an electronics shop.” > By going homebrew and eliminating unneeded components, Google built a > batch of servers for about $1,500 apiece, instead of the then-standard > $5,000. Hölzle, Page, and a third engineer designed the rigs themselves. > “It wasn’t really ‘designed,’” Hölzle says, gesturing with air quotes. > > More than a dozen generations of Google servers later, the company now > takes a much more sophisticated approach. Google knows exactly what it > needs inside its rigorously controlled data centers—speed, power, and good > connections—and saves money by not buying unnecessary extras. (No graphics > cards, for instance, since these machines never power a screen. And no > enclosures, because the motherboards go straight into the racks.) The same > principle applies to its networking equipment, some of which Google began > building a few years ago. > > Outside the Council Bluffs data center, radiator-like cooling towers chill > water from the server floor down to room temperature. Photo: Google/Connie > Zhou > > So far, though, there’s one area where Google hasn’t ventured: designing > its own chips. But the company’s VP of platforms, Bart Sano, implies that > even that could change. “I’d never say never,” he says. “In fact, I get > that question every year. From Larry.” > > Even if you reimagine the data center, the advantage won’t mean much if you > can’t get all those bits out to customers speedily and reliably. And so > Google has launched an attempt to wrap the world in fiber. In the early > 2000s, taking advantage of the failure of some telecom operations, it began > buying up abandoned fiber-optic networks, paying pennies on the dollar. > Now, through acquisition, swaps, and actually laying down thousands of > strands, the company has built a mighty empire of glass. > > But when you’ve got a property like YouTube, you’ve got to do even more. It > would be slow and burdensome to have millions of people grabbing videos > from Google’s few data centers. So Google installs its own server racks in > various outposts of its network—mini data centers, sometimes connected > directly to ISPs like Comcast or AT&T—and stuffs them with popular videos. > That means that if you stream, say, a Carly Rae Jepsen video, you probably > aren’t getting it from Lenoir or the Dalles but from some colo just a few > miles from where you are. > > Over the years, Google has also built a software system that allows it to > manage its countless servers as if they were one giant entity. Its in-house > developers can act like puppet masters, dispatching thousands of computers > to perform tasks as easily as running a single machine. In 2002 its > scientists created Google File System, which smoothly distributes files > across many machines. MapReduce, a Google system for writing cloud-based > applications, was so successful that an open source version called Hadoop > has become an industry standard. Google also created software to tackle a > knotty issue facing all huge data operations: When tasks come pouring into > the center, how do you determine instantly and most efficiently which > machines can best afford to take on the work? Google has solved this > “load-balancing” issue with an automated system called Borg. > > These innovations allow Google to fulfill an idea embodied in a 2009 paper > written by Hözle and one of his top lieutenants, computer scientist Luiz > Barroso: “The computing platform of interest no longer resembles a pizza > box or a refrigerator but a warehouse full of computers … We must treat > the data center itself as one massive warehouse-scale computer.” > > This is tremendously empowering for the people who write Google code. Just > as your computer is a single device that runs different programs > simultaneously—and you don’t have to worry about which part is running > which application—Google engineers can treat seas of servers like a single > unit. They just write their production code, and the system distributes it > across a server floor they will likely never be authorized to visit. “If > you’re an average engineer here, you can be completely oblivious,” Hözle > says. “You can order x petabytes of storage or whatever, and you have no > idea what actually happens.” > > But of course, none of this infrastructure is any good if it isn’t > reliable. Google has innovated its own answer for that problem as well—one > that involves a surprising ingredient for a company built on algorithms > and automation: people. > > At 3 am on a chilly winter morning, a small cadre of engineers begin to > attack Google. First they take down the internal corporate network that > serves the company’s Mountain View, California, campus. Later the team > attempts to disrupt various Google data centers by causing leaks in the > water pipes and staging protests outside the gates—in hopes of distracting > attention from intruders who try to steal data-packed disks from the > servers. They mess with various services, including the company’s ad > network. They take a data center in the Netherlands offline. Then comes > the coup de grâce—cutting most of Google’s fiber connection to Asia. > > Turns out this is an inside job. The attackers, working from a conference > room on the fringes of the campus, are actually Googlers, part of the > company’s Site Reliability Engineering team, the people with ultimate > responsibility for keeping Google and its services running. SREs are not > merely troubleshooters but engineers who are also in charge of getting > production code onto the “bare metal” of the servers; many are embedded in > product groups for services like Gmail or search. Upon becoming an SRE, > members of this geek SEAL team are presented with leather jackets bearing a > military-style insignia patch. Every year, the SREs run this simulated > war—called DiRT (disaster recovery testing)—on Google’s infrastructure. The > attack may be fake, but it’s almost indistinguishable from reality: > Incident managers must go through response procedures as if they were > really happening. In some cases, actual functioning services are messed > with. If the teams in charge can’t figure out fixes and patches to keep > things running, the attacks must be aborted so real users won’t be > affected. In classic Google fashion, the DiRT team always adds a goofy > element to its dead-serious test—a loony narrative written by a member of > the attack team. This year it involves a Twin Peaks-style supernatural > phenomenon that supposedly caused the disturbances. Previous DiRTs were > attributed to zombies or aliens. > > Some halls in Google’s Hamina, Finland, data center remain vacant—for now. > Photo: Google/Connie Zhou > > As the first attack begins, Kripa Krishnan, an upbeat engineer who heads > the annual exercise, explains the rules to about 20 SREs in a conference > room already littered with junk food. “Do not attempt to fix anything,” > she says. “As far as the people on the job are concerned, we do not exist. > If we’re really lucky, we won’t break anything.” Then she pulls the > plug—for real—on the campus network. The team monitors the phone lines and > IRC channels to see when the Google incident managers on call around the > world notice that something is wrong. It takes only five minutes for > someone in Europe to discover the problem, and he immediately begins > contacting others. > > “My role is to come up with big tests that really expose weaknesses,” > Krishnan says. “Over the years, we’ve also become braver in how much we’re > willing to disrupt in order to make sure everything works.” How did Google > do this time? Pretty well. Despite the outages in the corporate network, > executive chair Eric Schmidt was able to run a scheduled global all-hands > meeting. The imaginary demonstrators were placated by imaginary pizza. > Even shutting down three-fourths of Google’s Asia traffic capacity didn’t > shut out the continent, thanks to extensive caching. “This is the best > DiRT ever!” Krishnan exclaimed at one point. > > The SRE program began when Hözle charged an engineer named Ben Treynor with > making Google’s network fail-safe. This was especially tricky for a massive > company like Google that is constantly tweaking its systems and > services—after all, the easiest way to stabilize it would be to freeze all > change. Treynor ended up rethinking the very concept of reliability. > Instead of trying to build a system that never failed, he gave each > service a budget—an amount of downtime it was permitted to have. Then he > made sure that Google’s engineers used that time productively. “Let’s say > we wanted Google+ to run 99.95 percent of the time,” Hözle says. “We want > to make sure we don’t get that downtime for stupid reasons, like we > weren’t paying attention. We want that downtime because we push something > new.” > > Nevertheless, accidents do happen—as Sabrina Farmer learned on the morning > of April 17, 2012. Farmer, who had been the lead SRE on the Gmail team for > a little over a year, was attending a routine design review session. > Suddenly an engineer burst into the room, blurting out, “Something big is > happening!” Indeed: For 1.4 percent of users (a large number of people), > Gmail was down. Soon reports of the outage were all over Twitter and tech > sites. They were even bleeding into mainstream news. > > The conference room transformed into a war room. Collaborating with a peer > group in Zurich, Farmer launched a forensic investigation. A breakthrough > came when one of her Gmail SREs sheepishly admitted, “I pushed a change on > Friday that might have affected this.” Those responsible for vetting the > change hadn’t been meticulous, and when some Gmail users tried to access > their mail, various replicas of their data across the system were no longer > in sync. To keep the data safe, the system froze them out. > > The diagnosis had taken 20 minutes, designing the fix 25 minutes > more—pretty good. But the event went down as a Google blunder. “It’s > pretty painful when SREs trigger a response,” Farmer says. “But I’m happy > no one lost data.” Nonetheless, she’ll be happier if her future crises are > limited to DiRT-borne zombie attacks. > > One scenario that dirt never envisioned was the presence of a reporter on a > server floor. But here I am in Lenoir, earplugs in place, with Joe Kava > motioning me inside. > > We have passed through the heavy gate outside the facility, with > remote-control barriers evoking the Korean DMZ. We have walked through the > business offices, decked out in Nascar regalia. (Every Google data center > has a decorative theme.) We have toured the control room, where LCD > dashboards monitor every conceivable metric. Later we will climb up to > catwalks to examine the giant cooling towers and backup electric > generators, which look like Beatle-esque submarines, only green. We will > don hard hats and tour the construction site of a second data center just > up the hill. And we will stare at a rugged chunk of land that one day will > hold a third mammoth > computational facility. > > But now we enter the floor. Big doesn’t begin to describe it. Row after row > of server racks seem to stretch to eternity. Joe Montana in his prime could > not throw a football the length of it. > > During my interviews with Googlers, the idea of hot aisles and cold aisles > has been an abstraction, but on the floor everything becomes clear. The > cold aisle refers to the general room temperature—which Kava confirms is > 77 degrees. The hot aisle is the narrow space between the backsides of two > rows of servers, tightly enclosed by sheet metal on the ends. A nest of > copper coils absorbs the heat. Above are huge fans, which sound like jet > engines jacked through Marshall amps. The huge fans sound like jet > engines jacked through Marshall amps. > > We walk between the server rows. All the cables and plugs are in front, so > no one has to crack open the sheet metal and venture into the hot aisle, > thereby becoming barbecue meat. (When someone does have to head back > there, the servers are shut down.) Every server has a sticker with a code > that identifies its exact address, useful if something goes wrong. The > servers have thick black batteries alongside. Everything is uniform and in > place—nothing like the spaghetti tangles of Google’s long-ago Exodus era. > > Blue lights twinkle, indicating … what? A web search? Someone’s Gmail > message? A Glass calendar event floating in front of Sergey’s eyeball? It > could be anything. > > Every so often a worker appears—a long-haired dude in shorts propelling > himself by scooter, or a woman in a T-shirt who’s pushing a cart with a > laptop on top and dispensing repair parts to servers like a psychiatric > nurse handing out meds. (In fact, the area on the floor that holds the > replacement gear is called the pharmacy.) > > How many servers does Google employ? It’s a question that has dogged > observers since the company built its first data center. It has long stuck > to “hundreds of thousands.” (There are 49,923 operating in the Lenoir > facility on the day of my visit.) I will later come across a clue when I > get a peek inside Google’s data center R&D facility in Mountain View. In a > secure area, there’s a row of motherboards fixed to the wall, an honor > roll of generations of Google’s homebrewed servers. One sits atop a tiny > embossed plaque that reads july 9, 2008. google’s millionth server. But > executives explain that this is a cumulative number, not necessarily an > indication that Google has a million servers in operation at once. > > Wandering the cold aisles of Lenoir, I realize that the magic number, if it > is even obtainable, is basically meaningless. Today’s machines, with > multicore processors and other advances, have many times the power and > utility of earlier versions. A single Google server circa 2012 may be the > equivalent of 20 servers from a previous generation. In any case, Google > thinks in terms of clusters—huge numbers of machines that act together to > provide a service or run an application. “An individual server means > nothing,” H\0xF6lzle says. “We track computer power as an abstract metric.” > It’s the realization of a concept H\0xF6lzle and Barroso spelled out three > years ago: the data center as a computer. > > As we leave the floor, I feel almost levitated by my peek inside Google’s > inner sanctum. But a few weeks later, back at the Googleplex in Mountain > View, I realize that my epiphanies have limited shelf life. Google’s > intention is to render the data center I visited obsolete. “Once our people > get used to our 2013 buildings and clusters,” Hözle says, “they’re going to > complain about the current ones.” > > Asked in what areas one might expect change, Hözle mentions data center and > cluster design, speed of deployment, and flexibility. Then he stops short. > “This is one thing I can’t talk about,” he says, a smile cracking his > bearded visage, “because we’ve spent our own blood, sweat, and tears. I > want others to spend their own blood, sweat, and tears making the same > discoveries.” Google may be dedicated to providing access to all the > world’s data, but some information it’s still keeping to itself. > > Senior writer Steven Levy (steven_l...@wired.com) interviewed Mary Meeker > in issue 20.10. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- ************************************************************* Jörg Saßmannshausen University College London Department of Chemistry Gordon Street London WC1H 0AJ email: j.sassmannshau...@ucl.ac.uk web: http://sassy.formativ.net Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf