Re: [Beowulf] hadoop

Matthew Wallis Sat, 07 Feb 2015 04:40:44 -0800

Depends on the nature of the tasks, I'm sure you could use it for back end 
processing, load balancing would come as part of the job distribution.


You probably want to check the website for the types of workloads it supports.

Matt

--
Matthew Wallis
[email protected]

> On 7 Feb 2015, at 7:48 pm, Jonathan Aquilina <[email protected]> wrote:
> 
> Can it be used for example in a web hosting application to process site 
> requests in the form of load balancing etc
> 
> Sent from my iPhone
> 
>> On 07 Feb 2015, at 09:45, Matt Wallis <[email protected]> wrote:
>> 
>> Hi Jonathan,
>> 
>>> On 7 Feb 2015, at 6:20 pm, Jonathan Aquilina <[email protected]> 
>>> wrote:
>>> 
>>> Can someone explain to me what exactly the purpose of hadoop is and what we 
>>> mean when we say big data? Is this for data storage and retrieval? Number 
>>> crunching?
>> 
>> Hadoop can be thought of as HTPC, High Throughput Computing, over a 
>> collection of simple servers. Where in HPC you might have hundreds of nodes 
>> with a shared file system working on the same copy of the data, Hadoop 
>> distributes the data to local storage in each node of the cluster using the 
>> Hadoop Filesystem, and then collects the output at the end. I believe it has 
>> built in redundancy, allowing you to distribute the same job to 2 or 3 nodes 
>> for fault tolerance. It means your "cluster" can be very simple, no complex 
>> parallel filesystems, no specialised networks, no redundancy at the hardware 
>> level.
>> 
>> Originally built to work with MapReduce as it's core application, there are 
>> a number of other applications that can be found on the Apache website. 
>> 
>> As for big data, this is basically about taking things like 10 billion 
>> tweets, breaking them up into chunks of 500,000 or so, and doing analytics 
>> on them. Things like that break up very easily for distribution, as there is 
>> usually very little linkage between each tweet. 
>> 
>> Hadoop came out of the need for places like Google, Yahoo, Paypal and eBay 
>> to process terabytes of transaction logs an hour. They already had the 
>> servers, but they were in data centres all over the world. Rather than hook 
>> them all up to some common file server, just build a system to package up 
>> the data and the application and send it where ever can process it the 
>> quickest. Send it 3 times to make sure it gets done, then pull back the 
>> results at the end.
>> 
>> Matt.
>> _______________________________________________
>> Beowulf mailing list, [email protected] sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf
> 
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] hadoop

Reply via email to