Hello Jonathan. Here it is a good document to get you thinking. http://www.cs.berkeley.edu/~rxin/db-papers/WarehouseScaleComputing.pdf
Although Doug said "Oh, and Hadoop clusters are not going to supplant your HPC cluster" I believe that there is an ongoing effort to converge Cloud computing (eg. Hadoop) and HPC. The key things are exposed in the link I provided. To me the convergence is summarized in: -strong scalability. -reliability/fault tolerance. -programming productivity. -standarized/cheap infrastructure. Joshua ------ Original Message ------ Received: 09:20 AM PST, 02/07/2015 From: Jonathan Aquilina <[email protected]> To: Douglas Eadline <[email protected]>Cc: Beowulf <[email protected]> Subject: Re: [Beowulf] hadoop > > > Hey Douglas, > > Thanks for the information, what has me curious is if it can be used for > example in applications which dont involve large amounts of data. > > It would be great if you or anyone has any resources like ebooks are > useful websites to read up on it would be great if you could send them > reason being where I am working we deal with lots of live telemetry in > terms of positioning etc. and since we are going to be moving our system > away from windows to open source technologies such as angular.js for the > web site of our platform as well as mongodb and nodejs, we will be > implementing hadoop from amazon to take advantage of Amazon's elastic > map reduce. > > --- > Regards, > Jonathan Aquilina > Founder Eagle Eye T > > On 2015-02-07 17:33, Douglas Eadline wrote: > > > Jonathan > > > > I understand your confusion. Hadoop and Big Data have reached > > overused but not well understood status years ago. > > > > First, Hadoop started out at a MapReduce engine. This all > > changed with Hadoop V2 and YARN (Yet Another Resource Negotiator) > > Hadoop V2 can be considered a platform on which applications that need > > parallel access to large amounts of unstructured data (i.e. raw data not > > in a traditional database. It can also used with its own database HBase, > > which is based on Google Big Table. > > > > The idea is this, a "Hadoop" cluster has a large amount of storage > > using HDFS (or possibly another parallel filesystem) This is often referred > > to as the "Data Lake." Raw data is dumped in the lake. There is no > > ETL (Extract Transform and Load) step. Various Hadoop YARN frameworks use > > this data. YARN provides a very dynamic resource allocation model and the > > ability to provide data locality to your application (i.e. the traditional > > MapReduce idea was "move the computation to the data") > > > > Thus in a Hadoop V2 cluster you can have MapReduce applications (which > > support many of the the popular apps like Pig and Hive) It also supports > > Spark, Storm, Giraph and even MPI (not the most efficient but it works) > > There are many other applications being ported to YARN. > > > > Second, Big Data is usually defined by Volume, Velocity, and Variety. > > The definition seems to be what ever a vendor wants it to be, however. > > It reminds me of products that suddenly became "grid ready" in years past. > > Again such designations mean as much as "now works with binary data" > > > > Finally, if you are interested in Hadoop YARN you can check out the book > > "Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with > > Apache Hadoop 2" (I helped write it). There also many online resources. > > The first chapter of the book has the history of Hadoop as written by > > one of the developers. It is quite interested to read and helps dispel > > many of the Hadoop myths. You can read this chapter for free here: > > > > http://ptgmedia.pearsoncmg.com/images/9780321934505/samplepages/0321934504.pdf [2]That is enough Hadoop for Saturday morning. Oh, and Hadoop clusters > > are not going to supplant your HPC cluster. > > > > -- > > Doug > > > >> Can someone explain to me what exactly the purpose of hadoop is and what we mean when we say big data? Is this for data storage and retrieval? Number crunching? -- Regards, Jonathan Aquilina Founder Eagle Eye T -- Mailscanner: Clean _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf [1] > > > > -- > > Doug > > > Links: > ------ > [1] http://www.beowulf.org/mailman/listinfo/beowulf > [2] > http://ptgmedia.pearsoncmg.com/images/9780321934505/samplepages/0321934504.pdf > _______________________________________________ > Beowulf mailing list, [email protected] sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
