On Tuesday, August 19, 2014 06:33:29 AM Rich Freeman wrote: > On Tue, Aug 19, 2014 at 5:34 AM, J. Roeleveld <jo...@antarean.org> wrote: > > On Monday, August 18, 2014 10:53:51 AM Alec Ten Harmsel wrote: > >> On Mon 18 Aug 2014 10:50:23 AM EDT, Rich Freeman wrote: > >> > Hadoop is a very specialized tool. It does what it does very well, > >> > but if you want to use it for something other than map/reduce then > >> > consider carefully whether it is the right tool for the job. > >> > >> Agreed; unless you have decent hardware and can comfortably measure > >> your data in TB, it'll be quicker to use something else once you factor > >> in the administration time and learning curve. > > > > The benefit of clustering technologies is that you don't need high-end > > hardware to start with. You can use the old hardware you found collecting > > dust in the basement. > > > > The learning curve isn't as steep as it used to be. There are plenty of > > tools to make it easier to start using Hadoop. > > As long as you're counting words and don't mind coding everything in Java. > :) > > I found that if you want to avoid using Java, then the available > documentation plummets, and I'm pretty sure the version I was > attempting to use was buggy - it was losing records in the sort/reduce > phase I believe. Or perhaps I was just using it incorrectly, but the > same exact code worked just fine when I ran it on a single host with a > smaller dataset and just piped map | sort | reduce without using > Hadoop. The documentation was pretty sparse on how to get Hadoop to > work via stdin/out with non-Java code and it is quite possible I > wasn't quite doing things right. In the end my problem wasn't big > enough to necessitate using Hadoop and I used GNU parallel instead.
No need for Java knowledge to develop against Hadoop. A commercial product: http://www.informatica.com/Images/01603_powerexchange-for-hadoop_ds_en-US.pdf Nice and easy graphical interface. The same "code" that works against a relational database also works with Hadoop. The tool does the translation. I would be surprised if there are no other tools that can make it easier to develop code to work with Hadoop. I just haven't had the reason to search for those yet. -- Joost