Jeff Layton wrote:
BTW - I saw Karen's post about using Java with HadoopFS. Be sure to pay attention to that since getting a good 64-bit Java implementation for Linux is not always easy. There are a few out there (Sun has an early access program to a 64-bit Java) but the reports I've heard are that it's still early.
Yeah, 64 bit java is sorta-kinda working. Sun just released a 64 bit Java plugin for nsapi (e.g. firefox/mozilla) oh, only ... 5 years after the first RFE. Not sure how well baked it is, I am playing with it for some of our customers.
64 bit Java shouldn't be hard, as Java VM's are supposed to hide details of the underlying architecture. It is a VM. But at the end of day, there could be (considerable) differences in execution due to object size differences ... that is, unless you completely ignore the underlying native intrinsic data sizes, your execution could have some ... er ... unexpected results.
Which might by why Java 64 is so hard to create. They work so hard to hide the details of the underlying system (OS, CPU, memory, IO, network) from you, that in moving to a new ABI, there is so much to change, that it is ... non-trivial ... to do so.
This said, I hear of Java's use in HPC every now and then. Some apps are interesting in that they leverage some capability of the underlying platform, like the Pervasive Software DataRush effort, and allow you to hide latency by massively threading their analyses. But as we have noted to many, I don't see the great unwashed masses/hoards of HPC developers rushing to Java due to its (many) downsides. There is a real tangible measurable performance penalty to abstraction. Introduce too much and you spend more time traversing the abstraction classes than you do doing the computation. Heck, we can't even write good compilers for non-OO code (e.g. compilers that generate near optimal instruction paths on existing CPUs on significant fraction of HPC code bases). Are we expecting to write even better JIT compilers and optimizers to solve a more difficult problem than the one we have basically punted on?
I am a huge believer in programmer productivity (though I dispute the notion that Java's incredibly draconian type system coupled with its verbosity actually contributes to productivity), but underlying code performance is still one of the most important aspects in HPC. DataRush solves this by hiding latency of each task, by having so many tasks to work on. Sort of a Java version (weak analogy) of the old Tera MTA system. Other codes like Hadoop could do similar things ... schedule so much work that some actually gets done.
A nascent (yet very real) problem for Java in addition to the above mentioned, for HPC usage going forward, is their complete lack of support for accelerators. Maybe someday, in another decade or so, they will start to support GPU computation ... not talking about OpenCL support, but real execution on the many cores that accelerators supply. The underlying architecture is changing fast enough that I don't think they can keep up. And end users want the performance. This provides a net incentive not to use Java, as it can't currently (or in the foreseeable future) support the emerging personal supercomputing systems with accelerators. Sure it can run on the CPUs, but then like all other codes, it runs head first into the memory wall, the IO bandwidth walls, and so forth.
Sun of course will claim that the trick is to massively multithread the code, which means you don't focus on individual thread performance but on overall throughput. Which somewhat flies in the face of what HPC developers have been talking about for decades (tune for single processors first, then for parallel).
So I won't disparage the users or use of Java in HPC, other than to note that the future on that platform in HPC may not be as bright as some marketeers might suggest.
N.B. the recent MPI class we gave suggested that we need to re-tool it to focus more upon Fortran than C. There was no interest in Java from the class I polled. Some researchers want to use Matlab for their work, but most university computing facilities are loathe to spend the money to get site licenses for Matlab. Unfortunate, as Matlab is a very cool tool (been playing with it first in 1988 ...) its just not fast. The folks at Interactive Supercomputing might be able to help with this with their compiler.
-- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: land...@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf