Re: [Beowulf] Java vs C++ for interfacing to parallel library

Robert G. Brown Sun, 20 Aug 2006 15:02:26 -0700

On Sun, 20 Aug 2006, Joe Landman wrote:



Robert G. Brown wrote:

[...]

Java, octave, matlab, python, perl etc. are MUCH WORSE in this regard.
All require NONTRIVIAL encapsulation of the library into the interactive
environment.  I have never done an actual encapsulation into any of


Cant speak to Octave/Java/Matlab.  Python and Perl make this relatively
easy.  In Perl you have the Inline:: modules.  If you have installed
Inline::C, this example

           #!/usr/bin/perl
           use Inline C;
          greet('Joe');

          __END__
          __C__
          void greet(char* name) {
            printf("Hello %s!\n", name);
          }

does this

        [EMAIL PROTECTED]:~]
        105 >./inline.pl
        Hello Joe!

Obviously this is a trivial example, but if you create a reasonable set
of API's that you can express as we have indicated, even pass function
prototypes in using a header file, and a little config stuff at the
front end to give paths to libraries, this is not generally very hard.


I'm a bit skeptical about this for heavy lifting.  As in, could you
encapsulate the GSL in this way?  I doubt it.

Only when you have some ... odd ... structures or objects passing back
and forth which require a bit more work.


What's an odd structure?  All typedef structs (an object by any other
name) are "odd" in that they aren't part of the standard language
specification.  However C also permits a variety of freeform data
objects created by alloc'ing a block of memory and setting up any form
of offset addressing one needs/likes.

Python has similar facilities.  Generally speaking the dynamic
languanges (Perl, Python, Ruby) are pretty easy to wrap around things
and link with other stuff, as long as the API/data structures are pretty
clean.


Ay, that's the rub...;-) That and what you consider "pretty easy"...:-)

Hmmm... methinks you are thinking of strongly typed languages.  In
non-strongly-typed languages, internal data types are not usually opaque
unless they are objects with well formed classes/accessors behind them.
Even then, the data stores tend to be quite flexible. In Perl (as an
example) you have several types.  SV's (scalar variables), AV's (array
variables), and HV's (hash variables), as well as pointers to the same.
Notice that I didn't talk about ints, floats, etc.  Python has a
similar view, though it's data types include "lower level" types (ints,
floats, ...).  In Ruby, everything is an object.


I'm simply pointing out that in perl (the case I'm most familiar with)
it is really quite difficult to know the details of what a data object
looks like from the "inside".  When I talk about an array object in C, I
know EXACTLY what it looks like.  A **array holds addresses of a set of
*vectors of data.  A ***array holds addresses of a set of **arrays, each
holding addresses of a set of *vectors.  I can take steps to ensure that
the actual data of the array is in a single contiguous block of memory
or can allocate vector blocks all over the place or I can just let
malloc generate them whereever it likes.  In perl, arrays are an opaque
data type.  One cannot in general assume that an ordinary C subroutine
can dereference a perl array passed by reference as a **array or a
matrix[][].  I'm assuming that this is what the "Inline" stuff above
does -- perform all the requisite translations of perl data types into
forms that C can grok.  In SOME cases they may be pretty much the same
as in C -- but I doubt it.

I think perl uses a (de facto) struct/complex object for even simple
$variable types for a variety of reasons -- probably mostly because
"Perl is a contextually polymorphic language whose scalars can be
strings, numbers, or references (which includes objects)."  Hence there
is metadata associated with the storage of even the simplest data
objects.  I vaguely remember all sorts of dire warnings in the language
reference manual on this very point although I'm not going to go dig out
my copy to verify my recollection.

The point being that there is a nontrivial step for ANY language
translating data structures and objects, quite possibly including the
very simplest ones e.g. simple scalar variables containing e.g. ints,
uints, doubles, into a subroutine.  A subroutine written to use uint
inputs is going to be unhappy or do odd things when fed with ints.  What
can one do about this when feeding it from a perl variable named $i that
is intrinsically polymorphic.  When you set $anumber = -214748365; in
perl and pass it to a C routine, did you just pass it a signed long long
integer or an unsigned integer?  C will expect a binary representation
in definite type, but perl has no such type.  Ditto char variables --
what is a perl string and how is it terminated and what does perl do
with char s variables one byte wide?  In C this is a valid concept -- an
unterminated single byte character.  In perl I'm GUESSING that it ALWAYS
saves a character as a string in an actual struct, with the usual
metadata and probably with a terminator.

There is some discussion of this and some advice here:

  http://world.std.com/~swmcd/steven/perl/pm/xs/intro/index.html

discussing "XS" -- a translation/interface system that permits one to
integrate C source into perl native.  See also man perlxs, of course.
This is "the right way" to integrate libraries or complex C sources with
perl, but I quote:

  If you want to write XS, you have to learn it. Learning XS is very
  difficult, for two reasons.

  The first is that the core Perl docs, such as perlxs and perlguts,
  tacitly assume that you already understand XS. Accordingly, they omit or
  gloss over crucial assumptions and background information. This sounds
  bad, but it is actually rather common in the Unix world.

  The second is that you can't learn XS. Not as such. Not from the top
  down. This problem is much more profound than the first, and it stems
  not from any inadequacy in the documentation, but from what XS and
  isn't.

  The Perl docs refer to XS as a language, but it isn't. XS is a
  collection of macros. The XS langauge processor is a program called
  xsubpp, where pp is short for PreProcessor, and PreProcessor is a
  polite term for macro expander. xsubpp expands XS macros into the bits
  of C code necessary to connect the Perl interpreter to your C-language
  subroutines.

  Because XS isn't a language, it lacks structure. The underlying C code
  has structure, but you can't see it, because it is hidden behind the
  macros. This makes it virtually impossible to learn XS on its own
  terms.

...

  In order to learn XS, you have to work from the bottom up. You have to
  learn the Perl C API. You have to understand Perl's internal data
  structures. You have to understand how the Perl stack works, and how a C
  subroutine gets access to it. You have to understand how C subroutines
  get linked into the Perl executable. You have to understand the data
  paths through the DynaLoader module that bind the name of a Perl
  subroutine to the entry point of a C subroutine.

As I suggested, to interface a complex program or library with perl
"correctly" (efficiently), you have to learn the perl API.  Which is Not
Trivial At All.  Basically, to use perlxs, you start by learning how
perl is written -- all its data types and memory management and how
routines are written and called -- and THEN suddenly you see how to use
perlxs to encapsulate your C code to run as commands in native perl.

Or (as this site also suggests) you can try SWIG: http://www.swig.org/.

This is a quick-and-dirty solution that works (AFAICS) by first putting
your C routines in a SWIG wrapper, which is relatively simple because
SWIG is designed to simply wrap C routines.  Then SWIG does all the
magic, pre-encapsulated, of translating into and out of the language of
choice via its (now hidden) API.  Presumably it has library layers that
manage all of the interfacing cleanly for you.

NEITHER of these seems like they are for the faint of heart or anyone
less than a bloomin' expert top coder.  XS for the Ubercoder who thinks
nothing of reading the source code of the linux kernel as a pleasant
summer diversion, SWIG for less lofty but still highly competent coders.
At a guess, with SWIG you can do any simple project, but probably not
encapsulate the GSL.  With XS you can encapsulate the GSL directly into
perl, but only if you are a class of programmer that, frankly, I find a
bit frightening (think of a "Ramanujan" or "Riemann" of programmer) or a
team of very good programmers.  I don't know which class the referenced
library is in.

My first experience with Perl (my gosh, more than a decade ago) was
wrapping a long calculation that we were doing to extract total energies
by autogenerating input files and preparing data sets.  Took me a few
hours to write the code, started it running, and two weeks later, we had
our results.  The perl code did not do the calculations, that was the
fortran code.  The perl code drove the fortran, extracted the relevant
information and wrote it to a file, and prepared the next input.  I
would hate to think how long the dynamics would have take had it been
written in anything other than fortran/c.


Yeah, I do the same thing, only I use programs written in C native, and
write perl programs only as controllers (which is really what it is
for).  Good old "system()" or the various flavors of exec*() -- just let
perl generate your command line(s) for you and screen or file scrape
your results in for processing.  This is the SIMPLEST encapsulation of C
libraries that permit their use in perl -- just wrap them up with a
simple command line interface and stdout output and embed them into perl
code that way.  It has the further advantage that you can actually run
the programs by hand from the command line.

There is something to be said for using the Unix "build complex tools
out of pipelines of simple tools" approach to things.  Having tackled a
couple of GUI programs with some back end complexity at this point, I
learned the hard way that the only sane way to proceed here is to put
the complexity in a library so you can focus on the UI at the UI stage,
the complexity in the library stage.  That forces you to build an API,
and THEN you can sometimes do the required integration relatively
simply.  But it is almost invariably much simpler to write a simple
program that outputs the results to a file and then generate a graph of
the file with e.g. gnuplot by hand than to write a program that puts the
same results into memory and generates a graph of them via library calls
inside the program.  Not as nice for users, but way simpler for the
programmer.  If making something commercial, it might be justified.  If
you just want to make a bunch of plots, it probably isn't.

The point being that you have to interface these opaque and not
obviously documented data types to the C library calls.  This is surely
possible -- it is how all those perl libraries, matlab toolboxes, java
interfaces come about.  It will probably require that you learn WAY more
about how the language itself is implemented at the source level than
you are likely to want to know, and it is probably not going to be
terribly easy...


Hmmm.  Did speak to Perl and python above.  Not sure how to do it with
Octave, but the Matlab folks have some good connectivity with external
libraries.  I dont know if it is easy to extend.  Java likes to talk to
Java.


I think that in all cases the recipe is pretty much the same.  Either
use SWIG (or equivalent), use macros or wrappers designed to help you
encapsulate, and/or learn the language API way down deep.  Deep enough
to fully understand its method of data management and command linkage.
I'd be deeply suspicious of anyone who says this is simple.  At the very
least, I'd want to know their IQ and compare it to my own and if it is
more than 10 points higher reject their conclusion out of hand...;-)

With the exception of the people who "do" this regularly, of course.
They've invested the (possibly considerable) amount of time necessary to
MAKE it simple.  For example, >>I'm<< not going to do a port of
dieharder into R (as yet another interactive programming language
interface) -- but I'm willing to simplify encapsulate the dieharder
routines in a library interface that I'm HOPING will be simple enough to
encapsulate in R for somebody that has already figured out and written
extensions for it.

   rgb

--
Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:[EMAIL PROTECTED]


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Java vs C++ for interfacing to parallel library

Reply via email to