Re: [R] [FORGED] Newbie Question on R versus Matlab/Octave versus C

Alan Feuerbacher Wed, 30 Jan 2019 08:17:49 -0800

On 1/29/2019 11:50 PM, Jeff Newmiller wrote:

Thanks very much for providing these coding examples! I think this is agood way to learn some R.


Alan

On Tue, 29 Jan 2019, Alan Feuerbacher wrote:
On 1/28/2019 7:51 PM, Jeff Newmiller wrote:
If you forge on with your preconceptions of how such a simulationshould be implemented then you will be able to reproduce your failurejust as spectacularly using R as you did using Octave.
I think I've come to the same conclusion. :-)
It is crucial to employ vectorization of your algorithms if you wantgood performance with either Octave or R. That vectorization mayeither be over time or over separate simulations.
Please explain further, if you don't mind. My background is not inprogramming, but in analog microchip circuit design (I'm now retired).Thus I'm a user of circuit simulators, not a programmer of them. Also,I'm running this stuff on my home computers, either Linux or Windowsmachines.
I am running simulations of a million cases of power plantperformance over 25 years in about a minute. I know someone who usedR to simulate a CFD river flow problem in a class in a few minutes,while others using Fortran or Matlab were struggling to getcomparable runs completed in many hours. I believe the difference wasin how the data were structured and manipulated more than thelanguage that was being used. I think the strong capabilities forpresenting results using R makes using it advantageous over Octave,though.
After my failed attempt at using Octave, I realized that most likelythe main contributing factor was that I was not able to figure out anefficient data structure to model one person. But C lent itselfperfectly to my idea of how to go about programming my simulation. Sohere's a simplified pseudocode sort of example of what I did:
Don't model one person... model an array of people.
To model a single reproducing woman I used this C construct:

typedef struct woman {
 int isAlive;
 int isPregnant;
 double age;
 . . .
} WOMAN;
# e.g.
Nwomen <- 100
women <- data.frame( isAlive = rep( TRUE, Nwomen )
                    , isPregnant = rep( FALSE, Nwomen )
                    , age = rep( 20, Nwomen )
                    )
Then I allocated memory for a big array of these things, using the Cmalloc() function, which gave me the equivalent of this statement:
WOMAN women[NWOMEN];  /* An array of NWOMEN woman-structs */

After some initialization I set up two loops:

for( j=0; j<numberOfYears; j++) {
 for(i=1; i< numberOfWomen; i++) {
   updateWomen();
 }
}
for ( j in seq.int( numberOfYears ) {
   # let vectorized data storage automatically handle the other for loop
   women <- updateWomen( women )
}
The function updateWomen() figures out things like whether the womanbecomes pregnant or gives birth on a given day, dies, etc.
You can use your "fixed size" allocation strategy with flags indicatingwhether specific rows are in use, or you can only work with valid rowsand add rows as needed for children... best to compute a logical vectorthat identifies all of the birthing mothers as a subset of the dataframe, and build a set of children rows using the birthing mothers dataframe as input, and then rbind the new rows to the updated womendataframe as appropriate. The most clear approach for individualdecision calculations is the use of the vectorized "ifelse" function,though under certain circumstances putting an indexed subset on the leftside of an assignment can modify memory "in place" (thefunctional-programming restriction against this is probably a foreignidea to a dyed-in-the-wool C programmer, but R usually prevents you frommodifying the variable that was input to a function, automaticallymaking a local copy of the input as needed in order to prevent suchbackwash into the caller's context).
I added other refinements that are not relevant here, such as randomvariations of various parameters, using the GNU Scientific Libraryrandom number generator functions.
R has quite sophisticated random number generation by default.
If you can suggest a data construct in R or Octave that does somethinglike this, and uses your idea of vectorization, I'd like to hear it.I'd like to implement it and compare results with my C implementation.
If your problems truly need a compiled language, the Rcpp packagelets you mix C++ with R quite easily and then you get the best ofboth worlds. (C and Fortran are supported, but they are a bit morefinicky to setup than C++).
I don't know the answer to that, but perhaps you can help decide.

Alan
On January 28, 2019 4:00:07 PM PST, Alan Feuerbacher<alan...@comcast.net> wrote:
On 1/28/2019 4:20 PM, Rolf Turner wrote:
On 1/29/19 10:05 AM, Alan Feuerbacher wrote:
Hi,

I recently learned of the existence of R through a physicist friend
who uses it in his research. I've used Octave for a decade, and C
for
35 years, but would like to learn R. These all have advantages and
disadvantages for certain tasks, but as I'm new to R I hardly know
how
to evaluate them. Any suggestions?
* C is fast, but with a syntax that is (to my mind) virtually
    incomprehensible.  (You probably think differently about this.)
I've been doing it long enough that I have little problem with it,
except for pointers. :-)
* In C, you essentially have to roll your own for all tasks; in R,
    practically anything (well ...) that you want to do has already
    been programmed up.  CRAN is a wonderful resource, and there's
more
    on github.

* The syntax of R meshes beautifully with *my* thought patterns;
YMMV.
* Why not just bog in and try R out?  It's free, it's readily
available,
    and there are a number of good online tutorials.
I just installed R on my Linux Fedora system, so I'll do that.

I wonder if you'd care to comment on my little project that prompted
this? As part of another project, I wanted to model population growth
starting from a handful of starting individuals. This is exponential in

the long run, of course, but I wanted to see how a few basic parameters

affected the outcome. Using Octave, I modeled a single person as a
"cell", which in Octave has a good deal of overhead. The program
basically looped over the entire population, and updated each person
according to the parameters, which included random statistical
variations. So when the total population reached, say 10,000, and an
update time of 1 day, the program had to execute 10,000 x 365 update
operations for each year of growth. For large populations, say 100,000,

the program did not return even after 24 hours of run time.

So I switched to C, and used its "struct" declaration and an array of
structs to model the population. This allowed the program to complete
in
under a minute as opposed to 24 hours+. So in line with your comments,
C
is far more efficient than Octave.

How do you think R would fare in this simulation?

Alan


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] Newbie Question on R versus Matlab/Octave versus C

Reply via email to