Re: [Rd] Philosophy behind converting Fortran to C for use in R

2017-06-07 Thread Martyn Byng
Hi,

Just a quick comment on (1).

The C-Fortran interface has been standardized since Fortran 2003.  However, it 
does require the Fortran interface that is being called from C  to have been 
written with C operability in mind as specific C interoperable types etc. must 
be used.

Trying to call a Fortran interface that hasn't been written using C 
interoperable types still suffers from the issues that Bill describes.
 
Martyn

-Original Message-
From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of William 
Dunlap via R-devel
Sent: 06 June 2017 22:34
To: Avraham Adler 
Cc: R-devel 
Subject: Re: [Rd] Philosophy behind converting Fortran to C for use in R

Here are three reasons for converting Fortran code, especially older
Fortran code, to C:

1. The C-Fortran interface is not standardized.  Various Fortran compilers
pass logical and character arguments in various ways.  Various Fortran
compilers mangle function and common block names in variousl ways.  You can
avoid that problem by restricting R to using a certain Fortran compiler,
but that can make porting R to a new platform difficult.

2. By default, variables in Fortran routines are not allocated on the
stack, but are statically allocated, making recursion hard.

3. New CS graduates tend not to know Fortran.

(There are good reasons for not translating as well, risk and time being
the main ones.)


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Jun 6, 2017 at 1:27 PM, Avraham Adler 
wrote:

> Hello.
>
> This is not a question about a bug or even best practices; rather I'm
> trying to understand the philosophy or theory as to why certain
> portions of the R codebase are written as they are. If this question
> is better posed elsewhere, please point me in the proper direction.
>
> In the thread about the issues with the Tukey line, Martin said [1]:
>
> > when this topic came up last (for me) in Dec. 2014, I did spend about 2
> days work (or more?)
> > to get the FORTRAN code from the 1981 - book (which is abbreviated the
> "ABC of EDA")
> > from a somewhat useful OCR scan into compilable Fortran code and then
> f2c'ed,
> > wrote an R interface function found problems…
>
> I have seen this in the R source code and elsewhere, that native
> Fortran is converted to C via f2c and then run as C within R. This is
> notwithstanding R's ability to use Fortran, either directly through
> .Fortran() [2] or via .Call() using simple helper C-wrappers [3].
>
> I'm curious as to the reason. Is it because much of the code was
> written before Fortran 90 compilers were freely available? Does it
> help with maintenance or make debugging easier? Is it faster or more
> likely to compile cleanly?
>
> Thank you,
>
> Avi
>
> [1] https://stat.ethz.ch/pipermail/r-devel/2017-May/074363.html
> [2] Such as kmeans does for the Hartigan-Wong method in the stats package
> [2] Such as the mvtnorm package does
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


This e-mail has been scanned for all viruses by Star.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] sum() returns NA on a long *logical* vector when nb of TRUE values exceeds 2^31

2017-06-07 Thread Martin Maechler
> Martin Maechler 
> on Tue, 6 Jun 2017 09:45:44 +0200 writes:

> Hervé Pagès 
> on Fri, 2 Jun 2017 04:05:15 -0700 writes:

>> Hi, I have a long numeric vector 'xx' and I want to use
>> sum() to count the number of elements that satisfy some
>> criteria like non-zero values or values lower than a
>> certain threshold etc...

>> The problem is: sum() returns an NA (with a warning) if
>> the count is greater than 2^31. For example:

>>> xx <- runif(3e9) sum(xx < 0.9)
>> [1] NA Warning message: In sum(xx < 0.9) : integer
>> overflow - use sum(as.numeric(.))

>> This already takes a long time and doing
>> sum(as.numeric(.)) would take even longer and require
>> allocation of 24Gb of memory just to store an
>> intermediate numeric vector made of 0s and 1s. Plus,
>> having to do sum(as.numeric(.)) every time I need to
>> count things is not convenient and is easy to forget.

>> It seems that sum() on a logical vector could be modified
>> to return the count as a double when it cannot be
>> represented as an integer.  Note that length() already
>> does this so that wouldn't create a precedent. Also and
>> FWIW prod() avoids the problem by always returning a
>> double, whatever the type of the input is (except on a
>> complex vector).

>> I can provide a patch if this change sounds reasonable.

> This sounds very reasonable, thank you Hervé, for the
> report, and even more for a (small) patch.

I was made aware of the fact, that R treats logical and
integer very often identically in the C code, and in general we
even mention that logicals are treated as 0/1/NA integers in
arithmetic.

For the present case that would mean that we should also
safe-guard against *integer* overflow in sum(.)  and that is
not something we have done / wanted to do in the past...  Speed
being one reason.

So this ends up being more delicate than I had thought at first,
because changing  sum()  only would mean that

  sum(LOGI)   and
  sum(as.integer(LOGI))

would start differ for a logical vector LOGI.

So, for now this is something that must be approached carefully,
and the R Core team may want discuss "in private" first.

I'm sorry for having raised possibly unrealistic expectations.
Martin

> Martin

>> Cheers, H.

>> -- 
>> Hervé Pagès

>> Program in Computational Biology Division of Public
>> Health Sciences Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA
>> 98109-1024

>> E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:
>> (206) 667-1319

>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sum() returns NA on a long *logical* vector when nb of TRUE values exceeds 2^31

2017-06-07 Thread Hervé Pagès

Hi Martin,

On 06/07/2017 03:54 AM, Martin Maechler wrote:

Martin Maechler 
 on Tue, 6 Jun 2017 09:45:44 +0200 writes:



Hervé Pagès 
 on Fri, 2 Jun 2017 04:05:15 -0700 writes:


 >> Hi, I have a long numeric vector 'xx' and I want to use
 >> sum() to count the number of elements that satisfy some
 >> criteria like non-zero values or values lower than a
 >> certain threshold etc...

 >> The problem is: sum() returns an NA (with a warning) if
 >> the count is greater than 2^31. For example:

 >>> xx <- runif(3e9) sum(xx < 0.9)
 >> [1] NA Warning message: In sum(xx < 0.9) : integer
 >> overflow - use sum(as.numeric(.))

 >> This already takes a long time and doing
 >> sum(as.numeric(.)) would take even longer and require
 >> allocation of 24Gb of memory just to store an
 >> intermediate numeric vector made of 0s and 1s. Plus,
 >> having to do sum(as.numeric(.)) every time I need to
 >> count things is not convenient and is easy to forget.

 >> It seems that sum() on a logical vector could be modified
 >> to return the count as a double when it cannot be
 >> represented as an integer.  Note that length() already
 >> does this so that wouldn't create a precedent. Also and
 >> FWIW prod() avoids the problem by always returning a
 >> double, whatever the type of the input is (except on a
 >> complex vector).

 >> I can provide a patch if this change sounds reasonable.

 > This sounds very reasonable, thank you Hervé, for the
 > report, and even more for a (small) patch.

I was made aware of the fact, that R treats logical and
integer very often identically in the C code, and in general we
even mention that logicals are treated as 0/1/NA integers in
arithmetic.

For the present case that would mean that we should also
safe-guard against *integer* overflow in sum(.)  and that is
not something we have done / wanted to do in the past...  Speed
being one reason.

So this ends up being more delicate than I had thought at first,
because changing  sum()  only would mean that

   sum(LOGI)  and
   sum(as.integer(LOGI))

would start differ for a logical vector LOGI.

So, for now this is something that must be approached carefully,
and the R Core team may want discuss "in private" first.

I'm sorry for having raised possibly unrealistic expectations.


No worries. Thanks for taking my proposal into consideration.
Note that the isum() function in src/main/summary.c is already using
a 64-bit accumulator to accommodate intermediate sums > INT_MAX.
So it should be easy to modify the function to make it overflow for
much bigger final sums without altering performance. Seems like
R_XLEN_T_MAX would be the natural threshold.

Cheers,
H.



Martin

 > Martin

 >> Cheers, H.

 >> --
 >> Hervé Pagès

 >> Program in Computational Biology Division of Public
 >> Health Sciences Fred Hutchinson Cancer Research Center
 >> 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA
 >> 98109-1024

 >> E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:
 >> (206) 667-1319

 >> __
 >> R-devel@r-project.org mailing list
 >> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIDAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=dyRNzyVdDYXzNX0sXIl5sdDqDXSxROm4-uM_XMquX_E&s=Qq6QdMWvudWgR_WGKdbBVNnVs5JO6s692MxjDo2JR9Y&e=

 > __
 > R-devel@r-project.org mailing list
 > 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIDAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=dyRNzyVdDYXzNX0sXIl5sdDqDXSxROm4-uM_XMquX_E&s=Qq6QdMWvudWgR_WGKdbBVNnVs5JO6s692MxjDo2JR9Y&e=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel