[Rd] R 'base' returning 0 as sum of NAs

2017-01-11 Thread Alex Ivan Howard
Dear R Team

The following line returns 0 (zero) as answer:
sum(c(NA_real_, NA_real_, NA_real_, NA_real_), na.rm = TRUE)

One would, however, have expected it to return 'NaN', as is the case with
function 'mean':

> mean(c(NA_real_, NA_real_, NA_real_, NA_real_), na.rm = TRUE)
[1] NaN

The problem in other words:
I have a vector filled with missing numbers. I run the 'sum' function on
it, but instruct it to remove all missing values first. Consequently, the
sum function is left with an empty numeric vector. There is nothing to sum
over, so it shouldn't actually be able to return a concrete numeric value?
Shouldn't it thus rather return either NA ('unknown'/'missing') or - in the
fashion of the mean function - NaN ('not a number')?

With the current state of affairs, the sum function poses the grave danger
of introducing zeros to one's data (and subsequently other values as well,
as soon as the zeros get taken up in further calculations).

I hope my e-mail finds you well and I wish the R team all of the best for
2017 :)

Kind regards

Alex I. Howard

Web: www.nova.org.za
Phone: +27 (0) 44 695 0749
VoiP: +27 (0) 87 751 3490
Fax: +27 (0) 86 538 7958

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 'base' returning 0 as sum of NAs

2017-01-11 Thread Duncan Murdoch

On 11/01/2017 5:33 AM, Alex Ivan Howard wrote:

Dear R Team

The following line returns 0 (zero) as answer:
sum(c(NA_real_, NA_real_, NA_real_, NA_real_), na.rm = TRUE)

One would, however, have expected it to return 'NaN', as is the case with
function 'mean':


mean(c(NA_real_, NA_real_, NA_real_, NA_real_), na.rm = TRUE)

[1] NaN



The two expressions are long versions of

sum(numeric())
mean(numeric())

It is reasonable that an empty sum is zero.  The mean is 0/0, so NaN is 
reasonable.


If this doesn't suit your needs, then you should put in special checks 
for empty datasets.


Duncan Murdoch


The problem in other words:
I have a vector filled with missing numbers. I run the 'sum' function on
it, but instruct it to remove all missing values first. Consequently, the
sum function is left with an empty numeric vector. There is nothing to sum
over, so it shouldn't actually be able to return a concrete numeric value?
Shouldn't it thus rather return either NA ('unknown'/'missing') or - in the
fashion of the mean function - NaN ('not a number')?

With the current state of affairs, the sum function poses the grave danger
of introducing zeros to one's data (and subsequently other values as well,
as soon as the zeros get taken up in further calculations).

I hope my e-mail finds you well and I wish the R team all of the best for
2017 :)

Kind regards

Alex I. Howard

Web: www.nova.org.za
Phone: +27 (0) 44 695 0749
VoiP: +27 (0) 87 751 3490
Fax: +27 (0) 86 538 7958

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] accelerating matrix multiply

2017-01-11 Thread Cohn, Robert S
> Do you have R code (including set.seed(.) if relevant) to show on how to 
> generate
> the large square matrices you've mentioned in the beginning?  So we get to 
> some
> reproducible benchmarks?


Hi Martin,

Here is the program I used. I only generate 2 random numbers and reuse them to 
make the benchmark run faster. Let me know if there is something I can do to 
help--alternate benchmarks, tests, experiments with compilers other than icc.

MKL LAPACK behavior is undefined for NaN's so I left the check in, just made it 
more efficient on a CPU with SIMD. Thanks for looking at this.

set.seed (1)
m <- 3
n <- 3
A <- matrix (runif(2),nrow=m,ncol=n)
B <- matrix (runif(2),nrow=m,ncol=n)
print(typeof(A[1,2]))
print(A[1,2])

# Matrix multiply
system.time (C <- B %*% A)
system.time (C <- B %*% A)
system.time (C <- B %*% A)

-Original Message-
From: Martin Maechler [mailto:maech...@stat.math.ethz.ch] 
Sent: Tuesday, January 10, 2017 8:59 AM
To: Cohn, Robert S 
Cc: r-devel@r-project.org
Subject: Re: [Rd] accelerating matrix multiply

> Cohn, Robert S 
> on Sat, 7 Jan 2017 16:41:42 + writes:

> I am using R to multiply some large (30k x 30k double) matrices on a 
> 64 core machine (xeon phi).  I added some timers to src/main/array.c 
> to see where the time is going. All of the time is being spent in the 
> matprod function, most of that time is spent in dgemm. 15 seconds is 
> in matprod in some code that is checking if there are NaNs.

> > system.time (C <- B %*% A)
> nancheck: wall time 15.240282s
>dgemm: wall time 43.111064s
>  matprod: wall time 58.351572s
> user   system  elapsed 
> 2710.154   20.999   58.398
> 
> The NaN checking code is not being vectorized because of the early 
> exit when NaN is detected:
> 
>   /* Don't trust the BLAS to handle NA/NaNs correctly: PR#4582
>* The test is only O(n) here.
>*/
>   for (R_xlen_t i = 0; i < NRX*ncx; i++)
>   if (ISNAN(x[i])) {have_na = TRUE; break;}
>   if (!have_na)
>   for (R_xlen_t i = 0; i < NRY*ncy; i++)
>   if (ISNAN(y[i])) {have_na = TRUE; break;}
> 
> I tried deleting the 'break'. By inspecting the asm code, I verified 
> that the loop was not being vectorized before, but now is vectorized. 
> Total time goes down:
> 
> system.time (C <- B %*% A)
> nancheck: wall time  1.898667s
>dgemm: wall time 43.913621s
>  matprod: wall time 45.812468s
> user   system  elapsed 
> 2727.877   20.723   45.859
> 
> The break accelerates the case when there is a NaN, at the expense of 
> the much more common case when there isn't a NaN. If a NaN is 
> detected, it doesn't call dgemm and calls its own matrix multiply, 
> which makes the NaN check time insignificant so I doubt the early exit 
> provides any benefit.
> 
> I was a little surprised that the O(n) NaN check is costly compared to 
> the O(n**2) dgemm that follows. I think the reason is that nan check 
> is single thread and not vectorized, and my machine can do 2048 
> floating point ops/cycle when you consider the cores/dual issue/8 way 
> SIMD/muladd, and the constant factor will be significant for even 
> large matrices.
> 
> Would you consider deleting the breaks? I can submit a patch if that 
> will help. Thanks.
> 
> Robert

Thank you Robert for bringing the issue up ("again", possibly).
Within R core, some have seen somewhat similar timing on some platforms (gcc) 
.. but much less dramatical differences e.g. on macOS with clang.

As seen in the source code you cite above, the current implementation was 
triggered by a nasty BLAS bug .. actually also showing up only on some 
platforms, possibly depending on runtime libraries in addition to the compilers 
used.

Do you have R code (including set.seed(.) if relevant) to show on how to 
generate the large square matrices you've mentioned in the beginning?  So we 
get to some reproducible benchmarks?

With best regards,
Martin Maechler

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug with strptime, %OS, and "."

2017-01-11 Thread Upton, Stephen (Steve) (CIV)
Works for me:
> strptime("17_35_14.01234.mp3","%H_%M_%OS")$sec
[1] 14.01234
> strptime("17_35_14.mp3","%H_%M_%OS")$sec
[1] 14

Just leave off the ".mp3" in your time pattern.

Relevant section from the help ("Details") for strptime:
strptime converts character vectors to class "POSIXlt": its input x is first
converted by as.character. Each input string is processed as far as
necessary for the format specified: any trailing characters are ignored.

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Stephen C. Upton
Faculty Associate - Research
SEED (Simulation Experiments & Efficient Designs) Center
Operations Research Department
Naval Postgraduate School
Mobile: 804-994-4257
NIPR: scup...@nps.edu
SIPR: upto...@nps.navy.smil.mil
SEED Center web site: http://harvest.nps.edu
-Original Message-
From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of
frede...@ofb.net
Sent: Tuesday, January 10, 2017 9:59 PM
To: Dirk Eddelbuettel
Cc: R-devel
Subject: Re: [Rd] bug with strptime, %OS, and "."

On Tue, Jan 10, 2017 at 08:13:21PM -0600, Dirk Eddelbuettel wrote:
> 
> On 10 January 2017 at 17:48, frede...@ofb.net wrote:
> | Hi R Devel,
> | 
> | I just ran into a corner case with 'strptime'. Recall that the "%OS"
> | conversion accepts fractional seconds:
> | 
> | > strptime("17_35_14.01234.mp3","%H_%M_%OS.mp3")$sec
> | [1] 14.01234
> | 
> | Unfortunately for my application it seems to be "greedy", in that it 
> | tries to parse a decimal point which might belong to the rest of the
> | format:
> | 
> | > strptime("17_35_14.mp3","%H_%M_%OS.mp3")
> | [1] NA
> 
> Maybe just don't use the optional O:
> 
>R> strptime("17_35_14.mp3","%H_%M_%S.mp3")$sec
>[1] 14
>R> 
>R> strptime("17_35_14.mp3","%H_%M_%S.mp3")
>[1] "2017-01-10 17:35:14 CST"
>R>

For my application I wanted to be able to accept both formats, "14.mp3" and
"14.01234.mp3". Since "14" and "14.01234" both parse as numbers, I thought
"%OS" should accept both.

Yes, I can work around it fairly easily.

Frederick

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel