[Rd] Get memory address of an R data frame

2020-01-09 Thread lille stor
Hello,

I would like for my C function to be able to manipulate some values stored in 
an R data frame.

To achieve this, a need the (real) memory address where the R data frame stores 
its data (hopefully in a contiguous way). Then, from R, I call the C function 
and passing this memory address as a parameter.

The question: how can we get the memory address of the R data frame?

Thank you!

L.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Get memory address of an R data frame

2020-01-09 Thread Ezra Tucker
Hi Lille,

Is it possible you're looking for tracemem() or inspect() ?

> x <- data.frame(z = 1:10)> tracemem(x)[1] "<0x55aa743e0bc0>"

> x[1] <- 2Ltracemem[0x55aa743e0bc0 -> 0x55aa778f6ad0]:
tracemem[0x55aa778f6ad0 -> 0x55aa778f6868]: [<-.data.frame [<-
tracemem[0x55aa778f6868 -> 0x55aa778f5b48]: [<-.data.frame [<-

> .Internal(inspect(x)) @55aa743e0bc0 19 VECSXP g0c1
[OBJ,MARK,NAM(7),TR,ATT] (len=1, tl=0) @55aa7440d420 13 INTSXP g0c0
[MARK,NAM(7)] 1 : 10 (compact) ATTRIB: @55aa743f9ea0 02 LISTSXP g0c0 [MARK]
TAG: @55aa72ac98a0 01 SYMSXP g0c0 [MARK,NAM(7),LCK,gp=0x6000] "names" (has
value) @55aa743e0fb0 16 STRSXP g0c1 [MARK,NAM(7)] (len=1, tl=0)
@55aa72be1c70 09 CHARSXP g0c1 [MARK,gp=0x61] [ASCII] [cached] "z" TAG:
@55aa72ac9d70 01 SYMSXP g0c0 [MARK,NAM(7),LCK,gp=0x4000] "class" (has
value) @55aa73ca59b8 16 STRSXP g0c1 [MARK,NAM(7)] (len=1, tl=0)
@55aa72b562b8 09 CHARSXP g0c2 [MARK,gp=0x61,ATT] [ASCII] [cached]
"data.frame" TAG: @55aa72ac9670 01 SYMSXP g0c0 [MARK,NAM(7),LCK,gp=0x4000]
"row.names" (has value) @55aa743e1c98 13 INTSXP g0c1 [MARK,NAM(7)] (len=2,
tl=0) -2147483648,-10



On Thu, Jan 9, 2020 at 6:48 AM lille stor  wrote:

> Hello,
>
> I would like for my C function to be able to manipulate some values stored
> in an R data frame.
>
> To achieve this, a need the (real) memory address where the R data frame
> stores its data (hopefully in a contiguous way). Then, from R, I call the C
> function and passing this memory address as a parameter.
>
> The question: how can we get the memory address of the R data frame?
>
> Thank you!
>
> L.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] mean

2020-01-09 Thread Lipatz Jean-Luc
Hello,

Is there a reason for the following behaviour?
> mean(c("1","2","3"))
[1] NA
Warning message:
In mean.default(c("1", "2", "3")) :
  l'argument n'est ni numérique, ni logique : renvoi de NA

But:
> var(c("1","2","3"))
[1] 1

And also:
> median(c("1","2","3"))
[1] "2"

But:
> quantile(c("1","2","3"),p=.5)
Error in (1 - h) * qs[i] : 
  argument non numérique pour un opérateur binaire

It sounds like a lack of symetry. 
Best regards.


Jean-Luc LIPATZ
Insee - Direction générale
Responsable de la coordination sur le développement de R et la mise en oeuvre 
d'alternatives à SAS

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Get memory address of an R data frame

2020-01-09 Thread Joris Meys
Hi Lille,

To my understanding, there's no need to get the actual memory address of
the R data frame, as using .Call() or .External() can be used in a "call by
reference" way as well. This would be contrary to standard R behaviour, so
if you use that in a package, make sure you indicate this!

There's a detailed explanation on how to deal with R objects in C code in
the manual "Writing R extensions" here :

https://cran.r-project.org/doc/manuals/R-exts.html#Handling-R-objects-in-C

Especially check the section "Named objects and copying", which explains in
more detail how to control the standard R behaviour. Also keep in mind that
data frames are list-like structures, which are handled differently from
atomic vectors.

Hope this helps.
Kind regards
Joris

On Thu, Jan 9, 2020 at 12:48 PM lille stor  wrote:

> Hello,
>
> I would like for my C function to be able to manipulate some values stored
> in an R data frame.
>
> To achieve this, a need the (real) memory address where the R data frame
> stores its data (hopefully in a contiguous way). Then, from R, I call the C
> function and passing this memory address as a parameter.
>
> The question: how can we get the memory address of the R data frame?
>
> Thank you!
>
> L.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)

---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean

2020-01-09 Thread Marc Schwartz via R-devel


> On Jan 9, 2020, at 7:40 AM, Lipatz Jean-Luc  wrote:
> 
> Hello,
> 
> Is there a reason for the following behaviour?
>> mean(c("1","2","3"))
> [1] NA
> Warning message:
> In mean.default(c("1", "2", "3")) :
>  l'argument n'est ni numérique, ni logique : renvoi de NA
> 
> But:
>> var(c("1","2","3"))
> [1] 1
> 
> And also:
>> median(c("1","2","3"))
> [1] "2"
> 
> But:
>> quantile(c("1","2","3"),p=.5)
> Error in (1 - h) * qs[i] : 
>  argument non numérique pour un opérateur binaire
> 
> It sounds like a lack of symetry. 
> Best regards.
> 
> 
> Jean-Luc LIPATZ
> Insee - Direction générale
> Responsable de la coordination sur le développement de R et la mise en oeuvre 
> d'alternatives à SAS


Hi,

It would appear, whether by design or just inconsistent implementations, 
perhaps by different authors over time, that the checks for whether or not the 
input vector is numeric differ across the functions.

A further inconsistency is for median(), where:

> median(c("1", "2", "3", "4"))
[1] NA
Warning message:
In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
  argument is not numeric or logical: returning NA

as a result of there being 4 elements, rather than 3, and the internal checks 
in the code, where in the case of the input vector having an even number of 
elements, mean() is used:

if (n%%2L == 1L) 
sort(x, partial = half)[half]
else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])


Similarly:

> median(factor(c("1", "2", "3")))
Error in median.default(factor(c("1", "2", "3"))) : need numeric data

because the input vector is a factor, rather than character, and the initial 
check has:

  if (is.factor(x) || is.data.frame(x)) 
  stop("need numeric data")


Regards,

Marc Schwartz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean

2020-01-09 Thread Marc Schwartz via R-devel
Jean-Luc,

Please keep the communications on the list, for the benefit of others, now and 
in the future, via the list archive. I am adding r-devel back here.

I can't speak to the rationale in some of these cases. As I noted, it may be 
(is likely) due to differing authors over time, and there may have been 
relevant use cases at the time that the code was written, resulting in the 
various checks. Presumably, the additional checks were not incorporated into 
the other functions to enforce a level of consistency.

We will need to wait for someone from R Core to comment.

Regards,

Marc

> On Jan 9, 2020, at 8:34 AM, Lipatz Jean-Luc  wrote:
> 
> Ok, inconstencies.
> 
> The last test you wrote is a bit strange. I agree that it is useful to warn 
> about a computation that have no sense in the case of factors. But why 
> testing data;frames? If you go that way using random structures, you can also 
> try :
> 
>> median(list(1,2),list(3,4),list(4,5))
> Error in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) 
> return(x[FALSE][NA]) : 
>  l'argument n'est pas interprétable comme une valeur logique
> De plus : Warning message:
> In if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]) :
>  la condition a une longueur > 1 et seul le premier élément est utilisé
> 
> giving a message which, despite of his length, doesn't really explain the 
> reason of the error.
> 
> Why not a test on arguments like?
>  if (!is.numeric(x)) 
>  stop("need numeric data")
> 
> 
> -Message d'origine-
> De : Marc Schwartz  
> Envoyé : jeudi 9 janvier 2020 14:19
> À : Lipatz Jean-Luc 
> Cc : R-Devel 
> Objet : Re: [Rd] mean
> 
> 
>> On Jan 9, 2020, at 7:40 AM, Lipatz Jean-Luc  wrote:
>> 
>> Hello,
>> 
>> Is there a reason for the following behaviour?
>>> mean(c("1","2","3"))
>> [1] NA
>> Warning message:
>> In mean.default(c("1", "2", "3")) :
>> l'argument n'est ni numérique, ni logique : renvoi de NA
>> 
>> But:
>>> var(c("1","2","3"))
>> [1] 1
>> 
>> And also:
>>> median(c("1","2","3"))
>> [1] "2"
>> 
>> But:
>>> quantile(c("1","2","3"),p=.5)
>> Error in (1 - h) * qs[i] : 
>> argument non numérique pour un opérateur binaire
>> 
>> It sounds like a lack of symetry. 
>> Best regards.
>> 
>> 
>> Jean-Luc LIPATZ
>> Insee - Direction générale
>> Responsable de la coordination sur le développement de R et la mise en 
>> oeuvre d'alternatives à SAS
> 
> 
> Hi,
> 
> It would appear, whether by design or just inconsistent implementations, 
> perhaps by different authors over time, that the checks for whether or not 
> the input vector is numeric differ across the functions.
> 
> A further inconsistency is for median(), where:
> 
>> median(c("1", "2", "3", "4"))
> [1] NA
> Warning message:
> In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
>  argument is not numeric or logical: returning NA
> 
> as a result of there being 4 elements, rather than 3, and the internal checks 
> in the code, where in the case of the input vector having an even number of 
> elements, mean() is used:
> 
>if (n%%2L == 1L) 
>sort(x, partial = half)[half]
>else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
> 
> 
> Similarly:
> 
>> median(factor(c("1", "2", "3")))
> Error in median.default(factor(c("1", "2", "3"))) : need numeric data
> 
> because the input vector is a factor, rather than character, and the initial 
> check has:
> 
>  if (is.factor(x) || is.data.frame(x)) 
>  stop("need numeric data")
> 
> 
> Regards,
> 
> Marc Schwartz
> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Get memory address of an R data frame

2020-01-09 Thread Stepan

Hello Lille,

raw data of a data.frame (or more precisely a list, because data.frame 
is just a list with "data.frame" class) is an array of R specific data 
structures (SEXP), so a generic C function will not be able to work with 
them.


As a per-processing step, you may allocate an array for the pointers to 
the raw data of the columns yourself (there will be hopefully only a few 
compared to the size of the columns themselves). For this you'll need 
functions VECTOR_ELT to access the columns and DATAPTR to get their raw 
data (eventually TYPEOF to find out their type). Note that this won't 
work for a data frame that contains another list. If this memory layout 
doesn't work for you, then you may need to copy the whole data frame.


If you want to update the data from C, then keep in mind that

1) R vectors have value semantics and you should not be altering raw 
data of any vector unless you know that its not referenced from anywhere 
else -- otherwise you should make a copy, alter that copy instead and 
return it as the result from your C function.


2) R has generational garbage collector, so it *must* know about 
references between R objects and so you should use SET_VECTOR_ELT to 
update the data of a list (some would say that you can update the raw 
data if you really understand how the GC and R internals work, I would 
say: just don't)


Best,
Stepan

On 09. 01. 20 12:48, lille stor wrote:

Hello,

I would like for my C function to be able to manipulate some values stored in 
an R data frame.

To achieve this, a need the (real) memory address where the R data frame stores 
its data (hopefully in a contiguous way). Then, from R, I call the C function 
and passing this memory address as a parameter.

The question: how can we get the memory address of the R data frame?

Thank you!

L.

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=neKFCw86thQe2E2-61NAgpDMw4cC7oD_tUTTzraOkQM&m=ob3rEYy-Pk9cOE-VcE6_0TaHPYjGJ4kHYZru_jqXf38&s=AV2V5CyECZzyfSMZdViD_co5mAGurLNEu4jhA_CTDsk&e=


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Get memory address of an R data frame

2020-01-09 Thread Tomas Kalibera

On 1/9/20 1:03 PM, Ezra Tucker wrote:

Hi Lille,

Is it possible you're looking for tracemem() or inspect() ?


Please note these functions are only for debugging. They should never be 
called from programs or packages. One should never try to manipulate 
pointers from R directly or even hold them (except for what "external 
pointer" objects allow and is described in Writing R Extensions).


Tomas



x <- data.frame(z = 1:10)> tracemem(x)[1] "<0x55aa743e0bc0>"
x[1] <- 2Ltracemem[0x55aa743e0bc0 -> 0x55aa778f6ad0]:

tracemem[0x55aa778f6ad0 -> 0x55aa778f6868]: [<-.data.frame [<-
tracemem[0x55aa778f6868 -> 0x55aa778f5b48]: [<-.data.frame [<-


.Internal(inspect(x)) @55aa743e0bc0 19 VECSXP g0c1

[OBJ,MARK,NAM(7),TR,ATT] (len=1, tl=0) @55aa7440d420 13 INTSXP g0c0
[MARK,NAM(7)] 1 : 10 (compact) ATTRIB: @55aa743f9ea0 02 LISTSXP g0c0 [MARK]
TAG: @55aa72ac98a0 01 SYMSXP g0c0 [MARK,NAM(7),LCK,gp=0x6000] "names" (has
value) @55aa743e0fb0 16 STRSXP g0c1 [MARK,NAM(7)] (len=1, tl=0)
@55aa72be1c70 09 CHARSXP g0c1 [MARK,gp=0x61] [ASCII] [cached] "z" TAG:
@55aa72ac9d70 01 SYMSXP g0c0 [MARK,NAM(7),LCK,gp=0x4000] "class" (has
value) @55aa73ca59b8 16 STRSXP g0c1 [MARK,NAM(7)] (len=1, tl=0)
@55aa72b562b8 09 CHARSXP g0c2 [MARK,gp=0x61,ATT] [ASCII] [cached]
"data.frame" TAG: @55aa72ac9670 01 SYMSXP g0c0 [MARK,NAM(7),LCK,gp=0x4000]
"row.names" (has value) @55aa743e1c98 13 INTSXP g0c1 [MARK,NAM(7)] (len=2,
tl=0) -2147483648,-10



On Thu, Jan 9, 2020 at 6:48 AM lille stor  wrote:


Hello,

I would like for my C function to be able to manipulate some values stored
in an R data frame.

To achieve this, a need the (real) memory address where the R data frame
stores its data (hopefully in a contiguous way). Then, from R, I call the C
function and passing this memory address as a parameter.

The question: how can we get the memory address of the R data frame?

Thank you!

L.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Get memory address of an R data frame

2020-01-09 Thread Stepan

On 09. 01. 20 15:41, lille stor wrote:

I believe this could be done without creating side effects (e.g. 
crash) as we are just talking about changing values.


that is exactly the issue that my last two points warn about. Example:

a <- mtcars
.Call("my_innocent_function", a)

Would you expect that mtcars data.frame would be altered after this code 
is executed? What if some existing code relies on mtcars always 
containing the same data, which is a perfectly valid assumption given R 
specification.


If what you are trying to do is to have mutable data frame, then this 
goes against the philosophy of R. You can get mutability with 
environments and other R types that are intentionally mutable and their 
mutability is documented.


You can get data.frame mutability with the data.table package, but the 
tricks it's doing under the hood may bite back. In its source code you 
can also see how these things can be done, but unless you really need 
to, I would advise against implementing this yourself.


Best,
Stepan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean

2020-01-09 Thread peter dalgaard
I think median() behaves as designed: As long as the argument can be ordered, 
the "middle observation" makes sense, except when the middle falls between two 
categories, and you can't define and average of the two candidates for a median.

The "sick man" would seem to be var(). Notice that it is also inconsistent with 
cov():

> cov(c("1","2","3","4"),c("1","2","3","4") )
Error in cov(c("1", "2", "3", "4"), c("1", "2", "3", "4")) : 
  is.numeric(x) || is.logical(x) is not TRUE
> var(c("1","2","3","4"),c("1","2","3","4") )
[1] 1.67

-pd


> On 9 Jan 2020, at 14:49 , Marc Schwartz via R-devel  
> wrote:
> 
> Jean-Luc,
> 
> Please keep the communications on the list, for the benefit of others, now 
> and in the future, via the list archive. I am adding r-devel back here.
> 
> I can't speak to the rationale in some of these cases. As I noted, it may be 
> (is likely) due to differing authors over time, and there may have been 
> relevant use cases at the time that the code was written, resulting in the 
> various checks. Presumably, the additional checks were not incorporated into 
> the other functions to enforce a level of consistency.
> 
> We will need to wait for someone from R Core to comment.
> 
> Regards,
> 
> Marc
> 
>> On Jan 9, 2020, at 8:34 AM, Lipatz Jean-Luc  wrote:
>> 
>> Ok, inconstencies.
>> 
>> The last test you wrote is a bit strange. I agree that it is useful to warn 
>> about a computation that have no sense in the case of factors. But why 
>> testing data;frames? If you go that way using random structures, you can 
>> also try :
>> 
>>> median(list(1,2),list(3,4),list(4,5))
>> Error in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) 
>> return(x[FALSE][NA]) : 
>> l'argument n'est pas interprétable comme une valeur logique
>> De plus : Warning message:
>> In if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]) 
>> :
>> la condition a une longueur > 1 et seul le premier élément est utilisé
>> 
>> giving a message which, despite of his length, doesn't really explain the 
>> reason of the error.
>> 
>> Why not a test on arguments like?
>> if (!is.numeric(x)) 
>> stop("need numeric data")
>> 
>> 
>> -Message d'origine-
>> De : Marc Schwartz  
>> Envoyé : jeudi 9 janvier 2020 14:19
>> À : Lipatz Jean-Luc 
>> Cc : R-Devel 
>> Objet : Re: [Rd] mean
>> 
>> 
>>> On Jan 9, 2020, at 7:40 AM, Lipatz Jean-Luc  
>>> wrote:
>>> 
>>> Hello,
>>> 
>>> Is there a reason for the following behaviour?
 mean(c("1","2","3"))
>>> [1] NA
>>> Warning message:
>>> In mean.default(c("1", "2", "3")) :
>>> l'argument n'est ni numérique, ni logique : renvoi de NA
>>> 
>>> But:
 var(c("1","2","3"))
>>> [1] 1
>>> 
>>> And also:
 median(c("1","2","3"))
>>> [1] "2"
>>> 
>>> But:
 quantile(c("1","2","3"),p=.5)
>>> Error in (1 - h) * qs[i] : 
>>> argument non numérique pour un opérateur binaire
>>> 
>>> It sounds like a lack of symetry. 
>>> Best regards.
>>> 
>>> 
>>> Jean-Luc LIPATZ
>>> Insee - Direction générale
>>> Responsable de la coordination sur le développement de R et la mise en 
>>> oeuvre d'alternatives à SAS
>> 
>> 
>> Hi,
>> 
>> It would appear, whether by design or just inconsistent implementations, 
>> perhaps by different authors over time, that the checks for whether or not 
>> the input vector is numeric differ across the functions.
>> 
>> A further inconsistency is for median(), where:
>> 
>>> median(c("1", "2", "3", "4"))
>> [1] NA
>> Warning message:
>> In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
>> argument is not numeric or logical: returning NA
>> 
>> as a result of there being 4 elements, rather than 3, and the internal 
>> checks in the code, where in the case of the input vector having an even 
>> number of elements, mean() is used:
>> 
>>   if (n%%2L == 1L) 
>>   sort(x, partial = half)[half]
>>   else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
>> 
>> 
>> Similarly:
>> 
>>> median(factor(c("1", "2", "3")))
>> Error in median.default(factor(c("1", "2", "3"))) : need numeric data
>> 
>> because the input vector is a factor, rather than character, and the initial 
>> check has:
>> 
>> if (is.factor(x) || is.data.frame(x)) 
>> stop("need numeric data")
>> 
>> 
>> Regards,
>> 
>> Marc Schwartz
>> 
>> 
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean

2020-01-09 Thread Marc Schwartz via R-devel
Peter,

Thanks for the reply.

If that were the case, then should not the following be allowed to work with 
ordered factors?

> median(factor(c("1", "2", "3"), ordered = TRUE))
Error in median.default(factor(c("1", "2", "3"), ordered = TRUE)) : 
  need numeric data

At least on the surface, if you can lexically order a character vector:

> median(c("red", "blue", "green"))
[1] "green"

you can also order a factor, or ordered factor, and if the number of elements 
is odd, return a median value.

Regards,

Marc


> On Jan 9, 2020, at 10:46 AM, peter dalgaard  wrote:
> 
> I think median() behaves as designed: As long as the argument can be ordered, 
> the "middle observation" makes sense, except when the middle falls between 
> two categories, and you can't define and average of the two candidates for a 
> median.
> 
> The "sick man" would seem to be var(). Notice that it is also inconsistent 
> with cov():
> 
>> cov(c("1","2","3","4"),c("1","2","3","4") )
> Error in cov(c("1", "2", "3", "4"), c("1", "2", "3", "4")) : 
>  is.numeric(x) || is.logical(x) is not TRUE
>> var(c("1","2","3","4"),c("1","2","3","4") )
> [1] 1.67
> 
> -pd
> 
> 
>> On 9 Jan 2020, at 14:49 , Marc Schwartz via R-devel  
>> wrote:
>> 
>> Jean-Luc,
>> 
>> Please keep the communications on the list, for the benefit of others, now 
>> and in the future, via the list archive. I am adding r-devel back here.
>> 
>> I can't speak to the rationale in some of these cases. As I noted, it may be 
>> (is likely) due to differing authors over time, and there may have been 
>> relevant use cases at the time that the code was written, resulting in the 
>> various checks. Presumably, the additional checks were not incorporated into 
>> the other functions to enforce a level of consistency.
>> 
>> We will need to wait for someone from R Core to comment.
>> 
>> Regards,
>> 
>> Marc
>> 
>>> On Jan 9, 2020, at 8:34 AM, Lipatz Jean-Luc  
>>> wrote:
>>> 
>>> Ok, inconstencies.
>>> 
>>> The last test you wrote is a bit strange. I agree that it is useful to warn 
>>> about a computation that have no sense in the case of factors. But why 
>>> testing data;frames? If you go that way using random structures, you can 
>>> also try :
>>> 
 median(list(1,2),list(3,4),list(4,5))
>>> Error in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) 
>>> return(x[FALSE][NA]) : 
>>> l'argument n'est pas interprétable comme une valeur logique
>>> De plus : Warning message:
>>> In if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) 
>>> return(x[FALSE][NA]) :
>>> la condition a une longueur > 1 et seul le premier élément est utilisé
>>> 
>>> giving a message which, despite of his length, doesn't really explain the 
>>> reason of the error.
>>> 
>>> Why not a test on arguments like?
>>> if (!is.numeric(x)) 
>>>stop("need numeric data")
>>> 
>>> 
>>> -Message d'origine-
>>> De : Marc Schwartz  
>>> Envoyé : jeudi 9 janvier 2020 14:19
>>> À : Lipatz Jean-Luc 
>>> Cc : R-Devel 
>>> Objet : Re: [Rd] mean
>>> 
>>> 
 On Jan 9, 2020, at 7:40 AM, Lipatz Jean-Luc  
 wrote:
 
 Hello,
 
 Is there a reason for the following behaviour?
> mean(c("1","2","3"))
 [1] NA
 Warning message:
 In mean.default(c("1", "2", "3")) :
 l'argument n'est ni numérique, ni logique : renvoi de NA
 
 But:
> var(c("1","2","3"))
 [1] 1
 
 And also:
> median(c("1","2","3"))
 [1] "2"
 
 But:
> quantile(c("1","2","3"),p=.5)
 Error in (1 - h) * qs[i] : 
 argument non numérique pour un opérateur binaire
 
 It sounds like a lack of symetry. 
 Best regards.
 
 
 Jean-Luc LIPATZ
 Insee - Direction générale
 Responsable de la coordination sur le développement de R et la mise en 
 oeuvre d'alternatives à SAS
>>> 
>>> 
>>> Hi,
>>> 
>>> It would appear, whether by design or just inconsistent implementations, 
>>> perhaps by different authors over time, that the checks for whether or not 
>>> the input vector is numeric differ across the functions.
>>> 
>>> A further inconsistency is for median(), where:
>>> 
 median(c("1", "2", "3", "4"))
>>> [1] NA
>>> Warning message:
>>> In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
>>> argument is not numeric or logical: returning NA
>>> 
>>> as a result of there being 4 elements, rather than 3, and the internal 
>>> checks in the code, where in the case of the input vector having an even 
>>> number of elements, mean() is used:
>>> 
>>>  if (n%%2L == 1L) 
>>>  sort(x, partial = half)[half]
>>>  else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
>>> 
>>> 
>>> Similarly:
>>> 
 median(factor(c("1", "2", "3")))
>>> Error in median.default(factor(c("1", "2", "3"))) : need numeric data
>>> 
>>> because the input vector is a factor, rather than character, and the 
>>> initial check has:
>>> 
>>> if (is.factor(x) || is.data.frame(x)) 
>>>stop("need numeric data")
>>> 
>>> 
>>> Regards,
>>> 
>>> M

Re: [Rd] mean

2020-01-09 Thread Stephen Ellison
Note that in 

> > quantile(c("1","2","3"),p=.5)
> Error in (1 - h) * qs[i] : 
>  argument non numérique pour un opérateur binaire
the default quantile type (7) does not work for non-numerics.

Quantile types 1 and 3 work as expected:

> quantile(c("1","2","3"),p=.5, type=1)
50% 
"2" 
> quantile(c("1","2","3"),p=.5, type=3)
50% 
"2"


Steve E



***
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If 
you have received this message in error, please notify the sender 
immediately via +44(0)20 8943 7000 or notify postmas...@lgcgroup.com 
and delete this message and any copies from your computer and network. 
LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Get memory address of an R data frame

2020-01-09 Thread Pages, Herve
On 1/9/20 06:56, Stepan wrote:
> On 09. 01. 20 15:41, lille stor wrote:
> 
>> I believe this could be done without creating side effects (e.g. 
>> crash) as we are just talking about changing values.

A crash would certainly be an annoying "side effect" ;-)

As Stepan explained, data.frame objects like most objects in R should 
never be modified in-place. If you're looking for a data-frame-like 
structure with a reference semantic where in-place modifications are 
allowed, please take a look at the data.table package.

H.

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Settings to disable forked processing in R, e.g. parallel::mclapply()

2020-01-09 Thread Henrik Bengtsson
I'd like to pick up this thread started on 2019-04-11
(https://hypatia.math.ethz.ch/pipermail/r-devel/2019-April/077632.html).
Modulo all the other suggestions in this thread, would my proposal of
being able to disable forked processing via an option or an
environment variable make sense?  I've prototyped a working patch that
works like:

> options(fork.allowed = FALSE)
> unlist(parallel::mclapply(1:2, FUN = function(x) Sys.getpid()))
[1] 14058 14058
> parallel::mcmapply(1:2, FUN = function(x) Sys.getpid())
[1] 14058 14058
> parallel::pvec(1:2, FUN = function(x) Sys.getpid() + x/10)
[1] 14058.1 14058.2
> f <- parallel::mcparallel(Sys.getpid())
Error in allowFork(assert = TRUE) :
  Forked processing is not allowed per option ‘fork.allowed’ or
environment variable ‘R_FORK_ALLOWED’
> cl <- parallel::makeForkCluster(1L)
Error in allowFork(assert = TRUE) :
  Forked processing is not allowed per option ‘fork.allowed’ or
environment variable ‘R_FORK_ALLOWED’
>


The patch is:

Index: src/library/parallel/R/unix/forkCluster.R
===
--- src/library/parallel/R/unix/forkCluster.R (revision 77648)
+++ src/library/parallel/R/unix/forkCluster.R (working copy)
@@ -30,6 +30,7 @@

 newForkNode <- function(..., options = defaultClusterOptions, rank)
 {
+allowFork(assert = TRUE)
 options <- addClusterOptions(options, list(...))
 outfile <- getClusterOption("outfile", options)
 port <- getClusterOption("port", options)
Index: src/library/parallel/R/unix/mclapply.R
===
--- src/library/parallel/R/unix/mclapply.R (revision 77648)
+++ src/library/parallel/R/unix/mclapply.R (working copy)
@@ -28,7 +28,7 @@
 stop("'mc.cores' must be >= 1")
 .check_ncores(cores)

-if (isChild() && !isTRUE(mc.allow.recursive))
+if (!allowFork() || (isChild() && !isTRUE(mc.allow.recursive)))
 return(lapply(X = X, FUN = FUN, ...))

 ## Follow lapply
Index: src/library/parallel/R/unix/mcparallel.R
===
--- src/library/parallel/R/unix/mcparallel.R (revision 77648)
+++ src/library/parallel/R/unix/mcparallel.R (working copy)
@@ -20,6 +20,7 @@

 mcparallel <- function(expr, name, mc.set.seed = TRUE, silent =
FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE)
 {
+allowFork(assert = TRUE)
 f <- mcfork(detached)
 env <- parent.frame()
 if (isTRUE(mc.set.seed)) mc.advance.stream()
Index: src/library/parallel/R/unix/pvec.R
===
--- src/library/parallel/R/unix/pvec.R (revision 77648)
+++ src/library/parallel/R/unix/pvec.R (working copy)
@@ -25,7 +25,7 @@

 cores <- as.integer(mc.cores)
 if(cores < 1L) stop("'mc.cores' must be >= 1")
-if(cores == 1L) return(FUN(v, ...))
+if(cores == 1L || !allowFork()) return(FUN(v, ...))
 .check_ncores(cores)

 if(mc.set.seed) mc.reset.stream()

with a new file src/library/parallel/R/unix/allowFork.R:

allowFork <- function(assert = FALSE) {
value <- Sys.getenv("R_FORK_ALLOWED")
if (nzchar(value)) {
value <- switch(value,
   "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE,
   "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE,
stop(gettextf("invalid environment variable value: %s==%s",
   "R_FORK_ALLOWED", value)))
value <- as.logical(value)
} else {
value <- TRUE
}
value <- getOption("fork.allowed", value)
if (is.na(value)) {
stop(gettextf("invalid option value: %s==%s", "fork.allowed", value))
}
if (assert && !value) {
  stop(gettextf("Forked processing is not allowed per option %s or
environment variable %s", sQuote("fork.allowed"),
sQuote("R_FORK_ALLOWED")))
}
value
}

/Henrik

On Mon, Apr 15, 2019 at 3:12 AM Tomas Kalibera  wrote:
>
> On 4/15/19 11:02 AM, Iñaki Ucar wrote:
> > On Mon, 15 Apr 2019 at 08:44, Tomas Kalibera  
> > wrote:
> >> On 4/13/19 12:05 PM, Iñaki Ucar wrote:
> >>> On Sat, 13 Apr 2019 at 03:51, Kevin Ushey  wrote:
>  I think it's worth saying that mclapply() works as documented
> >>> Mostly, yes. But it says nothing about fork's copy-on-write and memory
> >>> overcommitment, and that this means that it may work nicely or fail
> >>> spectacularly depending on whether, e.g., you operate on a long
> >>> vector.
> >> R cannot possibly replicate documentation of the underlying operating
> >> systems. It clearly says that fork() is used and readers who may not
> >> know what fork() is need to learn it from external sources.
> >> Copy-on-write is an elementary property of fork().
> > Just to be precise, copy-on-write is an optimization widely deployed
> > in most modern *nixes, particularly for the architectures in which R
> > usually runs. But it is not an elementary property; it is not even
> > possible without an MMU.
>
> Yes, old Unix systems wi