Re: [Rd] Debate: Shall some of Microsoft R Open Code be ported to mainstream R?

2017-11-01 Thread Martin Maechler
> Iñaki Úcar 
> on Tue, 31 Oct 2017 14:55:44 +0100 writes:

> 2017-10-31 14:34 GMT+01:00 Juan Telleria
> :
>> So as long as I can read, OpenBlas, for Windows, might be
>> a worth considering option: http://www.openblas.net
>> 
>> But Intel MKL also seems to be free*:
>> https://software.intel.com/en-us/articles/free-mkl

> install.packages("rmsfact") 
> sub(".*because ", "", rmsfact::rmsfact(8))

"Amen"!

... and thank you Iñaki  for alerting us to the rmsfact package.
Cool!  

Martin Maechler
ETH Zurich and R Core Team

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Memory address of character datatype

2017-11-01 Thread lille stor
Hi,
 
To get the memory address of where the value of variable "x" (of datatype 
"numeric") is stored one does the following in R (in 32 bit):
 
      library(pryr)
      x <- 1024
      addr <- as.numeric(address(x)) + 24    # 24 is needed to jump the 
variable info and point to the data itself (i.e. 1024)
 
The question now is what is the value of the jump so that one can obtain the 
memory address of where the value of variable "x" (of datatype "character"):
 

  library(pryr)
      x <- "abc"
      addr <- as.numeric(address(x)) + ??    # what should be the value of the 
jump so that it points to the data of variable "x" (i.e. abc)?
 
Thank you in advance!

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Bug in model.matrix.default for higher-order interaction encoding when specific model terms are missing

2017-11-01 Thread Arie ten Cate
Hello Tyler,

Thank you for searching for, and finding, the basic description of the
behavior of R in this matter.

I think your example is in agreement with the book.

But let me first note the following. You write: "F_j refers to a
factor (variable) in a model and not a categorical factor". However:
"a factor is a vector object used to specify a discrete
classification" (start of chapter 4 of "An Introduction to R".) You
might also see the description of the R function factor().

You note that the book says about a factor F_j:
  "... F_j is coded by contrasts if T_{i(j)} has appeared in the
formula and by dummy variables if it has not"

You find:
   "However, the example I gave demonstrated that this dummy variable
encoding only occurs for the model where the missing term is the
numeric-numeric interaction, ~(X1+X2+X3)^3-X1:X2."

We have here T_i = X1:X2:X3. Also: F_j = X3 (the only factor). Then
T_{i(j)} = X1:X2, which is dropped from the model. Hence the X3 in T_i
must be encoded by dummy variables, as indeed it is.

  Arie

On Tue, Oct 31, 2017 at 4:01 PM, Tyler  wrote:
> Hi Arie,
>
> Thank you for your further research into the issue.
>
> Regarding Stata: On the other hand, JMP gives model matrices that use the
> main effects contrasts in computing the higher order interactions, without
> the dummy variable encoding. I verified this both by analyzing the linear
> model given in my first example and noting that JMP has one more degree of
> freedom than R for the same model, as well as looking at the generated model
> matrices. It's easy to find a design where JMP will allow us fit our model
> with goodness-of-fit estimates and R will not due to the extra degree(s) of
> freedom required. Let's keep the conversation limited to R.
>
> I want to refocus back onto my original bug report, which was not for a
> missing main effects term, but rather for a missing lower-order interaction
> term. The behavior of model.matrix.default() for a missing main effects term
> is a nice example to demonstrate how model.matrix encodes with dummy
> variables instead of contrasts, but doesn't demonstrate the inconsistent
> behavior my bug report highlighted.
>
> I went looking for documentation on this behavior, and the issue stems not
> from model.matrix.default(), but rather the terms() function in interpreting
> the formula. This "clever" replacement of contrasts by dummy variables to
> maintain marginality (presuming that's the reason) is not described anywhere
> in the documentation for either the model.matrix() or the terms() function.
> In order to find a description for the behavior, I had to look in the
> underlying C code, buried above the "TermCode" function of the "model.c"
> file, which says:
>
> "TermCode decides on the encoding of a model term. Returns 1 if variable
> ``whichBit'' in ``thisTerm'' is to be encoded by contrasts and 2 if it is to
> be encoded by dummy variables.  This is decided using the heuristic
> described in Statistical Models in S, page 38."
>
> I do not have a copy of this book, and I suspect most R users do not as
> well. Thankfully, however, some of the pages describing this behavior were
> available as part of Amazon's "Look Inside" feature--but if not for that, I
> would have no idea what heuristic R was using. Since those pages could made
> unavailable by Amazon at any time, at the very least we have an problem with
> a lack of documentation.
>
> However, I still believe there is a bug when comparing R's implementation to
> the heuristic described in the book. From Statistical Models in S, page
> 38-39:
>
> "Suppose F_j is any factor included in term T_i. Let T_{i(j)} denote the
> margin of T_i for factor F_j--that is, the term obtained by dropping F_j
> from T_i. We say that T_{i(j)} has appeared in the formula if there is some
> term T_i' for i' < i such that T_i' contains all the factors appearing in
> T_{i(j)}. The usual case is that T_{i(j)} itself is one of the preceding
> terms. Then F_j is coded by contrasts if T_{i(j)} has appeared in the
> formula and by dummy variables if it has not"
>
> Here, F_j refers to a factor (variable) in a model and not a categorical
> factor, as specified later in that section (page 40): "Numeric variables
> appear in the computations as themselves, uncoded. Therefore, the rule does
> not do anything special for them, and it remains valid, in a trivial sense,
> whenever any of the F_j is numeric rather than categorical."
>
> Going back to my original example with three variables: X1 (numeric), X2
> (numeric), X3 (categorical). This heuristic prescribes encoding X1:X2:X3
> with contrasts as long as X1:X2, X1:X3, and X2:X3 exist in the formula. When
> any of the preceding terms do not exist, this heuristic tells us to use
> dummy variables to encode the interaction (e.g. "F_j [the interaction term]
> is coded ... by dummy variables if it [any of the marginal terms obtained by
> dropping a single factor in the interaction] has not [appeared in the