Re: [Rd] Objects created by more than one data call?

2013-05-22 Thread Spencer Graves

On 5/21/2013 3:03 PM, William Dunlap wrote:

If you look at
data(package="Ecat")$results[,"Item"]
you will see the items "Hstarts", "Hstarts (Intratesm)", and "Hstarts 
(Intratesq)"
which I think means that the dataset Hstarts is found in 3 .rda files, 
"Hstarts.rda",
"Intratesq.rda", and "Intratesm.rda".  There are duplicate, modulo (filename),
items for "MedExp" as well.



  Thanks for this.  I may get me closer, but I still don't see it:  
(data(Intratesm)) imports only the object Intratesm, etc.  For more 
details, see below.



  Any other suggestions?


  Thanks again,
  Spencer


> Ecdat.data <- data(package="Ecdat")$results
> (Hstarts2 <- grep('Hstarts', Ecdat.data[, 'Item']))
[1] 47 48 49
> (MedExp2 <- grep('MedExp', Ecdat.data[, 'Item']))
[1] 67 68
> Ecdat.data[Hstarts2, ]
 Package LibPath Item
[1,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "Hstarts"
[2,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "Hstarts (Intratesm)"
[3,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "Hstarts (Intratesq)"
 Title
[1,] "Housing Starts"
[2,] "Housing Starts"
[3,] "Housing Starts"
> Ecdat.data[MedExp2,]
 Package LibPath Item
[1,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "MedExp"
[2,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "MedExp (VietNamH)"
 Title
[1,] "Structure of Demand for Medical Care"
[2,] "Structure of Demand for Medical Care"
> library(Ecdat)
> (data(Intratesm))
[1] "Intratesm"
> (data(Intratesq))
[1] "Intratesq"



Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf
Of Spencer Graves
Sent: Tuesday, May 21, 2013 12:21 PM
To: Prof Brian Ripley
Cc: r-devel@r-project.org
Subject: Re: [Rd] Objects created by more than one data call?

On 5/21/2013 9:03 AM, Prof Brian Ripley wrote:

On 21/05/2013 16:51, Spencer Graves wrote:

On 5/21/2013 7:47 AM, Prof Brian Ripley wrote:

On 21/05/2013 15:28, Spencer Graves wrote:

On 5/20/2013 10:10 PM, Prof Brian Ripley wrote:

On 21/05/2013 00:12, Spencer Graves wrote:

Hello, All:


If I use LazyData with the Ecdat package on R-Forge, "R CMD
check" reports "no visible binding for global variable
'nonEnglishNames'", where 'nonEnglishNames' is a dataset in Ecdat
used
as the default argument for a function.  With LazyData, that NOTE
disappears.  However, then I get, "Warning: objects 'Hstarts',
'Hstarts', 'MedExp' are created by more than one data call".


What do you suggest I do to fix this problem?

Not create the objects in more than one data() call.

Check what each of your data() calls produces.


Thanks.  How do I do that?

Call data() on each in turn, and see what files get added to an empty
workspace.


Like the following?

You missed the 'empty'.  Look at tools:::data2LazyLoadDB to see how
this is checked.


Thanks for the suggestion.  Unfortunately, I tried that function,
including stepping through it line by line, fixing references to other
functions not exported from tools, without enlightenment;  see below.


Thanks again,
Spencer


  > lib.loc = NULL
  > package='Ecdat'
  > pkgpath <- find.package(package, lib.loc, quiet = TRUE)
  > pkgpath
[1] "C:/Users/sgraves/pgms/R/R-3.0.0/library/Ecdat"
  > dataDir <- file.path(pkgpath, "data")
  > dataDir
[1] "C:/Users/sgraves/pgms/R/R-3.0.0/library/Ecdat/data"
  > enc <- tools:::.read_description(file.path(pkgpath,
"DESCRIPTION"))["Encoding"]
  > enc

NA
  > if (!is.na(enc)) {
+ op <- options(encoding = enc)
+ on.exit(options(encoding = op[[1L]]))
+ }
  > file_test("-d", dataDir)
[1] TRUE
  > file.path(dataDir, "Rdata.rds")
[1] "C:/Users/sgraves/pgms/R/R-3.0.0/library/Ecdat/data/Rdata.rds"
  > (file.exists(file.path(dataDir, "Rdata.rds")) &&
file.exists(file.path(dataDir,
+ paste(package, "rdx", sep = "."))) &&
file.exists(file.path(dataDir,
+ paste(package, "rdb", sep = "."
[1] FALSE
  > file.exists(file.path(dataDir,
+ paste(package, "rdx", sep = ".")))
[1] FALSE
  > file.path(dataDir,
+ paste(package, "rdx", sep = "."))
[1] "C:/Users/sgraves/pgms/R/R-3.0.0/library/Ecdat/data/Ecdat.rdx"
  > dataEnv <- new.env(hash = TRUE)
  > tmpEnv <- new.env()
  > f0 <- files <- list_files_with_type(dataDir, "data")
Error: could not find function "list_files_with_type"
  > f0 <- files <- tools:::list_files_with_type(dataDir, "data")
  > files <- unique(basename(file_path_sans_ext(files,
+ TRUE)))
Error in basename(file_path_sans_ext(files, TRUE)) :
could not find function "file_path_sans_ext"
  > files <- unique(basename(tools:::file_path_sans_ext(files,
+ TRUE)))
  > dlist <- vector("list", length(files))
  > files
character(0)
  >  names(dlist) <- files
  > loaded <- character(0L)
  > loaded
character(0)
  > for (f in files) {
+ utils::data(lis

Re: [Rd] Objects created by more than one data call?

2013-05-22 Thread William Dunlap
I used svn to copy the current version of Ecdat from Rforge to my PC
  C:\temp\packages>svn checkout svn://r-forge.r-project.org/svnroot/ecdat/
then fired up R to look at the rda files in it.

> setwd("c:/temp/packages/Ecdat/ecdat/pkg/data")
> read.dcf("../DESCRIPTION")[, c("Package","Version")]
Package Version
"Ecdat" "0.2-3"
> dir.rda <- function(rdaFile) { e <- new.env() ; load(rdaFile, envir=e) ; 
> objects(e, all=TRUE)}
> dir.rda("VietNamH.rda")
[1] "MedExp"
> rdas <- dir(pattern="\\.rda$")
> names(rdas) <- rdas
> z <- lapply(rdas, dir.rda)
> tab <- table(unlist(z))
> tab[tab>1]

Hstarts  MedExp
  3   2
> z[sapply(z, function(zi)"Hstarts" %in% zi)]
$Hstarts.rda
[1] "Hstarts"

$Intratesm.rda
[1] "Hstarts"

$Intratesq.rda
[1] "Hstarts"

> z[sapply(z, function(zi)"MedExp" %in% zi)]
$MedExp.rda
[1] "MedExp"

$VietNamH.rda
[1] "MedExp"

It looks some files don't contain what their names suggest:
> dir.rda("VietNamH.rda")
[1] "MedExp"

The two versions of MedExp are quite different:
> load("VietNamH.rda", envViet <- new.env(parent=emptyenv()))
> load("MedExp.rda", envMed <- new.env(parent=emptyenv()))
> objects(envViet)
[1] "MedExp"
> objects(envMed)
[1] "MedExp"
> all.equal(envViet$MedExp, envMed$MedExp)
 [1] "Names: 11 string mismatches" 
 [2] "Length mismatch: comparison on first 11 components"  
 [3] "Component 1: 'current' is not a factor"  
 ...
[18] "Component 10: Numeric: lengths (5999, 5574) differ"  
[19] "Component 11: 'current' is not a factor" 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: Spencer Graves [mailto:spencer.gra...@prodsyse.com]
> Sent: Wednesday, May 22, 2013 1:27 PM
> To: William Dunlap
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] Objects created by more than one data call?
> 
> On 5/21/2013 3:03 PM, William Dunlap wrote:
> > If you look at
> > data(package="Ecat")$results[,"Item"]
> > you will see the items "Hstarts", "Hstarts (Intratesm)", and "Hstarts 
> > (Intratesq)"
> > which I think means that the dataset Hstarts is found in 3 .rda files, 
> > "Hstarts.rda",
> > "Intratesq.rda", and "Intratesm.rda".  There are duplicate, modulo 
> > (filename),
> > items for "MedExp" as well.
> 
> 
>Thanks for this.  I may get me closer, but I still don't see it:
> (data(Intratesm)) imports only the object Intratesm, etc.  For more
> details, see below.
> 
> 
>Any other suggestions?
> 
> 
>Thanks again,
>Spencer
> 
> 
>  > Ecdat.data <- data(package="Ecdat")$results
>  > (Hstarts2 <- grep('Hstarts', Ecdat.data[, 'Item']))
> [1] 47 48 49
>  > (MedExp2 <- grep('MedExp', Ecdat.data[, 'Item']))
> [1] 67 68
>  > Ecdat.data[Hstarts2, ]
>   Package LibPath Item
> [1,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "Hstarts"
> [2,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "Hstarts (Intratesm)"
> [3,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "Hstarts (Intratesq)"
>   Title
> [1,] "Housing Starts"
> [2,] "Housing Starts"
> [3,] "Housing Starts"
>  > Ecdat.data[MedExp2,]
>   Package LibPath Item
> [1,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "MedExp"
> [2,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "MedExp (VietNamH)"
>   Title
> [1,] "Structure of Demand for Medical Care"
> [2,] "Structure of Demand for Medical Care"
>  > library(Ecdat)
>  > (data(Intratesm))
> [1] "Intratesm"
>  > (data(Intratesq))
> [1] "Intratesq"
> 
> 
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >
> >> -Original Message-
> >> From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] 
> >> On
> Behalf
> >> Of Spencer Graves
> >> Sent: Tuesday, May 21, 2013 12:21 PM
> >> To: Prof Brian Ripley
> >> Cc: r-devel@r-project.org
> >> Subject: Re: [Rd] Objects created by more than one data call?
> >>
> >> On 5/21/2013 9:03 AM, Prof Brian Ripley wrote:
> >>> On 21/05/2013 16:51, Spencer Graves wrote:
>  On 5/21/2013 7:47 AM, Prof Brian Ripley wrote:
> > On 21/05/2013 15:28, Spencer Graves wrote:
> >> On 5/20/2013 10:10 PM, Prof Brian Ripley wrote:
> >>> On 21/05/2013 00:12, Spencer Graves wrote:
>  Hello, All:
> 
> 
>  If I use LazyData with the Ecdat package on R-Forge, "R CMD
>  check" reports "no visible binding for global variable
>  'nonEnglishNames'", where 'nonEnglishNames' is a dataset in Ecdat
>  used
>  as the default argument for a function.  With LazyData, that NOTE
>  disappears.  However, then I get, "Warning: objects 'Hstarts',
>  'Hstarts', 'MedExp' are created by more than one data call".
> 
> 
>  What do you suggest I do to fix this problem?
> >>> Not create the objects in more than one data() call.
> >>>
> >>> Check what each of your data() calls produces.
> >>
> >> Thanks

Re: [Rd] Objects created by more than one data call?

2013-05-22 Thread Spencer Graves

Dear Bill:


On 5/22/2013 2:12 PM, William Dunlap wrote:

I used svn to copy the current version of Ecdat from Rforge to my PC
   C:\temp\packages>svn checkout svn://r-forge.r-project.org/svnroot/ecdat/
then fired up R to look at the rda files in it.


setwd("c:/temp/packages/Ecdat/ecdat/pkg/data")
read.dcf("../DESCRIPTION")[, c("Package","Version")]

Package Version
"Ecdat" "0.2-3"

dir.rda <- function(rdaFile) { e <- new.env() ; load(rdaFile, envir=e) ; 
objects(e, all=TRUE)}
dir.rda("VietNamH.rda")

[1] "MedExp"

rdas <- dir(pattern="\\.rda$")
names(rdas) <- rdas
z <- lapply(rdas, dir.rda)
tab <- table(unlist(z))
tab[tab>1]

Hstarts  MedExp
   3   2

z[sapply(z, function(zi)"Hstarts" %in% zi)]

$Hstarts.rda
[1] "Hstarts"

$Intratesm.rda
[1] "Hstarts"

$Intratesq.rda
[1] "Hstarts"


z[sapply(z, function(zi)"MedExp" %in% zi)]

$MedExp.rda
[1] "MedExp"

$VietNamH.rda
[1] "MedExp"

It looks some files don't contain what their names suggest:

dir.rda("VietNamH.rda")

[1] "MedExp"

The two versions of MedExp are quite different:

load("VietNamH.rda", envViet <- new.env(parent=emptyenv()))
load("MedExp.rda", envMed <- new.env(parent=emptyenv()))
objects(envViet)

[1] "MedExp"

objects(envMed)

[1] "MedExp"

all.equal(envViet$MedExp, envMed$MedExp)

  [1] "Names: 11 string mismatches"
  [2] "Length mismatch: comparison on first 11 components"
  [3] "Component 1: 'current' is not a factor"
  ...
[18] "Component 10: Numeric: lengths (5999, 5574) differ"
[19] "Component 11: 'current' is not a factor"



  Thanks very much.  Now I understand the problem.  For a solution, 
I need to consult with the package author (Yves Croissant;  I'm only the 
maintainer).



  However, your work at least makes it easy for me to describe the 
problem to him.



  Thanks again.
  Spencer


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: Spencer Graves [mailto:spencer.gra...@prodsyse.com]
Sent: Wednesday, May 22, 2013 1:27 PM
To: William Dunlap
Cc: r-devel@r-project.org
Subject: Re: [Rd] Objects created by more than one data call?

On 5/21/2013 3:03 PM, William Dunlap wrote:

If you look at
 data(package="Ecat")$results[,"Item"]
you will see the items "Hstarts", "Hstarts (Intratesm)", and "Hstarts 
(Intratesq)"
which I think means that the dataset Hstarts is found in 3 .rda files, 
"Hstarts.rda",
"Intratesq.rda", and "Intratesm.rda".  There are duplicate, modulo (filename),
items for "MedExp" as well.


Thanks for this.  I may get me closer, but I still don't see it:
(data(Intratesm)) imports only the object Intratesm, etc.  For more
details, see below.


Any other suggestions?


Thanks again,
Spencer


  > Ecdat.data <- data(package="Ecdat")$results
  > (Hstarts2 <- grep('Hstarts', Ecdat.data[, 'Item']))
[1] 47 48 49
  > (MedExp2 <- grep('MedExp', Ecdat.data[, 'Item']))
[1] 67 68
  > Ecdat.data[Hstarts2, ]
   Package LibPath Item
[1,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "Hstarts"
[2,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "Hstarts (Intratesm)"
[3,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "Hstarts (Intratesq)"
   Title
[1,] "Housing Starts"
[2,] "Housing Starts"
[3,] "Housing Starts"
  > Ecdat.data[MedExp2,]
   Package LibPath Item
[1,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "MedExp"
[2,] "Ecdat" "C:/Users/sgraves/pgms/R/R-3.0.0/library" "MedExp (VietNamH)"
   Title
[1,] "Structure of Demand for Medical Care"
[2,] "Structure of Demand for Medical Care"
  > library(Ecdat)
  > (data(Intratesm))
[1] "Intratesm"
  > (data(Intratesq))
[1] "Intratesq"



Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On

Behalf

Of Spencer Graves
Sent: Tuesday, May 21, 2013 12:21 PM
To: Prof Brian Ripley
Cc: r-devel@r-project.org
Subject: Re: [Rd] Objects created by more than one data call?

On 5/21/2013 9:03 AM, Prof Brian Ripley wrote:

On 21/05/2013 16:51, Spencer Graves wrote:

On 5/21/2013 7:47 AM, Prof Brian Ripley wrote:

On 21/05/2013 15:28, Spencer Graves wrote:

On 5/20/2013 10:10 PM, Prof Brian Ripley wrote:

On 21/05/2013 00:12, Spencer Graves wrote:

Hello, All:


 If I use LazyData with the Ecdat package on R-Forge, "R CMD
check" reports "no visible binding for global variable
'nonEnglishNames'", where 'nonEnglishNames' is a dataset in Ecdat
used
as the default argument for a function.  With LazyData, that NOTE
disappears.  However, then I get, "Warning: objects 'Hstarts',
'Hstarts', 'MedExp' are created by more than one data call".


 What do you suggest I do to fix this problem?

Not create the objects in more than one data() call.

Check what each of your data() calls produces.

 Thanks.  How do I do that?

Call data() on each in turn, and see what files get added to an empty
workspace.

Like the following?

You missed the 'empty'.  Look at

Re: [Rd] Inconsistent results from .C()

2013-05-22 Thread Robin Evans
Update: I did eventually discover an error in the C code which is probably
ultimately responsible for the bug (the variable eff_size is allowed to get
too large, and overrun the end of lev2[]).  However the behaviour of R (or
Rstudio) in response to this is still somewhat mystifying to me!

Robin

On 21 May 2013 14:10, Robin Evans  wrote:

> Sure!  C code is below if it helps.  The gist is that the function
> oneMargin forms two matrices M and C, mostly by repeatedly taking
> Kronecker products.
>
> Robin
>
> void kronecker (int *A, int *B, int *dima, int *dimb, int *C) {
>   int k = 0;
>   int i1,i2,j1,j2;
>
> for (i2 = 0; i2 < dima[1]; i2++) {
> for (j2 = 0; j2 < dimb[1]; j2++) {
> for (i1 = 0; i1 < dima[0]; i1++) {
> for (j1 = 0; j1 < dimb[0]; j1++) {
>   C[k] = A[i1 + dima[0]*i2]*B[j1 + dimb[0]*j2];
>   k++;
> 
> }
>
> void iterate (int *x, int *lev) {
>   bool ok = FALSE;
>   int i=0;
>
>   do {
> if(x[i] < lev[i]-1) {
>   x[i]++;
>   ok = TRUE;
> }
> else {
>   x[i] = 0;
> }
> i++;
>   }
>   while (!ok);
> }
>
> void oneMargin(int *Mar, int *Eff, int *neff, int *lev, int *nvar, int
> *M, int *C) {
>
>   int *M2, *mult, *Cj, *Cj2;
>   int i,j=1,k,l,m;
>
>   /* determine size of M to output */
>   for (i = 0; i < nvar[0]; i++) {
> j *= (Mar[i] == 1 ? lev[i]*lev[i] : lev[i]);
>   }
>   M2 = malloc(sizeof(int)*j);
>   mult = malloc(sizeof(int)*j);
>
>   M[0] = 1;
>   int M_size[2] = {1,1}, mult_size[2], Cj_size[2], C_size[2] = {0,0};
>
>   for (i = nvar[0]-1; i >= 0; i--) {
> if (Mar[i] == 1) {
>   for (j = 0; j < lev[i]*lev[i]; j++) {
> mult[j] =  ((j % (lev[i]+1)) == 0) ? 1 : 0;
>   }
>   mult_size[0] = mult_size[1] = lev[i];
> }
> else {
>   for (j = 0; j < lev[i]; j++) {
> mult[j] =  1;
>   }
>   mult_size[0] = 1;
>   mult_size[1] = lev[i];
> }
>
> kronecker(M, mult, M_size, mult_size, M2);
> M_size[0] *= mult_size[0];
> M_size[1] *= mult_size[1];
>
> /* copy M2 over to M */
> for (j=0; j < M_size[0]*M_size[1]; j++) {
>   M[j] = M2[j];
> }
>   }
>
>   free(M2);
>
>   /* Create C matrix */
>   int index[nvar[0]];
>   int prod;
>   int eff_size = 0;
>   /*swapped = 0; */
>   mult_size[0] = 1;
>
>   C_size[0] = 1;
>   for (j = 0; j < nvar[0]; j++) {
> C_size[0] *= (Mar[j] == 1 ? lev[j] : 1);
>   }
>
>   Cj = malloc(sizeof(int)*C_size[0]);
>   Cj2 = malloc(sizeof(int)*C_size[0]);
>   int *lev2;
>   lev2 = malloc(sizeof(int)*nvar[0]);
>
>   /* loop over effects */
>   for (i = 0; i < neff[0]; i++) {
> prod = 1;
> for (j = 0; j < nvar[0]; j++) {
>   if (Eff[j + i*nvar[0]]) {
> prod *= lev[j]-1;
> lev2[eff_size] = lev[j]-1;
> index[eff_size] = 0;
> eff_size++;
>}
> }
>
> /* loop over states of effect (excluding corner points) */
> for (j = 0; j < prod; j++) {
>   Cj[0] = 1;
>   Cj_size[0] = Cj_size[1] = 1;
>
>   k = eff_size;
>   /* loop over variables */
>   for (l = nvar[0]-1; l >= 0; l--) {
> /* skip variables not in margin */
> if (Mar[l] == 0) continue;
> mult_size[1] = lev[l];
>
> /* kronecker factor depends on whether or not variable is in
> effect */
> if (Eff[i*nvar[0]+l] == 0) {
>   /* for variables not in effect, just repeat matrix */
>   for (m = 0; m < lev[l]; m++) {
> mult[m] = 1;
>   }
> }
> else {
>   /* otherwise multiply based on state */
>   k--;
>   for (m = 0; m < lev[l]; m++) {
> mult[m] = (m == index[k]) ? lev[l] - 1 : -1;
>   }
> }
> kronecker(Cj, mult, Cj_size, mult_size, Cj2);
>
> Cj_size[1] *= mult_size[1];
>
> /* copy Cj2 over to Cj */
> for (k=0; k < Cj_size[0]*Cj_size[1]; k++) {
>   Cj[k] = Cj2[k];
> }
> if (Cj_size[0]*Cj_size[1] > C_size[0]) Rprintf("pointer screwup");
>   }
>
>   /* copy row over to C */
>   for (m = 0; m < C_size[0]; m++) C[C_size[0]*C_size[1]+m] = Cj[m];
>   C_size[1] += 1;
>   iterate(index, lev2);
> }
>   }
>
>   free(lev2);
>   free(Cj);
>   free(Cj2);
>   free(mult);
> }
>
>
> On 21 May 2013 14:04, R. Michael Weylandt 
> wrote:
> >
> >  It might also help if you can point us to the C code to help debug.
> >
> > MW
> >
> > On Tue, May 21, 2013 at 10:53 AM, Robin Evans  wrote:
> > > I should add to this that I'm running on Scientific Linux 6.  I later
> > > noticed that the bug only seems to occur when I run the code from
> Rstudio,
> > > and not if I use the terminal directly, so this may be the key to the
> > > problem.
> > >
> > > Robin
> > >
> > > On 20 May 2013 16:12, Robin Evans  wrote:
> > >
> > >> Hello,
> > >>
> > >> I've run into a problem which is both maddening and rather hard to
> > >> replicate, therefore I wondered if someone might know of a plausible
> > >> explanation for it.  I 

Re: [Rd] Inconsistent results from .C()

2013-05-22 Thread William Dunlap
Had you tried using the new-to-3.0.0 options(CBoundCheck=TRUE)?

[from the NEWS file]
There is a new option, options(CBoundsCheck=), which controls how .C()
and .Fortran() pass arguments to compiled code. If true (which can be
enabled by setting the environment variable R_C_BOUNDS_CHECK to yes),
raw, integer, double and complex arguments are always copied, and checked
for writing off either end of the array on return from the compiled code
(when a second copy is made). This also checks individual elements of 
character
vectors passed to .C().

valgrind helps find memory misuse also.

When you write on memory that you didn't allocate, expect to get mystifying
behavior.  Nonrepeatable results are a sign of memory misuse.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
> Behalf
> Of Robin Evans
> Sent: Wednesday, May 22, 2013 4:16 PM
> To: R Devel List
> Subject: Re: [Rd] Inconsistent results from .C()
> 
> Update: I did eventually discover an error in the C code which is probably
> ultimately responsible for the bug (the variable eff_size is allowed to get
> too large, and overrun the end of lev2[]).  However the behaviour of R (or
> Rstudio) in response to this is still somewhat mystifying to me!
> 
> Robin
> 
> On 21 May 2013 14:10, Robin Evans  wrote:
> 
> > Sure!  C code is below if it helps.  The gist is that the function
> > oneMargin forms two matrices M and C, mostly by repeatedly taking
> > Kronecker products.
> >
> > Robin
> >
> > void kronecker (int *A, int *B, int *dima, int *dimb, int *C) {
> >   int k = 0;
> >   int i1,i2,j1,j2;
> >
> > for (i2 = 0; i2 < dima[1]; i2++) {
> > for (j2 = 0; j2 < dimb[1]; j2++) {
> > for (i1 = 0; i1 < dima[0]; i1++) {
> > for (j1 = 0; j1 < dimb[0]; j1++) {
> >   C[k] = A[i1 + dima[0]*i2]*B[j1 + dimb[0]*j2];
> >   k++;
> > 
> > }
> >
> > void iterate (int *x, int *lev) {
> >   bool ok = FALSE;
> >   int i=0;
> >
> >   do {
> > if(x[i] < lev[i]-1) {
> >   x[i]++;
> >   ok = TRUE;
> > }
> > else {
> >   x[i] = 0;
> > }
> > i++;
> >   }
> >   while (!ok);
> > }
> >
> > void oneMargin(int *Mar, int *Eff, int *neff, int *lev, int *nvar, int
> > *M, int *C) {
> >
> >   int *M2, *mult, *Cj, *Cj2;
> >   int i,j=1,k,l,m;
> >
> >   /* determine size of M to output */
> >   for (i = 0; i < nvar[0]; i++) {
> > j *= (Mar[i] == 1 ? lev[i]*lev[i] : lev[i]);
> >   }
> >   M2 = malloc(sizeof(int)*j);
> >   mult = malloc(sizeof(int)*j);
> >
> >   M[0] = 1;
> >   int M_size[2] = {1,1}, mult_size[2], Cj_size[2], C_size[2] = {0,0};
> >
> >   for (i = nvar[0]-1; i >= 0; i--) {
> > if (Mar[i] == 1) {
> >   for (j = 0; j < lev[i]*lev[i]; j++) {
> > mult[j] =  ((j % (lev[i]+1)) == 0) ? 1 : 0;
> >   }
> >   mult_size[0] = mult_size[1] = lev[i];
> > }
> > else {
> >   for (j = 0; j < lev[i]; j++) {
> > mult[j] =  1;
> >   }
> >   mult_size[0] = 1;
> >   mult_size[1] = lev[i];
> > }
> >
> > kronecker(M, mult, M_size, mult_size, M2);
> > M_size[0] *= mult_size[0];
> > M_size[1] *= mult_size[1];
> >
> > /* copy M2 over to M */
> > for (j=0; j < M_size[0]*M_size[1]; j++) {
> >   M[j] = M2[j];
> > }
> >   }
> >
> >   free(M2);
> >
> >   /* Create C matrix */
> >   int index[nvar[0]];
> >   int prod;
> >   int eff_size = 0;
> >   /*swapped = 0; */
> >   mult_size[0] = 1;
> >
> >   C_size[0] = 1;
> >   for (j = 0; j < nvar[0]; j++) {
> > C_size[0] *= (Mar[j] == 1 ? lev[j] : 1);
> >   }
> >
> >   Cj = malloc(sizeof(int)*C_size[0]);
> >   Cj2 = malloc(sizeof(int)*C_size[0]);
> >   int *lev2;
> >   lev2 = malloc(sizeof(int)*nvar[0]);
> >
> >   /* loop over effects */
> >   for (i = 0; i < neff[0]; i++) {
> > prod = 1;
> > for (j = 0; j < nvar[0]; j++) {
> >   if (Eff[j + i*nvar[0]]) {
> > prod *= lev[j]-1;
> > lev2[eff_size] = lev[j]-1;
> > index[eff_size] = 0;
> > eff_size++;
> >}
> > }
> >
> > /* loop over states of effect (excluding corner points) */
> > for (j = 0; j < prod; j++) {
> >   Cj[0] = 1;
> >   Cj_size[0] = Cj_size[1] = 1;
> >
> >   k = eff_size;
> >   /* loop over variables */
> >   for (l = nvar[0]-1; l >= 0; l--) {
> > /* skip variables not in margin */
> > if (Mar[l] == 0) continue;
> > mult_size[1] = lev[l];
> >
> > /* kronecker factor depends on whether or not variable is in
> > effect */
> > if (Eff[i*nvar[0]+l] == 0) {
> >   /* for variables not in effect, just repeat matrix */
> >   for (m = 0; m < lev[l]; m++) {
> > mult[m] = 1;
> >   }
> > }
> > else {
> >   /* otherwise multiply based on state */
> >   k--;
> >   for (m = 0; m < lev[l]; m++) {
> > mult[m] = (m ==

Re: [Rd] Inconsistent results from .C()

2013-05-22 Thread Prof Brian Ripley
lev2 is malloc-ed in the code you supplied.  The new option Bill refers 
to is about input variables in .C, i.e. lev and not lev2.  Other array 
overruns can be found by other tools: Bill mentioned valgrind, and 
AddressSanitizer finds a different set of overruns.  See 
http://cran.r-project.org/doc/manuals/r-release/R-exts.html#Using-gctorture-and-memory-access 
.


On 23/05/2013 01:18, William Dunlap wrote:

Had you tried using the new-to-3.0.0 options(CBoundCheck=TRUE)?

 [from the NEWS file]
 There is a new option, options(CBoundsCheck=), which controls how .C()
 and .Fortran() pass arguments to compiled code. If true (which can be
 enabled by setting the environment variable R_C_BOUNDS_CHECK to yes),
 raw, integer, double and complex arguments are always copied, and checked
 for writing off either end of the array on return from the compiled code
 (when a second copy is made). This also checks individual elements of 
character
 vectors passed to .C().

valgrind helps find memory misuse also.

When you write on memory that you didn't allocate, expect to get mystifying
behavior.  Nonrepeatable results are a sign of memory misuse.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf
Of Robin Evans
Sent: Wednesday, May 22, 2013 4:16 PM
To: R Devel List
Subject: Re: [Rd] Inconsistent results from .C()

Update: I did eventually discover an error in the C code which is probably
ultimately responsible for the bug (the variable eff_size is allowed to get
too large, and overrun the end of lev2[]).  However the behaviour of R (or
Rstudio) in response to this is still somewhat mystifying to me!

Robin

On 21 May 2013 14:10, Robin Evans  wrote:


Sure!  C code is below if it helps.  The gist is that the function
oneMargin forms two matrices M and C, mostly by repeatedly taking
Kronecker products.

Robin

void kronecker (int *A, int *B, int *dima, int *dimb, int *C) {
   int k = 0;
   int i1,i2,j1,j2;

 for (i2 = 0; i2 < dima[1]; i2++) {
 for (j2 = 0; j2 < dimb[1]; j2++) {
 for (i1 = 0; i1 < dima[0]; i1++) {
 for (j1 = 0; j1 < dimb[0]; j1++) {
   C[k] = A[i1 + dima[0]*i2]*B[j1 + dimb[0]*j2];
   k++;
 
}

void iterate (int *x, int *lev) {
   bool ok = FALSE;
   int i=0;

   do {
 if(x[i] < lev[i]-1) {
   x[i]++;
   ok = TRUE;
 }
 else {
   x[i] = 0;
 }
 i++;
   }
   while (!ok);
}

void oneMargin(int *Mar, int *Eff, int *neff, int *lev, int *nvar, int
*M, int *C) {

   int *M2, *mult, *Cj, *Cj2;
   int i,j=1,k,l,m;

   /* determine size of M to output */
   for (i = 0; i < nvar[0]; i++) {
 j *= (Mar[i] == 1 ? lev[i]*lev[i] : lev[i]);
   }
   M2 = malloc(sizeof(int)*j);
   mult = malloc(sizeof(int)*j);

   M[0] = 1;
   int M_size[2] = {1,1}, mult_size[2], Cj_size[2], C_size[2] = {0,0};

   for (i = nvar[0]-1; i >= 0; i--) {
 if (Mar[i] == 1) {
   for (j = 0; j < lev[i]*lev[i]; j++) {
 mult[j] =  ((j % (lev[i]+1)) == 0) ? 1 : 0;
   }
   mult_size[0] = mult_size[1] = lev[i];
 }
 else {
   for (j = 0; j < lev[i]; j++) {
 mult[j] =  1;
   }
   mult_size[0] = 1;
   mult_size[1] = lev[i];
 }

 kronecker(M, mult, M_size, mult_size, M2);
 M_size[0] *= mult_size[0];
 M_size[1] *= mult_size[1];

 /* copy M2 over to M */
 for (j=0; j < M_size[0]*M_size[1]; j++) {
   M[j] = M2[j];
 }
   }

   free(M2);

   /* Create C matrix */
   int index[nvar[0]];
   int prod;
   int eff_size = 0;
   /*swapped = 0; */
   mult_size[0] = 1;

   C_size[0] = 1;
   for (j = 0; j < nvar[0]; j++) {
 C_size[0] *= (Mar[j] == 1 ? lev[j] : 1);
   }

   Cj = malloc(sizeof(int)*C_size[0]);
   Cj2 = malloc(sizeof(int)*C_size[0]);
   int *lev2;
   lev2 = malloc(sizeof(int)*nvar[0]);

   /* loop over effects */
   for (i = 0; i < neff[0]; i++) {
 prod = 1;
 for (j = 0; j < nvar[0]; j++) {
   if (Eff[j + i*nvar[0]]) {
 prod *= lev[j]-1;
 lev2[eff_size] = lev[j]-1;
 index[eff_size] = 0;
 eff_size++;
}
 }

 /* loop over states of effect (excluding corner points) */
 for (j = 0; j < prod; j++) {
   Cj[0] = 1;
   Cj_size[0] = Cj_size[1] = 1;

   k = eff_size;
   /* loop over variables */
   for (l = nvar[0]-1; l >= 0; l--) {
 /* skip variables not in margin */
 if (Mar[l] == 0) continue;
 mult_size[1] = lev[l];

 /* kronecker factor depends on whether or not variable is in
effect */
 if (Eff[i*nvar[0]+l] == 0) {
   /* for variables not in effect, just repeat matrix */
   for (m = 0; m < lev[l]; m++) {
 mult[m] = 1;
   }
 }
 else {
   /* otherwise multiply based on state */
   k--;
   for (m = 0; m < lev[l]; m++) {
 mult[m] = (m == index[k]) ? lev[l]