Hi all,

I'm posting this here as it discusses an issue with an external C library. If 
it would be better in R-Help, then I'll repost.

I'm using an external library which I've written, which provides a large set of 
data (>500MB in a highly condensed format) and the tools to return values from 
the data. The functionality has been tested call by call and using valgrind and 
works fine, with no memory leaks. After retrieval, I process the data in R. A 
specific function is causing a problem that appears to be related to the 
garbage collector (judging by symptoms).

In the C code, a Matrix is created using

PROTECT(retVal = allocMatrix(INTSXP, x, y));

Values are written into this matrix using

INTEGER(retVal)[translatedOffset]=z;

where "translatedOffset" is a conversion from a row/column pair to an offset as 
shown in R-exts.pdf.

The last two lines of the function call are:

UNPROTECT(1);
return retVal;

The shared library was compiled with R CMD SHLIB and is called using .Call.

Which returns our completed SEXP object to R where processing continues.

In R, we continue to process the data, replacing -1s with NAs (I couldn't find 
a way to do that in that would make it back into R), sorting it, and trimming 
it. All of these operations are carried out on the original data.

If I carry out the processing step by step from the interpreter, everything is 
fine and the data comes out how I would expect. But when I run the R code to 
carry out those steps, every now and again (Around 1/5th of the time), the 
returned data is garbage. I'm expecting to receive a bias per iteration that 
should be -5 <= bias <= 5, but for the garbaged data, I'm getting results of 
the order of 100s of thousands out (eg. -220627.7). If I call the routine which 
carries out the processing for one iteration from the intepreter, sometimes I 
get the correct data, sometimes (with the same frequency) I get garbage.

There are two possibilities that I can envisage.
1) Race condition: R is starting to execute the R code after the .Call before 
the .Call has returned, thus the data is corrupted.
2) Garbage collector: the GC is collecting my data between the UNPROTECT(1); 
call and the assignment to an R variable.

The created matrices can be large (where x > 1000, y > 100000), but the garbage 
doesn't appear to be related to the size of the matrix.

Any ideas what steps I could take to proceed with this? Or other possibilities 
than those I've suggested? For reasons of confidentiality I'm unable to release 
test code, and the large dataset might make testing difficult.

Thanks in advance

-- 
Jon Senior <j...@restlesslemon.co.uk>

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to