[Rd] Return values from .Call and garbage collection

2009-01-27 Thread Jon Senior
Hi all,

I'm posting this here as it discusses an issue with an external C library. If 
it would be better in R-Help, then I'll repost.

I'm using an external library which I've written, which provides a large set of 
data (>500MB in a highly condensed format) and the tools to return values from 
the data. The functionality has been tested call by call and using valgrind and 
works fine, with no memory leaks. After retrieval, I process the data in R. A 
specific function is causing a problem that appears to be related to the 
garbage collector (judging by symptoms).

In the C code, a Matrix is created using

PROTECT(retVal = allocMatrix(INTSXP, x, y));

Values are written into this matrix using

INTEGER(retVal)[translatedOffset]=z;

where "translatedOffset" is a conversion from a row/column pair to an offset as 
shown in R-exts.pdf.

The last two lines of the function call are:

UNPROTECT(1);
return retVal;

The shared library was compiled with R CMD SHLIB and is called using .Call.

Which returns our completed SEXP object to R where processing continues.

In R, we continue to process the data, replacing -1s with NAs (I couldn't find 
a way to do that in that would make it back into R), sorting it, and trimming 
it. All of these operations are carried out on the original data.

If I carry out the processing step by step from the interpreter, everything is 
fine and the data comes out how I would expect. But when I run the R code to 
carry out those steps, every now and again (Around 1/5th of the time), the 
returned data is garbage. I'm expecting to receive a bias per iteration that 
should be -5 <= bias <= 5, but for the garbaged data, I'm getting results of 
the order of 100s of thousands out (eg. -220627.7). If I call the routine which 
carries out the processing for one iteration from the intepreter, sometimes I 
get the correct data, sometimes (with the same frequency) I get garbage.

There are two possibilities that I can envisage.
1) Race condition: R is starting to execute the R code after the .Call before 
the .Call has returned, thus the data is corrupted.
2) Garbage collector: the GC is collecting my data between the UNPROTECT(1); 
call and the assignment to an R variable.

The created matrices can be large (where x > 1000, y > 10), but the garbage 
doesn't appear to be related to the size of the matrix.

Any ideas what steps I could take to proceed with this? Or other possibilities 
than those I've suggested? For reasons of confidentiality I'm unable to release 
test code, and the large dataset might make testing difficult.

Thanks in advance

-- 
Jon Senior 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Return values from .Call and garbage collection [Additional information added]

2009-01-27 Thread Jon Senior
Hi all,

I'm posting this here as it discusses an issue with an external C library. If 
it would be better in R-Help, then I'll repost.

I'm using an external library which I've written, which provides a large set of 
data (>500MB in a highly condensed format) and the tools to return values from 
the data. The functionality has been tested call by call and using valgrind and 
works fine, with no memory leaks. After retrieval, I process the data in R. A 
specific function is causing a problem that appears to be related to the 
garbage collector (judging by symptoms).

In the C code, a Matrix is created using

PROTECT(retVal = allocMatrix(INTSXP, x, y));

Values are written into this matrix using

INTEGER(retVal)[translatedOffset]=z;

where "translatedOffset" is a conversion from a row/column pair to an offset as 
shown in R-exts.pdf.

The last two lines of the function call are:

UNPROTECT(1);
return retVal;

The shared library was compiled with R CMD SHLIB and is called using .Call.

Which returns our completed SEXP object to R where processing continues.

In R, we continue to process the data, replacing -1s with NAs (I couldn't find 
a way to do that in that would make it back into R), sorting it, and trimming 
it. All of these operations are carried out on the original data.

If I carry out the processing step by step from the interpreter, everything is 
fine and the data comes out how I would expect. But when I run the R code to 
carry out those steps, every now and again (Around 1/5th of the time), the 
returned data is garbage. I'm expecting to receive a bias per iteration that 
should be -5 <= bias <= 5, but for the garbaged data, I'm getting results of 
the order of 100s of thousands out (eg. -220627.7). If I call the routine which 
carries out the processing for one iteration from the intepreter, sometimes I 
get the correct data, sometimes (with the same frequency) I get garbage.

There are two possibilities that I can envisage.
1) Race condition: R is starting to execute the R code after the .Call before 
the .Call has returned, thus the data is corrupted.
2) Garbage collector: the GC is collecting my data between the UNPROTECT(1); 
call and the assignment to an R variable.

The created matrices can be large (where x > 1000, y > 10), but the garbage 
doesn't appear to be related to the size of the matrix.

R version 2.8.1 (2008-12-22), running on Fedora 10 on a Centrino dual-core with 
3GB of RAM.

Any ideas what steps I could take to proceed with this? Or other possibilities 
than those I've suggested? For reasons of confidentiality I'm unable to release 
test code, and the large dataset might make testing difficult.

Thanks in advance

-- 
Jon Senior 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Return values from .Call and garbage collection

2009-01-27 Thread Jon Senior
On Tue, 27 Jan 2009 12:25:12 -
"Sklyar, Oleg \(London\)"  wrote:

> Most likely issue is your code itself, out of range indexing, failure to
> initialise all elements of the allocated structure correctly, 1 and not
> 0-based indexing, use of other R variables for initialisation that
> should have been protected but were not etc. 

Apologies one and all. Oleg was right. I was writing into a Matrix but my 
method for doing so was wrong. Error fixed, and now it's all behaving. Thanks.

-- 
Jon Senior 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Problem building DLL under Windows

2009-04-03 Thread Jon Senior
Apologies if this has appeared before, but I've searched the archives and all 
the documentation and I can't find anything which helps.

I'm trying to build a DLL under windows. The process (more on that later) works 
fine under Linux and gives the illusion of working under Windows, but 
attempting to load the resulting DLL using dyn.load results in:

Error in inDL(x, as.logical(local), as.logical(now), ...) :
  unable to load shared library: 'C:/Documents... '
  LoadLibrary failure: Invalid access to memory location.

Searching Google shows that the LoadLibrary message is unique to R (Or no-one 
else is admitting to it).

The first problem is that the library is actually a wrapper around an existing 
library to make it usable under R, but the original library is built as a 
static object (and for reasons of controlling exciting versioning problems, I'd 
prefer it to stay that way).

Under Linux, I pass the library to gcc using PKG_LIBS=-static lib.a and it 
builds fine.

Under Windows, I put the following in Makevars.win:
PKG_LIBS -Lc:/Path/To/Library -llib.name
and it builds fine (GCC returns with no errors and I have what appears to be an 
appropriately sized DLL in the directory).

I've tried passing various flags into gcc, but I really don't know what I'm 
doing at this point with regard to building under windows (I have a pretty good 
grasp of how to compile libraries under Linux, and understand the concepts 
involved in shared libraries. I get the impression however that I'm missing 
something about Windows DLLs.

Compulsory version information:
OS: Windows XP SP2
R: 2.8.1
GCC: 4.2.1-sjlj (mingw32-2) (From Rtools29.exe)

For the record, I've read 
http://www.stats.uwo.ca/faculty/murdoch/software/compilingDLLs/readme.packages.txt
 which hints at some requirement for DLLs to use _cdecl. I've started exploring 
along this line, but there's a lot of documentation to trawl through to make 
sense of it all and I don't want to go off chasing a red herring if I just need 
to pass a special --make-it-work flag to gcc.

In the only thread I found which appeared to have any similarities, Prof. 
Ripley said that there was a solution (or hint): "It is there, unfortunately 
along with a lot of uniformed speculation." Of course, the uninformed 
speculation is still in the archives making it no easier to find no than in 
August 2007! Perhaps someone who understands this stuff (or has some experience 
of it) could provide a hint as to how to proceed. :-)

Thanks in advance.

-- 
Jon Senior 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Problem building DLL under Windows

2009-04-03 Thread Jon Senior
On Fri, 3 Apr 2009 12:36:00 -0400
Simon Urbanek  wrote:

> That is true only for very specific architectures and OS combinations  
> but not on most systems (including Linux). Shared objects must be  
> compiled to contain position-independent code (PIC) such that they can  
> be re-located when loaded dynamically. In general you cannot use a  
> static library in a package unless the library was specifically  
> compiled with -fPIC.

It was indeed compiled with -fPIC on Linux. I had forgotten that and wonder if 
it might be related.
 
> Also please note that the above is possibly not what you want: -static  
> is not an option that applies to the library - it's a global option  
> for the linker which affects *all* libraries and possibly even the crt  
> code and compiler-related libraries (this depends on the platform). It  
> may cause additional problems since you may need to link R library  
> dynamically. All this is not related to Windows - this applies in  
> general on any platform (including Linux).

AFAICT -static can be used to force inclusion of a library statically. It may 
not be the case, the bulk of my coding experience is in Java, not wrapping 
esoteric functions in to R-callable C code! :-) Strangely though, it seems to 
work fine. I'll have to do some more tinkering to see if it can forced into a 
single build, rather than a 2-stage one. 

> See above, this may not be DLL-specific. Additionally, please make  
> sure you're using the right tools (MinGW gcc) for both your static  
> library and the package (you have indicated the you do, but just  
> making sure :))..

Checked and doubled checked. It's a clean installation of XP running under QEMU 
and has nothing but R and Rtools installed.

> I have tested a toy example with your setup and all was working just  
> fine, so for further help you may have to reveal exactly what library  
> you are using etc. since the devil may be in the details (if the  
> general advice above doesn't help).

OK. Since my previous mail, I discovered the pedump tool, and found that both 
pedump and objdump will segfault (So it appears, Windoze doesn't give enough 
information to be sure!) if I attempt to retrieve the Ordinal table from the 
freshly compiled DLL. This suggests that something is going awry during the 
second stage of the build (since I can retrieve the same information from the 
.lib file).

The problem with adding more details is the nature of the work. At the minute, 
I'm obliged to keep the fine details secret. I had a feeling that this might 
have been the case, but I thought it was worth seeing if someone had already 
encountered this and solved it in a different specific case.

> AFAICS that is only mentioned with respect to VC - the current tools  
> are smart enough with gcc. There are some issues when importing  
> variables from R itself, but that should not be related to your code  
> (unless you use this feature outside of the standard R headers).

OK.

Thanks for your help. Looks like the next step is probably going to be 
combining both compilation steps into a single Makefile. I really hate 
Makefiles! :-(

-- 
Jon Senior 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel