Re: [Rd] A memory management question

2005-09-05 Thread Luke Tierney
On Mon, 5 Sep 2005, [EMAIL PROTECTED] wrote:

> Luke Tierney <[EMAIL PROTECTED]> wrote:
>
>> This is not supported by the memory manager.  Using SETLENGTH to
>> change the length would confuse the garbage collector--we should
>> probably remove SETLENGTH from the headers.
>
>> The memory manager does over-allocate small vectors by rounding up to
>> convenient sizes, and the real size could be computed, but this is not
>> true for large allocations--these correspond to malloc calls for the
>> requested size--and in any case the memory manager relies on LENGTH
>> giving the correct amount (maybe not heavily but this could change).
>
>> A GC does not move objects.
>
>> Using R level vectors for the purpose you describe is in any case
>> tricky since it is hard to reliably prevent copying. You are better
>> off using something like an external pointer into an R-allocated
>> object that is only accessible through the external pointer.  Then you
>> can manage the filled length yourself.
>
> Ok... since GC does not move objects, and large vectors are allocated
> using a regular malloc, and malloc/free manages space independent of
> the LENGTH information, it seems that SETLENGTH would be "safe" if it
> was possible to guarantee for an interval of time that this particular
> value would not be moved or released due to any user activity?
>
> What if I create the full-length vector, make it visible using
> defineVar(), then protect the vector by creating a reference with
> R_MakeExternalPtr(), and R_PreserveObject() this reference?  Then
> shouldn't the vector be left alone until I release the reference?
> And I could then play with SETLENGTH() on that vector safely, so long
> as I restore it before releasing the reference, and so long as I only
> perform operations that modify the vector in-place?

It might or might not work now but is not guaranteed to do so reliably
in the future.  Seeing the risks of leaving SETLENGTH exposed, it is
very likely that SETLENGTH will be removed from the sources after the
2.2.0 release.

If you provide your own methods to read and write the external pointer
then you don' need this; this is safer than relying on undocumented
behavior of [ and [<- in any case.  You also then don't need to use
R_PreserveObject unless you really need to use it from the C level
outside of a context where an R reference exists.

luke



> i.e., should something like this work:
>
> static SEXP ptr;
>
> do_init()
> {
>SEXP s = PROTECT(allocVector(RAWSXP, 1000));
>defineVar("mystuff", s, R_BaseEnv);
>ptr = R_MakeExternalPtr(RAW(s), R_NilValue, s);
>R_PreserveObject(ptr);
>SETLENGTH(s, 0);
>UNPROTECT(1);
> }
>
> do_extend()
> {
>SEXP s = R_ExternalPtrProtected(ptr);
>memcpy(RAW(s)+LENGTH(s), "", 4);
>SETLENGTH(s, LENGTH(s)+4);
> }
>
> do_finish()
> {
>SEXP s = R_ExternalPtrProtected(ptr);
>SETLENGTH(s, 1000);
>R_ReleaseObject(ptr);
> }
>
> i.e., if the user tries to modify "mystuff", they'll end up with a
> copy, but the value pointed to by ptr will hang around (no longer
> accessible by the user) until do_finish() is called?
>
> -- Dave
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:  [EMAIL PROTECTED]
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RODBC and 64 bit

2005-09-05 Thread Florian Hahne
I forgot to include some more information.

Here is my sessionInfo
R version 2.2.0, 2005-08-24, x86_64-unknown-linux-gnu

attached base packages:
[1] "grid"  "tools" "methods"   "stats" "graphics"
"grDevices"
[7] "utils" "datasets"  "base"

other attached packages:
   RODBCprada RColorBrewer  Biobase
 "1.1-4"  "1.4.7"  "0.2-3"  "1.6.6"

and I am running Suse 9.3 on Intel Pentium 650 3.4GHz CPU with EM64
technology.

The TDS driver is the latest release of freeTDS (version 0.64), the DSN
is configured using unixODBC and I am connecting to a MSSQL 2000 Server.

Hope this helps,
FLorian

-- 
Florian Hahne
Molecular Genome Analysis (B050)
German Cancer Research Center
Im Neuenheimer Feld 580
D-69120 Heidelberg
Germany
room TP3 2.204
phone ++49 6221 42-4764
email [EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] .Call with C and Fortran together (PR#8122)

2005-09-05 Thread Thomas Lumley

>
> In some machines I don't get the segmentation fault problem, but I don't get 
> the
> message "Just a simple test" either (when using "cg" as the subroutine's 
> name).
> I believe this is bug in R because if I change my C interface again to return 
> a
> 0 instead of a R_NilValue, and then use it with another C program wich loads 
> the
> dynamic library amd call the function simple_program(), everything work
> perfectly.
>

I don't think it is an R bug.  I think it is because there is already a 
Fortran function called cg in R. The fact that changing the name matters 
suggest that you have a linking problem, and this turns out to be the 
case.

When I try running your code under gdb in R as Peter Dalgaard suggested 
(after editing it to use R's macros for calling fortran from C instead of 
"cfortran.h" which I don't have), I get

> .Call("simple_program")
  Calling the function...

Program received signal SIGSEGV, Segmentation fault.
0x081604e5 in cg_ (nm=0x9e5dda4, n=0xbfefccfc, ar=0xbfefcce8, ai=0x89a826,
 wr=0x9e5dda4, wi=0x9790cc0, matz=0x56090a0, zr=0x80992d4, zi=0x0, 
fv1=0x0,
 fv2=0x9e745f8, fv3=0x89a810, ierr=0x706d6973) at eigen.f:3416
3416  IERR = 10 * N
Current language:  auto; currently fortran


That is, your program is calling the Fortran subroutine CG in eigen.f, 
rather than your CG.

There should be some set of linker flags that makes sure your definition 
of CG is used, but I don't know what it would be (and it's probably very 
platform dependent)

-thomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A memory management question

2005-09-05 Thread dhinds
Luke Tierney <[EMAIL PROTECTED]> wrote:

> It might or might not work now but is not guaranteed to do so reliably
> in the future.  Seeing the risks of leaving SETLENGTH exposed, it is
> very likely that SETLENGTH will be removed from the sources after the
> 2.2.0 release.

> If you provide your own methods to read and write the external pointer
> then you don' need this; this is safer than relying on undocumented
> behavior of [ and [<- in any case.  You also then don't need to use
> R_PreserveObject unless you really need to use it from the C level
> outside of a context where an R reference exists.

I'm not sure I follow this.  Maybe I should explain the context for
the problem.

textConnection("xyz", "w") creates a connection, the output of which
is deposited in a char vector named "xyz", which is updated line by
line as output is sent to the connection.  The current code maintains
a pointer to "xyz" in the form of an unprotected SEXP.  Hence if the
user does rm(xyz), bad things happen.  A small bug, I admit.

I think the best fix is to use a protected reference to the result
vector.  I think this is safe and doesn't rely on any abuse of the
interfaces.

There's also a performance issue, that the result is updated after
every line of output, resulting in a vast amount of copying if a large
result is accumulated.  This is the part that could be fixed by using
SETLENGTH to manage the length of the protected result vector.

I'm not sure what you mean by undocumented behavior of [ and [<-.  I
think all I'm relying on is that as long as an outstanding reference
to the result vector exists, that R has to make sure the reference
remains valid, and hence can't change the memory allocation of the
result vector in any way.  I don't care what else happens to the
contents of the vector, as long as I get to control when it is
released.  It is ok with me if the user modifies the result vector
in-place, since my reference stays valid.  So I don't actually care
how [ and [<- work.

I think the only undocumented thing I'm relying on, is that the memory
manager doesn't pay attention to the LENGTH of objects that it isn't
actively doing anything to.  Currently, it actually only uses LENGTH
in one spot: for updating R_LargeVallocSize when a large vector is
released.  The true allocation sizes for individual objects are always
kept in another place (either by malloc, or in the node class of the
object).

It seems like in this limited usage, SETLENGTH does represent a useful
feature, by permitting safe over-allocation of a protected object, and
might be worth preserving (and documenting) for that purpose.  

Of course, the real problem here is the semantics of textConnection(),
which make life much more difficult and can't be changed because they
are specified outside of R.

-- Dave

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A memory management question

2005-09-05 Thread Luke Tierney
On Mon, 5 Sep 2005, [EMAIL PROTECTED] wrote:

> Luke Tierney <[EMAIL PROTECTED]> wrote:
>
>> It might or might not work now but is not guaranteed to do so reliably
>> in the future.  Seeing the risks of leaving SETLENGTH exposed, it is
>> very likely that SETLENGTH will be removed from the sources after the
>> 2.2.0 release.
>
>> If you provide your own methods to read and write the external pointer
>> then you don' need this; this is safer than relying on undocumented
>> behavior of [ and [<- in any case.  You also then don't need to use
>> R_PreserveObject unless you really need to use it from the C level
>> outside of a context where an R reference exists.
>
> I'm not sure I follow this.  Maybe I should explain the context for
> the problem.
>
> textConnection("xyz", "w") creates a connection, the output of which
> is deposited in a char vector named "xyz", which is updated line by
> line as output is sent to the connection.  The current code maintains
> a pointer to "xyz" in the form of an unprotected SEXP.  Hence if the
> user does rm(xyz), bad things happen.  A small bug, I admit.
>
> I think the best fix is to use a protected reference to the result
> vector.  I think this is safe and doesn't rely on any abuse of the
> interfaces.
> 
> There's also a performance issue, that the result is updated after
> every line of output, resulting in a vast amount of copying if a large
> result is accumulated.  This is the part that could be fixed by using
> SETLENGTH to manage the length of the protected result vector.
>
> I'm not sure what you mean by undocumented behavior of [ and [<-.  I
> think all I'm relying on is that as long as an outstanding reference
> to the result vector exists, that R has to make sure the reference
> remains valid, and hence can't change the memory allocation of the
> result vector in any way.  I don't care what else happens to the
> contents of the vector, as long as I get to control when it is
> released.  It is ok with me if the user modifies the result vector
> in-place, since my reference stays valid.  So I don't actually care
> how [ and [<- work.

It would have helped to explain what you are up to.  I had to guess
and guessed wrong, so forget the [ and [<- issue for now.

> I think the only undocumented thing I'm relying on, is that the memory
> manager doesn't pay attention to the LENGTH of objects that it isn't
> actively doing anything to.  Currently, it actually only uses LENGTH
> in one spot: for updating R_LargeVallocSize when a large vector is
> released.  The true allocation sizes for individual objects are always
> kept in another place (either by malloc, or in the node class of the
> object).
>
> It seems like in this limited usage, SETLENGTH does represent a useful
> feature, by permitting safe over-allocation of a protected object, and
> might be worth preserving (and documenting) for that purpose.

I am not comfortable making this available at this point.  It might be
useful to have but would need careful thought.  Without some way to
find out the true length there are potential problems.  Without some
way of making sure the fields in VECSXP and STRSXP that are added are
valid there are potential problems (not the first time but if the size
is shrunk and then increased).  Not that this can't be resolved but it
would take time that I don't have now, and this isn't high priority
enough to schedule in the near future.  So for now you should not use
SETLENGTH if you want your code to work beyond 2.2.0.

> Of course, the real problem here is the semantics of textConnection(),
> which make life much more difficult and can't be changed because they
> are specified outside of R.

It may be possible to expand the semantics by adding a logical
argument that controls whether the vector is to be over-allocated and
filled with zero length strings and truncated to the true length on
close.  Another variant would be to have a logical argument that says
to keep the input internally and provide a function, say
textConnectionOutput, to retrieve the internal output.  I would then
use a linked list internally.  The semantics of close complicate this
a bit; this function would probably need to optionally close the
connection to get a final complete line.

luke

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:  [EMAIL PROTECTED]
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Writing R-extensions

2005-09-05 Thread Luke Tierney
On Mon, 5 Sep 2005, Berwin A Turlach wrote:

> G'day Luke,
>
>> "LT" == Luke Tierney <[EMAIL PROTECTED]> writes:
>
>>> On Sat, 27 Aug 2005, Berwin A Turlach wrote:
>
>>>> 3) The final sentence in the section on `Registering S3
>>>> methods' is:
>>>>
>>>> Any methods for a generic defined in a package that does not
>>>> use a name space should be exported, and the package defining
>>>> and exporting the method should be attached to the search path
>>>> if the methods are to be found.
>>>>
>>>> [...] is the implication of that sentence that if I have a
>>>> package with a name space which defines a method for a generic
>>>> defined in another package that does not use a name space,
>>>> then this method is only found if my package is attached to
>>>> the search path and mere loading of the namespace is not
>>>> sufficient?
>
>LT> There is no typo here and your reading in the paragraph above
>LT> is correct.
> Thanks for the clarification.
>
> May I suggest that nevertheless there is a typo in this sentence and
> it should be " the package defining and exporting the methods..."?
> One reason why this sentence had me puzzled was that it uses twice
> "methods" and once "method". :)

All three are now "methods"

Best,

luke

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:  [EMAIL PROTECTED]
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] win.packages.html not found

2005-09-05 Thread Gabor Grothendieck
I am using Windows XP and

"R version 2.2.0, 2005-09-03"

and am getting the following message when I try to install or check a
package using
Rcmd check or Rcmd install:

Error in get(x, envir, mode, inherits) : variable "win.packages.html" was not fo
und

It seems to be looking for the indicated file but can't find it.  I
have not seen
this message before.  I am not sure if its related to 2.2.0 or something else.

Can anyone suggest how to proceed?

Thanks.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] win.packages.html not found

2005-09-05 Thread Gabor Grothendieck
To answer my own question I had mixed up my library paths and
it seemed that it was using the tools package from R 2.1 due to
this error even though I was using R 2.2.  Once I corrected that the
error message goes away.

On 9/5/05, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
> I am using Windows XP and
> 
> "R version 2.2.0, 2005-09-03"
> 
> and am getting the following message when I try to install or check a
> package using
> Rcmd check or Rcmd install:
> 
> Error in get(x, envir, mode, inherits) : variable "win.packages.html" was not 
> fo
> und
> 
> It seems to be looking for the indicated file but can't find it.  I
> have not seen
> this message before.  I am not sure if its related to 2.2.0 or something else.
> 
> Can anyone suggest how to proceed?
> 
> Thanks.
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Problem in R 2.2.0 with environments and [

2005-09-05 Thread Gabor Grothendieck
I have found a problem with R 2.2.0 under Windows XP.

Under R 2.1.1 patched I get the following result as expected.  First
we define a function f which displays the names of its arguments,
rather than their values.  We define a variable x whose value
is an environment and whose class is c("x", "environment").
f(x, x) then gives the expected result of "x" and "x".  If,
if we assign f to "[.x" then x[x] also gives "x" and "x" as 
expected under R 2.1.1 patched but _not_ under R 2.2.0.

First we show it under R 2.1.1 where everything works as expected:

> f <- function(x, y) { print(deparse(substitute(x))); 
> print(deparse(substitute(y))) }
> x <- .GlobalEnv
> class(x) <- c("x", "environment")
> f(x, x)
[1] "x"
[1] "x"
> "[.x" <- f
> x[x]  ## this is what we would have expected so its ok
[1] "x"
[1] "x"
> 
> R.version.string
[1] "R version 2.1.1, 2005-06-23"

Now lets repeat the above under R 2.2.0 and we see that f(x,x)
works as expected but not x[x] even though   "[.x" has set to equal f.
Unlike the situation in R 2.1.1 now f(x,x) and x[x] give
different results even though "[.x" was set to equal f.

> f <- function(x, y) { print(deparse(substitute(x))); 
> print(deparse(substitute(y))) }
> x <- .GlobalEnv
> class(x) <- c("x", "environment")
> f(x, x)
[1] "x"
[1] "x"
> # now x[x] and f(x,x) should give same result
> "[.x" <- f
> x[x]  ### does not give the same as f(x,x) 
[1] ""
[1] ""
> 
> R.version.string
[1] "R version 2.2.0, 2005-09-03"

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A memory management question

2005-09-05 Thread dhinds
Luke Tierney <[EMAIL PROTECTED]> wrote:

> I am not comfortable making this available at this point.  It might be
> useful to have but would need careful thought.  Without some way to
> find out the true length there are potential problems.  Without some
> way of making sure the fields in VECSXP and STRSXP that are added are
> valid there are potential problems (not the first time but if the size
> is shrunk and then increased).  Not that this can't be resolved but it
> would take time that I don't have now, and this isn't high priority
> enough to schedule in the near future.  So for now you should not use
> SETLENGTH if you want your code to work beyond 2.2.0.

Ok, that's fine... given the lack of other valid uses of SETLENGTH, it
doesn't seem worth preserving it just for this one debatable usage.

> It may be possible to expand the semantics by adding a logical
> argument that controls whether the vector is to be over-allocated and
> filled with zero length strings and truncated to the true length on
> close.  Another variant would be to have a logical argument that says
> to keep the input internally and provide a function, say
> textConnectionOutput, to retrieve the internal output.

These are possible... or optionally just don't reveal the intermediate
output at all, and just make the final result visible on close...

-- Dave

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Problem in R 2.2.0 with environments and [

2005-09-05 Thread Gabor Grothendieck
Sorry. I think this "problem" was actually the same one as my previous
post where I set my library path wrong.  Once I set it correctly both
versions worked fine.

On 9/5/05, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
> I have found a problem with R 2.2.0 under Windows XP.
> 
> Under R 2.1.1 patched I get the following result as expected.  First
> we define a function f which displays the names of its arguments,
> rather than their values.  We define a variable x whose value
> is an environment and whose class is c("x", "environment").
> f(x, x) then gives the expected result of "x" and "x".  If,
> if we assign f to "[.x" then x[x] also gives "x" and "x" as
> expected under R 2.1.1 patched but _not_ under R 2.2.0.
> 
> First we show it under R 2.1.1 where everything works as expected:
> 
> > f <- function(x, y) { print(deparse(substitute(x))); 
> > print(deparse(substitute(y))) }
> > x <- .GlobalEnv
> > class(x) <- c("x", "environment")
> > f(x, x)
> [1] "x"
> [1] "x"
> > "[.x" <- f
> > x[x]  ## this is what we would have expected so its ok
> [1] "x"
> [1] "x"
> >
> > R.version.string
> [1] "R version 2.1.1, 2005-06-23"
> 
> Now lets repeat the above under R 2.2.0 and we see that f(x,x)
> works as expected but not x[x] even though   "[.x" has set to equal f.
> Unlike the situation in R 2.1.1 now f(x,x) and x[x] give
> different results even though "[.x" was set to equal f.
> 
> > f <- function(x, y) { print(deparse(substitute(x))); 
> > print(deparse(substitute(y))) }
> > x <- .GlobalEnv
> > class(x) <- c("x", "environment")
> > f(x, x)
> [1] "x"
> [1] "x"
> > # now x[x] and f(x,x) should give same result
> > "[.x" <- f
> > x[x]  ### does not give the same as f(x,x)
> [1] ""
> [1] ""
> >
> > R.version.string
> [1] "R version 2.2.0, 2005-09-03"
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel