Re: [Rd] typo in `eurodist'

2005-12-09 Thread Jari Oksanen
Dear all,

There really seem to be many exciting issues in spelling and in
detecting spelling errors. However, a more disturbing feature in
'eurodist' to me is that the distances seem to be wrong. There are
several cases where the triangle inequality is violated so that a trip
from A to B is shorter when you make a detour via X instead of going
directly (see require(fortunes); fortune("eurodist") for an example). A
quick look revealed that you can find such a shorter detour for 104 of
210 "distances" of 'eurodist'. There is no guarantee that these shortest
path distances would be correct, but at least they are metric.

Just for fun, here are the differences between actual eurodist's and
shortest paths among the towns in the eurodist data:

Athens Barcelona Brussels Calais Cherbourg
Barcelona 1036
Brussels   635 0
Calais 705130
Cherbourg  819 00  0
Cologne448   1390  0 0
Copenhagen 507   459  525537   545
Geneva 879 00  0 0
Gibralta  1037 00  0 2
Hamburg438   2140  0 0
Hook of Holland530 00  0 0
Lisbon1623 1  216135 0
Lyons 1022 00  0 0
Madrid1036 00  0 0
Marseilles1037 01  0 0
Milan  879410 1092
Munich 445610 26 0
Paris  798 00  0 0
Rome 0 00  991
Stockholm  508   459  525537   546
Vienna   070   32 35 0
Cologne Copenhagen Geneva Gibralta Hamburg
Barcelona
Brussels
Calais
Cherbourg
Cologne
Copenhagen  222
Geneva  790300
Gibralta  0499  0
Hamburg   0  0  0   49
Hook of Holland   0  0 460   0
Lisbon  3986626000 334
Lyons 0327  00   0
Madrid   26499  00  48
Marseilles1327  00   0
Milan 0171  0   40 102
Munich0  0  0   89   0
Paris 0450  00   0
Rome  0 98 810  29
Stockholm   215  0300  539   0
Vienna0  0  0   70   0
Hook of Holland Lisbon Lyons Madrid Marseilles
Barcelona
Brussels
Calais
Cherbourg
Cologne
Copenhagen
Geneva
Gibralta
Hamburg
Hook of Holland
Lisbon  240
Lyons 1  0
Madrid0  0 0
Marseilles1264 0  0
Milan 1744 0115  0
Munich067065 70160
Paris 0150 0  0  1
Rome  0608   134  1  0
Stockholm   581272   327539327
Vienna067270 41  0
Milan Munich Paris Rome Stockholm
Barcelona
Brussels
Calais
Cherbourg
Cologne
Copenhagen
Geneva
Gibralta
Hamburg
Hook of Holland
Lisbon
Lyons
Madrid
Marseilles
Milan
Munich  0
Paris  57  0
Rome0 2991
Stockholm 171  0   451  105
Vienna139  0 00 1

It seems that "marginal" towns (Athens, Lisbon, Stockholm, Copenhagen)
have largest discrepancies.

It also seems that the names are not 'localized', but weird English
forms are used for places like København and Wien so dear to the R core
developers.

cheers, jari oksanen

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Blocking problem with embeded R (windows)

2005-12-09 Thread Simon Knapp
Hi all,

I am trying to make calls to R from an MFC application running on XP
and am having problems blocking the application while the call
executes.

I have tried the following approaches to using R from the application
(note that I set a wait cursor while R is executing).

1) call rcmd in BATCH mode using system(). This works well, except
that I get the cmd window popping up... which makes the app look
pretty tacky.

2) use the com interface. This works OK... sometimes. When I call
R_Proxy_evaluate_noreturn by pressing OK in the dialog that starts the
execution, if the cursor happens to be over the applications window
when the dialog disappears, then I get my wait cursor and the
application blocks. If the cursor is not over the applications window,
then I don't get the wait cursor and the application seems to block
after the first mouse click within the applications window.

3) use Rproxy.dll directly. The application does not block and I don't
get a wait cursor at all.

4) integrate the code used by Rproxy into my application (in
verbatim). The application does not block and I don't get a wait
cursor at all.

The things I have read about DLLs make statements like "a dll is just
code and data loaded into your applications process", which I have
taken to imply that the application should block while R is executing.
This also seems to be implied by the discussion around the rtest
example r-ext.pdf.

Can someone offer any advice on whether there is some way to make my
application block when configuring R? If not, is there a simple way to
make the app block (I have never coded using threads before, am a
relative newbie to MFC and am struggling to figure out how I would to
block otherwise).

Help would be greatly appreciated,
Simon Knapp

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Blocking problem with embeded R (windows)

2005-12-09 Thread Prof Brian Ripley
You are not calling R, but rproxy.dll, part of a (D)COM interface.  Try 
calling R itself (via R.dll).

On Fri, 9 Dec 2005, Simon Knapp wrote:

> Hi all,
>
> I am trying to make calls to R from an MFC application running on XP
> and am having problems blocking the application while the call
> executes.
>
> I have tried the following approaches to using R from the application
> (note that I set a wait cursor while R is executing).
>
> 1) call rcmd in BATCH mode using system(). This works well, except
> that I get the cmd window popping up... which makes the app look
> pretty tacky.
>
> 2) use the com interface. This works OK... sometimes. When I call
> R_Proxy_evaluate_noreturn by pressing OK in the dialog that starts the
> execution, if the cursor happens to be over the applications window
> when the dialog disappears, then I get my wait cursor and the
> application blocks. If the cursor is not over the applications window,
> then I don't get the wait cursor and the application seems to block
> after the first mouse click within the applications window.
>
> 3) use Rproxy.dll directly. The application does not block and I don't
> get a wait cursor at all.
>
> 4) integrate the code used by Rproxy into my application (in
> verbatim). The application does not block and I don't get a wait
> cursor at all.
>
> The things I have read about DLLs make statements like "a dll is just
> code and data loaded into your applications process", which I have
> taken to imply that the application should block while R is executing.
> This also seems to be implied by the discussion around the rtest
> example r-ext.pdf.
>
> Can someone offer any advice on whether there is some way to make my
> application block when configuring R? If not, is there a simple way to
> make the app block (I have never coded using threads before, am a
> relative newbie to MFC and am struggling to figure out how I would to
> block otherwise).
>
> Help would be greatly appreciated,
> Simon Knapp
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Blocking problem with embeded R (windows)

2005-12-09 Thread Simon Knapp
Thanks for that rapid reply!

In the fourth approach, I have compiled the functions R_proxy_init(), 
R_proxy_evaluate_noreturn(), R_proxy_term() and the callback functions 
defined in Baiers code directly into my application (I got them from the R 
source distribution and commented out the other functions). Rproxy.dll is 
not on my path.

When I look through the R_proxy_init() it does the same things that are done 
in the rtest example (as far as I can tell). Hence, I thought that I was 
calling R itself when initialising the dll.

I am using Baiers function R_proxy_evaluate_noreturn() because it seemed 
wiser to use code that was written by someone who knows what they are doing 
than roll my own! I don't understand enough about R IO functions to feel 
comfortable using them and am having trouble finding doco on them. I'm 
slowly learning about them, and the rest of R, by reading the code. Is there 
any doco on around on these and the R source in general?

Thanks again for the rapid reply
Simon Knapp

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] How to implement package-specific options?

2005-12-09 Thread Bjørn-Helge Mevik
Dear developeRs,

What is the preferred way to implement package-specific options?

Should one simply use options() -- e.g. options(myoption = myvalue)?
(And how should one document such options?)

Or is it better to implement a separate mechanismn, perhaps something
like ps.options()?


-- 
Bjørn-Helge Mevik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to implement package-specific options?

2005-12-09 Thread Prof Brian Ripley

On Fri, 9 Dec 2005, Bjørn-Helge Mevik wrote:


Dear developeRs,

What is the preferred way to implement package-specific options?

Should one simply use options() -- e.g. options(myoption = myvalue)?
(And how should one document such options?)


You can, but documentation is the problem.  It is possible to have your 
own options.Rd and help() will detect this and report there are two or 
more, but end-users may be confused.



Or is it better to implement a separate mechanismn, perhaps something
like ps.options()?


I think package sm() has a good solution.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] ID for machine?

2005-12-09 Thread Duncan Murdoch
Does R have a function to obtain a name of the machine that it is 
running on?  I'm going to be writing results to a database from several 
different machines, and I'd like to be able to identify where they came 
from.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ID for machine?

2005-12-09 Thread McGehee, Robert
Not an R function, per se, but
> system("uname -n", intern = TRUE)
returns my computer's network node hostname on both Windows and Linux.

You can use 'uname -a' for more information.

-Original Message-
From: Duncan Murdoch [mailto:[EMAIL PROTECTED] 
Sent: Friday, December 09, 2005 9:31 AM
To: R-devel
Subject: [Rd] ID for machine?


Does R have a function to obtain a name of the machine that it is 
running on?  I'm going to be writing results to a database from several 
different machines, and I'd like to be able to identify where they came 
from.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ID for machine?

2005-12-09 Thread Dimitris Rizopoulos
maybe

Sys.info()["nodename"]

could be helpfull in Windows.

Best,
Dimitris



Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


- Original Message - 
From: "Duncan Murdoch" <[EMAIL PROTECTED]>
To: "R-devel" 
Sent: Friday, December 09, 2005 3:31 PM
Subject: [Rd] ID for machine?


> Does R have a function to obtain a name of the machine that it is
> running on?  I'm going to be writing results to a database from 
> several
> different machines, and I'd like to be able to identify where they 
> came
> from.
>
> Duncan Murdoch
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ID for machine?

2005-12-09 Thread Seth Falcon
On  9 Dec 2005, [EMAIL PROTECTED] wrote:
> Does R have a function to obtain a name of the machine that it is
> running on?  I'm going to be writing results to a database from
> several different machines, and I'd like to be able to identify
> where they came from.

Sys.info has nodename which should be the hostname.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ID for machine?

2005-12-09 Thread Janusz Kawczak
How about something as trivial as
>system('hostname')? Is this for the Winds system?

Janusz.

On Fri, 9 Dec 2005, Duncan Murdoch wrote:

> Does R have a function to obtain a name of the machine that it is
> running on?  I'm going to be writing results to a database from several
> different machines, and I'd like to be able to identify where they came
> from.
>
> Duncan Murdoch
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ID for machine?

2005-12-09 Thread Duncan Murdoch
On 12/9/2005 9:48 AM, Dimitris Rizopoulos wrote:
> maybe
> 
> Sys.info()["nodename"]
> 
> could be helpfull in Windows.

Thanks to all who replied.  This seems like the most portable solution. 
  (The code will be running on all sorts of machines, and this isn't 
guaranteed to work, but I think it's pretty likely to, whereas some of 
the system() calls are a little more fragile.)

Duncan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] qt for df < 1

2005-12-09 Thread Luke Tierney
On Thu, 8 Dec 2005, Peter Dalgaard wrote:

> roger koenker <[EMAIL PROTECTED]> writes:
>
>> I was experimenting yesterday with a binomial make.link option
>> for estimating student t binary response models, tentatively
>> called gossit, and I noticed eventually that the R qt function doesn't
>> like df < 1.  Vaguely recalling that Splus didn't seem to mind such
>> weirdness,  I checked on our soon to be defunct Splus6.2 and
>> sure enough, it produced plausible answers instead of R's NA's.
>> Of course, I have no way of judging the quality of these answers,
>> but I'm curious about whether someone has already looked into
>> this can of worms.
>
> Well the help page has:
>
> For 'qt' only values of at least one are currently supported.
>
> and someone must have written that...
>
> R does have pt for df < 1, so a temporary fix using uniroot() seems
> doable.
>
>

Something like

 qqt<-function(p,df) sign(p-0.5)*sqrt(qf(1-2*pmin(p,1-p),1,df))

seems to do reasonably, at least in terms of consistency with pt, down
to 0.2 or mayby 0.1 df based on

 f<-function(d, n = 101) {
x<-seq(0, 1, len = n)
max(abs(pt(qqt(x, d), d) - x))
 }
 plot(function(d) log10(sapply(d, f)), .01,1)

luke

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
Actuarial Science
241 Schaeffer Hall  email:  [EMAIL PROTECTED]
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] data.frame() size

2005-12-09 Thread Matthew Dowle

Hi,

Please see below for post on r-help regarding data.frame() and the
possibility of dropping rownames, for space and time reasons.
I've made some changes, attached, and it seems to be working well. I see the
expected space (90% saved) and time (10 times faster) savings. There are no
doubt some bugs, and needs more work and testing, but I thought I would post
first at this stage.

Could some changes along these lines be made to R ? I'm happy to help with
testing and further work if required. In the meantime I can work with
overloaded functions which fixes the problems in my case.

Functions effected :

   dim.data.frame
   format.data.frame
   print.data.frame
   data.frame
   [.data.frame
   as.matrix.data.frame

Modified source code attached.

Regards,
Matthew


-Original Message-
From: Matthew Dowle 
Sent: 09 December 2005 09:44
To: 'Peter Dalgaard'
Cc: 'r-help@stat.math.ethz.ch'
Subject: RE: [R] data.frame() size



That explains it. Thanks. I don't need rownames though, as I'll only ever
use integer subscripts. Is there anyway to drop them, or even better not
create them in the first place? The memory saved (90%) by not having them
and 10 times speed up would be very useful. I think I need a data.frame
rather than a matrix because I have columns of different types in real life.

> rownames(d) = NULL
Error in "dimnames<-.data.frame"(`*tmp*`, value = list(NULL, c("a", "b" : 
invalid 'dimnames' given for data frame


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter
Dalgaard
Sent: 08 December 2005 18:57
To: Matthew Dowle
Cc: 'r-help@stat.math.ethz.ch'
Subject: Re: [R] data.frame() size


Matthew Dowle <[EMAIL PROTECTED]> writes:

> Hi,
> 
> In the example below why is d 10 times bigger than m, according to
> object.size ? It also takes around 10 times as long to create, which 
> fits with object.size() being truthful.  gcinfo(TRUE) also indicates a 
> great deal more garbage collector activity caused by data.frame() than 
> matrix().
> 
> $ R --vanilla
> 
> > nr = 100
> > system.time(m<<-matrix(integer(1), nrow=nr, ncol=2))
> [1] 0.22 0.01 0.23 0.00 0.00
> > system.time(d<<-data.frame(a=integer(nr), b=integer(nr)))
> [1] 2.81 0.20 3.01 0.00 0.00  # 10 times longer
> 
> > dim(m)
> [1] 100   2
> > dim(d)
> [1] 100   2   # same dimensions
> 
> > storage.mode(m)
> [1] "integer"
> > sapply(d, storage.mode)
> a b 
> "integer" "integer"   # same storage.mode
> 
> > object.size(m)/1024^2
> [1] 7.629616
> > object.size(d)/1024^2
> [1] 76.29482  # but 10 times bigger
> 
> > sum(sapply(d, object.size))/1024^2
> [1] 7.629501  # or is it ?If its not
> really 10 times bigger, why 10 times longer above ?

Row names!!


> r <- as.character(1:1e6)
> object.size(r)
[1] 7256
> object.size(r)/1024^2
[1] 68.6646

'nuff said?

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] qt for df < 1

2005-12-09 Thread roger koenker
On Dec 9, 2005, at 10:05 AM, Luke Tierney wrote:

> On Thu, 8 Dec 2005, Peter Dalgaard wrote:
>
>> roger koenker <[EMAIL PROTECTED]> writes:
>>
>>> I was experimenting yesterday with a binomial make.link option
>>> for estimating student t binary response models, tentatively
>>> called gossit, and I noticed eventually that the R qt function  
>>> doesn't
>>> like df < 1.  Vaguely recalling that Splus didn't seem to mind such
>>> weirdness,  I checked on our soon to be defunct Splus6.2 and
>>> sure enough, it produced plausible answers instead of R's NA's.
>>> Of course, I have no way of judging the quality of these answers,
>>> but I'm curious about whether someone has already looked into
>>> this can of worms.
>>
>> Well the help page has:
>>
>> For 'qt' only values of at least one are currently supported.
>>
>> and someone must have written that...
>>
>> R does have pt for df < 1, so a temporary fix using uniroot() seems
>> doable.
>>
>>
>
> Something like
>
> qqt<-function(p,df) sign(p-0.5)*sqrt(qf(1-2*pmin(p,1-p),1,df))
>
> seems to do reasonably, at least in terms of consistency with pt, down
> to 0.2 or mayby 0.1 df based on
>
> f<-function(d, n = 101) {
>   x<-seq(0, 1, len = n)
>   max(abs(pt(qqt(x, d), d) - x))
> }
> plot(function(d) log10(sapply(d, f)), .01,1)
>
> luke

Splus6.2 seems to be using just this approach based on similar testing.

Pure schadenfreude makes it hard to resist mentioning that in Splus6.2
qf(0, df1, df2)  gives -Inf  rather than 0, which caused some  
difficulties initially with my
attempt to replicate the comparison.

Roger


>
> -- 
> Luke Tierney
> Chair, Statistics and Actuarial Science
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
>Actuarial Science
> 241 Schaeffer Hall  email:  [EMAIL PROTECTED]
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] data.frame() size

2005-12-09 Thread Gabor Grothendieck
There was nothing attached in the copy that came through
to me.

By the way, there was some discussion earlier this year
on a light-weight data.frame class but I don't think anyone
ever posted any code.

On 12/9/05, Matthew Dowle <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> Please see below for post on r-help regarding data.frame() and the
> possibility of dropping rownames, for space and time reasons.
> I've made some changes, attached, and it seems to be working well. I see the
> expected space (90% saved) and time (10 times faster) savings. There are no
> doubt some bugs, and needs more work and testing, but I thought I would post
> first at this stage.
>
> Could some changes along these lines be made to R ? I'm happy to help with
> testing and further work if required. In the meantime I can work with
> overloaded functions which fixes the problems in my case.
>
> Functions effected :
>
>   dim.data.frame
>   format.data.frame
>   print.data.frame
>   data.frame
>   [.data.frame
>   as.matrix.data.frame
>
> Modified source code attached.
>
> Regards,
> Matthew
>
>
> -Original Message-
> From: Matthew Dowle
> Sent: 09 December 2005 09:44
> To: 'Peter Dalgaard'
> Cc: 'r-help@stat.math.ethz.ch'
> Subject: RE: [R] data.frame() size
>
>
>
> That explains it. Thanks. I don't need rownames though, as I'll only ever
> use integer subscripts. Is there anyway to drop them, or even better not
> create them in the first place? The memory saved (90%) by not having them
> and 10 times speed up would be very useful. I think I need a data.frame
> rather than a matrix because I have columns of different types in real life.
>
> > rownames(d) = NULL
> Error in "dimnames<-.data.frame"(`*tmp*`, value = list(NULL, c("a", "b" :
>invalid 'dimnames' given for data frame
>
>
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter
> Dalgaard
> Sent: 08 December 2005 18:57
> To: Matthew Dowle
> Cc: 'r-help@stat.math.ethz.ch'
> Subject: Re: [R] data.frame() size
>
>
> Matthew Dowle <[EMAIL PROTECTED]> writes:
>
> > Hi,
> >
> > In the example below why is d 10 times bigger than m, according to
> > object.size ? It also takes around 10 times as long to create, which
> > fits with object.size() being truthful.  gcinfo(TRUE) also indicates a
> > great deal more garbage collector activity caused by data.frame() than
> > matrix().
> >
> > $ R --vanilla
> > 
> > > nr = 100
> > > system.time(m<<-matrix(integer(1), nrow=nr, ncol=2))
> > [1] 0.22 0.01 0.23 0.00 0.00
> > > system.time(d<<-data.frame(a=integer(nr), b=integer(nr)))
> > [1] 2.81 0.20 3.01 0.00 0.00  # 10 times longer
> >
> > > dim(m)
> > [1] 100   2
> > > dim(d)
> > [1] 100   2   # same dimensions
> >
> > > storage.mode(m)
> > [1] "integer"
> > > sapply(d, storage.mode)
> > a b
> > "integer" "integer"   # same storage.mode
> >
> > > object.size(m)/1024^2
> > [1] 7.629616
> > > object.size(d)/1024^2
> > [1] 76.29482  # but 10 times bigger
> >
> > > sum(sapply(d, object.size))/1024^2
> > [1] 7.629501  # or is it ?If its not
> > really 10 times bigger, why 10 times longer above ?
>
> Row names!!
>
>
> > r <- as.character(1:1e6)
> > object.size(r)
> [1] 7256
> > object.size(r)/1024^2
> [1] 68.6646
>
> 'nuff said?
>
> --
>   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
>  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
> ~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907
>
>
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] data.frame() size

2005-12-09 Thread Hin-Tak Leung
Gabor Grothendieck wrote:
> There was nothing attached in the copy that came through
> to me.

I like to see that patch also.

> By the way, there was some discussion earlier this year
> on a light-weight data.frame class but I don't think anyone
> ever posted any code.

It may have been me. I am working on a bit-packed data.frame
which only uses 2-bits per unit of data, so it is 4 units per RAWSXP.
(work in progress, nothing to show).

So I am very interested to see the patch.

Yes, I took a couple of weeks reading/learning where have all the
memory gone in data.frame. The rowname/column names allocation is
a bit stupid. Each rowname and each column name is a full
R object, so there is a 32(or 28) byte overhead just from managing
that, before the STRSXP for the actual string, which is another X bytes.
so for an 1 x N data.frame with integers for content, the
the content is 4-byte * N, but the rowname/columnname is 32 * N -ish.
(a 9x increase). Word is 32-bit on most people's machines, and
I am counting the extra one from which you have to keep the address
of each SEXPREC somewhere, so it is 7+1 = 8, if I understand it correctly.

Here is the relevant comment, quoted verbatum from around line 225 of 
"src/include/Rinternals.h":

/* The generational collector uses a reduced version of SEXPREC as a
header in vector nodes.  The layout MUST be kept consistent with
the SEXPREC definition.  The standard SEXPREC takes up 7 words on
most hardware; this reduced version should take up only 6 words.
In addition to slightly reducing memory use, this can lead to more
favorable data alignment on 32-bit architectures like the Intel
Pentium III where odd word alignment of doubles is allowed but much
less efficient than even word alignment. */

Hin-Tak Leung

> On 12/9/05, Matthew Dowle <[EMAIL PROTECTED]> wrote:
> 
>>Hi,
>>
>>Please see below for post on r-help regarding data.frame() and the
>>possibility of dropping rownames, for space and time reasons.
>>I've made some changes, attached, and it seems to be working well. I see the
>>expected space (90% saved) and time (10 times faster) savings. There are no
>>doubt some bugs, and needs more work and testing, but I thought I would post
>>first at this stage.
>>
>>Could some changes along these lines be made to R ? I'm happy to help with
>>testing and further work if required. In the meantime I can work with
>>overloaded functions which fixes the problems in my case.
>>
>>Functions effected :
>>
>>  dim.data.frame
>>  format.data.frame
>>  print.data.frame
>>  data.frame
>>  [.data.frame
>>  as.matrix.data.frame
>>
>>Modified source code attached.
>>
>>Regards,
>>Matthew
>>
>>
>>-Original Message-
>>From: Matthew Dowle
>>Sent: 09 December 2005 09:44
>>To: 'Peter Dalgaard'
>>Cc: 'r-help@stat.math.ethz.ch'
>>Subject: RE: [R] data.frame() size
>>
>>
>>
>>That explains it. Thanks. I don't need rownames though, as I'll only ever
>>use integer subscripts. Is there anyway to drop them, or even better not
>>create them in the first place? The memory saved (90%) by not having them
>>and 10 times speed up would be very useful. I think I need a data.frame
>>rather than a matrix because I have columns of different types in real life.
>>
>>
>>>rownames(d) = NULL
>>
>>Error in "dimnames<-.data.frame"(`*tmp*`, value = list(NULL, c("a", "b" :
>>   invalid 'dimnames' given for data frame
>>
>>
>>-Original Message-
>>From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter
>>Dalgaard
>>Sent: 08 December 2005 18:57
>>To: Matthew Dowle
>>Cc: 'r-help@stat.math.ethz.ch'
>>Subject: Re: [R] data.frame() size
>>
>>
>>Matthew Dowle <[EMAIL PROTECTED]> writes:
>>
>>
>>>Hi,
>>>
>>>In the example below why is d 10 times bigger than m, according to
>>>object.size ? It also takes around 10 times as long to create, which
>>>fits with object.size() being truthful.  gcinfo(TRUE) also indicates a
>>>great deal more garbage collector activity caused by data.frame() than
>>>matrix().
>>>
>>>$ R --vanilla
>>>
>>>
nr = 100
system.time(m<<-matrix(integer(1), nrow=nr, ncol=2))
>>>
>>>[1] 0.22 0.01 0.23 0.00 0.00
>>>
system.time(d<<-data.frame(a=integer(nr), b=integer(nr)))
>>>
>>>[1] 2.81 0.20 3.01 0.00 0.00  # 10 times longer
>>>
>>>
dim(m)
>>>
>>>[1] 100   2
>>>
dim(d)
>>>
>>>[1] 100   2   # same dimensions
>>>
>>>
storage.mode(m)
>>>
>>>[1] "integer"
>>>
sapply(d, storage.mode)
>>>
>>>a b
>>>"integer" "integer"   # same storage.mode
>>>
>>>
object.size(m)/1024^2
>>>
>>>[1] 7.629616
>>>
object.size(d)/1024^2
>>>
>>>[1] 76.29482  # but 10 times bigger
>>>
>>>
sum(sapply(d, object.size))/1024^2
>>>
>>>[1] 7.629501  # or is it ?If its not
>>>really 10 times bigger, why 10 times longer above ?
>>
>>Row names!!
>>
>>
>>
>>>r <- as.character(1:1e6)
>

Re: [Rd] [R] data.frame() size

2005-12-09 Thread Liaw, Andy
I believe Gabor was referring to this:

http://tolstoy.newcastle.edu.au/R/devel/05/05/0837.html

Andy

From: Hin-Tak Leung
> 
> Gabor Grothendieck wrote:
> > There was nothing attached in the copy that came through
> > to me.
> 
> I like to see that patch also.
> 
> > By the way, there was some discussion earlier this year
> > on a light-weight data.frame class but I don't think anyone
> > ever posted any code.
> 
> It may have been me. I am working on a bit-packed data.frame
> which only uses 2-bits per unit of data, so it is 4 units per RAWSXP.
> (work in progress, nothing to show).
> 
> So I am very interested to see the patch.
> 
> Yes, I took a couple of weeks reading/learning where have all the
> memory gone in data.frame. The rowname/column names allocation is
> a bit stupid. Each rowname and each column name is a full
> R object, so there is a 32(or 28) byte overhead just from managing
> that, before the STRSXP for the actual string, which is 
> another X bytes.
> so for an 1 x N data.frame with integers for content, the
> the content is 4-byte * N, but the rowname/columnname is 32 * N -ish.
> (a 9x increase). Word is 32-bit on most people's machines, and
> I am counting the extra one from which you have to keep the address
> of each SEXPREC somewhere, so it is 7+1 = 8, if I understand 
> it correctly.
> 
> Here is the relevant comment, quoted verbatum from around line 225 of 
> "src/include/Rinternals.h":
> 
> /* The generational collector uses a reduced version of SEXPREC as a
> header in vector nodes.  The layout MUST be kept consistent with
> the SEXPREC definition.  The standard SEXPREC takes up 7 words on
> most hardware; this reduced version should take up only 6 words.
> In addition to slightly reducing memory use, this can lead to more
> favorable data alignment on 32-bit architectures like the Intel
> Pentium III where odd word alignment of doubles is 
> allowed but much
> less efficient than even word alignment. */
> 
> Hin-Tak Leung
> 
> > On 12/9/05, Matthew Dowle <[EMAIL PROTECTED]> wrote:
> > 
> >>Hi,
> >>
> >>Please see below for post on r-help regarding data.frame() and the
> >>possibility of dropping rownames, for space and time reasons.
> >>I've made some changes, attached, and it seems to be 
> working well. I see the
> >>expected space (90% saved) and time (10 times faster) 
> savings. There are no
> >>doubt some bugs, and needs more work and testing, but I 
> thought I would post
> >>first at this stage.
> >>
> >>Could some changes along these lines be made to R ? I'm 
> happy to help with
> >>testing and further work if required. In the meantime I can 
> work with
> >>overloaded functions which fixes the problems in my case.
> >>
> >>Functions effected :
> >>
> >>  dim.data.frame
> >>  format.data.frame
> >>  print.data.frame
> >>  data.frame
> >>  [.data.frame
> >>  as.matrix.data.frame
> >>
> >>Modified source code attached.
> >>
> >>Regards,
> >>Matthew
> >>
> >>
> >>-Original Message-
> >>From: Matthew Dowle
> >>Sent: 09 December 2005 09:44
> >>To: 'Peter Dalgaard'
> >>Cc: 'r-help@stat.math.ethz.ch'
> >>Subject: RE: [R] data.frame() size
> >>
> >>
> >>
> >>That explains it. Thanks. I don't need rownames though, as 
> I'll only ever
> >>use integer subscripts. Is there anyway to drop them, or 
> even better not
> >>create them in the first place? The memory saved (90%) by 
> not having them
> >>and 10 times speed up would be very useful. I think I need 
> a data.frame
> >>rather than a matrix because I have columns of different 
> types in real life.
> >>
> >>
> >>>rownames(d) = NULL
> >>
> >>Error in "dimnames<-.data.frame"(`*tmp*`, value = 
> list(NULL, c("a", "b" :
> >>   invalid 'dimnames' given for data frame
> >>
> >>
> >>-Original Message-
> >>From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On 
> Behalf Of Peter
> >>Dalgaard
> >>Sent: 08 December 2005 18:57
> >>To: Matthew Dowle
> >>Cc: 'r-help@stat.math.ethz.ch'
> >>Subject: Re: [R] data.frame() size
> >>
> >>
> >>Matthew Dowle <[EMAIL PROTECTED]> writes:
> >>
> >>
> >>>Hi,
> >>>
> >>>In the example below why is d 10 times bigger than m, according to
> >>>object.size ? It also takes around 10 times as long to 
> create, which
> >>>fits with object.size() being truthful.  gcinfo(TRUE) also 
> indicates a
> >>>great deal more garbage collector activity caused by 
> data.frame() than
> >>>matrix().
> >>>
> >>>$ R --vanilla
> >>>
> >>>
> nr = 100
> system.time(m<<-matrix(integer(1), nrow=nr, ncol=2))
> >>>
> >>>[1] 0.22 0.01 0.23 0.00 0.00
> >>>
> system.time(d<<-data.frame(a=integer(nr), b=integer(nr)))
> >>>
> >>>[1] 2.81 0.20 3.01 0.00 0.00  # 10 times longer
> >>>
> >>>
> dim(m)
> >>>
> >>>[1] 100   2
> >>>
> dim(d)
> >>>
> >>>[1] 100   2   # same dimensions
> >>>
> >>>
> storage.mode(m)
> >>>
> >>>[1] "integer"
> >>>
> sapply(d, storage.mode)
> >>>
> >>>a b
>

[Rd] segfault following a detach

2005-12-09 Thread James Bullard
Hello, first off, thanks for all of the previous help; hopefully someone 
will have some insight on this question. I am attempting to track down a 
segmentation fault which occurs only after a detach(2) is called in the 
code (I have replaced the detach(2) with detach(package:DSA) and that 
fails as well (furthermore, I have removed the detach calls and it does 
not segfault)). It has proved difficult to track down (at least for me) 
because it does not happen when the call is made, detach returns and 
then some seconds (~ 30 seconds - 1 minute) later a segmentation fault 
occurrs. I have run it in the debugger and the backtrace is below. When 
I step through the code of do_detach it does not appear to be happening 
at any consistent location. I assume this means that some worker thread 
is involved, but the bactrace is not helpful (at least to me).

1.) Can I improve the backtrace message after the segfault to increase 
message potential.
2.) Can I set some breakpoints elsewhere which might be more instructive 
as I do not see much going on in do_detach? suggestions?

The library I am working with is in C and uses Nag, it uses the 
registration facilities, although I have the problem when I do not use 
the registration facilities. Specifically, I have defined the method: 
void R_init_DSA(DllInfo *info). However, as I said if I comment this out 
it appears to behave identically.

Also, I have run the whole test case using valgrind to see if I could 
track down the problem there (I assume I am trashing some of R's memory) 
however, the only messages I get from valgrind are below - all related 
to the registration code. It does not appear to seg fault when I run it 
in valgrind, but I have no idea why this would be the case as I am 
*very* new to valgrind.

I am a little out of my league here so any help would be greatly 
appreciated. OS and R version information is below. Thanks as always for 
all of the help.

thanks, jim

 > R.version
 
platform i686-pc-linux-gnu
arch i686
os   linux-gnu   
system   i686, linux-gnu 
status   
major2   
minor2.0 
year 2005
month10  
day  06  
svn rev  35749   
language R   


(gdb) backtrace
#0  0xb71655d0 in ?? ()
#1  0x0872fc70 in ?? ()
#2  0x0872fc58 in ?? ()
#3  0xb69b7ab8 in ?? ()
#4  0xb71654d5 in ?? ()
#5  0x in ?? ()
#6  0x in ?? ()
#7  0x4399ca09 in ?? ()
#8  0x in ?? ()
#9  0x in ?? ()
#10 0x in ?? ()
#11 0x0872fc18 in ?? ()
#12 0x08ee0fe0 in ?? ()
#13 0x in ?? ()
#14 0xb69c5c30 in __JCR_LIST__ () from /lib/tls/i686/cmov/libpthread.so.0
#15 0xb69b7b4c in ?? ()
#16 0xb69bcae0 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#17 0xb69bcae0 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#18 0xb7d09c9a in clone () from /lib/tls/i686/cmov/libc.so.6

---
- valgrind output, after detach(.) is called 
-
---
==20262== Conditional jump or move depends on uninitialised value(s)
==20262==at 0x1B92D888: R_getDLLRegisteredSymbol (Rdynload.c:665)
==20262==by 0x1B92D9C5: R_dlsym (Rdynload.c:735)
==20262==by 0x1B92D0BD: R_callDLLUnload (Rdynload.c:412)
==20262==by 0x1B92D15B: DeleteDLL (Rdynload.c:439)
==20262==
==20262== Conditional jump or move depends on uninitialised value(s)
==20262==at 0x1B92D8D2: R_getDLLRegisteredSymbol (Rdynload.c:681)
==20262==by 0x1B92D9C5: R_dlsym (Rdynload.c:735)
==20262==by 0x1B92D0BD: R_callDLLUnload (Rdynload.c:412)
==20262==by 0x1B92D15B: DeleteDLL (Rdynload.c:439)
==20262==
==20262== Conditional jump or move depends on uninitialised value(s)
==20262==at 0x1B92D8D7: R_getDLLRegisteredSymbol (Rdynload.c:681)
==20262==by 0x1B92D9C5: R_dlsym (Rdynload.c:735)
==20262==by 0x1B92D0BD: R_callDLLUnload (Rdynload.c:412)
==20262==by 0x1B92D15B: DeleteDLL (Rdynload.c:439)
==20262==
==20262== Conditional jump or move depends on uninitialised value(s)
==20262==at 0x1B92D8DB: R_getDLLRegisteredSymbol (Rdynload.c:696)
==20262==by 0x1B92D9C5: R_dlsym (Rdynload.c:735)
==20262==by 0x1B92D0BD: R_callDLLUnload (Rdynload.c:412)
==20262==by 0x1B92D15B: DeleteDLL (Rdynload.c:439)
==20262==
==20262== Conditional jump or move depends on uninitialised value(s)
==20262==at 0x1B92D8E0: R_getDLLRegisteredSymbol (Rdynload.c:696)
==20262==by 0x1B92D9C5: R_dlsym (Rdynload.c:735)
==20262==by 0x1B92D0BD: R_callDLLUnload (Rdynload.c:412)
==20262==by 0x1B92D15B: DeleteDLL (Rdynload.c:439)
==20262==
==20262== Conditional jump or move depends on uninitia

Re: [Rd] segfault following a detach

2005-12-09 Thread Duncan Temple Lang
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Hi Jim,

 Can you send me a copy of the package and an R script
that causes the problem and I'll take a look at it.

 D.

James Bullard wrote:
> Hello, first off, thanks for all of the previous help; hopefully someone 
> will have some insight on this question. I am attempting to track down a 
> segmentation fault which occurs only after a detach(2) is called in the 
> code (I have replaced the detach(2) with detach(package:DSA) and that 
> fails as well (furthermore, I have removed the detach calls and it does 
> not segfault)). It has proved difficult to track down (at least for me) 
> because it does not happen when the call is made, detach returns and 
> then some seconds (~ 30 seconds - 1 minute) later a segmentation fault 
> occurrs. I have run it in the debugger and the backtrace is below. When 
> I step through the code of do_detach it does not appear to be happening 
> at any consistent location. I assume this means that some worker thread 
> is involved, but the bactrace is not helpful (at least to me).
> 
> 1.) Can I improve the backtrace message after the segfault to increase 
> message potential.
> 2.) Can I set some breakpoints elsewhere which might be more instructive 
> as I do not see much going on in do_detach? suggestions?
> 
> The library I am working with is in C and uses Nag, it uses the 
> registration facilities, although I have the problem when I do not use 
> the registration facilities. Specifically, I have defined the method: 
> void R_init_DSA(DllInfo *info). However, as I said if I comment this out 
> it appears to behave identically.
> 
> Also, I have run the whole test case using valgrind to see if I could 
> track down the problem there (I assume I am trashing some of R's memory) 
> however, the only messages I get from valgrind are below - all related 
> to the registration code. It does not appear to seg fault when I run it 
> in valgrind, but I have no idea why this would be the case as I am 
> *very* new to valgrind.
> 
> I am a little out of my league here so any help would be greatly 
> appreciated. OS and R version information is below. Thanks as always for 
> all of the help.
> 
> thanks, jim
> 
>  > R.version
>  
> platform i686-pc-linux-gnu
> arch i686
> os   linux-gnu   
> system   i686, linux-gnu 
> status   
> major2   
> minor2.0 
> year 2005
> month10  
> day  06  
> svn rev  35749   
> language R   
> 
> 
> (gdb) backtrace
> #0  0xb71655d0 in ?? ()
> #1  0x0872fc70 in ?? ()
> #2  0x0872fc58 in ?? ()
> #3  0xb69b7ab8 in ?? ()
> #4  0xb71654d5 in ?? ()
> #5  0x in ?? ()
> #6  0x in ?? ()
> #7  0x4399ca09 in ?? ()
> #8  0x in ?? ()
> #9  0x in ?? ()
> #10 0x in ?? ()
> #11 0x0872fc18 in ?? ()
> #12 0x08ee0fe0 in ?? ()
> #13 0x in ?? ()
> #14 0xb69c5c30 in __JCR_LIST__ () from /lib/tls/i686/cmov/libpthread.so.0
> #15 0xb69b7b4c in ?? ()
> #16 0xb69bcae0 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
> #17 0xb69bcae0 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
> #18 0xb7d09c9a in clone () from /lib/tls/i686/cmov/libc.so.6
> 
> ---
> - valgrind output, after detach(.) is called 
> -
> ---
> ==20262== Conditional jump or move depends on uninitialised value(s)
> ==20262==at 0x1B92D888: R_getDLLRegisteredSymbol (Rdynload.c:665)
> ==20262==by 0x1B92D9C5: R_dlsym (Rdynload.c:735)
> ==20262==by 0x1B92D0BD: R_callDLLUnload (Rdynload.c:412)
> ==20262==by 0x1B92D15B: DeleteDLL (Rdynload.c:439)
> ==20262==
> ==20262== Conditional jump or move depends on uninitialised value(s)
> ==20262==at 0x1B92D8D2: R_getDLLRegisteredSymbol (Rdynload.c:681)
> ==20262==by 0x1B92D9C5: R_dlsym (Rdynload.c:735)
> ==20262==by 0x1B92D0BD: R_callDLLUnload (Rdynload.c:412)
> ==20262==by 0x1B92D15B: DeleteDLL (Rdynload.c:439)
> ==20262==
> ==20262== Conditional jump or move depends on uninitialised value(s)
> ==20262==at 0x1B92D8D7: R_getDLLRegisteredSymbol (Rdynload.c:681)
> ==20262==by 0x1B92D9C5: R_dlsym (Rdynload.c:735)
> ==20262==by 0x1B92D0BD: R_callDLLUnload (Rdynload.c:412)
> ==20262==by 0x1B92D15B: DeleteDLL (Rdynload.c:439)
> ==20262==
> ==20262== Conditional jump or move depends on uninitialised value(s)
> ==20262==at 0x1B92D8DB: R_getDLLRegisteredSymbol (Rdynload.c:696)
> ==20262==by 0x1B92D9C5: R_dlsym (Rdynload.c:735)
> ==20262==by 0x1B92D0BD: R_callDLLUnload (Rdynload.c:412)
> ==20262==by 0x1B92D15B: DeleteDLL (Rdynl

[Rd] an Update on the "Woods" package--classification and constrained L1 regression for binary response

2005-12-09 Thread Izmirlian, Grant \(NIH/NCI\) [E]
Hello R-devel:

This is an update on my R package, "woods" that does bagged classification trees
using data structures in C. Most of the comments of my earlier post still
apply, with some additions (noted *)

(i) fits a single classification tree to dataset (R function CT)
   (ii) basic functionality of Random Forest, e.g. bagged trees with choices
about sample size, with/without replacement, size of (random) subset
of covariates drawn when nodes are split.  Result contains the oob 
votes,
and a matrix representing the forest structure.
 *(iii) for each element of the sample, discovers all unique paths from a root 
node to 
a terminal node as a sequence of splits on covariates and uses these to 
fit
a lasso regresssion to the binary response using a full 
c-implementation of
the Turlach lasso2 function gl1ce.

It is now available at http://mysite.verizon.net/izmirlian/woods_1.00.tar.gz


Grant Izmirlian
NCI

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] weights in nls

2005-12-09 Thread Patrick Burns
It would probably be more polite to give a warning
in 'nls' that the 'weights' argument is ignored.  Something
like the following should do:

if(missing(weights)) warning("weights are not currently implemented")

 > version
 _  
platform i386-pc-mingw32
arch i386   
os   mingw32
system   i386, mingw32  
status   Under development (unstable)
major2  
minor3.0
year 2005   
month12 
day  07 
svn rev  36656  
language R 


Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] external pointers

2005-12-09 Thread Ross Boylan
I have some C data I want to pass back to R opaquely, and then back to
C.  I understand external pointers are the way to do so.

I'm trying to find how they interact with garbage collection and object
lifetime, and what I need to do so that the memory lives until the
calling R process ends.

Could anyone give me some pointers?  I haven't found much documentation.
An earlier message suggested looking at simpleref.nw, but I can't find
that file.

So the overall pattern, from R, would look like
opaque <- setup(arg1, arg2, )  # setup calls a C fn
docompute(arg1, argb, opaque)  # many times. docompute also calls C
# and then when I return opaque and  the memory it's wrapping get
#cleaned up.  If necessary I could do
teardown(opaque)  # at the end

"C" is actually C++ via a C interface, if that matters.  In particular,
the memory allocated will likely be from the C++ run-time, and needs C++
destructors.

-- 
Ross Boylan  wk:  (415) 514-8146
185 Berry St #5700   [EMAIL PROTECTED]
Dept of Epidemiology and Biostatistics   fax: (415) 514-8150
University of California, San Francisco
San Francisco, CA 94107-1739 hm:  (415) 550-1062

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] external pointers

2005-12-09 Thread Byron Ellis
use a C finalizer...

void MyObject_finalize(SEXP opaque) {
MyObject *obj = (MyObject*)R_ExternalPtrAddr(opaque);
if(NULL != obj) delete obj;
}

and in your setup code...

PROTECT(p = R_MakeExternalPtr(...));
R_RegisterCFinalizer(p,MyObject_finalize);






On Dec 9, 2005, at 3:04 PM, Ross Boylan wrote:

> I have some C data I want to pass back to R opaquely, and then back to
> C.  I understand external pointers are the way to do so.
>
> I'm trying to find how they interact with garbage collection and  
> object
> lifetime, and what I need to do so that the memory lives until the
> calling R process ends.
>
> Could anyone give me some pointers?  I haven't found much  
> documentation.
> An earlier message suggested looking at simpleref.nw, but I can't find
> that file.
>
> So the overall pattern, from R, would look like
> opaque <- setup(arg1, arg2, )  # setup calls a C fn
> docompute(arg1, argb, opaque)  # many times. docompute also calls C
> # and then when I return opaque and  the memory it's wrapping get
> #cleaned up.  If necessary I could do
> teardown(opaque)  # at the end
>
> "C" is actually C++ via a C interface, if that matters.  In  
> particular,
> the memory allocated will likely be from the C++ run-time, and  
> needs C++
> destructors.
>
> -- 
> Ross Boylan  wk:  (415) 514-8146
> 185 Berry St #5700   [EMAIL PROTECTED]
> Dept of Epidemiology and Biostatistics   fax: (415) 514-8150
> University of California, San Francisco
> San Francisco, CA 94107-1739 hm:  (415) 550-1062
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

---
Byron Ellis ([EMAIL PROTECTED])
"Oook" -- The Librarian

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] segfault following a detach

2005-12-09 Thread Izmirlian, Grant \(NIH/NCI\) [E]
Jim:

This reminds me of problems I've had before, but usually they occur when I quit 
R
i.e. q(), because when testing and developing I can't remember actually 
detaching
a package. I can however think of countless times I get a segmentation fault 
upon 
quiting R. Usually this boils down to a hidden return argumen that is given an
insufficient allocation of memory. For example

  "foo" <- function(x, y, z){
 nx <- length(x)
 ny <- length(y)
 nz <- length(z)
 ans <- .C("bar",
x = as.double(x),
y = as.double(y),
z = as.double(z),
res1 = as.double(rep(0, nx)),
res2 = as.double(rep(0, nx*ny)),
PACKAGE = "FooBar")
 list(result = asn$res1)
}

Notice that only ans$res1 is returned so that it is easy to forget about 
ans$res2, 
as I have often done! Now suppose that the C routine actually needs nx*ny*nz 
space 
(say) for the pointer to double at the position indicated by res2 instead of 
just 
the nx*ny provided. Although you would expect a segmentation fault at runtime, 
it 
is my experience that sometimes the function completes and the segmentation 
fault 
doesn't happen until I quit R.

I hope that these comments are helpful,

Grant Izmirlian,
NCI

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] segfault following a detach

2005-12-09 Thread Byron Ellis
That sounds like your C function is smashing some of the header  
information in a chunk of memory somewhere past res2 so that cleanup  
during quit fails.

On Dec 9, 2005, at 4:28 PM, Izmirlian, Grant (NIH/NCI) [E] wrote:

> Jim:
>
> This reminds me of problems I've had before, but usually they occur  
> when I quit R
> i.e. q(), because when testing and developing I can't remember  
> actually detaching
> a package. I can however think of countless times I get a  
> segmentation fault upon
> quiting R. Usually this boils down to a hidden return argumen that  
> is given an
> insufficient allocation of memory. For example
>
>   "foo" <- function(x, y, z){
>  nx <- length(x)
>  ny <- length(y)
>  nz <- length(z)
>  ans <- .C("bar",
> x = as.double(x),
> y = as.double(y),
> z = as.double(z),
> res1 = as.double(rep(0, nx)),
> res2 = as.double(rep(0, nx*ny)),
> PACKAGE = "FooBar")
>  list(result = asn$res1)
> }
>
> Notice that only ans$res1 is returned so that it is easy to forget  
> about ans$res2,
> as I have often done! Now suppose that the C routine actually needs  
> nx*ny*nz space
> (say) for the pointer to double at the position indicated by res2  
> instead of just
> the nx*ny provided. Although you would expect a segmentation fault  
> at runtime, it
> is my experience that sometimes the function completes and the  
> segmentation fault
> doesn't happen until I quit R.
>
> I hope that these comments are helpful,
>
> Grant Izmirlian,
> NCI
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

---
Byron Ellis ([EMAIL PROTECTED])
"Oook" -- The Librarian

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel