[Rd] generic sample
Hi, When x is a data frame, sample(x) is not really sensible : R> sample(iris) # sends back all the iris data set. What about a generic sample function (from R-devel/src/library/base/R) with a method for class `data.frame` that would sample the row indexes and then do the subset. See the code below. Cheers, Romain sample <- function(x, ...) UseMethod("sample") sample.data.frame <- function(x, ...){ x[ sample(1:nrow(x), ...), ] } sample.default <- function(x, size, replace=FALSE, prob=NULL) { if(length(x) == 1 && x >= 1) { if(missing(size)) size <- x .Internal(sample(x, size, replace, prob)) } else { if(missing(size)) size <- length(x) x[.Internal(sample(length(x), size, replace, prob))] } } R> set.seed(4) R> sample(iris, 5) Sepal.Length Sepal.Width Petal.Length Petal.WidthSpecies 88 6.3 2.3 4.4 1.3 versicolor 24.9 3.0 1.4 0.2 setosa 44 5.0 3.5 1.6 0.6 setosa 41 5.0 3.5 1.3 0.3 setosa 119 7.7 2.6 6.9 2.3 virginica -- *mangosolutions* /data analysis that delivers/ Tel +44 1249 467 467 Fax +44 1249 467 468 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Retrieving function name
Hi, Does anyone know how I can retrieve a function name, for example If I have a function f as follows: f <- function( myfunc ) { print( name_of(myfunc) ); } I want to know what I should have as "name_of" such that I could call this with : f( median ) and it would print "median" or f( my_function_name ) and it would print "m_function_name". So far all I can get is the function definition that myfunc points to. I thought the structure command might do it but this also just gives back the function definition and not the original name. Any suggestions? Tom -- Dr. Thomas McCallum Systems Architect, Level E Limited ETTC, The King's Buildings Mayfield Road, Edinburgh EH9 3JL, UK Work +44 (0) 131 472 4813 Fax: +44 (0) 131 472 4719 http://www.levelelimited.com Email: [EMAIL PROTECTED] Level E is a limited company incorporated in Scotland. The contents of this e-mail are privileged and/or confidential. If you are not the intended recipient, please notify the sender and ensure this e-mail is deleted and not read, copied or disclosed. It is your responsibility to scan this e-mail and any attachments for computer viruses or other defects. Level E does not accept liability for any loss or damage which may result from this e-mail or any attachment. E-mail is not secure and can be intercepted, corrupted or amended. Level E does not accept liability for errors or omissions arising as a result of interrupted or defective transmission. Any views, opinions, conclusions or other information in this e-mail which do not relate to the business of Level E Limited are not authorised by Level E. Unless specifically stated and authorised by Level E, nothing in this e-mail shall be taken to be an offer or acceptance of any contract of any nature. E-mail entering or leaving Level E's system is subject to random monitoring and recording. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Retrieving function name
probably you're looking for: f <- function(FUN, x){ list(funName = deparse(substitute(FUN)), value = FUN(x)) } f(mean, rnorm(10)) f(median, rnorm(10)) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: "Tom McCallum" <[EMAIL PROTECTED]> To: Sent: Thursday, November 09, 2006 4:28 PM Subject: [Rd] Retrieving function name > Hi, > > Does anyone know how I can retrieve a function name, for example > > If I have a function f as follows: > > f <- function( myfunc ) { > print( name_of(myfunc) ); > } > > I want to know what I should have as "name_of" such that I could > call this > with : > f( median ) > and it would print "median" > > or f( my_function_name ) and it would print "m_function_name". > > So far all I can get is the function definition that myfunc points > to. > > I thought the structure command might do it but this also just gives > back > the function definition and not the original name. > > Any suggestions? > > Tom > > -- > Dr. Thomas McCallum > Systems Architect, > Level E Limited > ETTC, The King's Buildings > Mayfield Road, > Edinburgh EH9 3JL, UK > Work +44 (0) 131 472 4813 > Fax: +44 (0) 131 472 4719 > http://www.levelelimited.com > Email: [EMAIL PROTECTED] > > Level E is a limited company incorporated in Scotland. The contents > of > this e-mail are privileged and/or confidential. If you are not the > intended recipient, please > notify the sender and ensure this e-mail is deleted and not read, > copied > or disclosed. It is your responsibility to scan this e-mail and any > attachments for > computer viruses or other defects. Level E does not accept liability > for > any loss or damage which may result from this e-mail or any > attachment. > E-mail is not secure > and can be intercepted, corrupted or amended. Level E does not > accept > liability for errors or omissions arising as a result of interrupted > or > defective transmission. > Any views, opinions, conclusions or other information in this e-mail > which > do not relate to the business of Level E Limited are not authorised > by > Level E. Unless > specifically stated and authorised by Level E, nothing in this > e-mail > shall be taken to be an offer or acceptance of any contract of any > nature. > E-mail entering or leaving Level E's system is subject to random > monitoring and recording. > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Retrieving function name
"Tom McCallum" <[EMAIL PROTECTED]> writes: > Hi, > > Does anyone know how I can retrieve a function name, for example > > If I have a function f as follows: > > f <- function( myfunc ) { > print( name_of(myfunc) ); > } > > I want to know what I should have as "name_of" such that I could call this > with : > f( median ) > and it would print "median" > > or f( my_function_name ) and it would print "m_function_name". > > So far all I can get is the function definition that myfunc points to. > > I thought the structure command might do it but this also just gives back > the function definition and not the original name. > > Any suggestions? Depending on what you really want, this is either impossible or trivial. The trivial version is f <- function(x) print(deparse(substitute(x))) and the impossible one is to get something that prints "mean" if you do something like x<-1 f(switch(x, 1 = mean, 2 = median, 3 = sd, 4 = IQR)) or g <- function(foo) f(foo) g(mean) or indeed does anything sensible with f(function(x,y) x*y) Thing is, functions do not "have names", they can be anonymous or assigned to multiple names, or be passed as arguments. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] allocVector bug ?
On Wed, 8 Nov 2006, Vladimir Dergachev wrote: > On Wednesday 08 November 2006 12:56 pm, Luke Tierney wrote: >> On Mon, 6 Nov 2006, Vladimir Dergachev wrote: >>> Hi Luke, >>> >>> >>> I generally agree with this, however I believe that current logic breaks >>> down for large allocation sizes and my code ends up spending 70% (and up) >>> of computer time spinning inside garbage collector (I run oprofile to >>> observe what is going on). >> >> Again please be careful about these sorts of statements. I am sure >> there are bugs in the memory manager and places where things "break >> down" but this isn't one of them. The memory manager is quite >> deliberately biased towards keeping the total allocation low, if >> necessary at the expense of some extra gc overhead. This is needed if >> we want to use the same settings across a wide range of >> configurations, some of which have relatively little memory available >> (think student labs). The memory manager does try to learn about the >> needs of a session, and as a result triggering value get adjusted. It >> is not true that every large allocation causes a gc. This may be true >> _initially_, but once total memory usage stabilizes at a particular >> level it is no longer true (look at the way the heap limits are >> adjusted). >> >> This approach of adjusting based on usage within a session is >> reasonable and works well for longer sessions. It may not work well >> for short scripts that need large allocations. I doubt that any >> automated setting can work well in that situation while at the same >> time keeping memory usage in other settings low. So it may be useful >> to find ways of specifying a collection strategy appropriate for these >> situations. If you can send me a simplified version of your usage >> scenario then I will give this some thought and see if we can come up >> with some reasonable ways of allowing user code to tweak gc behavior >> for these situations. >> > > Hi Luke, > > Yes, I gladly concede the point that for a heuristic algorithm the notion > of what is a "bug" is murky (besides crashes, etc, which is not what I am not > talking about). > > Here is why I called this a bug: > > 1. My understanding is that each time gc() needs to increase memory it > performs a full garbage collection run. Right ? The allocation process does not call gc before every call to malloc. It only calls gc if the allocation would cross a threshold level. Those theshold levels are adjusted in an effort to compromise between keeping memory footprint low and not calling gc too often. The code you quote below is part of this adjustment process. If this process is working properly then as memory use grows there will initially be more gc activity and then less as the thresholds adjust. The command line arguments mentioned in ?Memory can be used to influence some of this behavior. Calling gcinfo(TRUE) will show you when the internally triggered collections occur, and which level of collection occurs. > > 2. This is not a problem with small memory sizes as they imply > (presumably) small number of objects. > > 3. However, if one wants to allocate many objects (say columns in a data > frame or just vectors) this results in large penalty > > Example 1: This simulates allocation of a data.frame with some character > columns which are assumed to be factors. On my system first assignment is > nearly instantaneous, why subsequent assignments take slightly less than 0.1 > seconds each. I'm not sure these are quite doing what you intend. You define Chars but don't use it. Also, system.time by default calls gc() before doing the evaluation. Giving FALSE as the second argument may give you a more realistic picture. > > L<-list() > Chars<-as.character(1:10) > for(i in 1:100)L[[i]]<-system.time(assign(paste("test", i), 1:100)) > Times<-do.call(rbind, L) > > Example 2: Same as example 1 but we first grow the memory with fake > allocation: > > L<-list() > Chars<-as.character(1:10) > Data<-1:1 > rm(Data) > for(i in 1:100)L[[i]]<-system.time(assign(paste("test", i), 1:100)) > Times<-do.call(rbind, L) > > In this case the first 20 or so allocations are very quck (faster than 0.02 > sec) and then garbage collector kicks in and the time rises to 0.08 seconds > each - still less than in Example 1. > > This example is relevant because this sequence of allocations is exactly what > happens when one uses read.table or scan (or database query) to load data. > > What is more, if the user then manipulates the loaded data by creating columns > that are a combination of existing ones then this is very slow as well. > > I looked more carefully at your code in src/main/memory.c, function > AdjustHeapSize: > > R_VSize = VNeeded; >if (vect_occup > R_VGrowFrac) { > R_size_t change = R_VGrowIncrMin + R_VGrowIncrFrac * R_NSize; > if (R_MaxVSize - R_VSize >= change) > R_VSize += change; >} > > Could it be that R_NSize
Re: [Rd] allocVector bug ?
On Thursday 09 November 2006 12:21 pm, Luke Tierney wrote: > On Wed, 8 Nov 2006, Vladimir Dergachev wrote: > > On Wednesday 08 November 2006 12:56 pm, Luke Tierney wrote: > >> On Mon, 6 Nov 2006, Vladimir Dergachev wrote: > > > > Hi Luke, > > > > Yes, I gladly concede the point that for a heuristic algorithm the > > notion of what is a "bug" is murky (besides crashes, etc, which is not > > what I am not talking about). > > > > Here is why I called this a bug: > > > > 1. My understanding is that each time gc() needs to increase memory > > it performs a full garbage collection run. Right ? > > The allocation process does not call gc before every call to malloc. > It only calls gc if the allocation would cross a threshold level. > Those theshold levels are adjusted in an effort to compromise between > keeping memory footprint low and not calling gc too often. The code > you quote below is part of this adjustment process. If this process > is working properly then as memory use grows there will initially be > more gc activity and then less as the thresholds adjust. Well, I was seeing it call gc for every large vector. This probably happens be only for those larger than R_VGrowIncrFrac * R_NSize. On my system R_NSize is never more than 1e6 so this would explain the problems when using 1e6 (and larger) vectors. > > > 2. This is not a problem with small memory sizes as they imply > > (presumably) small number of objects. > > > > 3. However, if one wants to allocate many objects (say columns in a > > data frame or just vectors) this results in large penalty > > > > Example 1: This simulates allocation of a data.frame with some character > > columns which are assumed to be factors. On my system first assignment is > > nearly instantaneous, why subsequent assignments take slightly less than > > 0.1 seconds each. > > I'm not sure these are quite doing what you intend. You define Chars > but don't use it. Also, system.time by default calls gc() before > doing the evaluation. Giving FALSE as the second argument may give you > a more realistic picture. The Chars are defined to create lots of ncells and make gc() run time more realistic. It also mimics having a data.frame with a few factor columns. As for system.time - thank you, I missed that ! Setting gcFirst=FALSE changes behavior in the first example to be 2 times faster and makes all the allocations in the second example faster. I guess that extra call to gc() caused R_VSize to shrink too fast. > > I looked more carefully at your code in src/main/memory.c, function > > AdjustHeapSize: > > > > R_VSize = VNeeded; > >if (vect_occup > R_VGrowFrac) { > > R_size_t change = R_VGrowIncrMin + R_VGrowIncrFrac * R_NSize; > > if (R_MaxVSize - R_VSize >= change) > > R_VSize += change; > >} > > > > Could it be that R_NSize should be R_VSize ? This would explain why I see > > a problem in case R_VSize>>R_NSize. > > That does indeed look like a bug and that R_NSize should be R_VSize -- > well spotted, thanks. I will need to experiment with this a bit more > to see if it can safely be changed. It will increase the memory > footprint a bit. Probaly not by enough to matter but if it does we > may need to adjust some of the tuning constants. > Would there be something I can help you with ? Is there a script to run through common usage patterns ? thank you ! Vladimir Dergachev > Best, > > luke > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Failing to install R-2.4.0 on FC4
Hi everyone, Firstly, I'm sorry for the cross-post. I re-read the posting guide and it appears my question is more related to r-devel (i think :-) ) I downloaded the source code available at: http://cran.fhcrc.org/src/base/R-2/R-2.4.0.tar.gz to a linux machine (Linux 2.6.11-1.1369_FC4smp #1 SMP). I successfully configured and compiled it, which means I'm able to run it from the R-2.4.0/bin directory. I want to do a system wide installation (via 'sudo make install') and help2man fails, ie, when I execute the installation command mentioned previously I get: make[1]: Entering directory `/home/bcarvalh/R-2.4.0/m4' make[1]: Nothing to be done for `install'. make[1]: Leaving directory `/home/bcarvalh/R-2.4.0/m4' make[1]: Entering directory `/home/bcarvalh/R-2.4.0/tools' make[1]: Nothing to be done for `install'. make[1]: Leaving directory `/home/bcarvalh/R-2.4.0/tools' make[1]: Entering directory `/home/bcarvalh/R-2.4.0/doc' installing doc ... help2man: can't get `--version' info from ../bin/R make[1]: *** [R.1] Error 255 make[1]: Leaving directory `/home/bcarvalh/R-2.4.0/doc' make: *** [install] Error 1 Searching the archive, I did find a similar report (http:// tolstoy.newcastle.edu.au/R/devel/06/01/3699.html) but the proposed solution did not work for me. Any suggestion? Thank you very much, Benilton Carvalho PhD Candidate Department of Biostatistics Johns Hopkins University __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Failing to install R-2.4.0 on FC4
Benilton Carvalho <[EMAIL PROTECTED]> writes: > Hi everyone, > > Firstly, I'm sorry for the cross-post. I re-read the posting guide > and it appears my question is more related to r-devel (i think :-) ) > > I downloaded the source code available at: > > http://cran.fhcrc.org/src/base/R-2/R-2.4.0.tar.gz > > to a linux machine (Linux 2.6.11-1.1369_FC4smp #1 SMP). > > I successfully configured and compiled it, which means I'm able to > run it from the R-2.4.0/bin directory. > > I want to do a system wide installation (via 'sudo make install') and > help2man fails, ie, when I execute the installation command mentioned > previously I get: > > make[1]: Entering directory `/home/bcarvalh/R-2.4.0/m4' > make[1]: Nothing to be done for `install'. > make[1]: Leaving directory `/home/bcarvalh/R-2.4.0/m4' > make[1]: Entering directory `/home/bcarvalh/R-2.4.0/tools' > make[1]: Nothing to be done for `install'. > make[1]: Leaving directory `/home/bcarvalh/R-2.4.0/tools' > make[1]: Entering directory `/home/bcarvalh/R-2.4.0/doc' > installing doc ... > help2man: can't get `--version' info from ../bin/R > make[1]: *** [R.1] Error 255 > make[1]: Leaving directory `/home/bcarvalh/R-2.4.0/doc' > make: *** [install] Error 1 > > Searching the archive, I did find a similar report (http:// > tolstoy.newcastle.edu.au/R/devel/06/01/3699.html) but the proposed > solution did not work for me. > > Any suggestion? Well if nothing else works, there's a Fedora Extra RPM for R-2.4.0... http://fedoraproject.org/extras/4/i386/repodata/repoview/R-0-2.4.0-2.fc4.1.html The symptoms suggest that you can run ../bin/R, but root cannot, so check permissions. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] invert argument in grep
Hello, What about an `invert` argument in grep, to return elements that are *not* matching a regular expression : R> grep("pink", colors(), invert = TRUE, value = TRUE) would essentially return the same as : R> colors() [ - grep("pink", colors()) ] I'm attaching the files that I modified (against today's tarball) for that purpose. Cheers, Romain -- *mangosolutions* /data analysis that delivers/ Tel +44 1249 467 467 Fax +44 1249 467 468 grep <- function(pattern, x, ignore.case = FALSE, extended = TRUE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE) { pattern <- as.character(pattern) ## when value = TRUE we return names if(!is.character(x)) x <- structure(as.character(x), names=names(x)) ## behaves like == for NA pattern if (is.na(pattern)) { if(value) return(structure(rep.int(as.character(NA), length(x)), names = names(x))) else return(rep.int(NA, length(x))) } if(perl) .Internal(grep.perl(pattern, x, ignore.case, value, useBytes, invert)) else .Internal(grep(pattern, x, ignore.case, extended, value, fixed, useBytes, invert)) } sub <- function(pattern, replacement, x, ignore.case = FALSE, extended = TRUE, perl = FALSE, fixed = FALSE, useBytes = FALSE) { pattern <- as.character(pattern) replacement <- as.character(replacement) if(!is.character(x)) x <- as.character(x) if (is.na(pattern)) return(rep.int(as.character(NA), length(x))) if(perl) .Internal(sub.perl(pattern, replacement, x, ignore.case, useBytes)) else .Internal(sub(pattern, replacement, x, ignore.case, extended, fixed, useBytes)) } gsub <- function(pattern, replacement, x, ignore.case = FALSE, extended = TRUE, perl = FALSE, fixed = FALSE, useBytes = FALSE) { pattern <- as.character(pattern) replacement <- as.character(replacement) if(!is.character(x)) x <- as.character(x) if (is.na(pattern)) return(rep.int(as.character(NA), length(x))) if(perl) .Internal(gsub.perl(pattern, replacement, x, ignore.case, useBytes)) else .Internal(gsub(pattern, replacement, x, ignore.case, extended, fixed, useBytes)) } regexpr <- function(pattern, text, extended = TRUE, perl = FALSE, fixed = FALSE, useBytes = FALSE) { pattern <- as.character(pattern) text <- as.character(text) if(perl) .Internal(regexpr.perl(pattern, text, useBytes)) else .Internal(regexpr(pattern, text, extended, fixed, useBytes)) } gregexpr <- function(pattern, text, extended = TRUE, perl = FALSE, fixed = FALSE, useBytes = FALSE) { pattern <- as.character(pattern) text <- as.character(text) if(perl) .Internal(gregexpr.perl(pattern, text, useBytes)) else .Internal(gregexpr(pattern, text, extended, fixed, useBytes)) } agrep <- function(pattern, x, ignore.case = FALSE, value = FALSE, max.distance = 0.1) { pattern <- as.character(pattern) if(!is.character(x)) x <- as.character(x) ## behaves like == for NA pattern if (is.na(pattern)){ if (value) return(structure(rep.int(as.character(NA), length(x)), names = names(x))) else return(rep.int(NA, length(x))) } if(!is.character(pattern) || (length(pattern) < 1) || ((n <- nchar(pattern)) == 0)) stop("'pattern' must be a non-empty character string") if(!is.list(max.distance)) { if(!is.numeric(max.distance) || (max.distance < 0)) stop("'max.distance' must be non-negative") if(max.distance < 1)# transform percentages max.distance <- ceiling(n * max.distance) max.insertions <- max.deletions <- max.substitutions <- max.distance } else { ## partial matching table <- c("all", "deletions", "insertions", "substitutions") ind <- pmatch(names(max.distance), table) if(any(is.na(ind))) warning("unknown match distance components ignored") max.distance <- max.distance[!is.na(ind)] names(max.distance) <- table[ind] ## sanity checks comps <- unlist(max.distance) if(!all(is.numeric(comps)) || any(comps < 0)) stop("'max.distance' components must be non-negative") ## extract restrictions if(is.null(max.distance$all)) max.distance$all <- 0.1 max.insertions <- max.deletions <- max.substitutions <- max.distance$all if(!is.null(max.distance$deletions)) max.deletions <- max.distance$deletions if(!is.null(max.distance$insertions)) max.insertions <- max.distance$insertions if(!is.null(max.distance$substitutions)) max.substitutions <- max.distance$su