Re: [Rd] R extension memory leak detection question
On 3/12/21 7:43 PM, xiaoyan yu wrote: I am writing C++ program based on R extensions and also try to test the program with google address sanitizer. I thought if I don't protect the variable from the allocation API such as Rf_allocVector, there will be a memory leak. However, the address sanitizer didn't report it. Is my understanding correct? Or I will see the memory leak only if I compile R source code with the address sanitizer. Yes, you should use special options for compilation and linking to use address sanitizer. See Writing R Extensions, section 4.3.3. If you allocate an R object using Rf_allocVector(), but don't protect it, it means this object is available for the garbage collector to reclaim. So it is not a memory leak. Memory leaks with a garbage collector are much less common than without, because if the program loses a pointer to some piece of memory, that piece will automatically be reclaimed (not leaked). Still, memory leaks are possible if the program forgets about a pointer to some piece of memory no longer needed, and keeps that pointer in say some global structure. Such memory leaks would not be found using address sanitizer. Address sanitizer/Undefined behavior sanitizer can sometimes find errors caused by that the program forgets to protect an R object, but this is relatively rare, as they don't understand R heap specifically, so you cannot assume that if you create such example, the error will always be found. Best Tomas Please help! Thanks, Xiaoyan [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Potential improvements of ave?
Hi Thomas, These are some great suggestions. But I can't help but feel there's a much bigger problem here. Intuitively, the ave function could (or should) sort the data. Then the indexing step becomes almost trivial, in terms of both time and space complexity. And the ave function is not the only example of where a problem becomes much simpler, if the data is sorted. Historically, I've never found base R functions user-friendly for aggregation purposes, or for sorting. (At least, not by comparison to SQL). But that's not the main problem. It would seem preferable to sort the data, only once. (Rather than sorting it repeatedly, or not at all). Perhaps, objects such as vectors and data.frame(s) could have a boolean attribute, to indicate if they're sorted. Or functions such as ave could have a sorted argument. In either case, if true, the function assumes the data is sorted and applies a more efficient algorithm. B. On Sat, Mar 13, 2021 at 1:07 PM SOEIRO Thomas wrote: > > Dear all, > > I have two questions/suggestions about ave, but I am not sure if it's > relevant for bug reports. > > > > 1) I have performance issues with ave in a case where I didn't expect it. The > following code runs as expected: > > set.seed(1) > > df1 <- data.frame(id1 = sample(1:1e2, 5e2, TRUE), > id2 = sample(1:3, 5e2, TRUE), > id3 = sample(1:5, 5e2, TRUE), > val = sample(1:300, 5e2, TRUE)) > > df1$diff <- ave(df1$val, > df1$id1, > df1$id2, > df1$id3, > FUN = function(i) c(diff(i), 0)) > > head(df1[order(df1$id1, >df1$id2, >df1$id3), ]) > > But when expanding the data.frame (* 1e4), ave fails (Error: cannot allocate > vector of size 1110.0 Gb): > > df2 <- data.frame(id1 = sample(1:(1e2 * 1e4), 5e2 * 1e4, TRUE), > id2 = sample(1:3, 5e2 * 1e4, TRUE), > id3 = sample(1:(5 * 1e4), 5e2 * 1e4, TRUE), > val = sample(1:300, 5e2 * 1e4, TRUE)) > > df2$diff <- ave(df2$val, > df2$id1, > df2$id2, > df2$id3, > FUN = function(i) c(diff(i), 0)) > > This use case does not seem extreme to me (e.g. aggregate et al work > perfectly on this data.frame). > So my question is: Is this expected/intended/reasonable? i.e. Does ave need > to be optimized? > > > > 2) Gabor Grothendieck pointed out in 2011 that drop = TRUE is needed to avoid > warnings in case of unused levels > (https://stat.ethz.ch/pipermail/r-devel/2011-February/059947.html). > Is it relevant/possible to expose the drop argument explicitly? > > > > Thanks, > > Thomas > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Faster sorting algorithm...
Hi, I am not sure if this is the right mailing list, so apologies in advance if it is not. I found the following link/presentation: https://www.r-project.org/dsc/2016/slides/ParallelSort.pdf The implementation of fsort is interesting but incomplete (not sure why?) and can be improved or made faster (at least 25% I believe). I might be wrong but there are maybe a couple of bugs as well. My questions are: 1/ Is the R Core team interested in a faster sorting algo? (Multithread or even single threaded) 2/ I see an issue with the license, which is MPL-2.0, and hence not compatible with base R, Python and Julia. Is there an interest to change the license of fsort so all 3 languages (and all the people using these languages) can benefit from it? (Like suggested on the first page) Please let me know if there is an interest to address the above points, I would be happy to look into it (free of charge of course!). Thank you Best regards Morgan [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Faster sorting algorithm...
Isn’t the default method now “radix” which is the data.table sort, and isn’t that already parallel using openmp where available? Avi On Mon, Mar 15, 2021 at 12:26 PM Morgan Morgan wrote: > Hi, > I am not sure if this is the right mailing list, so apologies in advance if > it is not. > > I found the following link/presentation: > https://www.r-project.org/dsc/2016/slides/ParallelSort.pdf > > The implementation of fsort is interesting but incomplete (not sure why?) > and can be improved or made faster (at least 25% I believe). I might be > wrong but there are maybe a couple of bugs as well. > > My questions are: > > 1/ Is the R Core team interested in a faster sorting algo? (Multithread or > even single threaded) > > 2/ I see an issue with the license, which is MPL-2.0, and hence not > compatible with base R, Python and Julia. Is there an interest to change > the license of fsort so all 3 languages (and all the people using these > languages) can benefit from it? (Like suggested on the first page) > > Please let me know if there is an interest to address the above points, I > would be happy to look into it (free of charge of course!). > > Thank you > Best regards > Morgan > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Sent from Gmail Mobile [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Faster sorting algorithm...
Default method for sort is not radix(especially for character vector). You might want to read the documentation of sort. For your second question, I invite you to look at the code of fsort. It is implemented only for positive finite double, and default to data.table:::forder ... when the types are different than positive double... Please read the pdf link I sent, everything is explained in it. Thank you Morgan On Mon, 15 Mar 2021, 16:52 Avraham Adler, wrote: > Isn’t the default method now “radix” which is the data.table sort, and > isn’t that already parallel using openmp where available? > > Avi > > On Mon, Mar 15, 2021 at 12:26 PM Morgan Morgan > wrote: > >> Hi, >> I am not sure if this is the right mailing list, so apologies in advance >> if >> it is not. >> >> I found the following link/presentation: >> https://www.r-project.org/dsc/2016/slides/ParallelSort.pdf >> >> The implementation of fsort is interesting but incomplete (not sure why?) >> and can be improved or made faster (at least 25% I believe). I might be >> wrong but there are maybe a couple of bugs as well. >> >> My questions are: >> >> 1/ Is the R Core team interested in a faster sorting algo? (Multithread or >> even single threaded) >> >> 2/ I see an issue with the license, which is MPL-2.0, and hence not >> compatible with base R, Python and Julia. Is there an interest to change >> the license of fsort so all 3 languages (and all the people using these >> languages) can benefit from it? (Like suggested on the first page) >> >> Please let me know if there is an interest to address the above points, I >> would be happy to look into it (free of charge of course!). >> >> Thank you >> Best regards >> Morgan >> >> [[alternative HTML version deleted]] >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > -- > Sent from Gmail Mobile > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] inheritance and attach
This change in R-devel just bit me. Under the newest release, if I attach() another .RData directory, the methods are not detected. Was it intentional? Running in Linux. Here is a script of an example that works fine under 3.6.2. but fails in R-devel. tmt% mkdir temp1 tmt% cd temp1 tmt% R # define a silly method, just for testing charlie <- function(x, ...) UseMethod("charlie") charlie.default <- function(x, ...) { cat("default method ", x, "\n") x +2 } charlie.character <- function(x, ...) { cat("character method ", x, "\n") as.character(as.numeric(x) + 2) } > quit("yes") tmt% cd .. tmt% R > attach("temp1/.RData") > charlie( 4) Error in UseMethod("charlie") : no applicable method for 'charlie' applied to an object of class "c('double', 'numeric')" The use case was my local test environment for the survival package. I can work around it. -- Terry M Therneau, PhD Department of Health Science Research Mayo Clinic thern...@mayo.edu "TERR-ree THUR-noh" [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Potential improvements of ave?
Hi Abby, Thank you for your positive feedback. I agree for your general comment about sorting. For ave specifically, ordering may not help because the output must maintain the order of the input (as ave returns only x and not the entiere data.frame). Thanks, Thomas De : Abby Spurdle Envoyé : lundi 15 mars 2021 10:22 À : SOEIRO Thomas Cc : r-devel@r-project.org Objet : Re: [Rd] Potential improvements of ave? EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS Hi Thomas, These are some great suggestions. But I can't help but feel there's a much bigger problem here. Intuitively, the ave function could (or should) sort the data. Then the indexing step becomes almost trivial, in terms of both time and space complexity. And the ave function is not the only example of where a problem becomes much simpler, if the data is sorted. Historically, I've never found base R functions user-friendly for aggregation purposes, or for sorting. (At least, not by comparison to SQL). But that's not the main problem. It would seem preferable to sort the data, only once. (Rather than sorting it repeatedly, or not at all). Perhaps, objects such as vectors and data.frame(s) could have a boolean attribute, to indicate if they're sorted. Or functions such as ave could have a sorted argument. In either case, if true, the function assumes the data is sorted and applies a more efficient algorithm. B. On Sat, Mar 13, 2021 at 1:07 PM SOEIRO Thomas wrote: > > Dear all, > > I have two questions/suggestions about ave, but I am not sure if it's > relevant for bug reports. > > > > 1) I have performance issues with ave in a case where I didn't expect it. The > following code runs as expected: > > set.seed(1) > > df1 <- data.frame(id1 = sample(1:1e2, 5e2, TRUE), > id2 = sample(1:3, 5e2, TRUE), > id3 = sample(1:5, 5e2, TRUE), > val = sample(1:300, 5e2, TRUE)) > > df1$diff <- ave(df1$val, > df1$id1, > df1$id2, > df1$id3, > FUN = function(i) c(diff(i), 0)) > > head(df1[order(df1$id1, >df1$id2, >df1$id3), ]) > > But when expanding the data.frame (* 1e4), ave fails (Error: cannot allocate > vector of size 1110.0 Gb): > > df2 <- data.frame(id1 = sample(1:(1e2 * 1e4), 5e2 * 1e4, TRUE), > id2 = sample(1:3, 5e2 * 1e4, TRUE), > id3 = sample(1:(5 * 1e4), 5e2 * 1e4, TRUE), > val = sample(1:300, 5e2 * 1e4, TRUE)) > > df2$diff <- ave(df2$val, > df2$id1, > df2$id2, > df2$id3, > FUN = function(i) c(diff(i), 0)) > > This use case does not seem extreme to me (e.g. aggregate et al work > perfectly on this data.frame). > So my question is: Is this expected/intended/reasonable? i.e. Does ave need > to be optimized? > > > > 2) Gabor Grothendieck pointed out in 2011 that drop = TRUE is needed to avoid > warnings in case of unused levels > (https://urldefense.com/v3/__https://stat.ethz.ch/pipermail/r-devel/2011-February/059947.html__;!!JQ5agg!J2AUFbQr31F2c6LUpTnyc5TX2Kh1bJ-VqhMND1c0N5axWO_tQl0pCJhtucPfjU7NXrBO$ > ). > Is it relevant/possible to expose the drop argument explicitly? > > > > Thanks, > > Thomas > __ > R-devel@r-project.org mailing list > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!JQ5agg!J2AUFbQr31F2c6LUpTnyc5TX2Kh1bJ-VqhMND1c0N5axWO_tQl0pCJhtucPfjUzdLFM1$ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] inheritance and attach
Terry, NEWS: CHANGES IN R 4.0.0 NEW FEATURES \item S3 method lookup now by default skips the elements of the search path between the global and base environments. If you use attach(), S3 methods are hence no longer dispatched to (because it is between global and base) unless you register them using .S3method(). Without registration you have to load them into the global env for them to work since this is now the only environment that doesn't require registration. Cheers, Simon > On Mar 16, 2021, at 7:19 AM, Therneau, Terry M., Ph.D. via R-devel > wrote: > > This change in R-devel just bit me. Under the newest release, if I attach() > another > .RData directory, the methods are not detected. > Was it intentional? Running in Linux. Here is a script of an example that > works fine > under 3.6.2. but fails in R-devel. > > tmt% mkdir temp1 > tmt% cd temp1 > tmt% R > # define a silly method, just for testing > > charlie <- function(x, ...) > UseMethod("charlie") > > > charlie.default <- function(x, ...) { > cat("default method ", x, "\n") > x +2 > } > > charlie.character <- function(x, ...) { > cat("character method ", x, "\n") > as.character(as.numeric(x) + 2) > } > >> quit("yes") > > tmt% cd .. > tmt% R >> attach("temp1/.RData") >> charlie( 4) > Error in UseMethod("charlie") : > no applicable method for 'charlie' applied to an object of class > "c('double', 'numeric')" > > > > The use case was my local test environment for the survival package. I can > work around it. > > -- > Terry M Therneau, PhD > Department of Health Science Research > Mayo Clinic > thern...@mayo.edu > > "TERR-ree THUR-noh" > > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [EXTERNAL] Re: inheritance and attach
Thanks Simon. I missed that. It is a sensible change. I had trouble because I had just changed computing environments this weekend (a forced change due to an institutional directive), and this caught me right after that so I spent some time chasing my tail. Murphy's law... Terry T. On 3/15/21 4:45 PM, Simon Urbanek wrote: > Terry, > > NEWS: CHANGES IN R 4.0.0 NEW FEATURES > > \item S3 method lookup now by default skips the elements of the >search path between the global and base environments. > > If you use attach(), S3 methods are hence no longer dispatched to (because it > is between global and base) unless you register them using .S3method(). > Without registration you have to load them into the global env for them to > work since this is now the only environment that doesn't require registration. > > Cheers, > Simon > > > >> On Mar 16, 2021, at 7:19 AM, Therneau, Terry M., Ph.D. via R-devel >> wrote: >> >> This change in R-devel just bit me. Under the newest release, if I >> attach() another >> .RData directory, the methods are not detected. >> Was it intentional? Running in Linux. Here is a script of an example >> that works fine >> under 3.6.2. but fails in R-devel. >> >> tmt% mkdir temp1 >> tmt% cd temp1 >> tmt% R >> # define a silly method, just for testing >> >> charlie <- function(x, ...) >> UseMethod("charlie") >> >> >> charlie.default <- function(x, ...) { >> cat("default method ", x, "\n") >> x +2 >> } >> >> charlie.character <- function(x, ...) { >> cat("character method ", x, "\n") >> as.character(as.numeric(x) + 2) >> } >> >>> quit("yes") >> tmt% cd .. >> tmt% R >>> attach("temp1/.RData") >>> charlie( 4) >> Error in UseMethod("charlie") : >>no applicable method for 'charlie' applied to an object of class >> "c('double', 'numeric')" >> >> >> >> The use case was my local test environment for the survival package. I can >> work around it. >> >> -- >> Terry M Therneau, PhD >> Department of Health Science Research >> Mayo Clinic >> thern...@mayo.edu >> >> "TERR-ree THUR-noh" >> >> >> [[alternative HTML version deleted]] >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Potential improvements of ave?
Abby, Vectors do have an internal mechanism for knowing that they are sorted via ALTREP (it was one of 2 core motivating features for 'smart vectors' the other being knowledge about presence of NAs). Currently I don't think we expose it at the R level, though it is part of the official C API. I don't know of any plans for this to change, but I suppose it could. Plus for functions in R itself, we could even use it without exposing it more widely. A number of functions, including sort itself, already do this in fact, but more could. I'd be interested in hearing which functions you think would particularly benefit from this. ~G On Mon, Mar 15, 2021 at 12:01 PM SOEIRO Thomas wrote: > Hi Abby, > > Thank you for your positive feedback. > > I agree for your general comment about sorting. > > For ave specifically, ordering may not help because the output must > maintain the order of the input (as ave returns only x and not the entiere > data.frame). > > Thanks, > > Thomas > > De : Abby Spurdle > Envoyé : lundi 15 mars 2021 10:22 > À : SOEIRO Thomas > Cc : r-devel@r-project.org > Objet : Re: [Rd] Potential improvements of ave? > > EMAIL EXTERNE - TRAITER AVEC PRÉCAUTION LIENS ET FICHIERS > > Hi Thomas, > > These are some great suggestions. > But I can't help but feel there's a much bigger problem here. > > Intuitively, the ave function could (or should) sort the data. > Then the indexing step becomes almost trivial, in terms of both time > and space complexity. > And the ave function is not the only example of where a problem > becomes much simpler, if the data is sorted. > > Historically, I've never found base R functions user-friendly for > aggregation purposes, or for sorting. > (At least, not by comparison to SQL). > > But that's not the main problem. > It would seem preferable to sort the data, only once. > (Rather than sorting it repeatedly, or not at all). > > Perhaps, objects such as vectors and data.frame(s) could have a > boolean attribute, to indicate if they're sorted. > Or functions such as ave could have a sorted argument. > In either case, if true, the function assumes the data is sorted and > applies a more efficient algorithm. > > > B. > > > On Sat, Mar 13, 2021 at 1:07 PM SOEIRO Thomas > wrote: > > > > Dear all, > > > > I have two questions/suggestions about ave, but I am not sure if it's > relevant for bug reports. > > > > > > > > 1) I have performance issues with ave in a case where I didn't expect > it. The following code runs as expected: > > > > set.seed(1) > > > > df1 <- data.frame(id1 = sample(1:1e2, 5e2, TRUE), > > id2 = sample(1:3, 5e2, TRUE), > > id3 = sample(1:5, 5e2, TRUE), > > val = sample(1:300, 5e2, TRUE)) > > > > df1$diff <- ave(df1$val, > > df1$id1, > > df1$id2, > > df1$id3, > > FUN = function(i) c(diff(i), 0)) > > > > head(df1[order(df1$id1, > >df1$id2, > >df1$id3), ]) > > > > But when expanding the data.frame (* 1e4), ave fails (Error: cannot > allocate vector of size 1110.0 Gb): > > > > df2 <- data.frame(id1 = sample(1:(1e2 * 1e4), 5e2 * 1e4, TRUE), > > id2 = sample(1:3, 5e2 * 1e4, TRUE), > > id3 = sample(1:(5 * 1e4), 5e2 * 1e4, TRUE), > > val = sample(1:300, 5e2 * 1e4, TRUE)) > > > > df2$diff <- ave(df2$val, > > df2$id1, > > df2$id2, > > df2$id3, > > FUN = function(i) c(diff(i), 0)) > > > > This use case does not seem extreme to me (e.g. aggregate et al work > perfectly on this data.frame). > > So my question is: Is this expected/intended/reasonable? i.e. Does ave > need to be optimized? > > > > > > > > 2) Gabor Grothendieck pointed out in 2011 that drop = TRUE is needed to > avoid warnings in case of unused levels ( > https://urldefense.com/v3/__https://stat.ethz.ch/pipermail/r-devel/2011-February/059947.html__;!!JQ5agg!J2AUFbQr31F2c6LUpTnyc5TX2Kh1bJ-VqhMND1c0N5axWO_tQl0pCJhtucPfjU7NXrBO$ > ). > > Is it relevant/possible to expose the drop argument explicitly? > > > > > > > > Thanks, > > > > Thomas > > __ > > R-devel@r-project.org mailing list > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!JQ5agg!J2AUFbQr31F2c6LUpTnyc5TX2Kh1bJ-VqhMND1c0N5axWO_tQl0pCJhtucPfjUzdLFM1$ > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Faster sorting algorithm...
In principle, I agree that faster ranking/sorting algorithms are important, and should be a priority. But I can't help but feel that the paper focuses on textbook-oriented problems. Given that in real world problems, there's almost always some form of prior knowledge: Wouldn't it be better, from a management perspective, to focus on sorting algorithms, that incorporate that prior knowledge? I'm not sure whether that's an R-devel discussion, or for another forum... On Tue, Mar 16, 2021 at 5:25 AM Morgan Morgan wrote: > > Hi, > I am not sure if this is the right mailing list, so apologies in advance if > it is not. > > I found the following link/presentation: > https://www.r-project.org/dsc/2016/slides/ParallelSort.pdf > > The implementation of fsort is interesting but incomplete (not sure why?) > and can be improved or made faster (at least 25% I believe). I might be > wrong but there are maybe a couple of bugs as well. > > My questions are: > > 1/ Is the R Core team interested in a faster sorting algo? (Multithread or > even single threaded) > > 2/ I see an issue with the license, which is MPL-2.0, and hence not > compatible with base R, Python and Julia. Is there an interest to change > the license of fsort so all 3 languages (and all the people using these > languages) can benefit from it? (Like suggested on the first page) > > Please let me know if there is an interest to address the above points, I > would be happy to look into it (free of charge of course!). > > Thank you > Best regards > Morgan > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel