Re: [Rd] [bug] droplevels() also drop object attributes (comment…)
> Serge Bibauw > on Mon, 15 May 2017 11:59:32 -0400 writes: > Hi, > Just reporting a small bug… not really a big deal, but I don’t think that is intended: droplevels() also drops all object’s attributes. Yes. The help page for droplevels (or the simple definition of 'droplevels.factor') clearly indicate that the method for factors is really just a call to factor(x, exclude = *) and that _is_ quite an important base function whose semantic should not be changed lightly. Still, let's continue : Looking a bit, I see that the current behavior of factor() {and hence droplevels} has been unchanged in this respect for the whole history of R, well, at least for more than 17 years (R 1.0.1, April 2000). I'd agree there _is_ a bug, at least in the documentation which does *not* mention that currently, all attributes are dropped but "names", "levels" (and "class"). OTOH, factor() would only need a small change to make it preserve all attributes (but "class" and "levels" which are set explicitly). I'm sure this will break some checks in some packages. Is it worth it? e.g., our own R QC checks currently check (the printing of) the following (in tests/reg-tests-2.R ): > ## some tests of factor matrices > A <- factor(7:12) > dim(A) <- c(2, 3) > A [,1] [,2] [,3] [1,] 7911 [2,] 810 12 Levels: 7 8 9 10 11 12 > str(A) factor [1:2, 1:3] 7 8 9 10 ... - attr(*, "levels")= chr [1:6] "7" "8" "9" "10" ... > A[, 1:2] [,1] [,2] [1,] 79 [2,] 810 Levels: 7 8 9 10 11 12 > A[, 1:2, drop=TRUE] [1] 7 8 9 10 Levels: 7 8 9 10 with the proposed change to factor(), the last call would change its result: > A[, 1:2, drop=TRUE] [,1] [,2] [1,] 79 [2,] 810 Levels: 7 8 9 10 because 'drop=TRUE' calls factor(..) and that would also preserve the "dim" attribute. I would think that the changed behavior _is_ better, and is also according to documentation, because the help page for [.factor explains that 'drop = TRUE' drops levels, but _not_ that it transforms a factor matrix into a factor (vector). Martin > Example: >> > test <- c("hello", "something", "hi") >> > test <- factor(test) >> > comment(test) <- "this is a test" >> > attr(test, "description") <- "this is another test" >> > attributes(test) >> $levels >> [1] "hello" "hi" "something" >> >> $class >> [1] "factor" >> >> $comment >> [1] "this is a test" >> >> $description >> [1] "this is another test" >> >> > test <- droplevels(test) >> > attributes(test) >> $levels >> [1] "hello" "hi" "something" >> >> $class >> [1] "factor" > Serge __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stopifnot() does not stop at first non-TRUE argument
> Hervé Pagès > on Mon, 15 May 2017 16:54:46 -0700 writes: > Hi, > On 05/15/2017 10:41 AM, luke-tier...@uiowa.edu wrote: >> This is getting pretty convoluted. >> >> The current behavior is consistent with the description at the top of >> the help page -- it does not promise to stop evaluation once the first >> non-TRUE is found. That seems OK to me -- if you want sequencing you >> can use >> >> stopifnot(A) >> stopifnot(B) >> >> or >> >> stopifnot(A && B) > My main use case for using stopifnot() is argument checking. In that > context, I like the conciseness of > stopifnot( > A, > B, > ... > ) > I think it's a common use case (and a pretty natural thing to do) to > order/organize the expressions in a way such that it only makes sense > to continue evaluating if all was OK so far e.g. > stopifnot( > is.numeric(x), > length(x) == 1, > is.na(x) > ) I agree. And that's how I have used stopifnot() in many cases myself, sometimes even more "extremely" than the above example, using assertions that only make sense if previous assertions were fulfilled, such as stopifnot(is.numeric(n), length(n) == 1, n == round(n), n >= 0) or in the Matrix package, first checking some class properties and then things that only make sense for objects with those properties. > At least that's how things are organized in the stopifnot() calls that > accumulated in my code over the years. That's because I was convinced > that evaluation would stop at the first non-true expression (as > suggested by the man page). Until recently when I got a warning issued > by an expression located *after* the first non-true expression. This > was pretty unexpected/confusing! > If I can't rely on this "sequencing" feature, I guess I can always > do > stopifnot(A) > stopifnot(B) > ... > but I loose the conciseness of calling stopifnot() only once. > I could also use > stopifnot(A && B && ...) > but then I loose the conciseness of the error message i.e. it's going > to be something like > Error: A && B && ... is not TRUE > which can be pretty long/noisy compared to the message that reports > only the 1st error. > Conciseness/readability of the single call to stopifnot() and > conciseness of the error message are the features that made me > adopt stopifnot() in the 1st place. Yes, and that had been my design goal when I created it. I do tend agree with Hervé and Serguei here. > If stopifnot() cannot be revisited > to do "sequencing" then that means I will need to revisit all my calls > to stopifnot(). >> >> I could see an argument for a change that in the multiple argumetn >> case reports _all_ that fail; that would seem more useful to me than >> twisting the code into knots. Interesting... but really differing from the current documentation, > Why not. Still better than the current situation. But only if that > semantic seems more useful to people. Would be sad if usefulness > of one semantic or the other was decided based on trickiness of > implementation. Well, the trickiness should definitely play a role. Apart from functionality and semantics, long term maintenance and code readibility, even elegance have shown to be very important aspects of good code in ca 30 years of S and R programming. OTOH, as mentioned above, the creation of good error messages has been an important design goal of stopifnot() and hence I'm willing to accept the extra complexity of "patching up" the call used in the error / warning messages. Also, as a change to what I posted yesterday, I now plan to follow Peter Dalgaard's suggestion of using eval( .. ) instead of eval(cl[[i]], envir = ) as there may be cases where the former behaves better in lazy evaluation situations. (Other opinions on that ?) Martin > Thanks, > H. >> >> Best, >> >> luke >> >> On Mon, 15 May 2017, Martin Maechler wrote: >> Serguei Sokol on Mon, 15 May 2017 16:32:20 +0200 writes: >>> >>> > Le 15/05/2017 à 15:37, Martin Maechler a écrit : >>> >>> Serguei Sokol >>> >>> on Mon, 15 May 2017 13:14:34 +0200 writes: >>> >> > I see in the archives that the attachment cannot pass. >>> >> > So, here is the code: >>> >> >>> >> [... MM: I needed to reformat etc to match closely to >>> >> the current source code which is in >>> >> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.r-2Dproject.org_R_trunk_src_library_base_R_stop.R&d=DwIFAw&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=t9fJDOl9YG2zB-GF0wQXrXJTsW2jxTxMHE-qZfLGzHU&s=KGsvpXrXpHCFTdbLM9ci3sBNO9C3ocsgEqHMvZKvV9I&e= >>> >> or its corresponding github mirror >>> >> >>>
Re: [Rd] stopifnot() does not stop at first non-TRUE argument
Le 15/05/2017 à 19:41, luke-tier...@uiowa.edu a écrit : This is getting pretty convoluted. The current behavior is consistent with the description at the top of the help page -- it does not promise to stop evaluation once the first non-TRUE is found. Hm... we can read in the man page : ‘stopifnot(A, B)’ is conceptually equivalent to { if(any(is.na(A)) || !all(A)) stop(...); if(any(is.na(B)) || !all(B)) stop(...) } and this behavior does promise to stop at first non-TRUE value without evaluation of the rest of conditions. Sergueï. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Wish for arima function: add a data argument and a formula-type for regressors
Hi, Using arima on data that are in a data frame, especially when adding xreg, would be much easier if the arima function contained 1) a "data=" argument 2) the possibility to include the covariate(s) in a formula style. Ideally the call could be something like > arima(symptome, order=c(1,0,0), xreg=~trait01*mesure0, data=anxiete) ( or arima(symptome~trait01*mesure0, order=c(1,0,0), data=anxiete) ) instead of present: > anxiete$interact = anxiete$trait01*anxiete$mesure0 > arima(anxiete$symptome, order=c(1,0,0), xreg=anxiete[, c("trait01", "mesure0", "interact")]) Background: Especially in psychology, so-called single case analyses consist often in a the interaction effect of treatment and usual training effect, with typically arma type of error, resulting in the above model. Typically, all the needed data are in a data.frame . An additional advantage concerns the names of the coefficient in the output: if only one regressor: >arima(anxiete$symptome, order=c(1,0,0), xreg=anxiete[, c("trait01")]) [...] Coefficients: ar1 intercept anxiete[, c("trait01")] 0.564933.8623 -8.1225 s.e. 0.1073 0.5969 0.8052 but the name convention changes with several regressors: >arima(anxiete$symptome, order=c(1,0,0), xreg=anxiete[, c("trait01", "mesure0", "interact")]) [...] Coefficients: ar1 intercept trait01 mesure0 interact 0.271534.1363 -5.5777 0.0075 -0.1809 s.e. 0.1211 0.6685 0.9009 0.03420.0490 -- Prof. Olivier Renaud http://www.unige.ch/fapse/mad/ Methodology & Data Analysis - Psychology Dept - University of Geneva UniMail, Office 4138 - 40, Bd du Pont d'Arve - CH-1211 Geneva 4 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] tweaking Sys.timezone()
Hi, On my system (Linux, Mageia 5) Sys.timezone() returns NA but with a minor tweak it could work as expected, i.e. returning "Europe/Paris". Here is the problem. At some moment it does lt <- normalizePath("/etc/localtime") On my system /etc/localtime is a symlink pointing to /usr/share/zoneinfo/Europe/Paris. So far so good. With the next two operations the good answer should be found: if (grepl(pat <- "^/usr/share/zoneinfo/", lt)) sub(pat, "", lt) Unfortunately, on my system "/usr/share" is also a simlink so lt resolves to "/home/local/usr_share/zoneinfo/Europe/Paris" and not to "/usr/share/zoneinfo/Europe/Paris". So the test above fails. As the keyword in this story is zoneinfo, could we modify the pat to look as if (grepl(pat <- "^.*/zoneinfo/", lt)) sub(pat, "", lt) ? In this way, we don't make assumption where exactly "zoneinfo/*" resides. We have found it, no matter where, so use it. Hoping it could find its way into a next R release. Best, Serguei. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stopifnot() does not stop at first non-TRUE argument
On Tue, 16 May 2017, Serguei Sokol wrote: Le 15/05/2017 à 19:41, luke-tier...@uiowa.edu a écrit : This is getting pretty convoluted. The current behavior is consistent with the description at the top of the help page -- it does not promise to stop evaluation once the first non-TRUE is found. Hm... we can read in the man page : ‘stopifnot(A, B)’ is conceptually equivalent to { if(any(is.na(A)) || !all(A)) stop(...); if(any(is.na(B)) || !all(B)) stop(...) } and this behavior does promise to stop at first non-TRUE value without evaluation of the rest of conditions. Yes: that is why I explicitly referenced the description at the top of the page. Changing the 'conceptually equivalent' bit to reflect what is happening is easy. The changes being discussed, and their long term maintenance, ar not. Best, luke Sergueï. -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stopifnot() does not stop at first non-TRUE argument
On Tue, 16 May 2017, Martin Maechler wrote: Hervé Pagès on Mon, 15 May 2017 16:54:46 -0700 writes: > Hi, > On 05/15/2017 10:41 AM, luke-tier...@uiowa.edu wrote: >> This is getting pretty convoluted. >> >> The current behavior is consistent with the description at the top of >> the help page -- it does not promise to stop evaluation once the first >> non-TRUE is found. That seems OK to me -- if you want sequencing you >> can use >> >> stopifnot(A) >> stopifnot(B) >> >> or >> >> stopifnot(A && B) > My main use case for using stopifnot() is argument checking. In that > context, I like the conciseness of > stopifnot( > A, > B, > ... > ) > I think it's a common use case (and a pretty natural thing to do) to > order/organize the expressions in a way such that it only makes sense > to continue evaluating if all was OK so far e.g. > stopifnot( > is.numeric(x), > length(x) == 1, > is.na(x) > ) I agree. And that's how I have used stopifnot() in many cases myself, sometimes even more "extremely" than the above example, using assertions that only make sense if previous assertions were fulfilled, such as stopifnot(is.numeric(n), length(n) == 1, n == round(n), n >= 0) or in the Matrix package, first checking some class properties and then things that only make sense for objects with those properties. > At least that's how things are organized in the stopifnot() calls that > accumulated in my code over the years. That's because I was convinced > that evaluation would stop at the first non-true expression (as > suggested by the man page). Until recently when I got a warning issued > by an expression located *after* the first non-true expression. This > was pretty unexpected/confusing! > If I can't rely on this "sequencing" feature, I guess I can always > do > stopifnot(A) > stopifnot(B) > ... > but I loose the conciseness of calling stopifnot() only once. > I could also use > stopifnot(A && B && ...) > but then I loose the conciseness of the error message i.e. it's going > to be something like > Error: A && B && ... is not TRUE > which can be pretty long/noisy compared to the message that reports > only the 1st error. > Conciseness/readability of the single call to stopifnot() and > conciseness of the error message are the features that made me > adopt stopifnot() in the 1st place. Yes, and that had been my design goal when I created it. I do tend agree with Hervé and Serguei here. > If stopifnot() cannot be revisited > to do "sequencing" then that means I will need to revisit all my calls > to stopifnot(). >> >> I could see an argument for a change that in the multiple argumetn >> case reports _all_ that fail; that would seem more useful to me than >> twisting the code into knots. Interesting... but really differing from the current documentation, > Why not. Still better than the current situation. But only if that > semantic seems more useful to people. Would be sad if usefulness > of one semantic or the other was decided based on trickiness of > implementation. Well, the trickiness should definitely play a role. Apart from functionality and semantics, long term maintenance and code readibility, even elegance have shown to be very important aspects of good code in ca 30 years of S and R programming. OTOH, as mentioned above, the creation of good error messages has been an important design goal of stopifnot() and hence I'm willing to accept the extra complexity of "patching up" the call used in the error / warning messages. Also, as a change to what I posted yesterday, I now plan to follow Peter Dalgaard's suggestion of using eval( .. ) instead of eval(cl[[i]], envir = ) as there may be cases where the former behaves better in lazy evaluation situations. (Other opinions on that ?) If you go this route it would be useful to step back and think about whether there might be some useful primitives to add to make this easier, such as - provide a dotsLength function for computing the number arguments captured in a ... argument - providing a dotsElt function for extracting the i-the element instead of going through the eval(sprintf("..%d", i)) construct. - maybe something for extracting the expression for the i-th argument. The might be more generally useful and make the code more readable and maintainable. Best, luke Martin > Thanks, > H. >> >> Best, >> >> luke >> >> On Mon, 15 May 2017, Martin Maechler wrote: >> Serguei Sokol on Mon, 15 May 2017 16:32:20 +0200 writes: >>> >>> > Le 15/05/2017 à 15:37, Martin Maechler a écrit : >>> >>> Serguei Sokol >>> >>> on Mon, 15 May 2017 13:14:34 +0200 writes: >>> >> > I see in the archives that the attachment cannot pass. >>> >> > So, here is the co
Re: [Rd] stopifnot() does not stop at first non-TRUE argument
switch(i, ...) extracts 'i'-th argument in '...'. It is like eval(as.name(paste0("..", i))) . Just mentioning other things: - For 'n', n <- nargs() can be used. - sys.call() can be used in place of match.call() . --- > peter dalgaard > on Mon, 15 May 2017 16:28:42 +0200 writes: > I think Hervé's idea was just that if switch can evaluate arguments selectively, so can stopifnot(). But switch() is .Primitive, so does it from C. if he just meant that, then "yes, of course" (but not so interesting). > I think it is almost a no-brainer to implement a sequential stopifnot if dropping to C code is allowed. In R it gets trickier, but how about this: Something like this, yes, that's close to what Serguei Sokol had proposed (and of course I *do* want to keep the current sophistication of stopifnot(), so this is really too simple) > Stopifnot <- function(...) > { > n <- length(match.call()) - 1 > for (i in 1:n) > { > nm <- as.name(paste0("..",i)) > if (!eval(nm)) stop("not all true") > } > } > Stopifnot(2+2==4) > Stopifnot(2+2==5, print("Hey!!!") == "Hey!!!") > Stopifnot(2+2==4, print("Hey!!!") == "Hey!!!") > Stopifnot(T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F,T) >> On 15 May 2017, at 15:37 , Martin Maechler wrote: >> >> I'm still curious about Hervé's idea on using switch() for the >> issue. > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stopifnot() does not stop at first non-TRUE argument
> On 16 May 2017, at 18:37 , Suharto Anggono Suharto Anggono via R-devel > wrote: > > switch(i, ...) > extracts 'i'-th argument in '...'. It is like > eval(as.name(paste0("..", i))) . Hey, that's pretty neat! -pd > > Just mentioning other things: > - For 'n', > n <- nargs() > can be used. > - sys.call() can be used in place of match.call() . > --- >> peter dalgaard >>on Mon, 15 May 2017 16:28:42 +0200 writes: > >> I think Hervé's idea was just that if switch can evaluate arguments >> selectively, so can stopifnot(). But switch() is .Primitive, so does it from >> C. > > if he just meant that, then "yes, of course" (but not so interesting). > >> I think it is almost a no-brainer to implement a sequential stopifnot if >> dropping to C code is allowed. In R it gets trickier, but how about this: > > Something like this, yes, that's close to what Serguei Sokol had proposed > (and of course I *do* want to keep the current sophistication > of stopifnot(), so this is really too simple) > >> Stopifnot <- function(...) >> { >> n <- length(match.call()) - 1 >> for (i in 1:n) >> { >> nm <- as.name(paste0("..",i)) >> if (!eval(nm)) stop("not all true") >> } >> } >> Stopifnot(2+2==4) >> Stopifnot(2+2==5, print("Hey!!!") == "Hey!!!") >> Stopifnot(2+2==4, print("Hey!!!") == "Hey!!!") >> Stopifnot(T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F,T) > > >>> On 15 May 2017, at 15:37 , Martin Maechler >>> wrote: >>> >>> I'm still curious about Hervé's idea on using switch() for the >>> issue. > >> -- >> Peter Dalgaard, Professor, >> Center for Statistics, Copenhagen Business School >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> Phone: (+45)38153501 >> Office: A 4.23 >> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stopifnot() does not stop at first non-TRUE argument
> > on Tue, 16 May 2017 09:49:56 -0500 writes: > On Tue, 16 May 2017, Martin Maechler wrote: >>> Hervé Pagès >>> on Mon, 15 May 2017 16:54:46 -0700 writes: >> >> > Hi, >> > On 05/15/2017 10:41 AM, luke-tier...@uiowa.edu wrote: >> >> This is getting pretty convoluted. >> >> >> >> The current behavior is consistent with the description at the top of >> >> the help page -- it does not promise to stop evaluation once the first >> >> non-TRUE is found. That seems OK to me -- if you want sequencing you >> >> can use >> >> >> >> stopifnot(A) >> >> stopifnot(B) >> >> >> >> or >> >> >> >> stopifnot(A && B) >> >> > My main use case for using stopifnot() is argument checking. In that >> > context, I like the conciseness of >> >> > stopifnot( >> > A, >> > B, >> > ... >> > ) >> >> > I think it's a common use case (and a pretty natural thing to do) to >> > order/organize the expressions in a way such that it only makes sense >> > to continue evaluating if all was OK so far e.g. >> >> > stopifnot( >> > is.numeric(x), >> > length(x) == 1, >> > is.na(x) >> > ) >> >> I agree. And that's how I have used stopifnot() in many cases >> myself, sometimes even more "extremely" than the above example, >> using assertions that only make sense if previous assertions >> were fulfilled, such as >> >> stopifnot(is.numeric(n), length(n) == 1, n == round(n), n >= 0) >> >> or in the Matrix package, first checking some class properties >> and then things that only make sense for objects with those properties. >> >> >> > At least that's how things are organized in the stopifnot() calls that >> > accumulated in my code over the years. That's because I was convinced >> > that evaluation would stop at the first non-true expression (as >> > suggested by the man page). Until recently when I got a warning issued >> > by an expression located *after* the first non-true expression. This >> > was pretty unexpected/confusing! >> >> > If I can't rely on this "sequencing" feature, I guess I can always >> > do >> >> > stopifnot(A) >> > stopifnot(B) >> > ... >> >> > but I loose the conciseness of calling stopifnot() only once. >> > I could also use >> >> > stopifnot(A && B && ...) >> >> > but then I loose the conciseness of the error message i.e. it's going >> > to be something like >> >> > Error: A && B && ... is not TRUE >> >> > which can be pretty long/noisy compared to the message that reports >> > only the 1st error. >> >> >> > Conciseness/readability of the single call to stopifnot() and >> > conciseness of the error message are the features that made me >> > adopt stopifnot() in the 1st place. >> >> Yes, and that had been my design goal when I created it. >> >> I do tend agree with Hervé and Serguei here. >> >> > If stopifnot() cannot be revisited >> > to do "sequencing" then that means I will need to revisit all my calls >> > to stopifnot(). >> >> >> >> >> I could see an argument for a change that in the multiple argumetn >> >> case reports _all_ that fail; that would seem more useful to me than >> >> twisting the code into knots. >> >> Interesting... but really differing from the current documentation, >> >> > Why not. Still better than the current situation. But only if that >> > semantic seems more useful to people. Would be sad if usefulness >> > of one semantic or the other was decided based on trickiness of >> > implementation. >> >> Well, the trickiness should definitely play a role. >> Apart from functionality and semantics, long term maintenance >> and code readibility, even elegance have shown to be very >> important aspects of good code in ca 30 years of S and R programming. >> >> OTOH, as mentioned above, the creation of good error messages >> has been an important design goal of stopifnot() and hence I'm >> willing to accept the extra complexity of "patching up" the call >> used in the error / warning messages. >> >> Also, as a change to what I posted yesterday, I now plan to follow >> Peter Dalgaard's suggestion of using >> eval( .. ) >> instead of eval(cl[[i]], envir = ) >> as there may be cases where the former behaves better in lazy >> evaluation situations. >> (Other opinions on that ?) > If you go this route it would be useful to step back and think about > whether there might be some useful primitives to add to make this > easier, such as > - provide a dotsLength function for computing the number arguments > captured in a ... argument actually my current version did not use that
[Rd] Consider increasing the size of HSIZE
The HSIZE constant, which sets the size of the hash table used to store symbols is currently defined as `#define HSIZE 4119`. This value was last increased in r5182 on 1999-07-15. https://github.com/jimhester/hashsize#readme contains a code which simulates a normal R workflow by loading a handful of packages. In the example more than 20,000 symbols are included in the hash table, resulting in a load factor of greater than 5. The histogram in the linked repository shows the distribution of bucket sizes for the hash table. This high load factor means most queries into the hashtable result in a collision, requiring an additional linear search of the linked list for each bucket. Is is common for growable hash tables to increase their size when the load factor is greater than .75, so I think it would be of benefit to increase the HSIZE constant considerably; to 32768 or possibly 65536. This will result in increased memory requirements for the hash table, but far fewer collisions. To get an idea of the performance implications the repository includes some benchmarks of looking up the first element in a given hash bucket, and the last element (for buckets over 10 elements long). The results are somewhat noisy. Because longer symbol names hashing the name and performing string comparisons to searching the list tends to dominate the time. But for symbols of similar length there is a 2X-4X increase in lookup performance between retrieving the first element in a bucket to retrieving the last (indicated by the `total` column in the table). Increasing the size of `HSIZE` seems like a easy way to improve the performance of an operation that occurs thousands if not millions of times for every R session, with very limited cost in memory. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stopifnot() does not stop at first non-TRUE argument
On 05/16/2017 09:59 AM, peter dalgaard wrote: On 16 May 2017, at 18:37 , Suharto Anggono Suharto Anggono via R-devel wrote: switch(i, ...) extracts 'i'-th argument in '...'. It is like eval(as.name(paste0("..", i))) . Hey, that's pretty neat! Indeed! Seems like this topic is even more connected to switch() than I anticipated... H. -pd Just mentioning other things: - For 'n', n <- nargs() can be used. - sys.call() can be used in place of match.call() . --- peter dalgaard on Mon, 15 May 2017 16:28:42 +0200 writes: I think Hervé's idea was just that if switch can evaluate arguments selectively, so can stopifnot(). But switch() is .Primitive, so does it from C. if he just meant that, then "yes, of course" (but not so interesting). I think it is almost a no-brainer to implement a sequential stopifnot if dropping to C code is allowed. In R it gets trickier, but how about this: Something like this, yes, that's close to what Serguei Sokol had proposed (and of course I *do* want to keep the current sophistication of stopifnot(), so this is really too simple) Stopifnot <- function(...) { n <- length(match.call()) - 1 for (i in 1:n) { nm <- as.name(paste0("..",i)) if (!eval(nm)) stop("not all true") } } Stopifnot(2+2==4) Stopifnot(2+2==5, print("Hey!!!") == "Hey!!!") Stopifnot(2+2==4, print("Hey!!!") == "Hey!!!") Stopifnot(T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F,T) On 15 May 2017, at 15:37 , Martin Maechler wrote: I'm still curious about Hervé's idea on using switch() for the issue. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=mLJLORFCunDiCafHllurGVVVHiMf85ExkM7B5DngfIk&s=helOsmplADBmY6Ct7r30onNuD8a6GKz6yuSgjPxljeU&e= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] problem running test on a system without /etc/localtime
Hi all, A problem with tests while building R. I'm packaging R for Sisyphus repository and package build environment, by design, doesn't have /etc/localtime file present. This causes failure with Sys.timeone during test run: [builder@localhost tests]$ ../bin/R --vanilla < reg-tests-1d.R > ## PR#17186 - Sys.timezone() on some Debian-derived platforms > (S.t <- Sys.timezone()) Error in normalizePath("/etc/localtime") : (converted from warning) path[1]="/etc/localtime": No such file or directory Calls: Sys.timezone -> normalizePath Execution halted This is caused by this code: > Sys.timezone function (location = TRUE) { tz <- Sys.getenv("TZ", names = FALSE) if (!location || nzchar(tz)) return(Sys.getenv("TZ", unset = NA_character_)) >> lt <- normalizePath("/etc/localtime") [remainder of the code skkipped] File /etc/loclatime is optional and is not guaranteed to be present on any platform. And anyway, it is a good idea to first check that file exists before calling normalizePath. Sure, this can be worked around by setting TZ environment variable, but that causes tests to fail in another place: [builder@localhost tests]$ TZ="GMT" ../bin/R --vanilla < reg-tests-1d.R > ## format()ing invalid hand-constructed POSIXlt objects > d <- as.POSIXlt("2016-12-06"); d$zone <- 1 > tools::assertError(format(d)) Error: Failed to get error in evaluating format(d) Execution halted It seems that the best solution will be to patch Sys.timezone. -- KM __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] problem running test on a system without /etc/localtime
On 17 May 2017 at 03:35, Kirill Maslinsky wrote: | I'm packaging R for Sisyphus repository and package build environment, | by design, doesn't have /etc/localtime file present. This causes failure | with Sys.timeone during test run: [...] | It seems that the best solution will be to patch Sys.timezone. The file-based approach was AFAIK never successfully standardized. Setting a TZ is a defensible fallback. At some point last year I got so annoyed about this (and have the historical Debian attitude that a config file may be preferable to a environment variable [ which I now think is wrong for some things like TZ ]) I wrote the 'gettz' package. Quick demo in a Docker container with nothing set: edd@max:~$ docker run --rm -ti r-base /bin/bash root@f3848979cab4:/# echo $TZ echo $TZ root@f3848979cab4:/# R R R version 3.4.0 (2017-04-21) -- "You Stupid Darkness" Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > Sys.getenv("TZ") # as expected Sys.getenv("TZ") # as expected [1] "" > install.packages("gettz") install.packages("gettz") Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) trying URL 'https://cran.rstudio.com/src/contrib/gettz_0.0.3.tar.gz' Content type 'application/x-gzip' length 9064 bytes == downloaded 9064 bytes * installing *source* package ‘gettz’ ... ** package ‘gettz’ successfully unpacked and MD5 sums checked ** libs g++ -I/usr/share/R/include -DNDEBUG -fpic -g -O2 -fdebug-prefix-map=/build/r-base-3.4.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c gettz.cpp -o gettz.o g++ -shared -L/usr/lib/R/lib -Wl,-z,relro -o gettz.so gettz.o -L/usr/lib/R/lib -lR installing to /usr/local/lib/R/site-library/gettz/libs ** R ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded * DONE (gettz) The downloaded source packages are in ‘/tmp/RtmpLvuVz8/downloaded_packages’ > gettz::gettz() gettz::gettz() [1] "Etc/UTC" > As I recall, R got patched for R 3.3.3 or R 3.4.0 to return "" in more cases. gettz is a little smarter about looking in more locations that R was at the time (and hence not dissimilar to what was suggested earlier today, but operates at compiled-code level). It uses a trick I found on StackOverflow (and which is credited in the package). It is certainly not perfect, but it is "good enough" for the uses I had in packages requiring some localtime information. Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] problem running test on a system without /etc/localtime
On Tue, May 16, 2017 at 5:35 PM, Kirill Maslinsky wrote: > Hi all, > > A problem with tests while building R. > > I'm packaging R for Sisyphus repository and package build environment, > by design, doesn't have /etc/localtime file present. This causes failure > with Sys.timeone during test run: > > [builder@localhost tests]$ ../bin/R --vanilla < reg-tests-1d.R > >> ## PR#17186 - Sys.timezone() on some Debian-derived platforms >> (S.t <- Sys.timezone()) > Error in normalizePath("/etc/localtime") : > (converted from warning) path[1]="/etc/localtime": No such file or > directory > Calls: Sys.timezone -> normalizePath > Execution halted > > This is caused by this code: > >> Sys.timezone > function (location = TRUE) > { > tz <- Sys.getenv("TZ", names = FALSE) > if (!location || nzchar(tz)) > return(Sys.getenv("TZ", unset = NA_character_)) >>> lt <- normalizePath("/etc/localtime") > [remainder of the code skkipped] > > File /etc/loclatime is optional and is not guaranteed to be present on > any platform. And anyway, it is a good idea to first check that file > exists before calling normalizePath. Looking at the code (https://github.com/wch/r-source/blob/R-3-4-branch/src/library/base/R/datetime.R#L26), could it be that mustWork = FALSE (instead of the default NA) avoids the warning causes this check error? Index: src/library/base/R/datetime.R === --- src/library/base/R/datetime.R (revision 72684) +++ src/library/base/R/datetime.R (working copy) @@ -23,7 +23,7 @@ { tz <- Sys.getenv("TZ", names = FALSE) if(!location || nzchar(tz)) return(Sys.getenv("TZ", unset = NA_character_)) -lt <- normalizePath("/etc/localtime") # Linux, macOS, ... +lt <- normalizePath("/etc/localtime", mustWork = FALSE) # Linux, macOS, ... if (grepl(pat <- "^/usr/share/zoneinfo/", lt)) sub(pat, "", lt) else if (lt == "/etc/localtime" && file.exists("/etc/timezone") && dir.exists("/usr/share/zoneinfo") && /Henrik > > Sure, this can be worked around by setting TZ environment variable, but > that causes tests to fail in another place: > > [builder@localhost tests]$ TZ="GMT" ../bin/R --vanilla < reg-tests-1d.R > >> ## format()ing invalid hand-constructed POSIXlt objects >> d <- as.POSIXlt("2016-12-06"); d$zone <- 1 >> tools::assertError(format(d)) > Error: Failed to get error in evaluating format(d) > Execution halted > > It seems that the best solution will be to patch Sys.timezone. > > -- > KM > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel