Re: [Rd] Submitting an updated package version to CRAN (Warning: non-ASCII characters)
Thank you, Prof Brian Ripley. That is a helpful resource! Since my only non-ASCII characters are in an example data frame, I think I will use the iconv() function to convert that example data frame into UTF-8. Then, I will indicate that R version >=2.10 must be in the Depends field of the DESCRIPTION file, as the resource indicates. On Mon, May 23, 2016 at 3:46 AM, Prof Brian Ripley wrote: > On 21/05/2016 21:25, Luck Buttered wrote: > >> Dear all: >> >> I am updating the version of an R package I submitted last year on CRAN >> and >> came across two questions that I would be grateful to seek any input >> about: >> >> 1) In the updated version of the package, I am adding a second example >> dataset. This example dataset is a subset of a public database that >> contains thousands of names. Upon running devtools::check(), I am only >> getting one warning. ("Warning: found non-ASCII strings"). >> >> It seems this warning is coming from special characters in some of the >> names. As it is ideal that the names should not be altered, I did not know >> what approach to take. Should I simply include a note in my CRAN >> submission >> indicating that the non-ASCII characters are meaningfully inherent to the >> example data? Or, should I convert the names to ASCII characters (if that >> is easily possible for so many cases), and indicate to users that names >> have been altered (special characters removed)? >> > > You should follow the advice of the manual: > https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Encoding-issues > . There is not enough detail here to know what you currently do (let alone > what you should do), but that message indicates that the encoding of > non-ASCII stings (what you call 'special characters') has not been declared > (and to be portable they should be in UTF-8). > > 2) I have never submitted an updated version of a package to CRAN. I am >> considering following a similar process to what I did to submit my >> original >> version of the package to CRAN. That is, using devtools::release() and >> including a note in a file called cran-comments.md to indicate that this >> is >> not an original version submission, but rather, an updated version >> submission. I found these advice on Hadley Wickhams site ( >> http://r-pkgs.had.co.nz/release.html), but could not determine if this >> was >> appropriate for version update submissions as well. >> > > There is a list for discussing package preparation, r-package-devel. > > > Thank you for sharing any advice! >> > > > > -- > Brian D. Ripley, rip...@stats.ox.ac.uk > Emeritus Professor of Applied Statistics, University of Oxford > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] factors with non-unique ("duplicated") levels have been deprecated since 2009 -- are *more* deprecated now -- and why you should be hesitant misusing suppressWarnings()
>From this bug report (it's a proposal for speedup only, not a bug), https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16895#c6 the fact that you can construct factors with non-unique aka "duplicated" levels in R has been re-raised. As mentioned there, we had a small discussion here (on 'R-devel') a bit more than 7 years ago, where I had said that indeed R core had decided that factors with duplicated levels will be deprecated from R version 2.10.0 on ... indeed a while ago. As factors are not S4 objects, there is no really formal class definition and no inherent class validation, but even then in 2009, we had changed `levels<-` such that it raised a warning when the levels were not unique: > aba <- c("a","b","a"); x <- factor(aba, levels=aba) Warning message: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, : duplicated levels in factors are deprecated > We've finally decided to make this an error in R-devel (which is planned for release, probably as R 3.4.0, in April 2017): > aba <- c("a","b","a"); x <- factor(aba, levels=aba) Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, : factor level [3] is duplicated > If you know R well, you'll know that it is still very easy to construct factors in R with invalid levels. For this reason, also *printing* such factors now produces a warning: > f [1] 1 2 2 3 3 2 2 1 Levels: 1 2 2 3 Warning message: In print.factor(x) : duplicated level [3] in factor > We have found at least two packages that are affected by this change by no longer passing 'R CMD check' on R-devel: 1) plyr --- but there it is just a check which has previously checked the *warning* mentioned above, which now is an error. So only the check must be amended (quite easily) 2) MicroDatosEs: now fails in example(censo2010). and that is the reason for this posting: I would claim that it is not primarily the fault of 'MicroDatosEs' maintainer, but actually of a package that it depends on, 'memisc'. Now that has a "nice" S4 method for producing factor from "item.vector" (though I would find an as(..) method [defined via setAs(..)] much more natural than an 'as.factor()' method) : > selectMethod("as.factor", "item.vector") Method Definition: function (x) { labels <- x@value.labels if (length(labels)) { values <- labels@values labels <- labels@.Data } else { values <- labels <- sort(unique(x@.Data)) } filter <- x@value.filter use.levels <- if (length(filter)) is.valid2(values, filter) else TRUE f <- suppressWarnings(factor(x@.Data, levels = values[use.levels], labels = labels[use.levels])) if (length(attr(x, "contrasts"))) contrasts(f) <- contrasts(x) f } and the suppressWarnings(..) has "ensured" all these years since 2009 that users and package writer were never alerted to the programming "glitch" (of not ensuring levels/labels were correct. They should have seen that factor() was called sometimes in situations it produced an invalid factor namely one where some levels were duplicated, and so the memisc authors could have ensured that the above method would produce correct factors. Martin Maechler, R core team / ETH Zurich __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] RProfmem output format
I'm picking up this 5-year old thread. 1. About the four memory allocations without a stacktrace I think the four memory allocations without a stacktrace reported by Rprofmem(): > Rprofmem(); x <- raw(2000); Rprofmem("") > cat(readLines("Rprofmem.out", n=5, warn=FALSE), sep="\n") 192 :360 :360 :1064 :2040 :"raw" are due to some initialization of R that is independent of Rprofmem(), because they can be avoided if one allocates some memory before (in a fresh R session): > z <- raw(1000); dummy <- gc() > Rprofmem(); x <- raw(2000); Rprofmem("") > cat(readLines("Rprofmem.out", n=5, warn=FALSE), sep="\n") 2040 :"raw" 2. About missing newlines when stacktrace is empty As a refresher, the problem is that memory allocations an empty stracktrace are reported without newlines, i.e. 192 :360 :360 :1064 :2040 :"raw" The question is why this is not reported as: 192 : 360 : 360 : 1064 : 2040 :"raw" This was/is because C function R_OutputStackTrace() - part of src/main/memory.c - looks like: static void R_OutputStackTrace(FILE *file) { int newline = 0; RCNTXT *cptr; for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) { if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN)) && TYPEOF(cptr->call) == LANGSXP) { SEXP fun = CAR(cptr->call); if (!newline) newline = 1; fprintf(file, "\"%s\" ", TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) : ""); } } if (newline) fprintf(file, "\n"); } Thomas, your last comment was: > Yes. It's obviously better to always print a newline, and so clearly > deliberate not to, that I suspect there may have been a good reason. > If I can't work it out (after my grant deadline this week) I will just > assume it's wrong. When I search the code and the commit history (https://github.com/wch/r-source/commit/3d5eb2a09f2d75893efdc8bbf1c72d17603886a0), it appears that this was there from the very first commit. Also, searching the code for usages of R_OutputStackTrace(), I only find R_ReportAllocation() and R_ReportNewPage(), both part of of src/main/memory.c (see below). static void R_ReportAllocation(R_size_t size) { if (R_IsMemReporting) { if(size > R_MemReportingThreshold) { fprintf(R_MemReportingOutfile, "%lu :", (unsigned long) size); R_OutputStackTrace(R_MemReportingOutfile); } } return; } static void R_ReportNewPage(void) { if (R_IsMemReporting) { fprintf(R_MemReportingOutfile, "new page:"); R_OutputStackTrace(R_MemReportingOutfile); } return; } Could it be that when you wrote it you had another usage for R_OutputStackTrace() in mind as well? If so, it makes sense that R_OutputStackTrace() shouldn't output a newline if the stack trace was empty. But if the above is the only usage, to me it looks pretty safe to always add a newline. > sessionInfo() R version 3.3.0 Patched (2016-05-26 r70682) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base /Henrik On Sun, May 15, 2011 at 1:16 PM, Thomas Lumley wrote: > On Mon, May 16, 2011 at 1:02 AM, Hadley Wickham wrote: >> So what causes allocations when the call stack is empty? Something >> internal? Does the garbage collector trigger allocations (i.e. could >> it be caused by moving data to contiguous memory)? > > The garbage collector doesn't move anything, it just swaps pointers in > a linked list. > > The lexer, parser, and evaluator all have to do some work before a > function context is set up for the top-level function, so I assume > that's where it is happening. > >> Any ideas what the correct thing to do with these memory allocations? >> Ignore them because they're not really related to the function they're >> attributed to? Sum them up? >> >>> I don't see why this is done, and I may well be the person who did it >>> (I don't have svn on this computer to check), but it is clearly >>> deliberate. >> >> It seems like it would be more consistent to always print a newline, >> and then it would obvious those allocations occurred when the call >> stack was empty. This would make parsing the file a little bit >> easier. > > Yes. It's obviously better to always print a newline, and so clearly > deliberate not to, that I suspect there may have been a good reason. > If I can't work it out (after my grant deadline this week) I will just > assume it's wrong. > > >-thomas > > -- > Thomas Lumley > Professor of Biostatistics > University of Auckland > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailin