date:20160604

Re: [Rd] Submitting an updated package version to CRAN (Warning: non-ASCII characters)

2016-06-04 Thread Luck Buttered

Thank you, Prof Brian Ripley. That is a helpful resource!

Since my only non-ASCII characters are in an example data frame, I think I
will use the iconv() function to convert that example data frame into
UTF-8. Then, I will indicate that R version >=2.10 must be in the Depends
field of the DESCRIPTION file, as the resource indicates.


On Mon, May 23, 2016 at 3:46 AM, Prof Brian Ripley 
wrote:

> On 21/05/2016 21:25, Luck Buttered wrote:
>
>> Dear all:
>>
>> I am updating the version of an R package I submitted last year on CRAN
>> and
>> came across two questions that I would be grateful to seek any input
>> about:
>>
>> 1) In the updated version of the package, I am adding a second example
>> dataset. This example dataset is a subset of a public database that
>> contains thousands of names. Upon running devtools::check(), I am only
>> getting one warning. ("Warning: found non-ASCII strings").
>>
>> It seems this warning is coming from special characters in some of the
>> names. As it is ideal that the names should not be altered, I did not know
>> what approach to take. Should I simply include a note in my CRAN
>> submission
>> indicating that the non-ASCII characters are meaningfully inherent to the
>> example data? Or, should I convert the names to ASCII characters (if that
>> is easily possible for so many cases), and indicate to users that names
>> have been altered (special characters removed)?
>>
>
> You should follow the advice of the manual:
> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Encoding-issues
> .  There is not enough detail here to know what you currently do (let alone
> what you should do), but that message indicates that the encoding of
> non-ASCII stings (what you call 'special characters') has not been declared
> (and to be portable they should be in UTF-8).
>
> 2) I have never submitted an updated version of a package to CRAN. I am
>> considering following a similar process to what I did to submit my
>> original
>> version of the package to CRAN. That is, using devtools::release() and
>> including a note in a file called cran-comments.md to indicate that this
>> is
>> not an original version submission, but rather, an updated version
>> submission. I found these advice on Hadley Wickhams site (
>> http://r-pkgs.had.co.nz/release.html), but could not determine if this
>> was
>> appropriate for version update submissions as well.
>>
>
> There is a list for discussing package preparation, r-package-devel.
>
>
> Thank you for sharing any advice!
>>
>
>
>
> --
> Brian D. Ripley,  rip...@stats.ox.ac.uk
> Emeritus Professor of Applied Statistics, University of Oxford
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] factors with non-unique ("duplicated") levels have been deprecated since 2009 -- are more deprecated now -- and why you should be hesitant misusing suppressWarnings()

2016-06-04 Thread Martin Maechler

>From this bug report (it's a proposal for speedup only, not a bug),
   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16895#c6
the fact that you can construct factors with non-unique aka
"duplicated" levels in R  has been re-raised.  As mentioned there,
we had a small discussion here (on 'R-devel') a bit more than 7 years
ago,  where I had said that indeed R core had decided
that factors with duplicated levels will be deprecated from R version
2.10.0 on ... indeed a while ago.

As factors are not S4 objects, there is no really formal class
definition and no inherent class validation, but even then in 2009, we
had changed
`levels<-` such that it raised a warning when the levels were not unique:

> aba <- c("a","b","a"); x <- factor(aba, levels=aba)
Warning message:
In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else
paste0(labels,  :
  duplicated levels in factors are deprecated
>

We've finally decided to make this an error in R-devel  (which is
planned for release, probably as R 3.4.0, in April 2017):

> aba <- c("a","b","a"); x <- factor(aba, levels=aba)
Error in `levels<-`(`*tmp*`, value = if (nl == nL)
as.character(labels) else paste0(labels,  :
  factor level [3] is duplicated
>

If you know R well, you'll know that it is still very easy to
construct factors in R with invalid levels.
For this reason, also *printing* such factors now produces a warning:

> f
[1] 1 2 2 3 3 2 2 1
Levels: 1 2 2 3
Warning message:
In print.factor(x) : duplicated level [3] in factor
>


We have found at least two packages that are affected by this change
by no longer passing 'R CMD check' on R-devel:
1) plyr --- but there it is just a check which has previously checked
the *warning* mentioned above, which now is an error.  So only the
check must be amended (quite easily)
2) MicroDatosEs: now fails in  example(censo2010).
  and that is the reason for this posting:   I would claim that it is
not primarily the fault of 'MicroDatosEs' maintainer,  but actually of
a package that it depends on, 'memisc'.
 Now that has a "nice" S4 method for producing  factor from
"item.vector"  (though I would find an  as(..) method [defined via
setAs(..)] much more natural than an 'as.factor()' method) :

> selectMethod("as.factor", "item.vector")
Method Definition:

function (x)
{
labels <- x@value.labels
if (length(labels)) {
values <- labels@values
labels <- labels@.Data
}
else {
values <- labels <- sort(unique(x@.Data))
}
filter <- x@value.filter
use.levels <- if (length(filter))
is.valid2(values, filter)
else TRUE
f <- suppressWarnings(factor(x@.Data, levels = values[use.levels],
labels = labels[use.levels]))
if (length(attr(x, "contrasts")))
contrasts(f) <- contrasts(x)
f
}



and the  suppressWarnings(..)   has  "ensured"  all these years since
2009  that users and package writer were never alerted to the
programming "glitch" (of not ensuring levels/labels were correct.
They should have seen that factor() was called sometimes in situations
it produced an invalid factor namely one where some levels were
duplicated, and so the memisc authors could have
ensured that the above method would produce correct factors.

Martin Maechler,
R core team / ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] RProfmem output format

2016-06-04 Thread Henrik Bengtsson

I'm picking up this 5-year old thread.

1. About the four memory allocations without a stacktrace

I think the four memory allocations without a stacktrace reported by Rprofmem():

> Rprofmem(); x <- raw(2000); Rprofmem("")
> cat(readLines("Rprofmem.out", n=5, warn=FALSE), sep="\n")
192 :360 :360 :1064 :2040 :"raw"

are due to some initialization of R that is independent of Rprofmem(),
because they can be avoided if one allocates some memory before (in a
fresh R session):

> z <- raw(1000); dummy <- gc()
> Rprofmem(); x <- raw(2000); Rprofmem("")
> cat(readLines("Rprofmem.out", n=5, warn=FALSE), sep="\n")
2040 :"raw"


2. About missing newlines when stacktrace is empty

As a refresher, the problem is that memory allocations an empty
stracktrace are reported without newlines, i.e.

192 :360 :360 :1064 :2040 :"raw"

The question is why this is not reported as:

192 :
360 :
360 :
1064 :
2040 :"raw"

This was/is because C function R_OutputStackTrace() - part of
src/main/memory.c  - looks like:

static void R_OutputStackTrace(FILE *file)
{
int newline = 0;
RCNTXT *cptr;

for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
   && TYPEOF(cptr->call) == LANGSXP) {
   SEXP fun = CAR(cptr->call);
   if (!newline) newline = 1;
   fprintf(file, "\"%s\" ",
   TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
   "");
}
}
if (newline) fprintf(file, "\n");
}


Thomas, your last comment was:

> Yes. It's obviously better to always print a newline, and so clearly
> deliberate not to, that I suspect there may have been a good reason.
> If I can't work it out (after my grant deadline this week) I will just
> assume it's wrong.

When I search the code and the commit history
(https://github.com/wch/r-source/commit/3d5eb2a09f2d75893efdc8bbf1c72d17603886a0),
it appears that this was there from the very first commit.  Also,
searching the code for usages of R_OutputStackTrace(), I only find
R_ReportAllocation() and R_ReportNewPage(), both part of of
src/main/memory.c (see below).

static void R_ReportAllocation(R_size_t size)
{
if (R_IsMemReporting) {
if(size > R_MemReportingThreshold) {
   fprintf(R_MemReportingOutfile, "%lu :", (unsigned long) size);
   R_OutputStackTrace(R_MemReportingOutfile);
}
}
return;
}

static void R_ReportNewPage(void)
{
if (R_IsMemReporting) {
fprintf(R_MemReportingOutfile, "new page:");
R_OutputStackTrace(R_MemReportingOutfile);
}
return;
}


Could it be that when you wrote it you had another usage for
R_OutputStackTrace() in mind as well?  If so, it makes sense that
R_OutputStackTrace() shouldn't output a newline if the stack trace was
empty.  But if the above is the only usage, to me it looks pretty safe
to always add a newline.

> sessionInfo()
R version 3.3.0 Patched (2016-05-26 r70682)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

/Henrik


On Sun, May 15, 2011 at 1:16 PM, Thomas Lumley  wrote:
> On Mon, May 16, 2011 at 1:02 AM, Hadley Wickham  wrote:
>> So what causes allocations when the call stack is empty?  Something
>> internal?  Does the garbage collector trigger allocations (i.e. could
>> it be caused by moving data to contiguous memory)?
>
> The garbage collector doesn't move anything, it just swaps pointers in
> a linked list.
>
> The lexer, parser, and evaluator all have  to do some work before a
> function context is set up for the top-level function, so I assume
> that's where it is happening.
>
>> Any ideas what the correct thing to do with these memory allocations?
>> Ignore them because they're not really related to the function they're
>> attributed to?  Sum them up?
>>
>>> I don't see why this is done, and I may well be the person who did it
>>> (I don't have svn on this computer to check), but it is clearly
>>> deliberate.
>>
>> It seems like it would be more consistent to always print a newline,
>> and then it would obvious those allocations occurred when the call
>> stack was empty.  This would make parsing the file a little bit
>> easier.
>
> Yes. It's obviously better to always print a newline, and so clearly
> deliberate not to, that I suspect there may have been a good reason.
> If I can't work it out (after my grant deadline this week) I will just
> assume it's wrong.
>
>
>-thomas
>
> --
> Thomas Lumley
> Professor of Biostatistics
> University of Auckland
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailin

Re: [Rd] Submitting an updated package version to CRAN (Warning: non-ASCII characters)

[Rd] factors with non-unique ("duplicated") levels have been deprecated since 2009 -- are more deprecated now -- and why you should be hesitant misusing suppressWarnings()

Re: [Rd] RProfmem output format

3 matches

Site Navigation

Mail list logo

Footer information

Re: [Rd] Submitting an updated package version to CRAN (Warning: non-ASCII characters)

[Rd] factors with non-unique ("duplicated") levels have been deprecated since 2009 -- are *more* deprecated now -- and why you should be hesitant misusing suppressWarnings()

Re: [Rd] RProfmem output format

3 matches

Mail list logo

[Rd] factors with non-unique ("duplicated") levels have been deprecated since 2009 -- are more deprecated now -- and why you should be hesitant misusing suppressWarnings()