[Rd] bug in "all.vars()"

2020-12-15 Thread Daniel Lüdecke
Hello,
I'm not sure if the following is intended, or a bug, but "all.vars()"
returns different values, depending on whether "TRUE" or "T" is used inside
"poly()":

m1 <- lm(Sepal.Length ~ poly(Sepal.Width, 2, raw = TRUE) + Petal.Length,
data = iris)
m2 <- lm(Sepal.Length ~ poly(Sepal.Width, 2, raw = T) + Petal.Length, data =
iris)

all.vars(formula(m1))
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length"
all.vars(formula(m2))
#> [1] "Sepal.Length" "Sepal.Width"  "T""Petal.Length"

I know that "T" is no reserved word, but rather a global variable.
Nonetheless, I just wanted to be clear if there is a bug, or everything
works as intended.

Best
Daniel


--

_

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Joachim Prölß, 
Prof. Dr. Blanche Schwappach-Pignataro, Marya Verdel
_

SAVE PAPER - THINK BEFORE PRINTING

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug in "all.vars()"

2020-12-15 Thread Duncan Murdoch

On 15/12/2020 7:33 a.m., Daniel Lüdecke wrote:

Hello,
I'm not sure if the following is intended, or a bug, but "all.vars()"
returns different values, depending on whether "TRUE" or "T" is used inside
"poly()": >
m1 <- lm(Sepal.Length ~ poly(Sepal.Width, 2, raw = TRUE) + Petal.Length,
data = iris)
m2 <- lm(Sepal.Length ~ poly(Sepal.Width, 2, raw = T) + Petal.Length, data =
iris)

all.vars(formula(m1))
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length"
all.vars(formula(m2))
#> [1] "Sepal.Length" "Sepal.Width"  "T""Petal.Length"

I know that "T" is no reserved word, but rather a global variable.
Nonetheless, I just wanted to be clear if there is a bug, or everything
works as intended.



It looks as though it is working as documented.  formula(m2) gives

Sepal.Length ~ poly(Sepal.Width, 2, raw = T) + Petal.Length

and all.vars says it returns "a character vector containing all the 
names which occur in an expression or call."  "T" is a name.


There are good reasons why "R CMD check" warns about using T when you 
mean TRUE.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] quantile() names

2020-12-15 Thread Merkle, Edgar C.
Avi,

On Mon, 2020-12-14 at 18:00 -0500, Avi Gross wrote:

Question: is the part that Ed Merkle is asking about the change in the

expected NAME associated with the output?

You are right: the question is about the name changing to "98%", when the 
returned object is the 97.5th percentile.

It is indeed easy to set names=FALSE here. But there can still be a problem 
when the user sets options(digits=2), then a package calls quantile(x, .975) 
and expects an object that has a name of "97.5%".

I think the easiest solution is to tell the user not to set options(digits=2), 
but it also seems like the "98%" name is not the best result. But Gabriel is 
correct that we would still need to consider how to handle something like 
quantile(x, 1/3). Maybe it is not a big enough issue to warrant changing 
anything.

Ed




He changed a sort of global parameter affecting how many digits he wants any

compliant function to display. So when he asked for a named vector, the

chosen name was based on his request and limited when possible to two

digits.


x <- 1:1000

temp <- quantile(x, .975)


If you examine temp, you will see it is a vector containing (as it happens)

a single numeric item (as it happens a double) with the value of 975. But

the name associated is a character string with a "%" appended as shown

below:


str(temp)

Named num 975

- attr(*, "names")= chr "98%"


If you do not want a name attached to the vector, add an option:


quantile(x, .975, names=FALSE)


If you want the name to be longer or different, you can do that after.


names(temp)

[1] "98%"


So change it yourself:


temp

98%

975

 names(temp) <- paste(round(temp, 3), "%", sep="")

temp

975.025%

975


The above is for illustration with tabs inserted to show what is in the

output. You probably do not need a name for your purposes and if you ask for

multiple quantiles you might need to adjust the above.


Of course if you wanted another non-default "type" of calculation, what Abby

offered may also apply.


-Original Message-

From: R-devel 
mailto:r-devel-boun...@r-project.org>> On Behalf 
Of Abby Spurdle

Sent: Monday, December 14, 2020 4:48 PM

To: Merkle, Edgar C. mailto:merk...@missouri.edu>>

Cc: r-devel@r-project.org

Subject: Re: [Rd] quantile() names


The "value" is *not* 975.

It's 975.025.


The results that you're observing, are merely the byproduct of formatting.


Maybe, you should try:


quantile (x, .975, type=4)


Which perhaps, using default options, produces the result you're expecting?



On Tue, Dec 15, 2020 at 8:55 AM Merkle, Edgar C. 
mailto:merk...@missouri.edu>>

wrote:


All,


Consider the code below


options(digits=2)

x <- 1:1000

quantile(x, .975)


The value returned is 975 (the 97.5th percentile), but the name has been

shortened to "98%" due to the digits option. Is this intended? I would have

expected the name to also be "97.5%" here. Alternatively, the returned value

might be 980 in order to match the name of "98%".


Best,

Ed



[[alternative HTML version deleted]]


__

R-devel@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel


__

R-devel@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel



Scanned by McAfee and confirmed virus-free.

Find out more here: https://bit.ly/2zCJMrO




[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] quantile() names

2020-12-15 Thread Avi Gross via R-devel
Thank you for explaining, Ed. It makes looking at the issue raised much
easier.

 

As I understand it, you are not really asking about some thing fully in your
control. You are asking how any function like quantile() should behave when
a user has altered something global or at least global within a package,
such as this:

 

> quantile(x, c(.95, .975, .99000))

95%   97.5% 99% 

950.050 975.025 990.010 

> dig.it <- options(digits=2)

> dig.it

$digits

[1] 7

 

I did it that way so I could re-set it!

 

I looked to see if quantile() is written in base R and it seems to be a
generic that I would have to hunt down so I stopped for now.

 

Here is what I get BEFORE changing the option for digits:

 

> x <- 1:1000

> quantile(x, probs=c(.95, .975, .99000))

95%   97.5% 99% 

950.050 975.025 990.010

 

Note I used the fuller version asking for multiple thresholds so I could see
what happened if I used more zeroes. Note that trailing zeroes are not shown
in the name of the third element of the vector. So I can suggest the program
is not getting the unevaluated text to use but is using the value of the
vector. Now I set the number of digits to 2, globally, and repeat:

 

> quantile(x, probs=c(.95, .975, .99000))

95% 98% 99% 

950 975 990

 

I notice several things as others have pointed out. There seems to be a
truncation in the values shown so nothing is now shown past the decimal
point. But maybe not as adding an argument of 1/3 gives 334 rather than 333.

 

> quantile(x, probs=c(.95, .975, .99000, 1/3))

95% 98% 99% 33% 

950 975 990 334

 

Now the names are apparently rounded as discussed, with the percent symbol
appended.

 

So what would you propose? Within the function there seem to be two parts
dealing with displaying the result and it looks like the original number
loses precision as handing the above to round(., 7) shows no change. So are
you asking it to parse the name different than the value even though there
is a global variable set specifying the digits they want?

 

If it really mattered, I suggest one solution may be to allow one or two
additional arguments to a function like quantile like:

 

quantile(x, ., digits=5, names=c("95%", "97.5%", .) )

 

So if a user really wanted to live in their own world of fewer digits they
could specify what labels they wanted and could ask for "high", "Higher" and
"HIGHEST" or whatever makes them happy. But, as noted, any user wanting that
level of control can change the labels afterward. But you are correct in
some package using quantile() and calling out the results individually by
name will not be able to consistently and reliably use that technique. But
can they use it now? I tried using variations on $.95% such as this and they
fail such as for quantile(x, c(.95, .975, .99000))$`95%` and the same for
using [] notation. These identifiers were not chosen to be used this way.
You can get them positionally:

 

> quantile(x, c(.95, .975, .99000))[1]

95% 

950 

> quantile(x, c(.95, .975, .99000))[2]

98% 

975

 

If you convert the darn out put from a vector to a list, though, it works,
using grave accents:

 

> as.list(quantile(x, c(.95, .975, .99000)))$`98%`

[1] 975

 

So, I doubt many would play games like me to find some way to select by
name. Odds are they might use position or get one at a time. The name is
more for humans to read, I would think.

 

 

Just my two cents. When an instruction impacts multiple places, it can be
ambiguous and changing global variables is, well, global.

 

Which raise another question here is why did the people making choices
choose silly names that are all numeric with maybe a decimal point and
ending in a character like % that has other uses? A cousin of quantile is
fivenum() that returns Tukey's five number summary as useful in making
boxplots:

 

> fivenum(x)

[1]1  250  500  750 1000

 

This returned a vector with no names. You can only index it by number,
albeit the columns are always in a fixed order and you know what to expect
in each. Another cousin returns a more complex structure 

 

> boxplot.stats(x)

$stats

[1]1  250  500  750 1000

 

$n

[1] 1000

 

$conf

[1] 476 525

 

$out

integer(0)

 

> boxplot.stats(x)$stats

[1]1  250  500  750 1000

 

That is a list of items but the first item is a vector with no names that is
the same as for fivenum().

 

Would it make more sense for the column names of the output looked more
like: 

 

> temp <- quantile(x, c(.95, .975, .99000))

> names(temp) <- c("perc95", "perc98", "perc99")

> temp

perc95 perc98 perc99 

   950975990

 

So you could do this to a vector:

 

> temp["perc98"]

perc98 

   975

Or do even more to a list:

 

> as.list(temp)$perc98

[1] 975

 

My feeling is some things are not really bugs but more like FEATURES you
normally live with and if it matters, work around it. I had trouble a while
ago with a laavan() case I ran where very rarely the program simply broke.
When in a big loo

[Rd] UTF-8 characters in Rd files

2020-12-15 Thread Gábor Csárdi
Dear list,

I am trying to see if there is a way to expand the set of UTF-8
characters that we can use in Rd files. The main blocker is LaTeX when
building the PDF manual.

It is possible to use the inputenx LaTeX package, instead of inputenc,
which is an improvement, by setting the RD2PDF_INPUTENC env var before
running R CMD Rd2pdf. However, I don't see a way to make this
automatic for a package, so we cannot use it for CRAN packages, as far
as I can tell.

Am I missing something? Would it make sense to have a way to specify
RD2PDF_INPUTENC (and possibly other similar env vars) for packages?

As for the possible implementation, one way would be to have a file
called `environ` or something similar in /man, that could define env
vars, and Rd2pdf would just read it in with readRenviron().

Thanks,
Gabor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel