I think this is a "Doctor, it hurts when I do this" issue. 

The root of it is that as.character() behaves differently on integers and 
floating values.

> factor(100000)
[1] 1e+05
Levels: 1e+05

> factor(100000,levels=100000)
[1] 1e+05
Levels: 1e+05

> factor(100000,levels=100000:100000)
[1] <NA>

> factor(as.integer(100000),levels=100000:100000)
[1] 100000
Levels: 100000

Or, more directly: It is the difference between these

> as.character(seq(99999L,100001L,1L))
[1] "99999"  "100000" "100001"
> as.character(seq(99999L,100001L,1))
[1] "99999"  "1e+05"  "100001"

in which the formatting code has detected that "1e+05" is shorter than 
"100000", but won't convert integers to scientific notation.

You can play whack-a-mole with this sort of issue: Fix a perceived problem in 
one place only to find a new problem popping up elsewhere. It is probably 
better just to never trust character conversion of numbers beyond 99999.

- pd



> On 23 May 2024, at 18:33 , Andrew Gustar <andrew_gus...@msn.com> wrote:
> 
> This thread on stackoverflow illustrates the problem... 
> https://stackoverflow.com/questions/78523612/r-factor-from-numeric-vector-drops-every-100-000th-element-from-its-levels
> 
> The issue is that factor(), applied to numeric values, uses as.character(), 
> which converts numbers to character strings according to the value of scipen. 
> The stackoverflow thread illustrates a case where this causes some factor 
> levels to become NA. There is also an inconsistency between the treatment of 
> numeric and integer values.
> 
> On the face of it, using format(..., scientific = FALSE) instead of 
> as.character() would solve the problem, but this probably needs careful 
> thinking through in case of other side effects!
> 
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to