Currently, substring defaults to last=1000000L, which strongly
suggests the intent is to default to "nchar(x)" without having to
compute/allocate that up front.

Unfortunately, this default makes no sense for "very large" strings
which may exceed 1000000L in "width".

The max width of a string is .Machine$integer.max-1:

# works
x = strrep(" ", .Machine$integer.max-1L)
# fails
x = strrep(" ", .Machine$integer.max)
Error in strrep(" ", .Machine$integer.max) :
  'Calloc' could not allocate memory (18446744071562067968 of 1 bytes)

(see also the comment in src/main/character.c: "Character strings in R
are less than 2^31-1 bytes, so we use int not size_t.")

So it seems to me either .Machine$integer.max or
.Machine$integer.max-1L would be a more sensible default. Am I missing
something?

Mike C

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to