Using strcapture seems like a great workaround for use cases of this kind, at 
least in base R. I agree as well that filling with NA for regmatches(..., 
gregexpr(...)) makes less sense, given the outputs are lists and thus are 
retained in the list.  Also, I suppose in the meantime the stringr package can 
be used when non-dropping vector outputs are desired.

However, I do think that non-dropping regex string extraction/matching in 
vector outputs from regmatches(..., regexpr(...)) or strextract would be a 
great (optional) design feature to have in base R for sake of consistency with 
the rest of the language (missing values, denoted by NA, are generally not 
dropped from vectors elsewhere and seem to agree conceptually with empty 
matches) and would help R to reach greater feature parity with MATLAB and 
Pandas in this respect (granted, Pandas is not technically a language on its 
own).

Although I have written personal wrappers and used stringr to accomplish the 
non-dropping behavior in the past, I have nevertheless found the behavior of 
base R string operations mildly astonishing (in the sense of POLA) and think 
others may have as well. As the stringr documentation puts it, "they lag behind 
the string operations in other programming languages, so that some things that 
are easy to do in languages like Ruby or Python are rather hard to do in R." 
Since consistent, robust string operations are often a standard base feature of 
other data science and scientific programming languages, I think this minor 
change would be a great improvement to the language and hopefully help promote 
adoption of R, especially given the surge in text-based data analysis in recent 
years.

Alternatively, although I generally don't use the Tidyverse packages very 
often, stringr seems like a great candidate for inclusion in base or 
recommended R if the R Core team and the package developer see it fitting (just 
a suggestion and probably a long shot). 

However, I will try not to belabor this point further. In any case, thank you!

Best,CG
CG
        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to