[Rd] Subset has No Examples for Vector Data

2023-10-10 Thread Dario Strbenac via R-devel
Hello,

Could the documentation page for subset gain an example of how to use it for 
something other than a data frame or matrix? I arrived at

> random <- LETTERS[rpois(100, 10)]
> subset(table(random), x > 10)
named integer(0)

I expected a part of the table to be returned rather than an empty vector.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subset has No Examples for Vector Data

2023-10-10 Thread Rui Barradas

Às 11:00 de 10/10/2023, Dario Strbenac via R-devel escreveu:

Hello,

Could the documentation page for subset gain an example of how to use it for 
something other than a data frame or matrix? I arrived at


random <- LETTERS[rpois(100, 10)]
subset(table(random), x > 10)

named integer(0)

I expected a part of the table to be returned rather than an empty vector.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Hello,

If you want to subset then you must refer to a variable in the original 
data set. In your example there is no 'x' in the output of table.



set.seed(2023)
random <- LETTERS[rpois(100, 10)]
(tbl <- table(random))
#> random
#>  C  D  E  F  G  H  I  J  K  L  M  N  P  Q  S
#>  1  2  4  4  8 13 14 10 17  9 11  2  1  3  1

subset(tbl, tbl > 10)
#> random
#>  H  I  K  M
#> 13 14 17 11


So it is subsetting vector data as wanted.
It is your expectation that a part of the table should be returned that 
is not in agreement with the data you have.


Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] FR: valid_regex() to test string validity as a regular expression

2023-10-10 Thread Michael Chirico via R-devel
> Grepping an empty string might work in many cases...

That's precisely why a base R offering is important, as a surer way of
validating in all cases. To be clear I am trying to directly access the
results of tre_regcomp().

> it is probably more portable to simply be prepared to propagate such
errors from the actual use on real inputs

That works best in self-contained calls -- foo(re) and we execute re inside
foo().

But the specific context where I found myself looking for a regex validator
is more complicated (https://github.com/r-lib/lintr/pull/2225). User
supplies a regular expression in a configuration file, only "later" is it
actually supplied to grepl().

Till now, we've done your suggestion -- just surface the regex error at run
time. But our goal is to make it friendlier and fail earlier at "compile
time" as the config is loaded, "long" before any regex is actually executed.

At a bare minimum this is a good place to return a classed warning (say
invalid_regex_warning) to allow finer control than tryCatch(condition=).

On Mon, Oct 9, 2023, 11:30 PM Tomas Kalibera 
wrote:

>
> On 10/10/23 01:57, Michael Chirico via R-devel wrote:
>
> It will be useful to package authors trying to validate input which is
> supposed to be a valid regular expression.
>
> As near as I can tell, the only way we can do so now is to run any
> regex function and check for the warning and/or condition to bubble
> up:
>
> valid_regex <- function(str) {
>   stopifnot(is.character(str), length(str) == 1L)
>   !inherits(tryCatch(grepl(str, ""), condition = identity), "condition")
> }
>
> That's pretty hefty/inscrutable for such a simple validation. I see a
> variety of similar approaches in CRAN packages [1], all slightly
> different. It would be good for R to expose a "canonical" way to run
> this validation.
>
> At root, the problem is that R does not expose the regex compilation
> routines like 'tre_regcomp', so from the R side we have to resort to
> hacky approaches.
>
> Hi Michael,
>
> I don't think you need compilation functions for that. If a regular
> expression is found invalid by a specific third party library R uses, the
> library should return and error to R and R should return an error to you,
> and you should probably propagate that to your users. Grepping an empty
> string might work in many cases as a test, but it is probably more portable
> to simply be prepared to propagate such errors from the actual use on real
> inputs. In theory, there could be some optimization for a particular case,
> the checking may not be the same - but that is the same say for compilation
> and checking.
>
> Things get slightly complicated by encoding/useBytes modes
> (tre_regwcomp, tre_regncomp, tre_regwncomp, tre_regcompb,
> tre_regncompb; all in tre.h), but all are already present in other
> regex routines, so this is doable.
>
> Re encodings, simply R strings should be valid in their encoding. This is
> not just for regular expressions but also for anything else. You shouldn't
> assume that R can handle invalid strings in any reasonable way. Definitely
> you shouldn't try adding invalid strings in tests - behavior with invalid
> strings is unspecified. To test whether a string is valid, there is
> validEnc() (or validUTF8()). But, again, it is probably safest to propagate
> errors from the regular expression R functions (in case the checks differ,
> particularly for non-UTF-8), also, duplicating the encoding checks can be a
> non-trivial overhead.
>
> If there was a strong need to have an automated way to somehow classify
> specifically errors from the regex libraries, perhaps R could attach some
> classes to them when the library tells.
>
> Tomas
>
> Exposing a function to compile regular expressions is common in other
> languages, e.g. Go [2], Python [3], JavaScript [4].
>
> [1] 
> https://github.com/search?q=lang%3AR+%2Fis%5Ba-zA-Z0-9._%5D*reg%5Ba-zA-Z0-9._%5D*ex.*%28%3C-%7C%3D%29%5Cs*function%2F+org%3Acran&type=code
> [2] https://pkg.go.dev/regexp#Compile
> [3] https://docs.python.org/3/library/re.html#re.compile
> [4] 
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
>
> __r-de...@r-project.org mailing 
> listhttps://stat.ethz.ch/mailman/listinfo/r-devel
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subset has No Examples for Vector Data

2023-10-10 Thread Martin Maechler
> Rui Barradas 
> on Tue, 10 Oct 2023 12:17:19 +0100 writes:

> Às 11:00 de 10/10/2023, Dario Strbenac via R-devel escreveu:
>> Hello,
>> 
>> Could the documentation page for subset gain an example of how to use it 
for something other than a data frame or matrix? I arrived at
>> 
>>> random <- LETTERS[rpois(100, 10)]
>>> subset(table(random), x > 10)
>> named integer(0)
>> 
>> I expected a part of the table to be returned rather than an empty 
vector.
>> 
>> --
>> Dario Strbenac
>> University of Sydney
>> Camperdown NSW 2050
>> Australia

> If you want to subset then you must refer to a variable in the original 
> data set. In your example there is no 'x' in the output of table.


> set.seed(2023)
> random <- LETTERS[rpois(100, 10)]
> (tbl <- table(random))
> #> random
> #>  C  D  E  F  G  H  I  J  K  L  M  N  P  Q  S
> #>  1  2  4  4  8 13 14 10 17  9 11  2  1  3  1

> subset(tbl, tbl > 10)
> #> random
> #>  H  I  K  M
> #> 13 14 17 11


> So it is subsetting vector data as wanted.
> It is your expectation that a part of the table should be returned that 
> is not in agreement with the data you have.

> Hope this helps,

> Rui Barradas

Thank you, Rui, for helping!

yes, *help*ing
  <---> 
that (original post) was very much for R-help,  not at all for R-devel ...

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel