Hello,
I forgot about %in%. Maybe because in the OP there were regex's.
And rowSums is much faster than apply.
In my tests this is 7 times faster than mine but with
%in% instead of grepl and apply(no, 1, any)
Hope this helps,
Rui Barradas
Às 18:34 de 03/06/20, Bert Gunter escreveu:
regex's are not needed. Using Rui's example:
> bad <- mapply(function(x) x %in% unwanted,dat)
> dat[!rowSums(bad),]
V1 V2 V3 V4 V5
2 E117 E113 E119 E100 E10
4 E114 E11 E119 E119 E114
5 E109 E111 E103 E103 E100
7 E108 E113 E119 E117 E11
8 E114 E105 E10 E109 E110
9 E119 E116 E108 E118 E119
10 E100 E110 E104 E111 E101
13 E111 E116 E101 E110 E116
15 E103 E11 E108 E10 E113
16 E111 E117 E103 E115 E119
17 E104 E110 E104 E117 E114
19 E100 E108 E10 E111 E105
20 E109 E115 E117 E108 E106
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, Jun 3, 2020 at 9:57 AM Rui Barradas <ruipbarra...@sapo.pt
<mailto:ruipbarra...@sapo.pt>> wrote:
Hello,
If you want to filter out rows with any of the values in a 'unwanted'
vector, try the following.
First, create a test data set.
x <- scan(what = character(), text = '
"E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102"
"E107" "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118"
"E116" "E112"
"E117"
')
set.seed(2020)
dat <- replicate(5, sample(x, 20, TRUE))
dat <- as.data.frame(dat)
Now, remove all rows that have at least one of "E102" or "E112"
unwanted <- c("E102", "E112")
no <- sapply(dat, function(x){
grepl(paste(unwanted, collapse = "|"), x)
})
no <- apply(no, 1, any)
dat[!no, ]
That's it, if I understood the problem.
Hope this helps,
Rui Barradas
Às 15:55 de 03/06/20, Ana Marija escreveu:
> Hello.
>
> I am trying to filter only rows that have ANY of these variables:
> E109, E119, E149
>
> so I did:
> controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))
>
> than I checked what I got:
>> s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE))
>> d0=unlist(s0)
>> d10=unique(d0)
>> d10
> [1] "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100"
"E106" "E102"
> [11] "E107"
> s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE))
> d1=unlist(s1)
> d11=unique(d1)
>> d11
> [1] "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118"
"E116" "E112"
> [11] "E117"
>
> I need help with changing this command
> controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))
>
> so that in the output I do not have any rows that include E102 or
E112?
>
> Thanks
> Ana
>
> ______________________________________________
> R-help@r-project.org <mailto:R-help@r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.