Re: [R] how to filter variables which appear in any row but do not include

Rui Barradas Wed, 03 Jun 2020 12:32:26 -0700

Hello,

I forgot about %in%. Maybe because in the OP there were regex's.
And rowSums is much faster than apply.


In my tests this is 7 times faster than mine but with

%in% instead of grepl and apply(no, 1, any)

Hope this helps,

Rui Barradas

Às 18:34 de 03/06/20, Bert Gunter escreveu:

regex's are not needed. Using Rui's example:

 > bad <- mapply(function(x) x %in% unwanted,dat)
 > dat[!rowSums(bad),]

      V1   V2   V3   V4   V5
2  E117 E113 E119 E100  E10
4  E114  E11 E119 E119 E114
5  E109 E111 E103 E103 E100
7  E108 E113 E119 E117  E11
8  E114 E105  E10 E109 E110
9  E119 E116 E108 E118 E119
10 E100 E110 E104 E111 E101
13 E111 E116 E101 E110 E116
15 E103  E11 E108  E10 E113
16 E111 E117 E103 E115 E119
17 E104 E110 E104 E117 E114
19 E100 E108  E10 E111 E105
20 E109 E115 E117 E108 E106

Bert Gunter

"The trouble with having an open mind is that people keep coming alongand sticking things into it."

-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Wed, Jun 3, 2020 at 9:57 AM Rui Barradas <ruipbarra...@sapo.pt<mailto:ruipbarra...@sapo.pt>> wrote:


    Hello,

    If you want to filter out rows with any of the values in a 'unwanted'
    vector, try the following.

    First, create a test data set.

    x <- scan(what = character(), text = '
    "E10"  "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102"
    "E107" "E11"  "E119" "E113" "E115" "E111" "E114" "E110" "E118"
    "E116" "E112"
    "E117"
    ')

    set.seed(2020)
    dat <- replicate(5, sample(x, 20, TRUE))
    dat <- as.data.frame(dat)


    Now, remove all rows that have at least one of "E102" or "E112"


    unwanted <- c("E102", "E112")
    no <- sapply(dat, function(x){
        grepl(paste(unwanted, collapse = "|"), x)
    })
    no <- apply(no, 1, any)
    dat[!no, ]


    That's it, if I understood the problem.


    Hope this helps,

    Rui Barradas


    Às 15:55 de 03/06/20, Ana Marija escreveu:
     > Hello.
     >
     > I am trying to filter only rows that have ANY of these variables:
     > E109, E119, E149
     >
     > so I did:
     > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))
     >
     > than I checked what I got:
     >> s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE))
     >> d0=unlist(s0)
     >> d10=unique(d0)
     >> d10
     >   [1] "E10"  "E103" "E104" "E109" "E101" "E108" "E105" "E100"
    "E106" "E102"
     > [11] "E107"
     > s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE))
     > d1=unlist(s1)
     > d11=unique(d1)
     >> d11
     >   [1] "E11"  "E119" "E113" "E115" "E111" "E114" "E110" "E118"
    "E116" "E112"
     > [11] "E117"
     >
     > I need help with changing this command
     > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))
     >
     > so that in the output I do not have any rows that include E102 or
    E112?
     >
     > Thanks
     > Ana
     >
     > ______________________________________________
     > R-help@r-project.org <mailto:R-help@r-project.org> mailing list
    -- To UNSUBSCRIBE and more, see
     > https://stat.ethz.ch/mailman/listinfo/r-help
     > PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
     > and provide commented, minimal, self-contained, reproducible code.
     >

    ______________________________________________
    R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
    To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to filter variables which appear in any row but do not include

Reply via email to