Thank you. The problem was not finding the mode but applying it the R way (I have the tendency to loop into each line of the dataframes, which I believe is NOT the R way). I'll try them. Best regards Luigi
On Sat, Oct 31, 2020 at 5:40 PM Bert Gunter <bgunter.4...@gmail.com> wrote: > > As usual, a web search ("find statistical mode in R") brought up something > that is possibly useful -- Did you try this before posting? If not, please do > so in future and let us know what your results were if you subsequently post > here. > > Here's what SO suggested: > > Mode <- function(x) { > ux <- unique(x) > ux[which.max(tabulate(match(x, ux)))] > } > > # ergo: > apply(as.matrix(df),1,Mode) > > Note that all the functionality in Mode is via .Internal functions. So you > can determine whether this is faster than Jim's code for your use case, but > I'm pretty sure it will be faster than yours. However, note that this gives > only the value of the *first* mode if there is more than one, while Jim's > code alerts you to multiple modes. > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Sat, Oct 31, 2020 at 2:29 AM Jim Lemon <drjimle...@gmail.com> wrote: >> >> Hi Luigi, >> If I understand your request: >> >> library(prettyR) >> apply(as.matrix(df),1,Mode) >> [1] "C" "B" "D" ">1 mode" ">1 mode" ">1 mode" "D" >> [8] "C" "B" ">1 mode" >> >> Jim >> >> On Sat, Oct 31, 2020 at 7:56 PM Luigi Marongiu <marongiu.lu...@gmail.com> >> wrote: >> >> > Hello, >> > I have a large dataframe (1 000 000 rows, 1000 columns) where the >> > columns contain a character. I would like to determine the most common >> > character for each row. >> > In the example below, I can parse one row at the time and find the >> > most common character (apart for ties...). But I think this will be >> > very slow and memory consuming. >> > Is there a way to run it more efficiently? >> > Thank you >> > >> > ``` >> > V = c("A", "B", "C", "D") >> > df = data.frame(n = 1:10, >> > col_01 = sample(V, 10, replace = TRUE, prob = NULL), >> > col_02 = sample(V, 10, replace = TRUE, prob = NULL), >> > col_03 = sample(V, 10, replace = TRUE, prob = NULL), >> > col_04 = sample(V, 10, replace = TRUE, prob = NULL), >> > col_05 = sample(V, 10, replace = TRUE, prob = NULL), >> > stringsAsFactors = FALSE) >> > >> > q = vector() >> > for(i in 1:nrow(df)) { >> > x = as.vector(t(df[i,2:ncol(df)])) >> > q[i] = names(which.max(table(x))) >> > } >> > df$most = q >> > ``` >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. -- Best regards, Luigi ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.