Thank you. The problem was not finding the mode but applying it the R
way (I have the tendency to loop into each line of the dataframes,
which I believe is NOT the R way).
I'll try them.
Best regards
Luigi

On Sat, Oct 31, 2020 at 5:40 PM Bert Gunter <bgunter.4...@gmail.com> wrote:
>
> As usual, a web search ("find statistical mode in R") brought up something 
> that is possibly useful -- Did you try this before posting? If not, please do 
> so in future and let us know what your results were if you subsequently post 
> here.
>
> Here's what SO suggested:
>
> Mode <- function(x) {
>    ux <- unique(x)
>    ux[which.max(tabulate(match(x, ux)))]
> }
>
> # ergo:
> apply(as.matrix(df),1,Mode)
>
> Note that all the functionality in Mode is via .Internal functions.  So you 
> can determine whether this is faster than Jim's code for your use case, but 
> I'm pretty sure it will be faster than yours. However, note that this gives 
> only the value of the *first* mode if there is more than one, while Jim's 
> code alerts you to multiple modes.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sat, Oct 31, 2020 at 2:29 AM Jim Lemon <drjimle...@gmail.com> wrote:
>>
>> Hi Luigi,
>> If I understand your request:
>>
>> library(prettyR)
>> apply(as.matrix(df),1,Mode)
>> [1] "C"       "B"       "D"       ">1 mode" ">1 mode" ">1 mode" "D"
>> [8] "C"       "B"       ">1 mode"
>>
>> Jim
>>
>> On Sat, Oct 31, 2020 at 7:56 PM Luigi Marongiu <marongiu.lu...@gmail.com>
>> wrote:
>>
>> > Hello,
>> > I have a large dataframe (1 000 000 rows, 1000 columns) where the
>> > columns contain a character. I would like to determine the most common
>> > character for each row.
>> > In the example below, I can parse one row at the time and find the
>> > most common character (apart for ties...). But I think this will be
>> > very slow and memory consuming.
>> > Is there a way to run it more efficiently?
>> > Thank you
>> >
>> > ```
>> > V = c("A", "B", "C", "D")
>> > df = data.frame(n = 1:10,
>> >        col_01 = sample(V, 10, replace = TRUE, prob = NULL),
>> >        col_02 = sample(V, 10, replace = TRUE, prob = NULL),
>> >        col_03 = sample(V, 10, replace = TRUE, prob = NULL),
>> >        col_04 = sample(V, 10, replace = TRUE, prob = NULL),
>> >        col_05 = sample(V, 10, replace = TRUE, prob = NULL),
>> >        stringsAsFactors = FALSE)
>> >
>> > q = vector()
>> > for(i in 1:nrow(df)) {
>> >   x = as.vector(t(df[i,2:ncol(df)]))
>> >   q[i] =    names(which.max(table(x)))
>> > }
>> > df$most = q
>> > ```
>> >
>> > ______________________________________________
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



-- 
Best regards,
Luigi

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to