Re: [R] which() vs. just logical selection in df

1/k^c Wed, 14 Oct 2020 15:24:01 -0700

Hi Dr. Snow, & R-helpers,

Thank you for your reply! I hadn't heard of the {microbenchmark}
package & was excited to try it! Thank you for the suggestion! I did
check the reference source for which() beforehand, which included the
statement to remove NAa, and I didn't have any missing values or NAs:


sum(is.na(dat$gender2))
sum(is.na(dat$gender))
sum(is.na(dat$y))

[1] 0
[1] 0
[1] 0

I still had a 10ms difference in the value returned by microbenchmark
between the following methods: one with and one without using which().
The difference is reversed from what I expected, since which() is an
extra step.

microbenchmark(
  head(
    dat[which(dat$gender2=="other"),],), times=100L)
microbenchmark(
  head(
    dat[dat$gender2=="other",],), times=100L)

         min                lq                 mean
head(dat[which(dat$gender2 == "other"), ], )      62.93803
74.25939     88.4704
head(dat[dat$gender2 == "other", ], )                 71.8914
87.95844    103.7231

Is which() invoking c-level code by chance, making it slightly faster
on average? The difference likely becomes important on terabytes of
data. The addition of which() still seems superfluous to me, and I'd
like to know whether it's considered best practice to keep it. What is
R inoking when which() isn't called explicitly? Is R invoking which()
eventually anyway?

Cheers!
Keith

> Message: 2
> Date: Mon, 12 Oct 2020 13:01:36 -0600
> From: Greg Snow <538...@gmail.com>
> To: "1/k^c" <kchambe...@gmail.com>
> Cc: r-help <r-help@r-project.org>
> Subject: Re: [R] which() vs. just logical selection in df
> Message-ID:
>         <cafeqcdyuuhh5tz7t5nj8cs_4xb61mneugasncekd485ebnr...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I would suggest using the microbenchmark package to do the time
> comparison.  This will run each a bunch of times for a more meaningful
> comparison.
>
> One possible reason for the difference is the number of missing values
> in your data (along with the number of columns).  Consider the
> difference in the following results:
>
> > x <- c(1,2,NA)
> > x[x==1]
> [1]  1 NA
> > x[which(x==1)]
> [1] 1
>
>
>
> On Sat, Oct 10, 2020 at 5:25 PM 1/k^c <kchambe...@gmail.com> wrote:
> >
> > Hi R-helpers,
> >
> > Does anyone know why adding which() makes the select call more
> > efficient than just using logical selection in a dataframe? Doesn't
> > which() technically add another conversion/function call on top of the
> > logical selection? Here is a reproducible example with a slight
> > difference in timing.
> >
> > # Surrogate data - the timing here isn't interesting
> > urltext <- paste("https://drive.google.com/";,
> >                  "uc?id=1AZ-s1EgZXs4M_XF3YYEaKjjMMvRQ7",
> >                  "-h8&export=download", sep="")
> > download.file(url=urltext, destfile="tempfile.csv") # download file first
> > dat <- read.csv("tempfile.csv", stringsAsFactors = FALSE, header=TRUE,
> >                   nrows=2.5e6) # read the file; 'nrows' is a slight
> >                                          # overestimate
> > dat <- dat[,1:3] # select just the first 3 columns
> > head(dat, 10) # print the first 10 rows
> >
> > # Select using which() as the final step ~ 90ms total time on my macbook air
> > system.time(
> >   head(
> >     dat[which(dat$gender2=="other"),],),
> >   gcFirst=TRUE)
> >
> > # Select skipping which() ~130ms total time
> > system.time(
> >   head(
> >     dat[dat$gender2=="other", ]),
> >   gcFirst=TRUE)
> >
> > Now I would think that the second one without which() would be more
> > efficient. However, every time I run these, the first version, with
> > which() is more efficient by about 20ms of system time and 20ms of
> > user time. Does anyone know why this is?
> >
> > Cheers!
> > Keith
> >
> > ______________________________________________
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> 538...@gmail.com
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 12 Oct 2020 08:33:44 +0200 (CEST)
> From: =?UTF-8?Q?Frauke_G=C3=BCnther?= <guent...@leibniz-bips.de>
> To: "r-help@r-project.org" <r-help@r-project.org>
> Cc: William Michels <w...@caa.columbia.edu>, "s...@posteo.org"
>         <s...@posteo.org>
> Subject: Re: [R]  Fwd:  Help using the exclude option in the neuralnet
>         package
> Message-ID: <957726669.124476.1602484424...@srvmail.bips.eu>
> Content-Type: text/plain; charset="utf-8"
>
> Dear all,
>
> the exclude and constant.weights options are used as follows:
>
> exclude: A matrix with n rows and 3 columns will exclude n weights. The the 
> first column refers to the layer, the second column to the input neuron and 
> the third column to the output neuron of the weight.
>
> constant.weights: A vector specifying the values of the weights that are 
> excluded from the training process and treated as fix.
>
> Please refer to the following example:
>
> Not using exclude and constant.weights (all weights are trained):
>
> > nn <- neuralnet(Species == "setosa" ~ Petal.Length + Petal.Width, iris, 
> > linear.output = FALSE)
> >
> > nn$weights
> [[1]]
> [[1]][[1]]
> [,1]
> [1,] 6.513239
> [2,] -0.815920
> [3,] -5.859802
> [[1]][[2]]
> [,1]
> [1,] -4.597934
> [2,] 9.179436
>
> Using exclude (2 weights are excluded --> NA):
>
> > nn <- neuralnet(Species == "setosa" ~ Petal.Length + Petal.Width, iris, 
> > linear.output = FALSE,
> exclude = matrix(c(1,2,1, 2,2,1),byrow=T, nrow=2))
> > nn$weights
> [[1]]
> [[1]][[1]]
> [,1]
> [1,] -0.2815942
> [2,] NA
> [3,] 0.2481212
> [[1]][[2]]
> [,1]
> [1,] -0.6934932
> [2,] NA
>
> Using exclude and constant.weights (2 weights are excluded and treated as fix 
> --> 100 and 1000, respectively):
>
> > nn <- neuralnet(Species == "setosa" ~ Petal.Length + Petal.Width, iris, 
> > linear.output = FALSE,
> exclude = matrix(c(1,2,1, 2,2,1),byrow=T, nrow=2),
> constant.weights=c(100,1000))
> > nn$weights
> [[1]]
> [[1]][[1]]
> [,1]
> [1,] 0.554119
> [2,] 100.000000
> [3,] 1.153611
> [[1]][[2]]
> [,1]
> [1,] -0.3962524
> [2,] 1000.0000000
>
> I hope you will find this example helpful.
>
> Sincerely,
> Frauke
>
>
> >     William Michels <w...@caa.columbia.edu mailto:w...@caa.columbia.edu > 
> > hat am 10.10.2020 18:16 geschrieben:
> >
> >
> >     Forwarding: Question re "neuralnet" package on the R-Help mailing list:
> >
> >     https://stat.ethz.ch/pipermail/r-help/2020-October/469020.html
> >
> >     If you are so inclined, please reply to:
> >
> >     r-help@r-project.org mailto:r-help@r-project.org <r-help@r-project.org 
> > mailto:r-help@r-project.org >
> >
> >     ---------- Forwarded message ---------
> >     From: Dan Ryan <dan.r...@unbc.ca mailto:dan.r...@unbc.ca >
> >     Date: Fri, Oct 9, 2020 at 3:52 PM
> >     Subject: Re: [R] Help using the exclude option in the neuralnet package
> >     To: r-help@r-project.org mailto:r-help@r-project.org 
> > <r-help@r-project.org mailto:r-help@r-project.org >
> >
> >     Good Morning,
> >
> >     I am using the neuralnet package in R, and am able to produce some
> >     basic neural nets, and use the output.
> >
> >     I would like to exclude some of the weights and biases from the
> >     iteration process and fix their values.
> >
> >     However I do not seem to be able to correctly define the exclude and
> >     constant.weights vectors.
> >
> >     Question: Can someone point me to an example where exclude and
> >     contant.weights are used. I have search the R help archive, and
> >     haven't found any examples which use these on the web.
> >
> >     Thank you in advance for any help.
> >
> >     Sincerely
> >
> >     Dan
> >
> >     [[alternative HTML version deleted]]
> >
> >     ______________________________________________
> >     R-help@r-project.org mailto:R-help@r-project.org mailing list -- To 
> > UNSUBSCRIBE and more, see
> >     https://stat.ethz.ch/mailman/listinfo/r-help
> >     PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> >     and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 13 Oct 2020 08:04:32 +0200
> From: Ablaye Ngalaba <ablayengal...@gmail.com>
> To: R-help@r-project.org
> Subject: [R] package for kernel on R
> Message-ID:
>         <caokwqv2yoqppsbujzv3i4ehayhnrvzp3vurxeba28flksud...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello,
> Please, I want to know which package to install on R when coding the kernel
> functions
>
>         [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Tue, 13 Oct 2020 09:09:00 +0200
> From: Ablaye Ngalaba <ablayengal...@gmail.com>
> To: R-help@r-project.org
> Subject: [R] help for R code
> Message-ID:
>         <CAOkWQv0LsgxkHdqpai1=9bplmp6tadnwziqtiha8zrirkf2...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Good morning dear administrators,
> Please help me to code this code in R.
> I use in this file the redescription function Φ which by making a scalar
> product gives a . You can also choose instead of the redescription function
> Φ a kernel k(x,x).
>
>
>
>
>                   Sincerely
>
>         [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Tue, 13 Oct 2020 11:21:45 +0300
> From: Eric Berger <ericjber...@gmail.com>
> To: Ablaye Ngalaba <ablayengal...@gmail.com>
> Cc: R mailing list <R-help@r-project.org>
> Subject: Re: [R] help for R code
> Message-ID:
>         <caggjw74tp-+l6gg0_blbnayl657ejw+_fvq+tscsadgej8v...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Ablaye,
> The CRAN repository has thousands of available R packages. To help
> people find relevant packages amid such a huge collection, there are
> some 'task view' pages that group packages according to a particular
> task. I am guessing that you are interested in kernels because of
> their use in machine learning, so you might want to look at the
> Machine Learning task view at:
>
> https://cran.r-project.org/web/views/MachineLearning.html
>
> If you search for 'kernels' on that page you will find
>
> 'Support Vector Machines and Kernel Methods' which mentions a few
> packages that use kernels.
>
> Good luck,
> Eric
>
>
> On Tue, Oct 13, 2020 at 10:09 AM Ablaye Ngalaba <ablayengal...@gmail.com> 
> wrote:
> >
> > Good morning dear administrators,
> > Please help me to code this code in R.
> > I use in this file the redescription function Φ which by making a scalar
> > product gives a . You can also choose instead of the redescription function
> > Φ a kernel k(x,x).
> >
> >
> >
> >
> >                   Sincerely
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ------------------------------
>
> End of R-help Digest, Vol 212, Issue 12
> ***************************************

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] which() vs. just logical selection in df

Reply via email to