Re: [R] How important is set.seed

Ebert,Timothy Aaron Tue, 22 Mar 2022 10:04:17 -0700

That approach would start the trainControl method at set.seed(123) and it would 
start ran_search at set.seed(123).
I am not sure it would be good or not – especially in this context. I am not 
clear on how the results are being compared, but I could get some differences 
if one method had a few extra calls to an RNG (random number generator).

I would think it makes more sense to ask how approach 1 differs from approach 2 
over a wide range of seeds. You are not testing the RNG, and I am not sure 
using the same seed for each model makes a difference unless the analysis is a 
paired samples approach. Might it be more effective to remove the initial 
set.seed() and then replace the second set.seed with set.seed(NULL) ?

Otherwise wrap this into a loop

N1=100
set.seed(123)
seed1<- runif(100, min=20, max=345689)
for (I in 1:100){
set.seed(seed1[i]
code
set.seed(seed1[i]
}

Or use set.seed(NULL) between the models.
You will need some variable to store the relevant results from each model, and 
some code do display the results. In the former I suggest setting up a matrix 
or two that can be indexed using the for loop index.

Tim

From: Neha gupta <neha.bologn...@gmail.com>
Sent: Tuesday, March 22, 2022 12:03 PM
To: Ebert,Timothy Aaron <teb...@ufl.edu>
Cc: Jeff Newmiller <jdnew...@dcn.davis.ca.us>; r-help@r-project.org
Subject: Re: How important is set.seed

[External Email]
Thank you again Tim

d=readARFF("my data")

set.seed(123)

tr <- d[index, ]
ts <- d[-index, ]

ctrl <- trainControl(method = "repeatedcv",number=10)

set.seed(123)
ran_search <- train(lneff ~ ., data = tr,
                     method = "mlp",
                       tuneLength = 30,
                     metric = "MAE",
                     preProc = c("center", "scale", "nzv"),
                     trControl = ctrl)
getTrainPerf(ran_search)

Would it be good?

On Tue, Mar 22, 2022 at 4:34 PM Ebert,Timothy Aaron 
<teb...@ufl.edu<mailto:teb...@ufl.edu>> wrote:
My inclination is to follow Jeff’s advice and put it at the beginning of the 
program.
You can always experiment:

set.seed(42)
rnorm(5,5,5)
rnorm(5,5,5)
runif(5,0,3)

As long as the commands are executed in the order they are written, then the 
outcome is the same every time. Set seed is giving you reproducible outcomes. 
However, the second rnorm() does not give you the same outcome as the first. So 
set seed starts at the same point but if you want the first and second rnorm() 
call to give the same results you will need another set.seed(42).

Note also, that it does not matter if you pause: run the above code as a chunk, 
or run each command individually you get the same result (as long as you do it 
in the sequence written). So, if you set seed, run some code, take a break, 
come back write some more code you  might get in trouble because R is still 
using the original set.seed() command.
To solve this issue use
set.seed(Sys.time())

Or

set.seed(NULL)

Some of this is just good programming style workflow:

Import data
Declare variables and constants (set.seed() typically goes here)
Define functions
Body of code
Generate output
Clean up ( set.seed(NULL) would go here, along with removing unused variables 
and such)

Regards,
Tim

From: Neha gupta <neha.bologn...@gmail.com<mailto:neha.bologn...@gmail.com>>
Sent: Tuesday, March 22, 2022 10:48 AM
To: Ebert,Timothy Aaron <teb...@ufl.edu<mailto:teb...@ufl.edu>>
Cc: Jeff Newmiller <jdnew...@dcn.davis.ca.us<mailto:jdnew...@dcn.davis.ca.us>>; 
r-help@r-project.org<mailto:r-help@r-project.org>
Subject: Re: How important is set.seed

[External Email]

Hello Tim

In some of the examples I see in the tutorials, they put the random seed just 
before the model training e.g train function in case of caret library. Should I 
follow this?

Best regards
On Tuesday, March 22, 2022, Ebert,Timothy Aaron 
<teb...@ufl.edu<mailto:teb...@ufl.edu>> wrote:
Ah, so maybe what you need is to think of “set.seed()” as a treatment in an 
experiment. You could use a random number generator to select an appropriate 
number of seeds, then use those seeds repeatedly in the different models to see 
how seed selection influences outcomes. I am not quite sure how many seeds 
would constitute a good sample. For me that would depend on what I find and how 
long a run takes.
  In parallel processing you set seed in master and then use a random number 
generator to set seeds in each worker.
Tim

From: Neha gupta <neha.bologn...@gmail.com<mailto:neha.bologn...@gmail.com>>
Sent: Tuesday, March 22, 2022 6:33 AM
To: Ebert,Timothy Aaron <teb...@ufl.edu<mailto:teb...@ufl.edu>>
Cc: Jeff Newmiller <jdnew...@dcn.davis.ca.us<mailto:jdnew...@dcn.davis.ca.us>>; 
r-help@r-project.org<mailto:r-help@r-project.org>
Subject: Re: How important is set.seed

[External Email]
Thank you all.

Actually I need set.seed because I have to evaluate the consistency of features 
selection generated by different models, so I think for this, it's recommended 
to use the seed.

Warm regards

On Tuesday, March 22, 2022, Ebert,Timothy Aaron 
<teb...@ufl.edu<mailto:teb...@ufl.edu>> wrote:
If you are using the program for data analysis then set.seed() is not necessary 
unless you are developing a reproducible example. In a standard analysis it is 
mostly counter-productive because one should then ask if your presented results 
are an artifact of a specific seed that you selected to get a particular 
result. However, in cases where you need a reproducible example, debugging a 
program, or specific other cases where you might need the same result with 
every run of the program then set.seed() is an essential tool.
Tim

-----Original Message-----
From: R-help 
<r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>> On Behalf 
Of Jeff Newmiller
Sent: Monday, March 21, 2022 8:41 PM
To: r-help@r-project.org<mailto:r-help@r-project.org>; Neha gupta 
<neha.bologn...@gmail.com<mailto:neha.bologn...@gmail.com>>; r-help mailing 
list <r-help@r-project.org<mailto:r-help@r-project.org>>
Subject: Re: [R] How important is set.seed

[External Email]

First off, "ML models" do not all use random numbers (for prediction I would 
guess very few of them do). Learn and pay attention to what the functions you 
are using do.

Second, if you use random numbers properly and understand the precision that 
your specific use case offers, then you don't need to use set.seed. However, in 
practice, using set.seed can allow you to temporarily avoid chasing precision 
gremlins, or set up specific test cases for testing code, not results. It is 
your responsibility to not let this become a crutch... a randomized simulation 
that is actually sensitive to the seed is unlikely to offer an accurate result.

Where to put set.seed depends a lot on how you are performing your simulations. 
In general each process should set it once uniquely at the beginning, and if 
you use parallel processing then use the features of your parallel processing 
framework to insure that this happens. Beware of setting all worker processes 
to use the same seed.

On March 21, 2022 5:03:30 PM PDT, Neha gupta 
<neha.bologn...@gmail.com<mailto:neha.bologn...@gmail.com>> wrote:
>Hello everyone
>
>I want to know
>
>(1) In which cases, we need to use set.seed while building ML models?
>
>(2) Which is the exact location we need to put the set.seed function i.e.
>when we split data into train/test sets, or just before we train a model?
>
>Thank you
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
>UNSUBSCRIBE and more, see
>https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm
>an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz
>sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf
>0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e=
>PLEASE do read the posting guide
>https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org
>_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR
>zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm
>f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e=
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e=
PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e=
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How important is set.seed

Reply via email to