Re: [R] Selecting random subset by ID

Jeff Newmiller Fri, 07 Sep 2018 13:07:21 -0700

IMO it is worth pointing out that you don't have to write code that solves your 
problem (else why have this list?) but this whole communication thing works 
best when you write code that creates a mock set of data that illustrates what 
you are starting from and some mock output.

The mock input can sometimes be the output of the dput function on a subset of 
your data, but in your case would probably be something more like

set.seed(42)
ids <- data.frame( id=1:8000, 
a1=rnorm(8000,0,1),n=sample(2:15,8000,replace=TRUE))
dta <- ids[rep(ids$id,ids$n),]
dta$a0 <- rnorm(nrow(dta),1,2)
dta$value <- with( dta, a0 + a1 )

where the exact way I approach making the data may not be exactly how your data 
is structured, but clarifying and avoiding that misunderstanding is exactly 
what you should try to address by learning how to do this when you ask your 
question.

You may find that reading the above helps you answer your own question, or you 
can confirm that this data set is close enough and show what code you tried 
starting with this data.

Oh, and by the way, sending your emails to this list formatted with html is a 
good way to corrupt your code examples because this list only forwards the 
plain text part of your email. Start with the plain text setting in your email 
program and avoid further miscommunication.

More on reproducible examples [1][2][3].

[1] 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

[2] http://adv-r.had.co.nz/Reproducibility.html

[3] https://cran.r-project.org/web/packages/reprex/index.html (read the 
vignette)

On September 7, 2018 12:00:07 PM PDT, Bert Gunter <bgunter.4...@gmail.com> 
wrote:
>?sample
>
>Should get you started
>
>We expect you to first make an effort to learn about and write your
>own code, rather than asking us to write it for you.
>
>-- Bert
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>On Fri, Sep 7, 2018 at 11:38 AM David Joubert
><david.joub...@uottawa.ca> wrote:
>>
>> Hello R users,
>>
>> I am working with a large dataset, including roughly 50 000
>sequential observations (variable "count") for 8000 individuals
>(variable "id"). The dataset is very unbalanced, meaning that some
>individuals have few observations and others have many. Because I plan
>on running Generalized Linear Models for panel data using pglm and the
>package has file size restrictions, I want to create 4 randomly
>selected subsets of 2500 individuals from the main dataset. What
>functions and code would I use to do this?
>>
>> Thanks in advance,
>>
>> David Joubert
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting random subset by ID

Reply via email to