Re: [R] Nested structure data simulation

varin sacha via R-help Sun, 19 May 2019 08:15:13 -0700

Dear Boris,

Great !!!! But what about Mark in your R code ? Don't we have to precise in the 
R code that mark ranges between 1 to 6 (1 ; 1.5 ; 2 ; 2.5 ; 3 ; 3.5 ; 4 ; 4.5 ; 
5 ; 5.5 ; 6) ?


By the way, to fit a linear mixed model, I use lme4 package and then the lmer 
function works with the variables like in this example here below :

library(lme4)
mm=lmer(Mark ~Gender + (1 | School / Class), data=Dataset) 

With your R code, how can I write the lmer function to make it work ?

Best,
S.







Le dimanche 19 mai 2019 à 15:26:39 UTC+2, Boris Steipe 
<boris.ste...@utoronto.ca> a écrit : 





Fair enough - there are additional assumptions needed, which I make as follows:
  - each class has the same size
  - each teacher teaches the same number of classes
  - the number of boys and girls is random within a class
  - there are 60% girls  (just for illustration that it does not have to be 
equal)
  

To make the dependencies explicit, I define them so, and in a way that they 
can't be inconsistent.

nS <- 10        # Schools
nTpS <- 5      # Teachers per School
nCpT <- 2      # Classes per teacher
nPpC <- 20      # Pupils per class
nS * nTpS * nCpT * nPpC == 2000  # Validate


mySim <- data.frame(School  = paste0("s", rep(1:nS, each = nTpS*nCpT*nPpC)),
                    Teacher = paste0("t", rep(1:(nTpS*nS), each = nCpT*nPpC)),
                    Class  = paste0("c", rep(1:(nCpT*nTpS*nS), each = nPpC)),
                    Gender  = sample(c("boy", "girl"),
                                    (nS*nTpS*nCpT*nPpC),
                                    prob = c(0.4, 0.6),
                                    replace = TRUE),
                    Mark    = numeric(nS*nTpS*nCpT*nPpC),
                    stringsAsFactors = FALSE)
                    

Then you fill mySim$Mark with values from your linear mixed model ...

mySim$Mark[i] <- simMarks(mySim[i])  # ... or something equivalent.


All good?

Cheers,
Boris



> On 2019-05-19, at 08:05, varin sacha <varinsa...@yahoo.fr> wrote:
> 
> Many thanks to all of you for your responses.
> 
> So, I will try to be clearer with a larger example. Te end of my mail is the 
> more important to understand what I am trying to do. I am trying to simulate 
> data to fit a linear mixed model (nested not crossed). More precisely, I 
> would love to get at the end of the process, a table (.txt) with columns and 
> rows. Column 1 and Rows will be the 2000 pupils and the columns the different 
> variables : Column 2 = classes ; Column 3 = teachers, Column 4 = schools ; 
> Column 5 = gender (boy or girl) ; Column 6 = mark in Frecnh
> 
> Pupils are nested  in classes, classes are nested in schools. The teacher are 
> part of the process.
> 
> I want to simulate a dataset with n=2000 pupils, 100 classes, 50 teachers and 
> 10 schools.
> - Pupils n°1 to pupils n°2000 (p1, p2, p3, p4, ..., p2000)
> - Classes n°1 to classes n°100 (c1, c2, c3, c4,..., c100)
> - Teachers n°1 to teacher n°50 ( t1, t2, t3, t4, ..., t50)
> - Schools n°1 to chool n°10 (s1, s2, s3, s4, ..., s10)
> 
> The nested structure is as followed : 
> 
> -- School 1 with teacher 1 to teacher 5 (t1, t2, t3, t4 and t5) with classes 
> 1 to classes 10 (c1, c2, c3, c4, c5, c6, c7, c8,c9,c10), pupils n°1 to pupils 
> n°200 (p1, p2, p3, p4,..., p200).
> 
> -- School 2 with teacher 6 to teacher 10, with classes 11 to classes 20, 
> pupils n°201 to pupils n°400
> 
> -- and so on
> 
> The table (.txt) I would love to get at the end is the following :
> 
>        Class    Teacher    School    gender    Mark
> 1      c1        t1                s1            boy        5
> 2      c1        t1                s1            boy        5.5
> 3      c1        t1                s1            girl        4.5
> 4      c1        t1                s1            girl        6
> 5      c1        t1                s1            boy      3.5
> 6      ...        ....                ....            .....        .....      
>         
> 
> The first 20 rows with c1, with t1, with s1, gender (randomly slected) and 
> mark (andomly selected) from 1 to 6
> The rows 21 to 40 with c2 with t1 with s1
> The rows 41 to 60 with c3 with t2 with s1
> The rows 61 to 80 with c4 with t2 with s1
> The rows 81 to 100 with c5 with t3 with s1
> The rows 101 to 120 with c6 with t3 with s1
> The rows 121 to 140 with c7 with t4 with s1
> The rows 141 to 160 with c8 with t4 with s1
> The rows 161 to 180 with c9 with t5 with s1
> The rows 181 to 200 with c10 with t5 with s1
> 
> The rows 201 to 220 with c11 with t6 with s2
> The rows 221 to 240 with c12 with t6 with s2
> 
> And so on...
> 
> Is it possible to do that ? Or am I dreaming ?
> 
> 
> Le dimanche 19 mai 2019 à 10:45:43 UTC+2, Linus Chen <linus.l.c...@gmail.com> 
> a écrit : 
> 
> 
> 
> 
> 
> Dear varin sacha,
> 
> I think it will help us help you, if you give a clearer description of
> what exactly you want.
> 
> I assume the situation is that you know what a data structure you
> want, but do not know
> how to conveniently create such structure.
> And that is where others can help you.
> So, please, describe the wanted data structure more thoroughly,
> ideally with example.
> 
> Thanks,
> Lei
> 
> On Sat, May 18, 2019 at 10:04 PM varin sacha via R-help
> <r-help@r-project.org> wrote:
>> 
>> Dear Boris,
>> 
>> Yes, top-down, no problem. Many thanks, but in your code did you not forget 
>> "teacher" ? As a reminder teacher has to be nested with classes. I mean the 
>> 50 pupils belonging to C1 must be with (teacher 1) T1, the 50 pupils 
>> belonging to C2 with T2, the 50 pupils belonging to C3 with T3 and so on.
>> 
>> Best,
>> 
>> 
>> Le samedi 18 mai 2019 à 16:52:48 UTC+2, Boris Steipe 
>> <boris.ste...@utoronto.ca> a écrit :
>> 
>> 
>> 
>> 
>> 
>> Can you build your data top-down?
>> 
>> 
>> 
>> schools <- paste("s", 1:6, sep="")
>> 
>> classes <- character()
>> for (school in schools) {
>>  classes <- c(classes, paste(school, paste("c", 1:5, sep=""), sep = "."))
>> }
>> 
>> pupils <- character()
>> for (class in classes) {
>>  pupils <- c(pupils, paste(class, paste("p", 1:10, sep=""), sep = "."))
>> }
>> 
>> 
>> 
>> B.
>> 
>> 
>> 
>>> On 2019-05-18, at 09:57, varin sacha via R-help <r-help@r-project.org> 
>>> wrote:
>>> 
>>> Dear R-Experts,
>>> 
>>> In a data simulation, I would like a balanced distribution with a nested 
>>> structure for classroom and teacher (not for school). I mean 50 pupils 
>>> belonging to C1, 50 other pupils belonging to C2, 50 other pupils belonging 
>>> to C3 and so on. Then I want the 50 pupils belonging to C1 with T1, the 50 
>>> pupils belonging to C2 with T2, the 50 pupils belonging to C3 with T3 and 
>>> so on. The school don’t have to be nested, I just want a balanced 
>>> distribution, I mean 60 pupils in S1, 60 other pupils in S2 and so on.
>>> Here below the reproducible example.
>>> Many thanks for your help.
>>> 
>>> ##############
>>> set.seed(123)
>>> # Génération aléatoire des colonnes
>>> pupils<-1:300
>>> classroom<-sample(c("C1","C2","C3","C4","C5","C6"),300,replace=T)  
>>> teacher<-sample(c("T1","T2","T3","T4","T5","T6"),300,replace=T)  
>>> school<-sample(c("S1","S2","S3","S4","S5"),300,replace=T)
>> 
>>> ##############
>>> 
>>> ______________________________________________
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Nested structure data simulation

Reply via email to