Hi,

I am wanting to simulate data where a percentage of the data has
multiple duplicated id variables (with unique values of another factor
variable for the dupicated id variables). Im having trouble figuring
out an efficent way to do so.

For example, consider this mock output [Note: Although the mock data
doesnt display this, I am eventually interested in 73% of id having 1
unique id, 22% with a duplicated id and 5% with 2 duplicated ids.
Also, I would like the 'al' variable to be randomly selected, perhaps
using sample() , from a 3-level factor "pt", "th", "ob" AND for an id
with duplicates to have unique values for the 'al' variable]:

Something like this:

id    z    al

1    .5    "pt"
2    .4    "ob"
3    .7    "pt"
4    .3     "th"
5    .5     "pt"
5    .6     "ob"
6    .3     "th"
6    .2     "ob"
7    .1     "pt"
7    .3     "th"
7    .1     "ob"

This would be the general idea although I will eventually create a
much larger data set with z based on rnorm(), etc.

Any help toward a solution is much appreciated!

AC

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to