Hi I am bit confused what you want to achieve. As I can not reproduce code without your data I just guess.
If I understand you want to select from all your data randomly 98 values for 98 animals (one for each animal). I presume your id2 is sorted. One option # make sorted ids id2<-sample(1:5, 100, replace=T) id2<-sort(id2) # how many unique ids len<-rle(id2)$lengths # how many values are from beginning shift.len<-c(0,cumsum(len))[-(length(len)+1)] # get one value from each id samp<-sapply(sapply(split(id2, id2), function(x) 1:length(x)), sample, 1) # just test id2[samp+shift.len] [1] 1 2 3 4 5 The other option is randomise vector of indices ss<-sample(1:100) sort data.frame according those randomised indices and select let say first one sapply(split(daf[ss,], daf[ss,1]), function(x) x[1,]) But i believe that there are even better options. Regards Petr r-help-boun...@r-project.org napsal dne 26.02.2010 14:31:06: > > Hello R users, > > I have been using R for a while now for basic stats but I'm now trying to > get my head around looping scripts and in some places I am failing! > > I have a data set with c. 1200 data points on 98 individual animals with > data on each row representing a daily measure and I am asking the question > "what variables affect the animal's behaviour?" > > the dataset includes these variables for analyses: > > presence of behaviour, absence of behaviour, site, year, rain, air temp, ID, > Day > > Listed below as they appear in the data set: > > BEH_T, BEH_F, SITE, YEAR, PRECIP_MM_DAY, PUP_AGE_EST, MO_AIR_TEMP, ID2, > DAY > > with BEH_T & BEH_F = the response variable for a binomial GLM > > here is the head of the dataset > (NB there are only two years and two sites) > > BEH_T BEH_F SITE YEAR PRECIP_MM_DAY PUP_AGE_EST MO_AIR_TEMP ID2 DAY > [1,] 14 10 1 2007 0 12 10.98750 1 1 > [2,] 37 23 1 2007 0 13 11.47333 1 2 > [3,] 56 22 1 2007 0 14 12.16667 1 3 > [4,] 43 23 1 2007 0 16 10.91515 1 5 > [5,] 62 16 1 2007 0 17 12.81026 1 6 > [6,] 30 20 1 2007 0 19 8.67037 1 8 > > (Sorry the headings are skewed) > > Because I don't want to do too complex a model to start with (just wanting > to learn first with a 'simple' model) I have issues with independence of the > data as there are repeats of individuals - i.e. data taken on the same IDs > on different days. So in order to account for that I have decided to random > sample one data point for each ID then run the GLM on that data for x number > of simulations to see if the explanatory variables are the same/similar > across all models. (This will reduce my data set to 98 data points, but it > is the best way I can see of doing this without doing mixed-effects models, > since not all IDs are seen at both sites in both years). > > I am also using the MuMIn package for running all subsets of your model > > > the code I'm using is: > > > for (S in 1:2){ > Sample.dat<-ALL.R[1,] > for (I in 1:98) { > tmp<-ALL.R[ALL.R$ID2==I,] > max<-dim(tmp)[1] > if (I==1) Sample.dat<-tmp[sample(1:max,1),] else { > Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),]) > m1.R<-glm(cbind(Sample.dat$BEH_T, Sample.dat$BEH_F) ~ Sample.dat$SITE + > Sample.dat$YEAR + Sample.dat$PRECIP_MM_DAY + Sample.dat$PUP_AGE_EST + > Sample.dat$MO_AIR_TEMP, family="binomial") > mod<-dredge(m1.R)}}} > > At this point I have two issues if I do it manually then it seems to work > i.e. gives me one output (e.g shown at bottom of post) where I then want to > take the first line, the model with the best AIC using mod[1,] - no problem! > > However, letting the code run and for example using print ((mod[1,])) at the > end it prints out the first line of 98 outputs - so I'm not too sure what > I've done wrong here, but it appears to be running a model for each ID - > something basic no doubt! > > Ideally, what I want to do is take a random sample of the data then run the > model get one output for that take the top line (i.e. the best AIC) and save > this, then run this routine say 100 times, saving that top line every time, > then having a look at the results and take a model average. Anytime I've got > close to this I have issues with overwriting the previous first line of the > model selection and I can't seem to identify how to set this loop up > properly. > > Any advice or guidance would be most appreciated, I have tried to explain my > issues clearly but if more info is required please just ask, > > Many thanks in advance to those of you that took the time to read this! > > Ross > > Ross Culloch > Ph.D. Student > Durham University > UK > > > > > > > > Here is an example of the model selection table from usingMuMIn: > > > Model selection table > (Intr) S.$MO_ S.$PRE S.$PUP S.$SIT S.$YEA k Dev. AIC AICc > delta weight > 30 645.8000 0.03841 -0.02148 0.2882 -0.3212 5 304.0 687.1 687.7 > 0.000 0.707 > 32 648.8000 0.03811 0.0009399 -0.02172 0.2857 -0.3227 6 304.0 689.0 690.0 > 2.249 0.230 > 26 785.1000 -0.02543 0.4678 -0.3905 4 312.8 693.9 694.3 > 6.630 0.026 > 31 794.2000 0.0037260 -0.02627 0.4519 -0.3950 5 312.5 695.5 696.2 > 8.493 0.010 > 22 582.7000 0.04703 0.2641 -0.2899 4 314.7 695.8 696.2 > 8.529 0.010 > 21 582.8000 0.06893 -0.01967 -0.2899 4 314.9 696.0 696.4 > 8.717 0.009 > 29 573.1000 0.04787 -0.0039980 0.2762 -0.2851 5 314.3 697.4 698.0 > 10.330 0.004 > 28 600.1000 0.06612 0.0046710 -0.02092 -0.2985 5 314.4 697.4 698.1 > 10.370 0.004 > 20 0.7526 0.05509 -0.01808 0.2450 4 321.0 702.0 702.5 > 14.770 0.000 > 10 530.4000 0.07447 -0.2639 3 324.0 703.1 703.3 > 15.640 0.000 > 27 0.7493 0.05556 -0.0022820 -0.01753 0.2519 5 320.8 703.9 704.6 > 16.850 0.000 > 19 530.0000 0.07455 -0.0001489 -0.2637 4 324.0 705.1 705.5 > 17.820 0.000 > 16 743.4000 0.4875 -0.3698 3 328.7 707.8 708.0 > 20.310 0.000 > 9 0.5512 0.06094 0.2286 3 328.8 707.9 708.1 > 20.430 0.000 > 8 0.6828 0.08019 -0.01688 3 328.9 708.0 708.2 > 20.540 0.000 > 18 0.5584 0.06173 -0.0059840 0.2481 4 327.8 708.9 709.3 > 21.620 0.000 > 25 739.9000 -0.0016930 0.4944 -0.3681 4 328.6 709.7 710.1 > 22.410 0.000 > 17 0.6856 0.07953 0.0012680 -0.01720 4 328.9 709.9 710.4 > 22.670 0.000 > 2 0.4985 0.08406 2 335.8 712.8 713.0 > 25.270 0.000 > 7 0.4996 0.08516 -0.0023780 3 335.6 714.7 714.9 > 27.240 0.000 > 14 1.0760 -0.02288 0.5151 3 340.8 719.9 720.1 > 32.420 0.000 > 23 1.0760 0.0003492 -0.02296 0.5136 4 340.8 721.9 722.3 > 34.590 0.000 > 5 0.8587 0.5304 2 354.0 731.0 731.1 > 43.440 0.000 > 12 0.8663 -0.0042170 0.5473 3 353.5 732.5 732.8 > 45.070 0.000 > 24 967.8000 0.0198500 -0.03274 -0.4813 4 358.3 739.4 739.8 > 52.140 0.000 > 15 942.8000 -0.02909 -0.4689 3 370.5 749.5 749.8 > 62.090 0.000 > 13 915.8000 0.0151200 -0.4556 3 384.7 763.7 764.0 > 76.290 0.000 > 6 900.3000 -0.4478 2 391.8 768.9 769.0 > 81.320 0.000 > 11 1.3530 0.0176300 -0.02957 3 402.3 781.3 781.6 > 93.890 0.000 > 4 1.3940 -0.02630 2 412.3 789.4 789.5 > 101.800 0.000 > 3 1.1010 0.0134300 2 424.4 801.4 801.6 > 113.800 0.000 > 1 1.1550 1 430.3 805.4 805.4 > 117.700 0.000 > > > > -- > View this message in context: http://n4.nabble.com/Loop-overwrite-and-data- > output-problems-tp1570593p1570593.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.