[R] GGPlot plot
Dear R help, I am new to ggplot so I apologize if my question is a bit obvious. I would like to create a plot where a compare the fraction of the values of a variable called PASP out of the number of subjects, for two groups of subject codified with a dummy variable called SUBJC. The variable PASP is discrete and only takes values 0,4,8.. My data are as following: PASP SUBJC 0 0 4 1 0 0 8 0 4 0 0 1 0 1 . . . . . . I would like to calculate the fraction of positive levels of PASP out of the total number of observations, divided per values of SUBJ=0 and 1. I am new to the use of GGPlot and I do not know how to organize the data and what to use to summarize these data as to obtain a picture as follows: I hope my request is clear. Thanks for any help you can provide. Francesca __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GGPlot plot
Thanks for the answer. Il gio 19 lug 2018, 01:04 Jim Lemon ha scritto: > Hi Francesca, > This looks like a fairly simple task. Try this: > > fpdf<-read.table(text="PASP SUBJC > 0 0 > 4 1 > 0 0 > 8 0 > 4 0 > 0 1 > 0 1", > header=TRUE) > # get the number of positive PASP results by group > ppos<-by(fpdf$SUBJC,fpdf$PASPpos,sum) > # get the number of subjects per group > spg<-c(sum(fpdf$SUBJC==0),sum(fpdf$SUBJC==1)) > barplot(ppos/spg,names.arg=c(0,1),xlab="Group", > ylab="Proportion PASP > 0",main="Proportion of PASP positive by group") > > Jim > > On Thu, Jul 19, 2018 at 2:47 AM, Francesca > wrote: > > Dear R help, > > > > I am new to ggplot so I apologize if my question is a bit obvious. > > > > I would like to create a plot where a compare the fraction of the values > of a variable called PASP out of the number of subjects, for two groups of > subject codified with a dummy variable called SUBJC. > > > > The variable PASP is discrete and only takes values 0,4,8.. > > > > My data are as following: > > > > > > > > PASP SUBJC > > > > > > > > 0 0 > > > > 4 1 > > > > 0 0 > > > > 8 0 > > > > 4 0 > > > > 0 1 > > > > 0 1 > > > > . . > > > > . . > > > > . . > > > > > > > > > > I would like to calculate the fraction of positive levels of PASP out of > the total number of observations, divided per values of SUBJ=0 and 1. I am > new to the use of GGPlot and I do not know how to organize the data and > what to use to summarize these data as to obtain a picture as follows: > > > > > > > > > > > > I hope my request is clear. Thanks for any help you can provide. > > > > Francesca > > > > > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Collecting output of regressions in an intelligent way
Dear R Contributors I am asking for some suggestions on how to organize output of a series of regressions and tests in an intelligent way. I estimate a series of Var models with increasing numbers of lags and the perform a Wald test to control Granger Causality: I would like to learn a way to do it that allows me not to produce copy and past code. This is what I do: Estimate var models with increasing number of lags, V.6<-VAR(cbind(index1,ma_fin),p=6,type="both") V.7<-VAR(cbind(index1,ma_fin),p=7,type="both") V.8<-VAR(cbind(index1,ma_fin),p=8,type="both") V.9<-VAR(cbind(index1,ma_fin),p=9,type="both") then observe results and control significance of regressors: summary(V.6) summary(V.7) summary(V.8) summary(V.9) summary(V.10) then use the estimated var to perform the test: wald_fin7.1<-wald.test(b=coef(V.7$varresult[[1]]), Sigma=vcov(V.7$varresult[[1]]), Terms=c(2,4,6,8,10,12)) wald_fin8.1<-wald.test(b=coef(V.8$varresult[[1]]), Sigma=vcov(V.8$varresult[[1]]), Terms=c(2,4,6,8,10,12,14)) wald_fin9.1<-wald.test(b=coef(V.9$varresult[[1]]), Sigma=vcov(V.9$varresult[[1]]), Terms=c(2,4,6,8,10,12,14,16)) wald_fin10.1<-wald.test(b=coef(V.10$varresult[[1]]), Sigma=vcov(V.10$varresult[[1]]), Terms=c(2,4,6,8,10,12,14,16,18)) #then collect tests result in a table: wald_fin<-rbind(wald_fin7.1$result$chi2, wald_fin12.1$result$chi2,wald_fin21.1$result$chi2, wald_fin7.2$result$chi2, wald_fin12.2$result$chi2,wald_fin21.2$result$chi2) My idea is that it is possible to create all this variable with a loop across the objects names but it is a level of coding much higher than my personal knowledge and ability. I hope anyone can help Thanks in advance -- Francesca -- Francesca Pancotto, PhD Università di Modena e Reggio Emilia Viale A. Allegri, 9 40121 Reggio Emilia Office: +39 0522 523264 Web: https://sites.google.com/site/francescapancotto/ -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Compressing code help in a loop
Dear Contributors I have a problem with a loop. I needed to create a variable that takes values 1,2.. to 19 corresponding to the value of a variable in a data.frame whose name is p_int$p_made and which takes values from 406 to 211. The problem is that this values come ordered in the wrong way when I try to compress the loop as the system reads 107,111,207,211,311,406,407,408,409,410,411, while they correspond to quarters-years so they should be ordered as 406-107-207-307-407… the only solution I found was really silly. It is the following. p_m<-matrix(0,dim(p_int)[1],1) for (i in 1:length(p_int$p_made)){ if (p_int$p_made[i]==406) p_m[i]<-1 else if (p_int$p_made[i]==107) p_m[i]<-2 else if (p_int$p_made[i]==207) p_m[i]<-3 else if (p_int$p_made[i]==307) p_m[i]<-4 else if (p_int$p_made[i]==407) p_m[i]<-5 else if (p_int$p_made[i]==108) p_m[i]<-6 else if (p_int$p_made[i]==208) p_m[i]<-7 else if (p_int$p_made[i]==308) p_m[i]<-8 else if (p_int$p_made[i]==408) p_m[i]<-9 else if (p_int$p_made[i]==109) p_m[i]<-10 else if (p_int$p_made[i]==209) p_m[i]<-11 else if (p_int$p_made[i]==309) p_m[i]<-12 else if (p_int$p_made[i]==409) p_m[i]<-13 else if (p_int$p_made[i]==110) p_m[i]<-14 else if (p_int$p_made[i]==210) p_m[i]<-15 else if (p_int$p_made[i]==310) p_m[i]<-16 else if (p_int$p_made[i]==410) p_m[i]<-17 else if (p_int$p_made[i]==111) p_m[i]<-18 else if (p_int$p_made[i]==211) p_m[i]<-19 } Can anyone help to find something more efficient? Thanks in advance. Francesca -- Francesca -- Francesca Pancotto Associate Professor University of Modena and Reggio Emilia Viale A. Allegri, 9 40121 Reggio Emilia Office: +39 0522 523264 Web: https://sites.google.com/site/francescapancotto/ -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dates in a data.frame
> Dear Contributors I have a problem concerning the replication of a variable with the date structure. I have the following database of 12000 observations bank.list.m: name date aba.1ABA 2006-10-24 aba.2ABA 2006-11-30 aba.3ABA 2006-10-24 aba.4ABA 2006-11-30 aba.5ABA 2006-10-24 aba.6ABA 2006-11-30 aba.7ABA 2006-10-24 aba.8ABA 2006-11-30 aba.9ABA 2006-10-24 aba.10 ABA 2006-11-30 and the following with 960 obs. day.spot date spot 1 2006-01-02 1.1826 2 2006-01-03 1.1875 3 2006-01-04 1.2083 4 2006-01-05 1.2088 5 2006-01-06 1.2093 6 2006-01-09 1.2078 the date in the second database are a subset of the dates of the first database. What I need to do is to associate the value of the variable spot reported in the second database, at the exact place of the corresponding date in the first database. I tried the following dates<-table(bank.list.m$date) test<-as.data.frame(dates) dates.v<-as.Date(test$Var1) x<-as.data.frame(dates.v) x$index<-c(1:960) x$spot.v<-day.spot$spot[x$index] but I do not seem to go anywhere. I think I only replicated the values of the day.spot variable. Any help? Thanks for your time and patience! Francescaa -- Francesca ------ Francesca Pancotto, PhD Università di Modena e Reggio Emilia Viale A. Allegri, 9 40121 Reggio Emilia Office: +39 0522 523264 Web: https://sites.google.com/site/francescapancotto/ -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Two geom_bar with counts to put in the same plot
Dear Contributors, I would like to ask help on how to create a plot that is the overlapping of two other plots. It is a geom_bar structure, where I want to count the occurrences of two variables, participation1 and participation2 that I recoded as factors as ParticipationNOPUN and ParticipationPUN to have nice names in the legend. The variables to "count" in the two plots are delta11_L and delta2_L These are my data and code to create the two plots. I would like to put them in the same plot as superimposed areas so that I see the change in the distribution of counts in the two cases. This is DB: participation1 participation2 ParticipantsNOPUN ParticipantsPUN delta11_L delta2_L [1,] 1 1 2 2 00 [2,] 1 1 2 2 -10 -10 [3,] 1 1 2 2 -100 [4,] 1 1 2 2 00 [5,] 1 1 2 2 00 [6,] 1 1 2 2 00 [7,] 1 0 2 1 -30 30 [8,] 1 1 2 2 0 10 [9,] 1 0 2 1 10 40 [10,] 1 1 2 2 00 [11,] 0 0 1 1 200 [12,] 1 1 2 2 100 [13,] 1 1 2 2 00 [14,] 1 1 2 2 00 [15,] 1 1 2 2 20 10 [16,] 1 1 2 2 00 [17,] 1 1 2 2 00 [18,] 1 1 2 2 -10 30 [19,] 0 0 1 1 30 10 [20,] 1 1 2 2 10 10 [21,] 1 1 2 2 00 [22,] 1 1 2 2 00 [23,] 1 1 2 2 0 -10 [24,] 1 1 2 2 0 -20 [25,] 1 1 2 2 10 -10 [26,] 1 1 2 2 00 [27,] 1 1 2 2 00 First PLOT(I need to subset the data to eliminate some NA. NB: the two dataframes end up not having the same number of rows for this reason): ggplot(data=subset(DB, !is.na(participation1)), aes(x = delta11_L, fill =ParticipantsNOPUN))+ geom_bar(position = "dodge")+ theme_bw(base_size = 12) + labs(x="Delta Contributions (PGG w/out punishment)")+ theme(legend.position = "top",legend.title = element_blank()) +scale_fill_brewer(palette="Set1") Second PLOT: ggplot(DB, aes(x = delta2_L, fill =ParticipantsPUN) , aes(x = delta2_L, fill =ParticipantsPUN))+ geom_bar(position = "dodge")+ theme_bw(base_size = 12) + labs(x="Delta Contributions (PGG w/punishment)")+ theme(legend.position = "top",legend.title = element_blank()) +scale_fill_brewer(palette="Set1") is it possible to create a density plot of the two counts data on the same plot? Do I need to create a variable count or long data format? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Two geom_bar with counts to put in the same plot
Hi here it is;. THANKS! dput(DATASET) structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, -10, -10, 0, 0, 0, -30, 0, 10, 0, 20, 10, 0, 0, 20, 0, 0, -10, 30, 10, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, -10, 0, 0, 0, 40, -10, 0, 10, 0, 10, 0, -20, 0, 0, 0, 10, -20, 10, -10, 40, -10, -10, 10, 20, 10, 0, 0, 0, 0, 0, 0, -10, 0, 0, 20, 0, 0, 0, 0, 10, 0, 0, 0, 10, 0, -10, 10, 0, 0, 10, 10, 10, 0, 0, 0, 0, 0, -10, 0, 0, 0, 20, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 10, 0, 10, 0, 0, 0, 20, -20, 0, 0, -10, 0, 0, 0, 0, -10, 10, 0, 20, 0, 0, 0, 0, 0, -10, 0, 0, 0, 0, 0, -10, 0, 0, -10, 0, -10, 30, -10, 0, 0, 10, -10, 0, -10, -10, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0, 10, 10, 0, 0, -20, -10, 0, 0, 0, 0, 0, 0, 10, 30, 40, 30, 30, 30, 30, 20, 20, 40, 20, 20, 10, 20, 30, 20, 40, 20, 30, 20, 30, 20, 20, 30, 20, 40, 10, 20, 10, 30, 30, 30, 30, 10, 30, 30, 20, 10, 40, 30, 40, 40, 30, 20, 10, 10, 20, 20, 30, 40, 40, 40, 40, 0, 20, 20, 40, 10, 20, 20, 10, 0, -10, 0, 0, 0, 0, 30, 10, 40, 0, 0, 0, 0, 0, 10, 0, 0, 30, 10, 10, 0, 0, -10, -20, -10, 0, 0, 0, -10, 10, 0, 40, 0, 30, 0, 10, 0, 40, 0, 0, -10, 0, 10, 40, -10, 0, 0, 0, 10, 0, 10, -10, 40, 10, 20, 10, 40, 0, 10, -10, 0, 40, 0, 0, -10, 0, 0, 20, -10, 0, 10, 0, 30, -10, 0, 0, 0, -10, 40, 10, 10, 0, 10, -10, 0, 10, 0, 10, 0, -20, 20, 0, 0, -20, 20, 0, -30, 20, 0, 0, 20, 10, 0, 20, 30, 0, 0, -10, 10, 10, 0, -10, 40, 10, 0, 10, 0, 0, 20, 10, 20, 30, 0, 40, 30, 0, 20, 40, -10, 0, 0, 0, -10, 0, 20, -10, 0, 0, 10, 0, 0, 20, -20, -20, 0, 20, 0, 0, 10, 0, -10, -10, 20, -10, 0, 0, 0, 0, 0, 0, 0, -10, 30, 10, 0, 0, 10, 20, 10, -10, 10, 0, 0, -10, 30, -20, 10, 0, 0, 0, 10, 10, 10, 10, -10, 0, 20, 10, 10, 10, 0, -10, -10, 0, 0, 10, 20, 0, -10, 10, 0, 10, 20, 10, 0, 0, 0, 0, 10, 10, 10, 30, 10, 0, 0, -10, 40, 0, 0, 10, 10, 40, 30, -10, 0, 0, 10, 20, 0, 0, 10, 40, 0, 0, -10, -20), .Dim = c(236L, 6L ), .Dimnames = list(NULL, c("participation1", "participation2", "ParticipantsNOPUN", "ParticipantsPUN", "delta
Re: [R] Two geom_bar with counts to put in the same plot
Hi! It is not exactly what I wanted but more than I suspected I could get. Thanks a lot, this is awesome! Francesca On Wed, 4 Dec 2019 at 14:04, Rui Barradas wrote: > Hello, > > Please keep R-Help in the thread. > > As for the question, the following divides by facets, participation1/2 > with values 0/1. See if that's what you want. > > > idv <- grep("part", names(DB)[-(3:4)], ignore.case = TRUE, value = TRUE) > dblong <- reshape2::melt(DB[-(3:4)], id.vars = idv) > dblong <- reshape2::melt(dblong, id.vars = c("variable", "value")) > names(dblong) <- c("deltaVar", "delta", "participationVar", > "participation") > dblong <- dblong[complete.cases(dblong),] > > ggplot(dblong, aes(x = delta, fill = deltaVar)) + >geom_density(aes(alpha = 0.2)) + >scale_alpha_continuous(guide = "none") + >facet_wrap(participationVar ~ participation) > > > Hope this helps, > > Rui Barradas > > Às 08:25 de 04/12/19, Francesca escreveu: > > Dear Rui > > the code works and the final picture is aesthetical as I wanted(very > > beautiful indeed), but I probably did not explain that the two > > distributions that I want to overlap, must be different by participation > > 1 and participation 2, which are to dummy variables that identify : > > Participation 1(equivalent to PARTICIPATIONNOPUN): 1 participants, 0 non > > participants, for the variable delta11_L > > Participation 2(equivalent to PARTICIPATIONPUN): 1 participants, 0 non > > participants, for the variable delta2_L > > > > The density plots are four in the end rather than 2: I compare delta11_L > > for Participants1 vsnon participants and delta2_L for Participants 2 vs > > non Participants 2, > > I basically want to verify whether the population of Participants vs Non > > participants, change going from delta11_L to delta2_L > > > > > > Sorry for being unclear. > > Thanks for any help. > > Francesca > > > > On Wed, 4 Dec 2019 at 09:16, Rui Barradas > <mailto:ruipbarra...@sapo.pt>> wrote: > > > > Hello, > > > > Is it as simple as this? The code below does not separate the > > participant1 and participant2, only the 'delta' variables. > > > > > > idv <- grep("part", names(DB)[-(3:4)], ignore.case = TRUE, value = > TRUE) > > dblong <- reshape2::melt(DB[-(3:4)], id.vars = idv) > > head(dblong) > > > > ggplot(dblong, aes(x = value, fill = variable)) + > > geom_density(aes(alpha = 0.2)) + > > scale_alpha_continuous(guide = "none") > > > > > > I will also repost the data, since you have posted a matrix and this > > code needs a data.frame. > > > > > > DB <- > > structure(list(participation1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, > > 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > > 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > > 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, > > 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, > > 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > > 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > > 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > > 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), participation2 = c(1, > > 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, > > 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, > > 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, > > 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, > > 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, > > 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, > > 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, > > 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, > > 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, > > 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, > > 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > > 1, 1, 1, 1), ParticipantsNOPUN = structure(
Re: [R] Two geom_bar with counts to put in the same plot
Hi, sorry for bothering again. I was wondering how I can reshape the data, if in your code, I would like to have only two panels, where in the panel with Participation =0, I represent delta11_L of participation1==0 and delta2_L of participation2==0, and in the right panel, I want Participation=1, but representing together delta11_L of participation1==1, and delta2_L of participation2==1. I get messed up with the joint melting of participation, which determines the facet, but then I cannot assign the proper fill to the density plots which depend on it, and on the other hand I would like to have in the same plot with mixed participation. I hope it is clear. Nonetheless, the previous plot is useful to understand something I had not thought about. Thanks again for your time. F. -- > Il giorno 4 dic 2019, alle ore 15:27, Francesca > ha scritto: > > Hi! > It is not exactly what I wanted but more than I suspected I could get. Thanks > a lot, this is awesome! > Francesca > > On Wed, 4 Dec 2019 at 14:04, Rui Barradas <mailto:ruipbarra...@sapo.pt>> wrote: > Hello, > > Please keep R-Help in the thread. > > As for the question, the following divides by facets, participation1/2 > with values 0/1. See if that's what you want. > > > idv <- grep("part", names(DB)[-(3:4)], ignore.case = TRUE, value = TRUE) > dblong <- reshape2::melt(DB[-(3:4)], id.vars = idv) > dblong <- reshape2::melt(dblong, id.vars = c("variable", "value")) > names(dblong) <- c("deltaVar", "delta", "participationVar", "participation") > dblong <- dblong[complete.cases(dblong),] > > ggplot(dblong, aes(x = delta, fill = deltaVar)) + >geom_density(aes(alpha = 0.2)) + >scale_alpha_continuous(guide = "none") + >facet_wrap(participationVar ~ participation) > > > Hope this helps, > > Rui Barradas > > Às 08:25 de 04/12/19, Francesca escreveu: > > Dear Rui > > the code works and the final picture is aesthetical as I wanted(very > > beautiful indeed), but I probably did not explain that the two > > distributions that I want to overlap, must be different by participation > > 1 and participation 2, which are to dummy variables that identify : > > Participation 1(equivalent to PARTICIPATIONNOPUN): 1 participants, 0 non > > participants, for the variable delta11_L > > Participation 2(equivalent to PARTICIPATIONPUN): 1 participants, 0 non > > participants, for the variable delta2_L > > > > The density plots are four in the end rather than 2: I compare delta11_L > > for Participants1 vsnon participants and delta2_L for Participants 2 vs > > non Participants 2, > > I basically want to verify whether the population of Participants vs Non > > participants, change going from delta11_L to delta2_L > > > > > > Sorry for being unclear. > > Thanks for any help. > > Francesca > > > > On Wed, 4 Dec 2019 at 09:16, Rui Barradas > <mailto:ruipbarra...@sapo.pt> > > <mailto:ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>>> wrote: > > > > Hello, > > > > Is it as simple as this? The code below does not separate the > > participant1 and participant2, only the 'delta' variables. > > > > > > idv <- grep("part", names(DB)[-(3:4)], ignore.case = TRUE, value = TRUE) > > dblong <- reshape2::melt(DB[-(3:4)], id.vars = idv) > > head(dblong) > > > > ggplot(dblong, aes(x = value, fill = variable)) + > > geom_density(aes(alpha = 0.2)) + > > scale_alpha_continuous(guide = "none") > > > > > > I will also repost the data, since you have posted a matrix and this > > code needs a data.frame. > > > > > > DB <- > > structure(list(participation1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, > > 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > > 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > > 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, > > 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, > > 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > > 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > > 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > > 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N
Re: [R] Two geom_bar with counts to put in the same plot
Exactly. I was trying to remelt data in the right way, but I could not get there yet. Can you suggest me this code? Thanks a lot F. -- > Il giorno 5 dic 2019, alle ore 11:11, Jim Lemon ha > scritto: > > Hi Francesca, > Do you want something like this? > > Jim > > On Thu, Dec 5, 2019 at 6:58 PM Francesca wrote: >> >> Hi, sorry for bothering again. >> I was wondering how I can reshape the data, if in your code, >> I would like to have only two panels, where in the panel with Participation >> =0, I represent delta11_L of participation1==0 >> and delta2_L of participation2==0, and in the right panel, I want >> Participation=1, but representing together >> delta11_L of participation1==1, and delta2_L of participation2==1. >> >> I get messed up with the joint melting of participation, which determines >> the facet, but then I cannot assign the proper fill to the density plots >> which depend on it, and on the other hand I would like to have in the same >> plot with mixed participation. >> >> I hope it is clear. >> Nonetheless, the previous plot is useful to understand something I had not >> thought about. >> Thanks again for your time. >> F. >> -- >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Two geom_bar with counts to put in the same plot
This is a consolation, because I cannot get it in ggplot either! Thanks for the code! F. > Il giorno 5 dic 2019, alle ore 11:17, Jim Lemon ha > scritto: > > Sorry it's not ggplot, I couldn't work that one out. > > # using the data frame structure that Rui kindly added > # and perhaps Rui can work out how to do this in ggplot > DBcomplete<-DB[complete.cases(DB),] > library(plotrix) > png("fp.png") > par(mfrow=c(1,2)) > density11_0<-density(DBcomplete$delta11_L[DBcomplete$participation1==0]) > density2_0<-density(DBcomplete$delta2_L[DBcomplete$participation1==0]) > plot(0,xlim=c(-30,50),ylim=c(0,max(density11_0$y)),type="n", > xlab="delta",ylab="density",main="participation == 0") > plot_bg("lightgray") > grid(col="white") > polygon(density11_0,col="#ff773344") > polygon(density2_0,col="#3377ff44") > density11_1<-density(DBcomplete$delta11_L[DBcomplete$participation1==1]) > density2_1<-density(DBcomplete$delta2_L[DBcomplete$participation1==1]) > plot(0,xlim=c(-30,50),ylim=c(0,max(density11_1$y)),type="n", > xlab="delta",ylab="density",main="participation == 1") > plot_bg("lightgray") > grid(col="white") > polygon(density11_1,col="#ff773344") > polygon(density2_1,col="#3377ff44") > par(cex=0.9) > legend(5,0.11,c("delta11_L","delta2_L"),fill=c("#ff773344","#3377ff44")) > dev.off() > > Jim > > On Thu, Dec 5, 2019 at 9:14 PM Francesca wrote: >> >> Exactly. I was trying to remelt data in the right way, but I could not get >> there yet. Can you suggest me this code? >> Thanks a lot >> F. >> >> -- >> >> Il giorno 5 dic 2019, alle ore 11:11, Jim Lemon ha >> scritto: >> >> Hi Francesca, >> Do you want something like this? >> >> Jim >> >> On Thu, Dec 5, 2019 at 6:58 PM Francesca >> wrote: >> >> >> Hi, sorry for bothering again. >> I was wondering how I can reshape the data, if in your code, >> I would like to have only two panels, where in the panel with Participation >> =0, I represent delta11_L of participation1==0 >> and delta2_L of participation2==0, and in the right panel, I want >> Participation=1, but representing together >> delta11_L of participation1==1, and delta2_L of participation2==1. >> >> I get messed up with the joint melting of participation, which determines >> the facet, but then I cannot assign the proper fill to the density plots >> which depend on it, and on the other hand I would like to have in the same >> plot with mixed participation. >> >> I hope it is clear. >> Nonetheless, the previous plot is useful to understand something I had not >> thought about. >> Thanks again for your time. >> F. >> -- >> >> >> >> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] "And" condition spanning over multiple columns in data frame
Dear contributors, I need to create a set of columns, based on conditions of a dataframe as follows. I have managed to do the trick for one column, but I do not seem to find any good example where the condition is extended to all the dataframe. I have these dataframe called c10Dt: id cp1 cp2 cp3 cp4 cp5 cp6 cp7 cp8 cp9 cp10 cp11 cp12 1 1 NA NA NA NA NA NA NA NA NA NA NA NA 2 4 8 18 15 10 12 11 9 18 8 16 15 NA 3 3 8 5 5 4 NA 5 NA 6 NA 10 10 10 4 3 5 5 4 4 3 2 1 3 2112 5 1 NA NA NA NA NA NA NA NA NA NA NA NA 6 2 5 5 10 10 9 10 10 10 NA 109 10 -- Columns are id, cp1, cp2.. and so on. What I need to do is the following, made on just one column: c10Dt <- mutate(c10Dt, exit1= ifelse(is.na(cp1) & id!=1, 1, 0)) So, I create a new variable, called exit1, in which the program selects cp1, checks if it is NA, and if it is NA but also the value of the column "id" is not 1, then it gives back a 1, otherwise 0. So, what I want is that it selects all the cases in which the id=2,3, or 4 is not NA in the corresponding values of the matrix. I managed to do it manually column by column, but I feel there should be something smarter here. The problem is that I need to replicate this over all the columns from cp2, to cp12, but keeping fixed the id column instead. I have tried with c10Dt %>% mutate(x=across(starts_with("cp"), ~ifelse(. == NA)) & id!=1,1,0 ) but the problem with across is that it will implement the condition only on cp_ columns. How do I tell R to use the column id with all the other columns? Thanks for any help provided. Francesca -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
Dear Contributors, I hope someone has found a similar issue. I have this data set, cp1 cp2 role groupid 1 10 13 4 5 2 5 10 3 1 3 7 7 4 6 4 10 4 2 7 5 5 8 3 2 6 8 7 4 4 7 8 8 4 7 8 10 15 3 3 9 15 10 2 2 10 5 5 2 4 11 20 20 2 5 12 9 11 3 6 13 10 13 4 3 14 12 6 4 2 15 7 4 4 1 16 10 0 3 7 17 20 15 3 8 18 10 7 3 4 19 8 13 3 5 20 10 9 2 6 I need to to average of groups, using the values of column groupid, and create a twin dataset in which the mean of the group is replaced instead of individual values. So for example, groupid 3, I calculate the mean (12+18)/2 and then I replace in the new dataframe, but in the same positions, instead of 12 and 18, the values of the corresponding mean. I found this solution, where db10_means is the output dataset, db10 is my initial data. db10_means<-db10 %>% group_by(groupid) %>% mutate(across(starts_with("cp"), list(mean = mean))) It works perfectly, except that for NA values, where it replaces to all group members the NA, while in some cases, the group is made of some NA and some values. So, when I have a group of two values and one NA, I would like that for those with a value, the mean is replaced, for those with NA, the NA is replaced. Here the mean function has not the na.rm=T option associated, but it appears that this solution cannot be implemented in this case. I am not even sure that this would be enough to solve my problem. Thanks for any help provided. -- Francesca -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
Sorry for posting a non understandable code. In my screen the dataset looked correctly. I recreated my dataset, folllowing your example: test<-data.frame(matrix(c( 8, 8, 5 , 5 ,NA ,NA , 1, 15, 20, 5, NA, 17, 2 , 5 , 5, 2 , 5 ,NA, 5 ,10, 10, 5 ,12, NA), c( 18, 5, 5, 5, NA, 9, 2, 2, 10, 7 , 5, 19, NA, 10, NA, 4, NA, 8, NA, 5, 10, 3, 17, NA), c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4, 4, 2, 2, 3, 2, 3, 3, 2, 2 ,4), c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, 7, 5, 8, 5, 1, 2, 4, 7, 6, 6))) colnames(test)<-c("cp1","cp2","role","groupid") What I have done so far is the following, that works: test %>% group_by(groupid) %>% mutate(across(starts_with("cp"), list(mean = mean))) But the problem is with NA: everytime the mean encounters a NA, it creates NA for all group members. I need the software to calculate the mean ignoring NA. So when the group is made of three people, mean of the three. If the group is two values and an NA, calculate the mean of two. My code works , creates a mean at each position for three subjects, replacing instead of the value of the single, the group mean. But when NA appears, all the group gets NA. Perhaps there is a different way to obtain the same result. On Mon, 16 Sept 2024 at 11:35, Rui Barradas wrote: > Às 08:28 de 16/09/2024, Francesca escreveu: > > Dear Contributors, > > I hope someone has found a similar issue. > > > > I have this data set, > > > > > > > > cp1 > > cp2 > > role > > groupid > > 1 > > 10 > > 13 > > 4 > > 5 > > 2 > > 5 > > 10 > > 3 > > 1 > > 3 > > 7 > > 7 > > 4 > > 6 > > 4 > > 10 > > 4 > > 2 > > 7 > > 5 > > 5 > > 8 > > 3 > > 2 > > 6 > > 8 > > 7 > > 4 > > 4 > > 7 > > 8 > > 8 > > 4 > > 7 > > 8 > > 10 > > 15 > > 3 > > 3 > > 9 > > 15 > > 10 > > 2 > > 2 > > 10 > > 5 > > 5 > > 2 > > 4 > > 11 > > 20 > > 20 > > 2 > > 5 > > 12 > > 9 > > 11 > > 3 > > 6 > > 13 > > 10 > > 13 > > 4 > > 3 > > 14 > > 12 > > 6 > > 4 > > 2 > > 15 > > 7 > > 4 > > 4 > > 1 > > 16 > > 10 > > 0 > > 3 > > 7 > > 17 > > 20 > > 15 > > 3 > > 8 > > 18 > > 10 > > 7 > > 3 > > 4 > > 19 > > 8 > > 13 > > 3 > > 5 > > 20 > > 10 > > 9 > > 2 > > 6 > > > > > > > > I need to to average of groups, using the values of column groupid, and > > create a twin dataset in which the mean of the group is replaced instead > of > > individual values. > > So for example, groupid 3, I calculate the mean (12+18)/2 and then I > > replace in the new dataframe, but in the same positions, instead of 12 > and > > 18, the values of the corresponding mean. > > I found this solution, where db10_means is the output dataset, db10 is my > > initial data. > > > > db10_means<-db10 %>% > >group_by(groupid) %>% > >mutate(across(starts_with("cp"), list(mean = mean))) > > > > It works perfectly, except that for NA values, where it replaces to all > > group members the NA, while in some cases, the group is made of some NA > and > > some values. > > So, when I have a group of two values and one NA, I would like that for > > those with a value, the mean is replaced, for those with NA, the NA is > > replaced. > > Here the mean function has not the na.rm=T option associated, but it > > appears that this solution cannot be implemented in this case. I am not > > even sure that this would be enough to solve my problem. > > Thanks for any help provided. > > > Hello, > > Your data is a mess, please don't post html, this is plain text only > list. Anyway, I managed to create a data frame by copying the data to a > file named "rhelp.txt" and then running > > > > db10 <- scan(file = "rhelp.txt", what = character()) > header <- db10[1:4] > db10 <- db10[-(1:4)] |> as.numeric() > db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |> >as.data.frame() |> >setNames(header) > > str(db10) > #> 'data.frame':25 obs. of 4 variables: > #> $ cp1: num 1 5 3 7 10 5 2 4 8 10 ... > #> $ cp2: num 10 2 1 4
Re: [R] (no subject)
All' Na Is Na. Il lun 16 set 2024, 16:29 Bert Gunter ha scritto: > See the na.rm argument of ?mean > > But what happens if all values are NA? > > -- Bert > > > On Mon, Sep 16, 2024 at 7:24 AM Francesca > wrote: > > > > Sorry for posting a non understandable code. In my screen the dataset > > looked correctly. > > > > > > I recreated my dataset, folllowing your example: > > > > test<-data.frame(matrix(c( 8, 8, 5 , 5 ,NA ,NA , 1, 15, 20, 5, NA, 17, > > 2 , 5 , 5, 2 , 5 ,NA, 5 ,10, 10, 5 ,12, NA), > > c( 18, 5, 5, 5, NA, 9, 2, 2, 10, 7 , 5, > 19, > > NA, 10, NA, 4, NA, 8, NA, 5, 10, 3, 17, NA), > > c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4, 4, > 2, > > 2, 3, 2, 3, 3, 2, 2 ,4), > > c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, 7, 5, > > 8, 5, 1, 2, 4, 7, 6, 6))) > > colnames(test)<-c("cp1","cp2","role","groupid") > > > > What I have done so far is the following, that works: > > test %>% > > group_by(groupid) %>% > > mutate(across(starts_with("cp"), list(mean = mean))) > > > > But the problem is with NA: everytime the mean encounters a NA, it > creates > > NA for all group members. > > I need the software to calculate the mean ignoring NA. So when the group > is > > made of three people, mean of the three. > > If the group is two values and an NA, calculate the mean of two. > > > > My code works , creates a mean at each position for three subjects, > > replacing instead of the value of the single, the group mean. > > But when NA appears, all the group gets NA. > > > > Perhaps there is a different way to obtain the same result. > > > > > > > > On Mon, 16 Sept 2024 at 11:35, Rui Barradas > wrote: > > > > > Às 08:28 de 16/09/2024, Francesca escreveu: > > > > Dear Contributors, > > > > I hope someone has found a similar issue. > > > > > > > > I have this data set, > > > > > > > > > > > > > > > > cp1 > > > > cp2 > > > > role > > > > groupid > > > > 1 > > > > 10 > > > > 13 > > > > 4 > > > > 5 > > > > 2 > > > > 5 > > > > 10 > > > > 3 > > > > 1 > > > > 3 > > > > 7 > > > > 7 > > > > 4 > > > > 6 > > > > 4 > > > > 10 > > > > 4 > > > > 2 > > > > 7 > > > > 5 > > > > 5 > > > > 8 > > > > 3 > > > > 2 > > > > 6 > > > > 8 > > > > 7 > > > > 4 > > > > 4 > > > > 7 > > > > 8 > > > > 8 > > > > 4 > > > > 7 > > > > 8 > > > > 10 > > > > 15 > > > > 3 > > > > 3 > > > > 9 > > > > 15 > > > > 10 > > > > 2 > > > > 2 > > > > 10 > > > > 5 > > > > 5 > > > > 2 > > > > 4 > > > > 11 > > > > 20 > > > > 20 > > > > 2 > > > > 5 > > > > 12 > > > > 9 > > > > 11 > > > > 3 > > > > 6 > > > > 13 > > > > 10 > > > > 13 > > > > 4 > > > > 3 > > > > 14 > > > > 12 > > > > 6 > > > > 4 > > > > 2 > > > > 15 > > > > 7 > > > > 4 > > > > 4 > > > > 1 > > > > 16 > > > > 10 > > > > 0 > > > > 3 > > > > 7 > > > > 17 > > > > 20 > > > > 15 > > > > 3 > > > > 8 > > > > 18 > > > > 10 > > > > 7 > > > > 3 > > > > 4 > > > > 19 > > > > 8 > > > > 13 > > > > 3 > > > > 5 > > > > 20 > > > > 10 > > > > 9 > > > > 2 > > > > 6 > > > > > > > > > > > > > > > > I need to to average of groups, using the values of column groupid, > and > > > > create a twin dataset in which the mean of the group is replaced > instead > > > of > > > > individual
Re: [R] (no subject)
Sorry, my typing was corrected by the computer. When I have a NA, there should be a missing value. So, if a group has 2 values and a NA, the two that have values, should be replaced by the mean of the two, the third should be NA. The NA is the participant that dropped out. On Tue, 17 Sept 2024 at 02:27, Bert Gunter wrote: > Hmmm... typos and thinkos ? > > Maybe: > mean_narm<- function(x) { >m <- mean(x, na.rm = T) >if (is.nan (m)) NA else m > } > > -- Bert > > On Mon, Sep 16, 2024 at 4:40 PM CALUM POLWART wrote: > > > > Rui's solution is good. > > > > Bert's suggestion is also good! > > > > For Berts suggestion you'd make the list bit > > > > list(mean = mean_narm) > > > > But prior to that define a function: > > > > mean_narm<- function(x) { > > > > m <- mean(x, na.rm = T) > > > > if (!is.Nan (m)) { > > m <- NA > > } > > > > return (m) > > } > > > > Would do what you suggested in your reply to Bert. > > > > On Mon, 16 Sep 2024, 19:48 Rui Barradas, wrote: > > > > > Às 15:23 de 16/09/2024, Francesca escreveu: > > > > Sorry for posting a non understandable code. In my screen the dataset > > > > looked correctly. > > > > > > > > > > > > I recreated my dataset, folllowing your example: > > > > > > > > test<-data.frame(matrix(c( 8, 8, 5 , 5 ,NA ,NA , 1, 15, 20, 5, > NA, 17, > > > > 2 , 5 , 5, 2 , 5 ,NA, 5 ,10, 10, 5 ,12, NA), > > > > c( 18, 5, 5, 5, NA, 9, 2, 2, 10, 7 , > 5, > > > 19, > > > > NA, 10, NA, 4, NA, 8, NA, 5, 10, 3, 17, NA), > > > > c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, > 4, 4, > > > 2, > > > > 2, 3, 2, 3, 3, 2, 2 ,4), > > > > c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, > 7, > > > 5, > > > > 8, 5, 1, 2, 4, 7, 6, 6))) > > > > colnames(test)<-c("cp1","cp2","role","groupid") > > > > > > > > What I have done so far is the following, that works: > > > > test %>% > > > >group_by(groupid) %>% > > > >mutate(across(starts_with("cp"), list(mean = mean))) > > > > > > > > But the problem is with NA: everytime the mean encounters a NA, it > > > creates > > > > NA for all group members. > > > > I need the software to calculate the mean ignoring NA. So when the > group > > > is > > > > made of three people, mean of the three. > > > > If the group is two values and an NA, calculate the mean of two. > > > > > > > > My code works , creates a mean at each position for three subjects, > > > > replacing instead of the value of the single, the group mean. > > > > But when NA appears, all the group gets NA. > > > > > > > > Perhaps there is a different way to obtain the same result. > > > > > > > > > > > > > > > > On Mon, 16 Sept 2024 at 11:35, Rui Barradas > > > wrote: > > > > > > > >> Às 08:28 de 16/09/2024, Francesca escreveu: > > > >>> Dear Contributors, > > > >>> I hope someone has found a similar issue. > > > >>> > > > >>> I have this data set, > > > >>> > > > >>> > > > >>> > > > >>> cp1 > > > >>> cp2 > > > >>> role > > > >>> groupid > > > >>> 1 > > > >>> 10 > > > >>> 13 > > > >>> 4 > > > >>> 5 > > > >>> 2 > > > >>> 5 > > > >>> 10 > > > >>> 3 > > > >>> 1 > > > >>> 3 > > > >>> 7 > > > >>> 7 > > > >>> 4 > > > >>> 6 > > > >>> 4 > > > >>> 10 > > > >>> 4 > > > >>> 2 > > > >>> 7 > > > >>> 5 > > > >>> 5 > > > >>> 8 > > > >>> 3 > > > >>> 2 > > > >>> 6 > > > >>> 8 > > > >>> 7 > > > >>> 4 > > > >>> 4 > > > >>> 7 > > > >>> 8 > > &g
[R] Loops
Dear Contributors, I am asking help on the way how to solve a problem related to loops for that I always get confused with. I would like to perform the following procedure in a compact way. Consider that p is a matrix composed of 100 rows and three columns. I need to calculate the sum over some rows of each column separately, as follows: fa1<-(colSums(p[1:25,])) fa2<-(colSums(p[26:50,])) fa3<-(colSums(p[51:75,])) fa4<-(colSums(p[76:100,])) fa5<-(colSums(p[1:100,])) and then I need to apply to each of them the following: fa1b<-c() for (i in 1:3){ fa1b[i]<-(100-(100*abs(fa1[i]/sum(fa1[i])-(1/3 } fa2b<-c() for (i in 1:3){ fa2b[i]<-(100-(100*abs(fa2[i]/sum(fa2[i])-(1/3 } and so on. Is there a more efficient way to do this? Thanks for your time! Francesca -- Francesca Pancotto, PhD Università di Modena e Reggio Emilia Viale A. Allegri, 9 40121 Reggio Emilia Office: +39 0522 523264 Web: https://sites.google.com/site/francescapancotto/ -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Loops
Thanks to you all, they are very useful and I am learning a lot. Best, Francesca On 27 January 2013 19:20, arun wrote: > > > Hi, > > You could use library(plyr) as well > library(plyr) > pnew<-colSums(aaply(laply(split(as.data.frame(p),((1:nrow(as.data.frame(p))-1)%/% > 25)+1),as.matrix),c(2,3),function(x) x)) > res<-rbind(t(pnew),colSums(p)) > row.names(res)<-1:nrow(res) > res<- 100-100*abs(res/rowSums(res)-(1/3)) > A.K. > > > - Original Message - > From: Rui Barradas > To: Francesca > Cc: r-help@r-project.org > Sent: Sunday, January 27, 2013 6:17 AM > Subject: Re: [R] Loops > > Hello, > > I think there is an error in the expression > > 100-(100*abs(fa1[i]/sum(fa1[i])-(1/3))) > > Note that fa1[i]/sum(fa1[i]) is always 1. If it's fa1[i]/sum(fa1), try > the following, using lists to hold the results. > > > # Make up some data > set.seed(6628) > p <- matrix(runif(300), nrow = 100) > > idx <- seq(1, 100, by = 25) > fa <- lapply(idx, function(i) colSums(p[i:(i + 24), ])) > fa[[5]] <- colSums(p) > > fab <- lapply(fa, function(x) 100 - 100*abs(x/sum(x) - 1/3)) > fab > > You can give names to the lists elements, if you want to. > > > names(fa) <- paste0("fa", 1:5) > names(fab) <- paste0("fa", 1:5, "b") > > > Hope this helps, > > Rui Barradas > > Em 27-01-2013 08:02, Francesca escreveu: > > Dear Contributors, > > I am asking help on the way how to solve a problem related to loops for > > that I always get confused with. > > I would like to perform the following procedure in a compact way. > > > > Consider that p is a matrix composed of 100 rows and three columns. I > need > > to calculate the sum over some rows of each > > column separately, as follows: > > > > fa1<-(colSums(p[1:25,])) > > > > fa2<-(colSums(p[26:50,])) > > > > fa3<-(colSums(p[51:75,])) > > > > fa4<-(colSums(p[76:100,])) > > > > fa5<-(colSums(p[1:100,])) > > > > > > > > and then I need to apply to each of them the following: > > > > > > fa1b<-c() > > > > for (i in 1:3){ > > > > fa1b[i]<-(100-(100*abs(fa1[i]/sum(fa1[i])-(1/3 > > > > } > > > > > > fa2b<-c() > > > > for (i in 1:3){ > > > > fa2b[i]<-(100-(100*abs(fa2[i]/sum(fa2[i])-(1/3 > > > > } > > > > > > and so on. > > > > Is there a more efficient way to do this? > > > > Thanks for your time! > > > > Francesca > > > > -- > > Francesca Pancotto, PhD > > Università di Modena e Reggio Emilia > > Viale A. Allegri, 9 > > 40121 Reggio Emilia > > Office: +39 0522 523264 > > Web: https://sites.google.com/site/francescapancotto/ > > -- > > > > [[alternative HTML version deleted]] > > > > > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Francesca -- Francesca Pancotto, PhD Università di Modena e Reggio Emilia Viale A. Allegri, 9 40121 Reggio Emilia Office: +39 0522 523264 Web: https://sites.google.com/site/francescapancotto/ -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] On the calulation of crossed differences
Dear Contributors, I am back asking for help concerning the same type of dataset I was asking before, in a previous help request. I needed to sum data over subsample of three time series each of them made of 100 observations. The solution proposed were various, among which: db<-p dim( db ) <- c(25,4,3) db2 <- apply(db, c(2,3), sum) db3 <- t(apply(db2, 1, function(poff) 100-(100*abs(poff/sum(poff)-(1/3))) ) ) My request now is about the function at the end of the calculation in db3. IF instead of the difference from a number, here 1/3, I need to calculate the following difference: consider that db3 is a matrix 4x3, I need to calculate (db3[1,1] -db3[1,2])+(db3[1,1] -db3[1,3])*0.5 and store it to a cell, then (db3[1,2] -db3[1,1])+(db3[1,2] -db3[1,3])*0.5 and store it to a cell, then (db3[1,3] -db3[1,2])+(db3[1,3] -db3[1,2])*0.5 and store it to a cell, then repeat this for each of the four row of the same matrix. The resulting matrix should be composed of these distances. I need to repeat this for each of the subsamples. I realize that there arecalculations that are repeated but I did not find a strategy that does not require Francesca ------ Francesca Pancotto, PhD Università di Modena e Reggio Emilia Viale A. Allegri, 9 40121 Reggio Emilia Office: +39 0522 523264 Web: https://sites.google.com/site/francescapancotto/ -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Selecting elements in lists with a row condition
Dear Contributors I am asking some advice on how to solve the following problem. I have a list composed of 78 elements, each of which is a matrix of factors and numbers, similar to the following bank_name date px_last_CIB Q.Yp_made p_for 1 CIB 10/02/061.33 p406-q406406 406 2 CIB 10/23/061.28 p406-q406406 406 3 CIB 11/22/061.28 p406-q406406 406 4 CIB 10/02/061.35 p406-q107406 107 5 CIB 10/23/061.32 p406-q107406 107 6 CIB 11/22/061.32 p406-q107406 107 -- Francesca -- Francesca Pancotto, PhD Università di Modena e Reggio Emilia Viale A. Allegri, 9 40121 Reggio Emilia Office: +39 0522 523264 Web: https://sites.google.com/site/francescapancotto/ -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting elements in lists with a row condition
Dear Contributors sorry but the message was sent involuntary. I am asking some advice on how to solve the following problem. I have a list composed of 78 elements, each of which is a matrix of factors and numbers, similar to the following bank_name date px_last_CIB Q.Yp_made p_for 1 CIB 10/02/061.33 p406-q406406 406 2 CIB 10/23/061.28 p406-q406406 406 3 CIB 11/22/061.28 p406-q406406 406 4 CIB 10/02/061.35 p406-q107406 107 5 CIB 10/23/061.32 p406-q107406 107 6 CIB 11/22/061.32 p406-q107406 107 Each of these matrixes changes for the column name bank_name and for the suffix _CIB which reports the name as in bank_name. Moreover each matrix as a different number of rows, so that I cannot transform it into a large matrix. I need to create a matrix made of the rows of each element of the list that respect the criterium that the column p_made is = to 406. I need to pick each of the elements of each matrix that is contained in the list elements, that satisfy this condition. It seems difficult to me but perhaps is super easy. Thanks for any help you can provide. Francesca On 4 February 2014 12:42, Francesca wrote: > Dear Contributors > I am asking some advice on how to solve the following problem. > I have a list composed of 78 elements, each of which is a matrix of > factors and numbers, similar to the following > > bank_name date px_last_CIB Q.Yp_made p_for > 1 CIB 10/02/061.33 p406-q406406 406 > 2 CIB 10/23/061.28 p406-q406406 406 > 3 CIB 11/22/061.28 p406-q406406 406 > 4 CIB 10/02/061.35 p406-q107406 107 > 5 CIB 10/23/061.32 p406-q107406 107 > 6 CIB 11/22/061.32 p406-q107 406 107 > > > -- > > Francesca > > -- > Francesca Pancotto, PhD > Università di Modena e Reggio Emilia > Viale A. Allegri, 9 > 40121 Reggio Emilia > Office: +39 0522 523264 > Web: https://sites.google.com/site/francescapancotto/ > -- > -- Francesca -- Francesca Pancotto, PhD Università di Modena e Reggio Emilia Viale A. Allegri, 9 40121 Reggio Emilia Office: +39 0522 523264 Web: https://sites.google.com/site/francescapancotto/ -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reorganize(stack data) a dataframe inducing names
Dear Contributors, thanks for collaboration. I am trying to reorganize data frame, that looks like this: n1.Index DatePX_LASTn2.Index Date.1 PX_LAST.1 n3.Index Date.2 PX_LAST.2 1 NA04/02/071.34 NA 04/02/07 1.36 NA 04/02/07 1.33 2 NA04/09/071.34 NA 04/09/07 1.36 NA 04/09/07 1.33 3 NA 04/16/071.34 NA 04/16/07 1.36 NA 04/16/07 1.33 4 NA 04/30/071.36 NA 04/30/07 1.40 NA 04/30/07 1.37 5 NA05/07/071.36 NA 05/07/07 1.40 NA 05/07/07 1.37 6 NA 05/14/071.36 NA 05/14/07 1.40 NA 05/14/07 1.37 7 NA 05/22/071.36 NA 05/22/07 1.40 NA 05/22/07 1.37 While what I would like to obtain is: I would like to obtain stacked data as: n1.Index DatePX_LAST n1.Index04/02/071.34 n1.Index04/09/071.34 n1.Index 04/16/071.34 n1.Index 04/30/071.36 n1.Index05/07/071.36 n1.Index 05/14/071.36 n1.Index 05/22/071.36 n2.Index 04/02/071.36 n2.Index 04/16/071.36 n2.Index 04/16/071.36 n2.Index 04/30/071.40 n2.Index 05/07/071.40 n2.Index 05/14/071.40 n2.Index 05/22/071.40 n3.Index 04/02/071.33 n3.Index 04/16/071.33 n3.Index 04/16/071.33 n3.Index 04/30/071.37 I have tried the function stack, but it uses only one argument. Then I have tested the melt function from the package reshape, but it seems not to be reproducing the correct organization of the data, as it takes date as the id values. PS: the n1 index names are not ordered in the original database, so I cannot fill in the NA with the names using a recursive formula. Thank you for any help you can provide. Francesca -- Francesca -- Francesca Pancotto, PhD Dipartimento di Economia Università di Bologna Piazza Scaravilli, 2 40126 Bologna Office: +39 051 2098135 Cell: +39 393 6019138 Web: http://www2.dse.unibo.it/francesca.pancotto/ -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reorganize(stack data) a dataframe inducing names
Dear Contributors thanks for any help you can provide. I searched the threads but I could not find any query that satisfied my needs. This is my database: index time values 13732 27965 DATA.Q211.SUM.Index04/08/11 1.42 13733 27974 DATA.Q211.SUM.Index05/10/11 1.45 13734 27984 DATA.Q211.SUM.Index06/01/11 1.22 13746 28615 DATA.Q211.TDS.Index04/07/11 1.35 13747 28624 DATA.Q211.TDS.Index05/20/11 1.40 13754 29262 DATA.Q211.UBS.Index05/02/11 1.30 13755 29272 DATA.Q211.UBS.Index05/03/11 1.48 13761 29915 DATA.Q211.UCM.Index04/28/11 1.43 13768 30565 DATA.Q211.VDE.Index05/02/11 1.48 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 13789 31865 DATA.Q211.WPC.Index04/01/11 1.40 13790 31875 DATA.Q211.WPC.Index04/08/11 1.42 13791 31883 DATA.Q211.WPC.Index05/10/11 1.43 13804 32515 DATA.Q211.XTB.Index04/29/11 1.50 13805 32525 DATA.Q211.XTB.Index05/30/11 1.40 13806 32532 DATA.Q211.XTB.Index06/28/11 1.43 I need to select only the rows of this database that correspond to each of the first occurrences of the string represented in column index. In the example shown I would like to obtain a new data.frame which is index time values 13732 27965 DATA.Q211.SUM.Index04/08/11 1.42 13746 28615 DATA.Q211.TDS.Index04/07/11 1.35 13754 29262 DATA.Q211.UBS.Index05/02/11 1.30 13761 29915 DATA.Q211.UCM.Index04/28/11 1.43 13768 30565 DATA.Q211.VDE.Index05/02/11 1.48 13775 31215 DATA.Q211.WF.Index04/14/11 1.44 13789 31865 DATA.Q211.WPC.Index04/01/11 1.40 13804 32515 DATA.Q211.XTB.Index04/29/11 1.50 As you can see, it is not the whole string to change, rather a substring that is part of it. I want to select only the first values related to the row that presents for the first time the different part of the string(substring). I know how to select rows according to a substring condition on the index column, but I cannot use it here because the substring changes and moreover the number of occurrences per substring is variable. Thank you for any help you can provide. Francesca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reorganize(stack data) a dataframe inducing names
Works perfectly. Thanks. f. On 1 August 2011 18:22, jim holtman wrote: > Try this: had to add extra names to your data since it was not clear > how it was organized. Next time use 'dput' to enclose data. > > > x <- read.table(textConnection(" index time key date values > + 13732 27965 DATA.Q211.SUM.Index04/08/11 1.42 > + 13733 27974 DATA.Q211.SUM.Index05/10/11 1.45 > + 13734 27984 DATA.Q211.SUM.Index06/01/11 1.22 > + 13746 28615 DATA.Q211.TDS.Index04/07/11 1.35 > + 13747 28624 DATA.Q211.TDS.Index05/20/11 1.40 > + 13754 29262 DATA.Q211.UBS.Index05/02/11 1.30 > + 13755 29272 DATA.Q211.UBS.Index05/03/11 1.48 > + 13761 29915 DATA.Q211.UCM.Index04/28/11 1.43 > + 13768 30565 DATA.Q211.VDE.Index05/02/11 1.48 > + 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 > + 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 > + 13789 31865 DATA.Q211.WPC.Index04/01/11 1.40 > + 13790 31875 DATA.Q211.WPC.Index04/08/11 1.42 > + 13791 31883 DATA.Q211.WPC.Index05/10/11 1.43 > + 13804 32515 DATA.Q211.XTB.Index04/29/11 1.50 > + 13805 32525 DATA.Q211.XTB.Index05/30/11 1.40 > + 13806 32532 DATA.Q211.XTB.Index06/28/11 1.43") > + , header = TRUE > + , as.is = TRUE > + ) > > closeAllConnections() > > x > index time key date values > 1 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 > 2 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45 > 3 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22 > 4 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 > 5 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40 > 6 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 > 7 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48 > 8 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 > 9 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 > 10 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 > 11 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 > 12 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 > 13 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42 > 14 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43 > 15 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 > 16 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40 > 17 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43 > > # get index of first occurance of 'key' column > > indx <- !duplicated(x$key) > > x[indx,] > index time key date values > 1 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 > 4 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 > 6 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 > 8 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 > 9 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 > 10 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 > 12 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 > 15 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 > > > > > > > > On Mon, Aug 1, 2011 at 11:13 AM, Francesca > wrote: > > Dear Contributors > > thanks for any help you can provide. I searched the threads > > but I could not find any query that satisfied my needs. > > This is my database: > > index time values > > 13732 27965 DATA.Q211.SUM.Index04/08/11 1.42 > > 13733 27974 DATA.Q211.SUM.Index05/10/11 1.45 > > 13734 27984 DATA.Q211.SUM.Index06/01/11 1.22 > > 13746 28615 DATA.Q211.TDS.Index04/07/11 1.35 > > 13747 28624 DATA.Q211.TDS.Index05/20/11 1.40 > > 13754 29262 DATA.Q211.UBS.Index05/02/11 1.30 > > 13755 29272 DATA.Q211.UBS.Index05/03/11 1.48 > > 13761 29915 DATA.Q211.UCM.Index04/28/11 1.43 > > 13768 30565 DATA.Q211.VDE.Index05/02/11 1.48 > > 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 > > 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 > > 13789 31865 DATA.Q211.WPC.Index04/01/11 1.40 > > 13790 31875 DATA.Q211.WPC.Index04/08/11 1.42 > > 13791 31883 DATA.Q211.WPC.Index05/10/11 1.43 > > 13804 32515 DATA.Q211.XTB.Index04/29/11 1.50 > > 13805 32525 DATA.Q211.XTB.Index05/30/11 1.40 > > 13806 32532 DATA.Q211.XTB.Index06/28/11 1.43 > > > > I need to select only the rows of this database that correspond to each > > of the first occurrences of the string represented in column > > index. In the example shown I would like to obtain a new > > data.frame which is > > > > index time values > > 13732 27965 DATA.Q211.SUM
[R] Simulation over data repeatedly for four loops
Dear Contributors, I am trying to perform a simulation over sample data, but I need to reproduce the same simulation over 4 groups of data. My ability with for loop is null, in particular related to dimensions as I always get, no matter what I try, "number of items to replace is not a multiple of replacement length" This is what I intend to do: replicate this operation for four times, where the index for the four groups is in the part of the code: datiPc[[1]][,2]. I have to replicate the following code 4 times, where the changing part is in the data from which I pick the sample, the data that are stored in datiPc[[1]][,2]. If I had to use data for the four samples, I would substitute the 1 with a j and replicate a loop four times, but it never worked. My desired final outcome is a matrix with 1 observations for each couple of extracted samples, i.e. 8 columns of 1 observations of means. db<-c() # Estrazione dei campioni dai dati di PGG e TRUST estr1 <- c(); estr2 <- c(); m1<-c() m2<-c() tmp1<- data1[[1]][,2]; tmp2<- data2[[2]][,2]; for(i in 1:100){ estr1<-sample(tmp1, 1000, replace = TRUE) estr2<-sample(tmp2, 1000, replace = TRUE) m1[i]<-mean(estr1,na.rm=TRUE) m2[i]<-mean(estr2,na.rm=TRUE) } db<-data.frame(cbind(m1,m2)) Thanks for any help you can provide. Best Regards -- Francesca -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help
Dear Contributors I would like to perform this operation using a loop, instead of repeating the same operation many times. The numbers from 1 to 4 related to different groups that are in the database and for which I have the same data. x<-c(1,3,7) datiP1 <- datiP[datiP$city ==1,x]; datiP2 <- datiP[datiP$city ==2,x]; datiP3 <- datiP[datiP$city ==3,x] datiP4 <- datiP[datiP$city ==4,x]; -- Thank you for any help you can provide. Francesca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] computing scores from a factor analysis
William, I had a problem similar to Wolfgang and I solved it through your help. Many thanks! Just an observation which sounded strange to me ( I am not a statistician, just a wildlife biologist) I have noticed that running the pca using principal with raw data (and therefore using scores=TRUE in the command line) gives different pca scores than running the same pca with the correlation matrix (using scores=FALSE in the command line and therefore calculating the scores in the way you suggested to Wolfgang). Is that normal? Thanks francesca -- View this message in context: http://r.789695.n4.nabble.com/computing-scores-from-a-factor-analysis-tp4306234p4372993.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to fix indeces in a loop
Dear Contributors, I have an easy question for you which is puzzling me instead. I am running loops similar to the following: for (i in c(100,1000,1)){ print((mean(i))) #var<-var(rnorm(i,0,1)) } This is what I obtain: [1] 100 [1] 1000 [1] 1 In this case I ask the software to print out the result, but I would like to store it in an object. I have tried a second loop, because if I index the out put variable with the i , i get thousands of records which I do not want(a matrix of dimension 1). for (i in c(100,1000,1)){ for (j in 1:3){ x[j]<-((mean(i))) #var<-var(rnorm(i,0,1)) }} This is the x: [,1] [,2] [,3] [1,] 1 NA NA [2,] 1 NA NA [3,] 1 NA NA Clearly the object x is storing only the last value of i, 1. I would like to save a vector of dimension 3 with content 100,1000,1, but I do not know how to fix the index in an efficient manner. Thanks for any help you can provide. Francesca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to fix indeces in a loop
Thanks a lot! Francesca On 18 May 2012 18:21, arun wrote: > Hi Francesca, > >> for(i in 1:length(x1<-c(100,1000,1))){ > j<-x1[i] > x1[i]<-mean(j) > } > >> x1 > [1] 100 1000 1 > > > > A.K. > > > > - Original Message - > From: Francesca > To: r-help@r-project.org > Cc: > Sent: Friday, May 18, 2012 10:59 AM > Subject: [R] How to fix indeces in a loop > > Dear Contributors, > I have an easy question for you which is puzzling me instead. > I am running loops similar to the following: > > > for (i in c(100,1000,1)){ > > print((mean(i))) > #var<-var(rnorm(i,0,1)) > } > > This is what I obtain: > > [1] 100 > [1] 1000 > [1] 1 > > In this case I ask the software to print out the result, but I would > like to store it in an object. > I have tried a second loop, because if I index the out put variable > with the i , i get thousands of records which I do not want(a matrix > of dimension 1). > > for (i in c(100,1000,1)){ > for (j in 1:3){ > x[j]<-((mean(i))) > #var<-var(rnorm(i,0,1)) > }} > > This is the x: > > [,1] [,2] [,3] > [1,] 1 NA NA > [2,] 1 NA NA > [3,] 1 NA NA > > Clearly the object x is storing only the last value of i, 1. > > I would like to save a vector of dimension 3 with content 100,1000,1, > but I do not know how to fix the index in an efficient manner. > > Thanks for any help you can provide. > Francesca > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Francesca -- Francesca Pancotto, PhD Università di Modena e Reggio Emilia Viale A. Allegri, 9 40121 Reggio Emilia Office: +39 0522 523264 Web: http://www2.dse.unibo.it/francesca.pancotto/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Collecting results of a test with array
Dear contributors I have tried this experiment: x<-c() for (i in 1:12){ x[i]<-list(cbind(x1[i],x2[i])) #this is a list of 12 couples of time series I am using to perform a test } # that compares them 2 by 2 # # #trace statistic test<-data.frame() cval<-array( , dim=c(2,3,12)) for (i in 2:12){ for (k in 1:2){ for (j in 1:3){ result[k,j,i]<- ((ca.jo(data.frame(x[i]),ecdet="none",type="trace", spec="longrun",K=2))@cval[k,j]) }}} I have a problem in collecting the results of a test. The function ca.jo creates an object with various attributes, one of which is the "cval" that i can access through @cval. The attribute cval is an object of dimension 2X3. I am running recursively the test with ca.jo for 12 couples of time series, so I have an output of 12 matrices of 2X3 elements and I would like to create an object like an array of dimension (2,3,12) which contains each matrix @cval produced by ca.jo for the 12 subjects that i tested. Can anyone help me with that? I hope my explanation of the problem is clear. Thanks in advance for any help. -- Francesca -- Francesca Pancotto, PhD Università di Modena e Reggio Emilia Viale A. Allegri, 9 40121 Reggio Emilia Office: +39 0522 523264 Web: http://www2.dse.unibo.it/francesca.pancotto/ -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Select part of character row name in a data frame
Dear R contributors, I have a problem in selecting in an efficient way, rows of a data frame according to a condition, which is a part of a row name of the table. The data frame is made of 64 rows and 2 columns, but the row names are very long but I need to select them according to a small part of it and perform calculations on the subsets. This is the example: X Y "Unique to strat " 0.048228.39 "Unique to crt.dummy" 0.044125.92 "Unique to gender " 0.0159 9.36 "Unique to age " 0.083949.37 "Unique to gg_right1 " 0.0019 1.10 "Unique to strat:crt.dummy " 0.068940.54 "Common to strat, and crt.dummy " -0.0392 -23.09 "Common to strat, and gender " -0.0031-1.84 "Common to crt.dummy, and gender " 0.0038 2.21 "Common to strat, and age " 0.0072 4.21 X and Y are the two columns of variables, while “Unique to strat”, are the row names. I am interested to select for example those rows whose name contains “strat” only. It would be very easy if these names were simple, but they are not and involve also spaces. I tried with select matches from dplyr but works for column names but I did not find how to use it on row names, which are of course character values. Thanks for any help you can provide. -- Francesca Pancotto, PhD __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Select part of character row name in a data frame
Thanks a lot, so simple so efficient! I will study more the grep command I did not know. Thanks! Francesca Pancotto > Il giorno 19 ott 2017, alle ore 12:12, Enrico Schumann > ha scritto: > > df[grep("strat", row.names(df)), ] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Select part of character row name in a data frame
I did not need to select the whole character sentence, otherwise I would know how to do it.. from basic introduction to R as you suggest. Grep works perfectly. f. -- Francesca Pancotto, PhD > Il giorno 19 ott 2017, alle ore 18:01, Jeff Newmiller > ha scritto: > > (Re-)read the discussion of indexing (both `[` and `[[`) and be sure to get > clear on the difference between matrices and data frames in the Introduction > to R document that comes with R. There are many ways to create numeric > vectors, character vectors, and logical vectors that can then be used as > indexes, including the straightforward way: > > df[ c( > "Unique to strat ", > "Unique to strat:crt.dummy ", > "Common to strat, and crt.dummy ", > "Common to strat, and gender ", > "Common to strat, and age ") ,] > -- > Sent from my phone. Please excuse my brevity. > > On October 19, 2017 3:14:53 AM PDT, Francesca PANCOTTO > wrote: >> Thanks a lot, so simple so efficient! >> >> I will study more the grep command I did not know. >> >> Thanks! >> >> >> Francesca Pancotto >> >>> Il giorno 19 ott 2017, alle ore 12:12, Enrico Schumann >> ha scritto: >>> >>> df[grep("strat", row.names(df)), ] >> >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Map of Italy data filled at the level of the province
Dear Users I am very new to the use of ggplot. I am supposed to make a plot of Italian provinces in which I have to fill the color of some provinces with the values of a variable(I do not provide the data because it is irrelevant which data to use). Right now I explored the function map in maps package thanks to which I managed to plot the map of Italy with provinces borders and select only those provinces contained in the vector nomi(which is just a list of character elements with the names of the provinces which are just like counties in the US). map("italy",col=1:20, regions=nomi) The problem is to fill the provinces level with the values of a variable that is the variable of interest: I found a series of examples based on US data extracted from very hard to get databases. Can anyone provide an easy example where to start from? Thanks in advance Francesca -- Francesca Pancotto Professore Associato di Politica Economica Università degli Studi di Modena e Reggio Emilia Palazzo Dossetti - Viale Allegri, 9 - 42121 Reggio Emilia Office: +39 0522 523264 Web: https://sites.google.com/site/francescapancotto/ -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Small vector into large data frame
Dear Contributors I seem not to get the general rule applying to the use of loops. I need some help. I have a database in which i need to generate a variable according to the following rule. This is the database head bank_name date px_last Q_Y p_made p_for p_m p_f aba.1 ABA 2006-10-241.28 p406-q406406 406 1 1 aba.2 ABA 2006-11-301.31 p406-q406406 406 1 1 aba.3 ABA 2006-10-241.29 p406-q107406 107 1 2 aba.4 ABA 2006-11-301.33 p406-q107406 107 1 2 aba.5 ABA 2006-10-241.31 p406-q207406 207 1 3 aba.6 ABA 2006-11-301.35 p406-q207406 207 1 3 the variable p_f takes values from 1 to 19 in a non regular way. then I have a vector of 19 elements > spot$pxlast [1] 1.32 1.34 1.35 1.43 1.46 1.58 1.58 1.41 1.40 1.33 1.40 1.46 1.43 1.35 1.22 1.36 1.34 1.42 1.42 I need to create a variable to attach to the data frame which is composed of 11500 rows that takes values 1.32 when p_f==1 1.34 when p_f==2 It seems so easy but I cannot find a way to do it in an efficient way. Thanks in advance for any help. Francesca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Organize regression output
Dear Contributors I would like to ask some help concerning the automatization process of an analysis, that sounds hard to my knowledge. I have a list of regression models. I call them models=c(ra,rb,rc,rd,re,rf,rg,rh) I can access the output of each of them using for example, for the first ra$coefficients and i obtain (Intercept) coeff1 coeff2age gender 0.62003033 0.00350807 -0.03817848 -0.01513533 -0.18668972 and I know that ra$coefficients[1] would give me the intercept of this model. What i need to do is to collect the coefficients of each regression in models, and calculate and place in a table, the following simple summation: ra rb rc ... intercept intercept intercept intercept+coeff1intercept+coeff1 intercept+coeff1 intercept+coeff2intercept+coeff2 intercept+coeff2 intercept+coeff1+coeff2 intercept+coeff1+coeff2 intercept+coeff1+coeff2 The calculations are trivial(I know how to do it in steps) but what is difficult for me is to invent a procedure that organizes the data in an efficient way. I tried some step , starting with collecting the coefficients but i think I am going the wrong way calcolati <- list() for (i in c(ra,rb,rc,rd,re,rf,rg,rh)) { calcolati[[i]] <- i$coefficients[1] } Thanks for any help you can provide. f. -- Francesca Pancotto Web: https://sites.google.com/site/francescapancotto/ <https://sites.google.com/site/francescapancotto/> Energie: http://www.energie.unimore.it/ <http://www.energie.unimore.it/> -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question on DPpackage
Hi to everyone, I'm a PhD student and I'm involved in non parametric analyses of hierarchical models. I tried to use package DPpackage on my data, but I encountered some problems in interpreting ouputs. Can anybody help me? The problem can be remued as follows: I have a logit hierarchical model for survival (i.e. binary response) in patients affected by heart failure (the court consists of n=536 subjects), admitted in J different hospitals. The idea is to explain survival by means of linear predictor of relevant covariates with a random effect on grouping factor, represented by hospital of dmission. The main goal would be the reconstruction of random effect density, because we are interest in founding (if present) groups of "similar" hospitals. Now, I found some trouble in doing this beause I don't understand how outputs "ss", "ncluster" and "mub" work. Particullary, how does the package compute "ss" starting from "ncluster" distribution? How should I interpret the vector "mub"? I tried to look into the fortrand code, but I'm not used to that language so it didn't help me a lot. Could anybody make me understand how "ss" and "ncluster" are related, and how to build the predictive densities of the random effect starting from "mub"s elements? It would be a great gain in my work. Looking forward to hear news from you thanks a lot in advance Regards Francesca Ieva -- - Francesca Ieva MOX - Modeling and Scientific Computing Dipartimento di Matematica "F.Brioschi" Politecnico di Milano Via Bonardi 9, 20133 Milano, Italy mailto: francesca.i...@fastwebnet.it francesca.i...@mail.polimi.it Voice: +39 02 2399 4604 Skype: francesca.ieva - __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to perform an ols estimation with lm ignoring NA values
Dear R community, probably my question is obvious but I did not find any solution yet by browsind the mailing list results. I need to perform a simple ols regression in a dataset with cross section data, where no temporal dimension is inserted. In this data set there are missing values. I would like the software to perform the ols regression but to just ignore these data and consider the rest. I tried to use the function na.action=na.omit in lm( y~x, na.action=na.omit) but it seems to exert no effect on the function. Thank you for any available help. Francesca -- Post - doc Finance HEC Management School of the University of Liège Rue Louvrex, 14 , Bldg N1 , B-4000 Liège Belgium Web: https://mail.sssup.it/~pancotto __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rotating pca scores
Dear Folks I need to rotate PCA loadings and scores using R. I have run a pca using princomp and I have rotated PCA results with varimax. Using varimax R gives me back just rotated PC loadings without rotated PC scores. Does anybody know how I can obtain/calculate rotated PC scores with R? Your kindly help is appreciated in advance Francesca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rotating pca scores
Dear Folks I need to rotate PCA loadings and scores using R. I have run a pca using princomp and I have rotated PCA results with varimax. Using varimax R gives me back just rotated PC loadings without rotated PC scores. Does anybody know how I can obtain/calculate rotated PC scores with R? Your kindly help is appreciated in advance Francesca -- Francesca Iordan 11A Sharon Gardens E97RX London UK contact: +44 750 5485255 Italian contact: +39 349 7313294 Skype: jordanfrancesca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Step and AIC
Hello everybody, I would need some help from you. I am trying to fit a logistic model to some presence absence data of animals living on river islands. I have got 12 predictor variables and I am trying to use a stepwise forward method to fit the best logistic model to my data. I am using the function STEP (stats). I have a question for you. Can I use step function if my variables have a binomial distribution? Reading the explanations of the function, I have understood that step is more suitable for dealing with gaussian distributed variables. Is that right? I apologize in advance for this question, but I am just at the beginning of my long path to handle and know statistics and R. regards Francesca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Exponentiate very large numbers
Dear R experts, I have the logarithms of 2 values: log(a) = 1347 log(b) = 1351 And I am trying to solve this expression: exp( ln(a) ) - exp( ln(0.1) + ln(b) ) But of course every time I try to exponentiate the log(a) or log(b) values I get Inf. Are there any tricks I can use to get a real result for exp( ln(a) ) - exp( ln(0.1) + ln(b) ), either in logarithm or exponential form? Thank you very much for the help __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exponentiate very large numbers
I am sorry I have confused you, the logs are all base e: ln(a) = 1347 ln(b) = 1351 And I am trying to solve this expression: exp( ln(a) ) - exp( ln(0.1) + ln(b) ) Thank you. 2013/2/4 francesca casalino : > Dear R experts, > > I have the logarithms of 2 values: > > log(a) = 1347 > log(b) = 1351 > > And I am trying to solve this expression: > > exp( ln(a) ) - exp( ln(0.1) + ln(b) ) > > But of course every time I try to exponentiate the log(a) or log(b) > values I get Inf. Are there any tricks I can use to get a real result > for exp( ln(a) ) - exp( ln(0.1) + ln(b) ), either in logarithm or > exponential form? > > > Thank you very much for the help __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 and facet_wrap help
Dear R experts, I am trying to arrange multiple plots, creating one graph for each size1 factor variable in my data frame, and each plot has the median price on the y-axis and the size2 on the x-axis grouped by clarity: library(ggplot2) df <- data.frame(price=matrix(sample(1:1000, 100, replace = TRUE), ncol = 1)) df$size1 = 1:nrow(df) df$size1 = cut(df$size1, breaks=11) df=df[sample(nrow(df)),] df$size2 = 1:nrow(df) df$size2 = cut(df$size2, breaks=11) df=df[sample(nrow(df)),] df$clarity = 1:nrow(df) df$clarity = cut(df$clarity, breaks=6) mydf = aggregate(df$price, by=list(df$size1, df$size2, df$clarity),median) names(mydf)[1] = 'size1' names(mydf)[2] = 'size2' names(mydf)[3] = 'clarity' names(mydf)[4] = 'median_price' # So my data is already in a "long" format I think, but when I do this: ggplot(data=mydf, aes(x=mydf$size2, y=mydf$median_price, group=as.factor(mydf$clarity), colour=as.factor(mydf$clarity))) + geom_line() + facet_wrap(~ factor(mydf$size1)) I get this error: "Error in layout_base(data, vars, drop = drop) : At least one layer must contain all variables used for facetting" Can you please help me understand what I am doing wrong? -fra __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 and facet_wrap help
Dear Ista, Thank you! It works perfectly! -fra 2013/2/18 Ista Zahn : > Hi, > > You are making it more complicated than it needs to be. You already > provided the data.frame in the ggplot call, so you don't need to > specify it in the aes calls. The various factor() and as.factor() > calls are also unnecessary. So stripping away this extra stuff your > plot looks like > > ggplot(data=mydf, aes(x=size2, > y=median_price, > group=clarity, > colour=clarity)) + > geom_line() + > facet_wrap(~ size1) > > which does give the desired display. > > Best, > Ista > > On Mon, Feb 18, 2013 at 6:04 AM, francesca casalino > wrote: >> Dear R experts, >> >> I am trying to arrange multiple plots, creating one graph for each >> size1 factor variable in my data frame, and each plot has the median >> price on the y-axis and the size2 on the x-axis grouped by clarity: >> >> library(ggplot2) >> >> df <- data.frame(price=matrix(sample(1:1000, 100, replace = TRUE), ncol = 1)) >> >> df$size1 = 1:nrow(df) >> df$size1 = cut(df$size1, breaks=11) >> df=df[sample(nrow(df)),] >> df$size2 = 1:nrow(df) >> df$size2 = cut(df$size2, breaks=11) >> df=df[sample(nrow(df)),] >> df$clarity = 1:nrow(df) >> df$clarity = cut(df$clarity, breaks=6) >> >> >> mydf = aggregate(df$price, by=list(df$size1, df$size2, df$clarity),median) >> >> names(mydf)[1] = 'size1' >> names(mydf)[2] = 'size2' >> names(mydf)[3] = 'clarity' >> names(mydf)[4] = 'median_price' >> >> # So my data is already in a "long" format I think, but when I do this: >> >> ggplot(data=mydf, aes(x=mydf$size2, y=mydf$median_price, >> group=as.factor(mydf$clarity), colour=as.factor(mydf$clarity))) + >> geom_line() + facet_wrap(~ factor(mydf$size1)) >> >> >> I get this error: >> "Error in layout_base(data, vars, drop = drop) : >> At least one layer must contain all variables used for facetting" >> >> Can you please help me understand what I am doing wrong? >> -fra >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Submit a new script after all parallel jobs have completed
Dear R experts, I have an R script that creates multiple scripts and submits these simultaneously to a computer cluster, and after all of the multiple scripts have completed and the output has been written in the respective folders, I would like to automatically launch another R script that works on these outputs. I haven't been able to figure out whether there is a way to do this in R: the function 'wait' is not what I want since the scripts are submitted as different jobs and each of them completes and writes its output file at different times, but I actually want to run the subsequent script after all of the outputs appear. Can you please help me find a solution? Thank you very much -fra __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SURVREG Function
Hi, I need some help to manage frailty in Survreg function; in particular I'm looking for more information about frailty in survreg function applied to a loglogistic hazard function. Actually I need to develope a predictor for frailty random variable realization (similar to the Proportional Hazard Model's one based on Laplace Tansforms' ratio). I can't find any documentation about AFT models with Gamma Frailty developed in "survival package". Any reference paper to suggest? Thank a million Francesca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] snpStats reference allele used in genetic associations?
Hi, Does anyone know how to find the reference allele used for genetic associations ran in snpStats? I have ran several associations using snp.rhs.tests, but I cannot tell which allele was used as the "effect allele". Is it the one coded as "Al1" in the SNP.support file? I can find the RAF (risk allele frequency) from the function col.summary, but again, which allele does this refer to? Also the proportions of genotypes from the col.summary is given as "AA/AB/BB", so I cannot understand from that which is coded as the "risk" allele. I could find this in the snpStats paper: "For categorical variables, including SNPs, the user can reorder the categories. The first one will be treated as reference category in the analysis." Thank you very much for your help! Fra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] snpStats reference allele used in genetic associations?
But then how do you know which allele is the reference and which the risk allele (between A/T/C/G)? 2014-05-26 1:41 GMT+01:00 David Duffy : > francesca casalino asked: > > >> Does anyone know how to find the reference allele used for genetic >> associations ran in snpStats? >> >> A is ref allele, B is risk allele. > > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] snpStats reference allele used in genetic associations?
I am having this problem because I need to run a meta-analysis and to align all the variants between the different studies included in the meta-analysis I need to know the effect allele used to get the beta (so that I can flip the beta if the effect allele is flipped compared to all other studies). Thanks for your help. Francesca 2014-05-26 11:13 GMT+01:00 francesca casalino : > But then how do you know which allele is the reference and which the risk > allele (between A/T/C/G)? > > > 2014-05-26 1:41 GMT+01:00 David Duffy : > > francesca casalino asked: >> >> >>> Does anyone know how to find the reference allele used for genetic >>> associations ran in snpStats? >>> >>> A is ref allele, B is risk allele. >> >> >> >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] snpStats reference allele used in genetic associations?
Dear David, Thank you very much for you help, I really appreciate it. I am not using the read.snps.long() or any other import function, as the data is already in snpMatrix, so I cannot specify it at the input step⦠I am reading the data as a snpMatrix, so using load() after having called the snpStats package as in the vignette: "Example of genome-wide association testing": require(snpStats) data(for.exercise) The objects loaded with this set are: 1) genotypes (in probability format from imputed data); 2) SNP.support object, with information on the SNPs such as Allele 1, Allele 2, chr, position. Other information that I need for the meta-analysis can be extracted from the 'col.summary' command in snpStats: MAF, and I think RAF (risk allele frequency) can be considered as the Effect allele frequency for the meta-analysis. Then I am using snp.rhs.estimates and snp.rhs.tests for the associations. The problem is that I don't know which allele is taken as the risk allele in the association, is there a way to see this? Is it always the Allele 2 reported in the SNP.support file? Until I understand this, I won't be able to harmonise the SNPs all to one reference, since I don't know if I need to flip the betas when the effect allele is reversed for example⦠I have noticed this because when comparing the frequency of the Allele 2 (taken as the risk allele) and the RAF which I thought was the frequency associated with it, with the frequencies of the same allele found in the 1000 Genomes, I get concordance up to frequency= 0.5, then a shift in direction happens and I get discordance up to 1 for the reference frequency. Thank you very much for any suggestions you may have, Francesca 2014-05-27 2:07 GMT+01:00 David Duffy : > On Mon, 26 May 2014, francesca casalino wrote: > > I am having this problem because I need to run a meta-analysis and to >> align >> all the variants between the different studies included in the >> meta-analysis I need to know the effect allele used to get the beta (so >> that I can flip the beta if the effect allele is flipped compared to all >> other studies). >> > > This depends on how the data has been sent to you. Obviously, you should > check the "A" allele frequency in the different datasets. If they have > used different genotyping assays and the strand of the SNP is ambiguous, eg > G->C transversion, then this may be the only way to exclude problems. PLINK > offers a tool to check for this using LD patterns. > > Cheers, David. > > > | David Duffy (MBBS PhD) > | email: david.du...@qimrberghofer.edu.au ph: INT+61+7+3362-0217 fax: > -0101 > | Genetic Epidemiology, QIMR Berghofer Institute of Medical Research > | 300 Herston Rd, Brisbane, Queensland 4006, Australia GPG 4D0B994A > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting elements in lists with a row condition
Hello A. k. thanks for the suggestion. I tried this but it does not work. I probably use it in the wrong way. This is what it tells me, do.call(rbind,lapply(bank.list,function(x) x[x[,"p_made"]==406,])) Errore in match.names(clabs, names(xi)) : names do not match previous names What am I doing wrong? f. -- Francesca Pancotto Università degli Studi di Modena e Reggio Emilia Palazzo Dossetti - Viale Allegri, 9 - 42121 Reggio Emilia Office: +39 0522 523264 Web: https://sites.google.com/site/francescapancotto/ -- Il giorno 04/feb/2014, alle ore 16:42, arun ha scritto: > Hi, > Try: > > If `lst1` is the list: > do.call(rbind,lapply(lst1,function(x) x[x[,"p_made"]==406,])) > A.K. > > > > > On Tuesday, February 4, 2014 8:53 AM, Francesca > wrote: > Dear Contributors > sorry but the message was sent involuntary. > I am asking some advice on how to solve the following problem. > I have a list composed of 78 elements, each of which is a matrix of factors > and numbers, similar to the following > > bank_name date px_last_CIB Q.Yp_made p_for > 1 CIB 10/02/061.33 p406-q406406 406 > 2 CIB 10/23/061.28 p406-q406406 406 > 3 CIB 11/22/061.28 p406-q406406 406 > 4 CIB 10/02/061.35 p406-q107406 107 > 5 CIB 10/23/061.32 p406-q107406 107 > 6 CIB 11/22/061.32 p406-q107406 107 > > > Each of these matrixes changes for the column name bank_name and for the > suffix _CIB which reports the name as in bank_name. Moreover each matrix as > a different number of rows, so that I cannot transform it into a large > matrix. > > I need to create a matrix made of the rows of each element of the list that > respect the criterium > that the column p_made is = to 406. > I need to pick each of the elements of each matrix that is contained in the > list elements, that satisfy this condition. > > It seems difficult to me but perhaps is super easy. > Thanks for any help you can provide. > > Francesca > > > > On 4 February 2014 12:42, Francesca wrote: > >> Dear Contributors >> I am asking some advice on how to solve the following problem. >> I have a list composed of 78 elements, each of which is a matrix of >> factors and numbers, similar to the following >> >> bank_name date px_last_CIB Q.Yp_made p_for >> 1 CIB 10/02/061.33 p406-q406406 406 >> 2 CIB 10/23/061.28 p406-q406406 406 >> 3 CIB 11/22/061.28 p406-q406406 406 >> 4 CIB 10/02/061.35 p406-q107 406 107 >> 5 CIB 10/23/061.32 p406-q107406 107 >> 6 CIB 11/22/061.32 p406-q107406 107 >> >> >> -- >> >> Francesca >> >> ------ >> Francesca Pancotto, PhD >> Università di Modena e Reggio Emilia >> Viale A. Allegri, 9 >> 40121 Reggio Emilia >> Office: +39 0522 523264 >> Web: https://sites.google.com/site/francescapancotto/ > >> -- >> > > > > -- > > Francesca > > -- > Francesca Pancotto, PhD > Università di Modena e Reggio Emilia > Viale A. Allegri, 9 > 40121 Reggio Emilia > Office: +39 0522 523264 > Web: https://sites.google.com/site/francescapancotto/ > -- > > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unicode symbols not working with ggplot in R
Hi, I am trying to produce a ggplot graph using specific characters in the labels, but ggplots doesn't seem to support certain symbols. For example, when I type: print("\u25E9") it shows a square which is half black, but when I try to use it in ggplot it doesn't print. I am using facet_wrap, but it looks like the problem is in ggplot that doesn't recognise the Unicode symbols and not factet_wrap (please let me know if it is otherwise). I am taking this very helpful example for illustration ( http://r.789695.n4.nabble.com/plus-minus-in-factor-not-plotmath-not-expression-td4681490.html ): junk<-data.frame(gug=c( rep( paste("\u25E9"), 10), rep( paste("\u25E8"), 10) ) ) junk$eks<-1:nrow(junk) junk$why<-with(junk, as.numeric(gug) + eks) print(summary(junk)) library(ggplot2) print( ggplot(data=junk, mapping=aes(x=eks, y=why)) + geom_point() + facet_grid(. ~ gug) ) Is there a way to have R recognise these Unicode symbols? It is not math symbols so plotmath will not be useful here... I'm using a Mac and this is the SessionInfo: > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Unicode_0.1-3 ggplot2_0.9.3.1 plyr_1.8reshape2_1.2.2 loaded via a namespace (and not attached): [1] colorspace_1.2-4 dichromat_2.0-0digest_0.6.4 grid_3.0.2 [5] gtable_0.1.2 labeling_0.2 MASS_7.3-29munsell_0.4.2 [9] proto_0.3-10 RColorBrewer_1.0-5 scales_0.2.3 stringr_0.6.2 [13] tcltk_3.0.2tools_3.0.2 Thank you very much for your help! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Filling missing data in a Panel
Dear R contributors, I have a problem with a database that at the moment I find hard to solve. I have a panel composed of n subjects, whose names in the table that I report is bank_name, and observations for each of the individuals of bank_name from 1 to 18, as reported from the column p_for. As you can see from p_for, there are missing values in the panel that are not present and that create problems to my estimation. Do you know an efficient way to introduce missing values in the rows of the panel so that each cross section bank_name has the same number of observations p_for, even though some of them are NA? Thanks for any help you can provide, Best, Francesca row.names bank_name date px_last Q_Y p_made p_for 1 2 1 11/30/061.31 p406-q406406 406 1 2 47 1 02/26/091.27 p109-q109109 109 10 3 55 1 06/08/20091.40 p209-q209209 209 11 4 68 1 12/01/20091.51 p409-q409409 409 13 5 87 1 05/26/101.22 p210-q210210 210 15 6 96 1 7/22/20101.25 p310-q310310 310 16 7221 2 11/14/061.30 p406-q406406 406 1 8 16 2 02/13/071.27 p107-q107107 107 2 9 31 2 5/15/20071.36 p207-q207207 207 3 10 222 3 11/29/20071.50 p407-q407407 4075 11 1110 3 02/25/081.48 p108-q108108 1086 12 6 4 02/15/071.35 p107-q107107 107 2 1318 4 5/24/20071.39 p207-q207207 207 3 14 292 4 08/21/071.39 p307-q307307 3074 1538 4 11/29/20071.49 p407-q407407 4075 1649 4 01/28/081.43 p108-q108108 108 6 1761 4 05/15/081.52 p208-q208208 208 7 1871 4 08/18/081.45 p308-q308308 308 8 1978 4 11/20/081.30 p408-q408408 408 9 2088 4 02/19/091.35 p109-q109109 109 10 21 941 4 05/28/091.35 p209-q209209 209 11 -- Francesca Pancotto Università degli Studi di Modena e Reggio Emilia Palazzo Dossetti - Viale Allegri, 9 - 42121 Reggio Emilia Office: +39 0522 523264 Web: https://sites.google.com/site/francescapancotto/ -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Filling missing data in a Panel
Thanks a lot, it works perfectly! f. -- Francesca Pancotto Università degli Studi di Modena e Reggio Emilia Palazzo Dossetti - Viale Allegri, 9 - 42121 Reggio Emilia Office: +39 0522 523264 Web: https://sites.google.com/site/francescapancotto/ -- Il giorno 17/feb/2014, alle ore 09:57, arun ha scritto: > Sorry, a typo: vec3 instead of vec2 > > dat3 <- > data.frame(bank_name=vec3,p_for=rep(seq(18),length(unique(dat$bank_name > A.K. > > > > > On , arun wrote: > Hi, > > Looks like one column name is missing. I am not sure about the output you > wanted. May be this helps. > > > dat <- read.table(text="row.names bank_name date px_last Q_Y p_made > q_madep_for > 1 21 11/30/061.31 p406-q406406 406 1 > > 2471 02/26/091.27 p109-q109109 109 10 > 3551 06/08/20091.40 p209-q209209 209 11 > 4681 12/01/20091.51 p409-q409409 409 13 > 5871 05/26/101.22 p210-q210210 210 15 > 6961 7/22/20101.25 p310-q310310 310 16 > 72212 11/14/061.30 p406-q406406 406 1 > 8162 02/13/071.27 p107-q107107 107 2 > 9312 5/15/20071.36 p207-q207207 207 3 > 10 2223 11/29/20071.50 p407-q407407 407 5 > 11 11103 02/25/081.48 p108-q108108 108 6 > 1264 02/15/071.35 p107-q107107 107 2 > 13184 5/24/20071.39 p207-q207207 207 3 > 14 2924 08/21/071.39 p307-q307307 307 4 > 15384 11/29/20071.49 p407-q407407 407 5 > 16494 01/28/081.43 p108-q108108 108 6 > 17614 05/15/081.52 p208-q208208 208 7 > 18714 08/18/081.45 p308-q308308 308 8 > 19784 11/20/081.30 p408-q408408 408 9 > 20884 02/19/091.35 p109-q109109 109 10 > 21 9414 05/28/091.35 p209-q209209 209 > 11",sep="",header=TRUE,stringsAsFactors=FALSE) > ##Possible solution 1 > > tbl <- table(dat$bank_name) > dat2 <- > data.frame(bank_name=as.numeric(rep(names(tbl),max(tbl)-tbl)),p_for=NA) > res1 <- merge(dat,dat2,all=TRUE)[colnames(dat)] > table(res1$bank_name) > # > # 1 2 3 4 > #10 10 10 10 > > > ###2 > > > vec1 <- with(dat,tapply(p_for,list(bank_name),FUN=max)) > vec2 <- as.numeric(rep(names(vec1),each=max(vec1))) > dat2New <- data.frame(bank_name=vec2,p_for=rep(seq(max(vec1)),4)) > res2 <- merge(dat,dat2New,all=TRUE)[colnames(dat)] > table(res2$bank_name) > # > # 1 2 3 4 > #16 16 16 16 > > #or > > 3 > > #using 18 as mentioned in the description > > vec3 <- rep(unique(dat$bank_name),each=18) > dat3 <- > data.frame(bank_name=vec2,p_for=rep(seq(18),length(unique(dat$bank_name > res3 <- merge(dat,dat3,all=TRUE)[colnames(dat)] > table(res3$bank_name) > > # 1 2 3 4 > #18 18 18 18 > > A.K. > > > > > > On Monday, February 17, 2014 2:40 AM, Francesca Pancotto > wrote: > Dear R contributors, > I have a problem with a database that at the moment I find hard to solve. > > I have a panel composed of n subjects, whose names in the table that I report > is bank_name, > and observations for each of the individuals of bank_name from 1 to 18, as > reported from the column p_for. > As you can see from p_for, there are missing values in the panel that are not > present and that create problems to my estimation. > Do you know an efficient way to introduce missing values in the rows of the > panel so that each cross section bank_name has the same number of observations > p_for, even though some of them are NA? > Thanks for any help you can provide, > > Best, > > Francesca > > > row.names bank_name date px_last Q_Y p_made p_for > 1 2 1 11/30/061.31 p406-q406406 406 1 > > 2 47 1 02/26/091.27 p109-q109109 109 10 > 3 55 1 06/08/20091.40 p209-q209209 209 11 > 4 68 1 12/01/20091.51 p409-q409409 409 13 > 5 87 1 05/26/101.22 p210-q210210 210 15
[R] opls-da
Dear all, I would like to apply Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) to a metabolomic dataset, in order to discriminate two groups of samples. I have looked for an available R package and I have found "K-OPLS" and oscorespls.fit (Orthogonal scores PLSR) from "pls" package. I wonder if K-OPLS performs the same discriminant analysis of OPLS-DA? Is there any other available package for applying OPLS-DA? Thanks in advance for any advice. Best regards, Francesca Chignola -- --- Francesca Chignola, PhD Dulbecco Telethon Institute c/o S. Raffaele Scientific Institute Center of Genomics, BioInformatics and BioStatistics Biomolecular NMR Laboratory 1B4 Via Olgettina 58 20132 Milano Italy - - DAI IL TUO 5 X MILLE AL SAN RAFFAELE. BASTA UNA FIRMA. SE FIRMI PER LA RICERCA SANITARIA DEL SAN RAFFAELE DI MILANO, FIRMI PER TUTTI. C.F. 03 06 42 80 153 INFO: 5permi...@hsr.it - www.5xmille.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] objects memory limits
Dear list, I'm a quite new user of R-project, and I've a doubt on objects memory: I open a new R session and the command memory.limits() gives me 1535 Mb of memory (the PC has 2 Gb RAM and 32 bit), I create an integer vector object of 2e8 size, so about 2e8*4 bytes (800Mb) of memory are allocated, a size smaller then memory available. But when I try to make the dataframe of this object it gives me "Errore: cannot allocate vector of size 762.9 Mb". Why cannot I create a dataframe of an object with size smaller then memory available? I also tried to halve the object size but the situation doesn't changes. In R, is the memory of dataframe object smaller then vector object one? Are there different memory limits between objects? Is there a possibility to change limits? This is the command sequence: R version 2.12.0 (2010-10-15) Copyright (C) 2010 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i386-pc-mingw32/i386 (32-bit) > ls() character(0) > memory.limit() [1] 1535 > x=integer(2e8) > object.size(x) 80024 bytes > rm(x) > ls() character(0) > x=data.frame(integer(2e8)) Errore: cannot allocate vector of size 762.9 Mb Inoltre: Warning messages: 1: In as.data.frame.integer(x[[i]], optional = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) 2: In as.data.frame.integer(x[[i]], optional = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) 3: In as.data.frame.integer(x[[i]], optional = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) 4: In as.data.frame.integer(x[[i]], optional = TRUE) : Reached total allocation of 1535Mb: see help(memory.size) > x=data.frame(integer(1e8)) Errore: cannot allocate vector of size 381.5 Mb Inoltre: Warning messages: 1: In unlist(vlist, recursive = FALSE, use.names = FALSE) : Reached total allocation of 1535Mb: see help(memory.size) 2: In unlist(vlist, recursive = FALSE, use.names = FALSE) : Reached total allocation of 1535Mb: see help(memory.size) 3: In unlist(vlist, recursive = FALSE, use.names = FALSE) : Reached total allocation of 1535Mb: see help(memory.size) 4: In unlist(vlist, recursive = FALSE, use.names = FALSE) : Reached total allocation of 1535Mb: see help(memory.size) > Many thanks. Kind regards. Dr. Francesca Bader University of Trieste Italy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] XIII Meeting GRASS and GFOSS in Italy
Dear mailing list, I'd like to inform you about XIII Meeting GRASS and GFOSS, that will take place from 15th to 17th February 2012 at University of Trieste (edificio H3, aula magna) in Trieste (Italy). The meeting will involve both GRASS users both open source software and data users. More infomations: http://sites.google.com/site/grassts/ gr...@units.it Kind regards. From meeting organizators Dr. PhD Francesca Bader Università degli Studi di Trieste [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Wilcox test and data collection
Dear Contributors I have a problem with the collection of data from the results of a test. I need to perform a comparative test over groups of data , recall the value of the pvalue and create a table. My problem is in the way to replicate the analysis over and over again over subsets of data according to a condition. I have this database, called y: gg t1 t2d 40 1 1 2 50 2 2 1 45 1 3 1 49 2 1 1 5 2 1 3 40 1 1 2 where gg takes values from 1 to 100, t1 and t2 have values in (1,2,3) and d in (0,1,2,3) I want to perform tests on the values of gg according to the conditions that d==0 , compare values of gg when t1==1 with values of gg when t1==3 d==1 , compare values of gg when t1==1 with values of gg when t1==3 d==2 , compare values of gg when t1==1 with values of gg when t1==3 .. then d==0 , compare values of gg when t2==1 with values of gg when t2==3 d==1... then collect the data of a statistics and create a table. The procedure i followed is to create sub datasets called m0,m1,m2,m3 corresponding to the values of d, i.e. m0<- y[y$d==0,c(7,17,18,19)] m1<- y[y$d==1,c(7,17,18,19)] m2<- y[y$d==2,c(7,17,18,19)] m3<- y[y$d==3,c(7,17,18,19)] then perform the test as follows: x1<-wilcox.test(m0[m0$t1==1,1],m0[m0$t1==3,1],correct=FALSE, exact=FALSE, conf.int=TRUE,alternative = c("g")) #ABC ID x2<- wilcox.test(m1[m1$t1==1,1],m1[m1$t1==3,1],correct=FALSE, exact=FALSE, conf.int=TRUE,alternative = c("g")) x3<- wilcox.test(m2[m2$t1==1,1],m2[m2$t1==3,1],correct=FALSE, exact=FALSE, conf.int=TRUE,alternative = c("g")) x4<- wilcox.test(m3[m3$t1==1,1],m3[m3$t1==3,1],correct=FALSE, exact=FALSE, conf.int=TRUE,alternative = c("g")) each of these tests will create an object, say x and then I extract the value statistics using x$statistics. How to automatize this? Thank you for any help you can provide. Francesca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge two data frames and find common values and non-matching values
Yes, your code did exactly what I needed. Thank you!! -f [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Import in R with White Spaces
Ok I added quoting and it did work...Not sure why, but thank you for both your replies! -f [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge two data frames and find common values and non-matching values
Sorry---I thought it worked but I think I am actually definitely doing something wrong... The problem might be that there are NA's and there are also duplicated values...My fault. I can't figure out what is going wrong... I'll be more thorough and modify the two df to mirror more what I have to explain better: df1 is: Name Position location francesca A 75 maria A 75 cristina B 36 And df2 is: location Country 75 UK 75 Italy 56 France 56 Austria So I thought I had to first eliminate the duplicates like this: df1_unique<-subset(df1, !duplicated(location)) df2_unique<-subset(df2, !duplicated(location)) After doing this I get: df1 : Name Position location francesca A 75 cristina B 36 And df2: location Country 75 UK 56 France And I would like to match on "Location" and the output to tell me which records are matching in df1 and not in df2, the ones matching in both, and the ones which are in df2 but are not matching in df1... Name Position Location Match francesca A 75 1 cristina B 36 0 As William suggested, df12 <- merge(df1, cbind(df2, fromDF2=TRUE), all.x=TRUE, by="location") df12$Match <- !is.na(df12$fromDF2) new_common<- new[which(new$Match==TRUE),] Would give me the records that are matching, which should be correct, but I am not getting the correct value for the non-shared elements (the variants that are in the df2 but not indf1): df2_only <- subset(df1_unique, !(location %in% df2_unique)) df2_only<- df2_unique[-which(df2_unique$location %in% df1_unique$location),] Neither of these work and give me wrong records... My questions are: 1. How do I calculate the records from df2 which are NOT in df1? 2.Do I need to eliminate the duplictaes (or is there a way to record where they came from)? Any help is very appreciated... THANK YOU very much! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Permutation or Bootstrap to obtain p-value for one sample
Thank you very much to both Ken and Peter for the very helpful explanations. Just to understand this better (sorry for repeating but I am also new in statistics so please correct me where I am wrong): Ken' method: Random sampling of the mean, and then using these means to construct a distribution of means (the 'null' distribution), and I can then use this normal distribution and compare the population mean to my mean using, for example, z-score. Of note: The initial distributions are not normal, so I thought I needed to base my calculations on the median, but I can use the mean to construct a normal distribution. This would be defined a bootstrap test. Peter's method: Random sampling of the mean, and then comparing each sampled mean with the population mean and see if it is higher or equal to the difference between my sample and the population mean. This is a permutation test, but to actually get CI and a p-value I would need bootstrap method. Did I understand this correctly? I tried to start with Ken's approach for now, and followed his steps, but: 1) I get a lot of NaN in the sampling distribution, is this normal? 2) I think I am doing again something wrong when I try to find a p-value This is what I did: nreps=1 mean.dist=rep(NA,nreps) for(replication in 1:nreps) { my.sample=sample(population$Y, 250, replace=F) #Peter mentioned that this sampling should be without replacement, so I went for that--- mean.for.rep=mean(my.sample) #mean for this replication mean.dist[replication]=mean.for.rep #store the mean } hist(mean.dist,main="Null Dist of Means", col="chartreuse") #Show the means in a nifty color mean_dist= mean(mean.dist, na.rm=TRUE) sd_pop= sd(mean.dist, na.rm=TRUE) mean_sample= mean(population$Y, na.rm=TRUE) z_stat= (mean_sample - mean_dist)/(sd_pop/sqrt(2089)) p_value= 2*pnorm(-abs(z_stat)) Is this correct? THANK YOU SO MUCH FOR ALL YOUR HELP!! 2011/10/9 Ken Hutchison > Hi Francy, > A bootstrap test would likely be sufficient for this problem, but a > one-sample t-test isn't advisable or necessary in my opinion. If you use a > t-test multiple times you are making assumptions about the distribution of > your data; more importantly, your probability of Type 1 error will be > increased with each test. So, a valid thing to do would be to sample > (computation for this problem won't be expensive so do alotta reps) and > compare your mean to the null distribution of means. I.E. > > nreps=1 > mean.dist=rep(NA,nreps) > > for(replication in 1:nreps) > { > my.sample=sample(population, 2500, replace=T) > #replace could be false, depends on preference > mean.for.rep=mean(my.sample) #mean for this replication > mean.dist[replication]=mean.for.rep #store the mean > } > > hist(mean.dist,main="Null Dist of Means", col="chartreuse") > #Show the means in a nifty color > > You can then perform various tests given the null distribution, or infer > from where your sample mean lies within the distribution or something to > that effect. Also, if the distribution is normal, which is somewhat likely > since it is a distribution of means: (shapiro.test or require(nortest) > ad.test will let you know) you should be able to make inference from that > using parametric methods (once) which will fit the truth a bit better than a > t.test. > Hope that's helpful, >Ken Hutchison > > > On Sat, Oct 8, 2011 at 10:04 AM, francy wrote: > >> Hi, >> >> I am having trouble understanding how to approach a simulation: >> >> I have a sample of n=250 from a population of N=2,000 individuals, and I >> would like to use either permutation test or bootstrap to test whether >> this >> particular sample is significantly different from the values of any other >> random samples of the same population. I thought I needed to take random >> samples (but I am not sure how many simulations I need to do) of n=250 >> from >> the N=2,000 population and maybe do a one-sample t-test to compare the >> mean >> score of all the simulated samples, + the one sample I am trying to prove >> that is different from any others, to the mean value of the population. >> But >> I don't know: >> (1) whether this one-sample t-test would be the right way to do it, and >> how >> to go about doing this in R >> (2) whether a permutation test or bootstrap methods are more appropriate >> >> This is the data frame that I have, which is to be sampled: >> df<- >> i.e. >> x y >> 1 2 >> 3 4 >> 5 6 >> 7 8 >> . . >> . . >> . . >> 2,000 >> >> I have this sample from df, and would like to test whether it is has >> extreme >> values of y. >> sample1<- >> i.e. >> x y >> 3 4 >> 7 8 >> . . >> . . >> . . >> 250 >> >> For now I only have this: >> >> R=999 #Number of simulations, but I don't know how many... >> t.values =numeric(R) #creates a numeric vector with 999 elements, >> which >> will hold the results of each simulation. >> for (i in 1:R) { >> sample1 <- df[sample(nrow(df), 250, replace=TRUE),] >> >> But I don't know how to continue the
Re: [R] Permutation or Bootstrap to obtain p-value for one sample
Dear Peter and Tim, Thank you very much for taking the time to explain this to me! It is much more clear now. And sorry for using the space here maybe inappropriately, I really hope this is OK and gets posted, I think it is really important that non-statisticians like myself get a good idea of the concepts behind the functions of R. I am really grateful you went through this with me. -f 2011/10/9 Tim Hesterberg > I'll concur with Peter Dalgaard that > * a permutation test is the right thing to do - your problem is equivalent > to a two-sample test, > * don't bootstrap, and > * don't bother with t-statistics > but I'll elaborate a bit on on why, including > * two approaches to the whole problem - and how your approach relates > to the usual approach, > * an interesting tidbit about resampling t statistics. > > First, I'm assuming that your x variable is irrelevant, only y matters, > and that sample1 is a proper subset of df. I'll also assume that you > want to look for differences in the mean, rather than arbitrary differences > (in which case you might use e.g. a Kolmogorov-Smirnov test). > > There are two general approaches to this problem: > (1) two-sample problem, sample1$y vs df$y[rows other than sample 1] > (2) the approach you outlined, thinking of sample1$y as part of df$y. > > Consider (1), and call the two data sets y1 and y2 > The basic permutation test approach is > * compute the test statistic theta(y1, y2), e.g. mean(y1)-mean(y2) > * repeat (or 9) times: > draw a sample of size n1 from the pooled data, call that Y1, call the rest > Y2 > compute theta(Y1, Y2) > * P-value for a one-sided test is (1 + k) / (1 + ) > where k is the number of permutation samples with theta(Y1,Y2) >= > theta(y1,y2) > > The test statistic could be > mean(y1) - mean(y2) > mean(y1) > sum(y1) > t-statistic (pooled variance) > P-value for a t-test (pooled variance) > mean(y1) - mean(pooled data) > t-statistic (unpooled variance) > P-value for a t-test (unpooled variance) > median(y1) - median(y2) > ... > The first six of those are equivalent - they give exactly the same P-value > for the permutation test. The reason is that those test statistics > are monotone transformations of each other, given the data. > Hence, doing the pooled-variance t calculations gains nothing. > > Now consider your approach (2). That is equivalent to the permutation > test using the test statistic: mean(y1) - mean(pooled data). > > Why not a bootstrap? E.g. pool the data and draw samples of size > n1 and n2 from the pooled data, independently and with replacement. > That is similar to the permutation test, but less accurate. Probably > the easiest way to see this is to suppose there is 1 outlier in the pooled > data. > In any permutation iteration there is exactly 1 outlier among the two > samples. > With bootstrapping, there could be 0, 1, 2, > The permutation test answers the question - given that there is exactly > 1 outlier in my combined data, what is the probability that random chance > would give a difference as large as I observed. The bootstrap would > answer some other question. > > Tim Hesterberg > NEW! Mathematical Statistics with Resampling and R, Chihara & Hesterberg > > http://www.amazon.com/Mathematical-Statistics-Resampling-Laura-Chihara/dp/1118029852/ref=sr_1_1?ie=UTF8 > http://home.comcast.net/~timhesterberg > (resampling, water bottle rockets, computers to Guatemala, shower = 2650 > light bulbs, ...) > > > >On Oct 8, 2011, at 16:04 , francy wrote: > > > >> Hi, > >> > >> I am having trouble understanding how to approach a simulation: > >> > >> I have a sample of n=250 from a population of N=2,000 individuals, and I > >> would like to use either permutation test or bootstrap to test whether > this > >> particular sample is significantly different from the values of any > other > >> random samples of the same population. I thought I needed to take random > >> samples (but I am not sure how many simulations I need to do) of n=250 > from > >> the N=2,000 population and maybe do a one-sample t-test to compare the > mean > >> score of all the simulated samples, + the one sample I am trying to > prove > >> that is different from any others, to the mean value of the population. > But > >> I don't know: > >> (1) whether this one-sample t-test would be the right way to do it, and > how > >> to go about doing this in R > >> (2) whether a permutation test or bootstrap methods are more appropriate > >> > >> This is the data frame that I have, which is to be sampled: > >> df<- > >> i.e. > >> x y > >> 1 2 > >> 3 4 > >> 5 6 > >> 7 8 > >> . . > >> . . > >> . . > >> 2,000 > >> > >> I have this sample from df, and would like to test whether it is has > extreme > >> values of y. > >> sample1<- > >> i.e. > >> x y > >> 3 4 > >> 7 8 > >> . . > >> . . > >> . . > >> 250 > >> > >> For now I only have this: > >> > >> R=999 #Number of simulations, but I don't know how many... > >> t.values =numeric(R) #crea
[R] Mean or mode imputation fro missing values
Dear R experts, I have a large database made up of mixed data types (numeric, character, factor, ordinal factor) with missing values, and I am looking for a package that would help me impute the missing values using either the mean if numerical or the mode if character/factor. I maybe could use replace like this: df$var[is.na(df$var)] <- mean(df$var, na.rm = TRUE) And go through all the many different variables of the datasets using mean or mode for each, but I was wondering if there was a faster way, or if a package existed to automate this (by doing 'mode' if it is a factor or character or 'mean' if it is numeric)? I have tried the package "dprep" because I wanted to use the function "ce.mimp", btu unfortunately it is not available anymore. Thank you for your help, -francy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mean or mode imputation fro missing values
Yes thank you Gu… I am just trying to do this as a rough step and will try other imputation methods which are more appropriate later. I am just learning R, and was trying to do the for loop and f-statement by hand but something is going wrong… This is what I have until now: *fake array: age<- c(5,8,10,12,NA) a<- factor(c("aa", "bb", NA, "cc", "cc")) b<- c("banana", "apple", "pear", "grape", NA) df_test <- data.frame(age=age, a=a, b=b) df_test$b<- as.character(df_test$b) for (var in 1:ncol(df_test)) { if (class(df_test$var)=="numeric") { df_test$var[is.na(df_test$var)] <- mean(df_test$var, na.rm = TRUE) } else if (class(df_test$var)=="character") { Mode(df_test$var[is.na(df_test$var)], na.rm = TRUE) } } Where 'Mode' is the function: function (x, na.rm) { xtab <- table(x) xmode <- names(which(xtab == max(xtab))) if (length(xmode) > 1) xmode <- ">1 mode" return(xmode) } It seems as it is just ignoring the statements though, without giving any error…Does anybody have any idea what is going on? Thank you very much for all the great help! -f 2011/10/11 Weidong Gu : > In your case, it may not be sensible to simply fill missing values by > mean or mode as multiple imputation becomes the norm this day. For > your specific question, na.roughfix in randomForest package would do > the work. > > Weidong Gu > > On Tue, Oct 11, 2011 at 8:11 AM, francesca casalino > wrote: >> Dear R experts, >> >> I have a large database made up of mixed data types (numeric, >> character, factor, ordinal factor) with missing values, and I am >> looking for a package that would help me impute the missing values >> using either the mean if numerical or the mode if character/factor. >> >> I maybe could use replace like this: >> df$var[is.na(df$var)] <- mean(df$var, na.rm = TRUE) >> And go through all the many different variables of the datasets using >> mean or mode for each, but I was wondering if there was a faster way, >> or if a package existed to automate this (by doing 'mode' if it is a >> factor or character or 'mean' if it is numeric)? >> >> I have tried the package "dprep" because I wanted to use the function >> "ce.mimp", btu unfortunately it is not available anymore. >> >> Thank you for your help, >> -francy >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating data frame with residuals of a data frame
Dear experts, I am trying to create a data frame from the residuals I get after having applied a linear regression to each column of a data frame, but I don't know how to create this data frame from the resulting list since the list has differing numbers of rows. So for example: age<- c(5,6,10,14,16,NA,18) value1<- c(30,70,40,50,NA,NA,NA) value2<- c(2,4,1,4,4,4,4) df<- data.frame(age, value1, value2) #Run linear regression to adjust for age and get residuals: lm_f <- function(x) { x<- residuals(lm(data=df, formula= x ~ age)) } resid <- apply(df,2,lm_f) resid<- resid[-1] Then resid is a list with different row numbers: $value1 1 2 3 4 -16.945813 22.906404 -7.684729 1.724138 $value2 1 2 3 4 5 7 -0.37398374 1.50406504 -1.98373984 0.52845528 0.28455285 0.04065041 I am trying to get both the original variable and their residuals in the same data frame like this: age, value1, value2, resid_value1, resid_value2 But when I try cbind or other operations I get an error message because they do not have the same number of rows. Can you please help me figure out how to solve this? Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Order a data frame based on the order of another data frame
Hi, I am trying to match the order of the rownames of a dataframe with the rownames of another dataframe (I can't simply sort both sets because I would have to change the order of many other connected datasets if I did that): Also, the second dataset (snp.matrix$fam) is a snp matrix slot: so for example: data_one: xyz sample_1110001-0.3352623 -1.141462-0.4032494 sample_1110005 0.1862424 0.015944 0.1329059 sample_1110420 0.1309120 0.0040055960.06117253 sample_2220017 0.1145205 -0.1250900540.04957881 rownames(snp.matrix$fam) [1] "sample_2220017" "sample_1110420" "sample_1110001" [4] "sample_1110005" I would like my data_one to look like this: x y z sample_2220017 0.1145205 -0.1250900540.04957881 sample_1110420 0.1309120 0.0040055960.06117253 sample_1110001-0.3352623 -1.141462-0.4032494 sample_1110005 0.1862424 0.015944 0.1329059 I have tried these but it doesn't work: data_one[order(rownames(snp.matrix$fam)),] data_one[rownames(data_oen)[order(rownames(snp.matrix$fam))],] Thank you for your help! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] master thesis
Hi, For my master thesis I have 24 micro-plots on which I did measurements during 3 months. The measurements were: - Rainfall and runoff events throughout 3monts (runoff being dependant on the rainfall, a coefficient (%) has been made per rainfall event and per 3 months) - Soil texture (3 different textures were differentiated) - Slope (3 classes of slopes) - Stoniness (one time measurement) - Random roughness (throughout 3 months) - Land use (crop land or grazing land) - Vegetation cover (throughout 3 months) - Vegetation height (throughout 3 months, only measured on cropland) - Antecedent moisture content (throughout 3 months) Now I would like to investigate the effect of all these variables on the rainfall/runoff. For example does a steeper slope have a larger effect on the runoff than the soil texture? What are the possibilities in R? Thank you for any feedback, Francesca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] XIII GRASS and GFOSS italian Meeting
Dear all, we would like to point out the approaching XIII GRASS and GFOSS italian Meeting which will take place at the University of Trieste from Wednesday, February 15 until Friday, February 17, 2012. Abstracts can be sent to gr...@units.it till January 20 while subscriptions are open up to February 6, 2012. All important informations regarding the meeting can be found at http://sites.google.com/site/grassts/ Kind Regards, Dipartimento di Scienze della Vita Università degli Studi di Trieste via Weiss 2, 34127 Trieste tel. 040 5582072, fax. 040 5582011 mail: gr...@units.it http://sites.google.com/site/grassts/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help Package 'CLIMTRENDS' from Archive
Hello, I need to run the 'climtrend' library which is no longer available, I downloaded and installed it from the archive on my pc but it doesn't work, it says "I can't find the function ..." what should I do? I absolutely need to use it, in addition to installing it, what should I do to use it? thank you in advance for your kindness, Regards Francesca __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Thank you 4 Davide
Hello, The problem was that version 4.0.4 did not support the package so I tried with several old versions until 3.6.2 installs both climtrend and Rcmdr with its graphical interface !! solved and thanks again Davide !!Francesca (from Italy) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Create a numeric series in an efficient way
Dear Contributors I am trying to create a numeric series with repeated numbers, not difficult task, but I do not seem to find an efficient way. This is my solution blocB <- c(rep(x = 1, times = 84), rep(x = 2, times = 84), rep(x = 3, times = 84), rep(x = 4, times = 84), rep(x = 5, times = 84), rep(x = 6, times = 84), rep(x = 7, times = 84), rep(x = 8, times = 84), rep(x = 9, times = 84), rep(x = 10, times = 84), rep(x = 11, times = 84), rep(x = 12, times = 84), rep(x = 13, times = 84)) which works but it is super silly and I need to create different variables similar to this, changing the value of the repetition, 84 in this case. Thanks for any help. F. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create a numeric series in an efficient way
I apologize, I solved the problem, sorry for that. f. Il giorno gio 13 giu 2024 alle ore 16:42 Francesca PANCOTTO < francesca.panco...@unimore.it> ha scritto: > Dear Contributors > I am trying to create a numeric series with repeated numbers, not > difficult task, but I do not seem to find an efficient way. > > This is my solution > > blocB <- c(rep(x = 1, times = 84), rep(x = 2, times = 84), rep(x = 3, > times = 84), rep(x = 4, times = 84), rep(x = 5, times = 84), rep(x = 6, > times = 84), rep(x = 7, times = 84), rep(x = 8, times = 84), rep(x = 9, > times = 84), rep(x = 10, times = 84), rep(x = 11, times = 84), rep(x = 12, > times = 84), rep(x = 13, times = 84)) > > which works but it is super silly and I need to create different variables > similar to this, changing the value of the repetition, 84 in this case. > Thanks for any help. > > > F. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fill NA values in columns with values of another column
Dear Contributors, I have a problem with a database composed of many individuals for many periods, for which I need to perform a manipulation of data as follows. Here I report the procedure I need to do for the first 32 observations of the first period. cbind(VB1d[,1],s1id[,1]) [,1] [,2] [1,]68 [2,]95 [3,] NA1 [4,]56 [5,] NA7 [6,] NA2 [7,]44 [8,]27 [9,]27 [10,] NA3 [11,] NA2 [12,] NA4 [13,]56 [14,]95 [15,] NA5 [16,] NA6 [17,] 103 [18,]72 [19,]21 [20,] NA7 [21,]72 [22,] NA8 [23,] NA4 [24,] NA5 [25,] NA6 [26,]21 [27,]44 [28,]68 [29,] 103 [30,] NA3 [31,] NA8 [32,] NA1 In column s1id, I have numbers from 1 to 8, which are the id of 8 groups , randomly mixed in the larger group of 32. For each group, I want the value that is reported for only to group members, to all the four group members. For example, value 8 in first row , second column, is group 8. The value for group 8 of the variable VB1d is 6. At row 28, again for s1id equal to 8, I have 6. But in row 22, the value 8 of the second variable, reports a value NA. in each group is the same, only two values have the correct number, the other two are NA. I need that each group, identified by the values of the variable S1id, correctly report the number of variable VB1d that is present for just two group members. I hope my explanation is acceptable. The task appears complex to me right now, especially because I will need to multiply this procedure for x12x14 similar databases. Anyone has ever encountered a similar problem? Thanks in advance for any help provided. -- Francesca Pancotto Associate Professor Political Economy University of Modena, Largo Santa Eufemia, 19, Modena Office Phone: +39 0522 523264 Web: *https://sites.google.com/view/francescapancotto/home <https://sites.google.com/view/francescapancotto/home>* -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.