from:"Francesca"

[R] GGPlot plot

2018-07-18 Thread Francesca

Dear R help,

I am new to ggplot so I apologize if my question is a bit obvious.

I would like to create a plot where a compare the fraction of the values of a 
variable called PASP out of the number of subjects, for two groups of subject 
codified with a dummy variable called SUBJC.

The variable PASP is discrete and only takes values 0,4,8..

My data are as following:

 

PASP   SUBJC

 

0  0

4  1

0  0

8  0

4  0

0  1

0  1

.   .

.   .

.   .




I would like to calculate the fraction of positive levels of PASP out of the 
total number of observations, divided per values of SUBJ=0 and 1. I am new to 
the use of GGPlot and I do not know how to organize the data and what to use to 
summarize these data as to obtain a picture as follows:





I hope my request is clear. Thanks for any help you can provide.

Francesca



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GGPlot plot

2018-07-18 Thread Francesca

Thanks for the answer.

Il gio 19 lug 2018, 01:04 Jim Lemon  ha scritto:

> Hi Francesca,
> This looks like a fairly simple task. Try this:
>
> fpdf<-read.table(text="PASP   SUBJC
>  0  0
>  4  1
>  0  0
>  8  0
>  4  0
>  0  1
>  0  1",
>  header=TRUE)
> # get the number of positive PASP results by group
> ppos<-by(fpdf$SUBJC,fpdf$PASPpos,sum)
> # get the number of subjects per group
> spg<-c(sum(fpdf$SUBJC==0),sum(fpdf$SUBJC==1))
> barplot(ppos/spg,names.arg=c(0,1),xlab="Group",
>  ylab="Proportion PASP > 0",main="Proportion of PASP positive by group")
>
> Jim
>
> On Thu, Jul 19, 2018 at 2:47 AM, Francesca 
> wrote:
> > Dear R help,
> >
> > I am new to ggplot so I apologize if my question is a bit obvious.
> >
> > I would like to create a plot where a compare the fraction of the values
> of a variable called PASP out of the number of subjects, for two groups of
> subject codified with a dummy variable called SUBJC.
> >
> > The variable PASP is discrete and only takes values 0,4,8..
> >
> > My data are as following:
> >
> >
> >
> > PASP   SUBJC
> >
> >
> >
> > 0  0
> >
> > 4  1
> >
> > 0  0
> >
> > 8  0
> >
> > 4  0
> >
> > 0  1
> >
> > 0  1
> >
> > .   .
> >
> > .   .
> >
> > .   .
> >
> >
> >
> >
> > I would like to calculate the fraction of positive levels of PASP out of
> the total number of observations, divided per values of SUBJ=0 and 1. I am
> new to the use of GGPlot and I do not know how to organize the data and
> what to use to summarize these data as to obtain a picture as follows:
> >
> >
> >
> >
> >
> > I hope my request is clear. Thanks for any help you can provide.
> >
> > Francesca
> >
> >
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Collecting output of regressions in an intelligent way

2015-06-25 Thread Francesca

Dear R Contributors
I am asking for some suggestions on how to organize output of a series of
regressions and tests in an intelligent way.
I estimate a series of Var models with increasing numbers of lags and the
perform a Wald test to control Granger Causality: I would like to learn a
way to do it that allows me not to produce copy and past code.

This is what I do:
Estimate var models with increasing number of lags,

V.6<-VAR(cbind(index1,ma_fin),p=6,type="both")
V.7<-VAR(cbind(index1,ma_fin),p=7,type="both")
V.8<-VAR(cbind(index1,ma_fin),p=8,type="both")
V.9<-VAR(cbind(index1,ma_fin),p=9,type="both")

then observe results and control significance of regressors:

summary(V.6)
summary(V.7)
summary(V.8)
summary(V.9)
summary(V.10)

then use the estimated var to perform the test:

wald_fin7.1<-wald.test(b=coef(V.7$varresult[[1]]),
Sigma=vcov(V.7$varresult[[1]]), Terms=c(2,4,6,8,10,12))
wald_fin8.1<-wald.test(b=coef(V.8$varresult[[1]]),
Sigma=vcov(V.8$varresult[[1]]), Terms=c(2,4,6,8,10,12,14))
wald_fin9.1<-wald.test(b=coef(V.9$varresult[[1]]),
Sigma=vcov(V.9$varresult[[1]]), Terms=c(2,4,6,8,10,12,14,16))
wald_fin10.1<-wald.test(b=coef(V.10$varresult[[1]]),
Sigma=vcov(V.10$varresult[[1]]), Terms=c(2,4,6,8,10,12,14,16,18))

#then collect tests result in a table:

wald_fin<-rbind(wald_fin7.1$result$chi2,
wald_fin12.1$result$chi2,wald_fin21.1$result$chi2,
wald_fin7.2$result$chi2,
wald_fin12.2$result$chi2,wald_fin21.2$result$chi2)


My idea is that it is possible to create all this variable with a loop
across the objects names but it is a level of coding much higher than my
personal knowledge and ability.

I hope anyone can help

Thanks in advance


-- 

Francesca

--
Francesca Pancotto, PhD
Università di Modena e Reggio Emilia
Viale A. Allegri, 9
40121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Compressing code help in a loop

2014-11-10 Thread Francesca

Dear Contributors

I have a problem with a loop.

I needed to create a variable that takes values 1,2.. to 19 corresponding
to the value of a variable in a data.frame whose name is p_int$p_made and

which takes values from 406  to 211.

The problem is that this values come ordered in the wrong way when I try to
compress the loop as the system reads


107,111,207,211,311,406,407,408,409,410,411,


while they correspond to quarters-years so they should be ordered as


406-107-207-307-407…

the only solution I found was really silly. It is the following.




p_m<-matrix(0,dim(p_int)[1],1)

for (i in 1:length(p_int$p_made)){

  if (p_int$p_made[i]==406) p_m[i]<-1 else

if (p_int$p_made[i]==107) p_m[i]<-2 else

  if (p_int$p_made[i]==207) p_m[i]<-3 else

if (p_int$p_made[i]==307) p_m[i]<-4 else

  if (p_int$p_made[i]==407) p_m[i]<-5 else

if (p_int$p_made[i]==108) p_m[i]<-6 else

  if (p_int$p_made[i]==208) p_m[i]<-7 else

if (p_int$p_made[i]==308) p_m[i]<-8 else

  if (p_int$p_made[i]==408) p_m[i]<-9 else

if (p_int$p_made[i]==109) p_m[i]<-10 else

  if (p_int$p_made[i]==209) p_m[i]<-11 else

if (p_int$p_made[i]==309) p_m[i]<-12 else

  if (p_int$p_made[i]==409) p_m[i]<-13 else

if (p_int$p_made[i]==110) p_m[i]<-14 else

  if (p_int$p_made[i]==210) p_m[i]<-15 else

if (p_int$p_made[i]==310) p_m[i]<-16 else

  if (p_int$p_made[i]==410) p_m[i]<-17 else

if (p_int$p_made[i]==111) p_m[i]<-18
else

  if (p_int$p_made[i]==211) p_m[i]<-19

}

Can anyone help to find something more efficient?


Thanks in advance.


Francesca

-- 

Francesca

--
Francesca Pancotto
Associate Professor
University of Modena and Reggio Emilia
Viale A. Allegri, 9
40121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Dates in a data.frame

2014-11-17 Thread Francesca

>

Dear Contributors

I have a problem concerning the replication of a variable with

the date structure.

I have the following database of 12000 observations

bank.list.m:

name   date

aba.1ABA 2006-10-24

aba.2ABA 2006-11-30

aba.3ABA 2006-10-24

aba.4ABA 2006-11-30

aba.5ABA 2006-10-24

aba.6ABA 2006-11-30

aba.7ABA 2006-10-24

aba.8ABA 2006-11-30

aba.9ABA 2006-10-24

aba.10   ABA 2006-11-30


and the following with 960 obs.


day.spot

date   spot

1 2006-01-02 1.1826

2 2006-01-03 1.1875

3 2006-01-04 1.2083

4 2006-01-05 1.2088

5 2006-01-06 1.2093

6 2006-01-09 1.2078


the date in the second database are a subset of the dates of the first
database.

What I need to do is to associate the value of the variable spot

reported in the second database, at the exact place of the corresponding

date in the first database.

I tried the following



dates<-table(bank.list.m$date)


test<-as.data.frame(dates)

dates.v<-as.Date(test$Var1)

x<-as.data.frame(dates.v)

x$index<-c(1:960)

x$spot.v<-day.spot$spot[x$index]


but I do not seem to go anywhere.

I think I only replicated the values of the day.spot variable.

Any help?

Thanks for your time and patience!

Francescaa

-- 

Francesca

------
Francesca Pancotto, PhD
Università di Modena e Reggio Emilia
Viale A. Allegri, 9
40121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Two geom_bar with counts to put in the same plot

2019-12-03 Thread Francesca

Dear Contributors,
I would like to ask help on how to create a  plot that is the overlapping
of two other plots.
It is a geom_bar structure, where I want to count the occurrences of two
variables, participation1 and participation2 that I recoded as factors as
ParticipationNOPUN and ParticipationPUN to have nice names in the legend.
The variables to "count" in the two plots are delta11_L and delta2_L
These are my data and code to create the two plots. I would like to put
them in the same plot as superimposed areas so that I see the change in the
distribution of counts in the two cases.
This is DB:

participation1 participation2 ParticipantsNOPUN ParticipantsPUN delta11_L
delta2_L
  [1,]  1  1 2   2
00
  [2,]  1  1 2   2
  -10  -10
  [3,]  1  1 2   2
  -100
  [4,]  1  1 2   2
00
  [5,]  1  1 2   2
00
  [6,]  1  1 2   2
00
  [7,]  1  0 2   1
  -30   30
  [8,]  1  1 2   2
0   10
  [9,]  1  0 2   1
   10   40
 [10,]  1  1 2   2
00
 [11,]  0  0 1   1
   200
 [12,]  1  1 2   2
   100
 [13,]  1  1 2   2
00
 [14,]  1  1 2   2
00
 [15,]  1  1 2   2
   20   10
 [16,]  1  1 2   2
00
 [17,]  1  1 2   2
00
 [18,]  1  1 2   2
  -10   30
 [19,]  0  0 1   1
   30   10
 [20,]  1  1 2   2
   10   10
 [21,]  1  1 2   2
00
 [22,]  1  1 2   2
00
 [23,]  1  1 2   2
0  -10
 [24,]  1  1 2   2
0  -20
 [25,]  1  1 2   2
   10  -10
 [26,]  1  1 2   2
00
 [27,]  1  1 2   2
00

First PLOT(I need to subset the data to eliminate some NA. NB: the two
dataframes end up not having the same number of rows for this reason):

ggplot(data=subset(DB, !is.na(participation1)), aes(x = delta11_L, fill
=ParticipantsNOPUN))+
 geom_bar(position = "dodge")+ theme_bw(base_size = 12) +
labs(x="Delta Contributions (PGG w/out punishment)")+
  theme(legend.position = "top",legend.title = element_blank())
+scale_fill_brewer(palette="Set1")

Second PLOT:

 ggplot(DB, aes(x = delta2_L, fill =ParticipantsPUN)  , aes(x = delta2_L,
fill =ParticipantsPUN))+
  geom_bar(position = "dodge")+ theme_bw(base_size = 12) + labs(x="Delta
Contributions (PGG w/punishment)")+
  theme(legend.position = "top",legend.title = element_blank())
+scale_fill_brewer(palette="Set1")



is it possible to create a density plot of the two counts data on the same
plot?
Do I need to create a variable count or long data format?
Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Two geom_bar with counts to put in the same plot

2019-12-03 Thread Francesca

Hi
here it is;. THANKS!

dput(DATASET)
structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 
1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 
1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 
1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 
0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 
1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 
1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 
1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 
1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 
1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 
2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2, 2, 2, 2, 2, 2, 
1, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 
2, 2, 2, 2, 1, 1, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
1, 2, 1, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 
2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 
-10, -10, 0, 0, 0, -30, 0, 10, 0, 20, 10, 0, 0, 20, 0, 0, -10, 
30, 10, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, -10, 0, 0, 0, 
40, -10, 0, 10, 0, 10, 0, -20, 0, 0, 0, 10, -20, 10, -10, 40, 
-10, -10, 10, 20, 10, 0, 0, 0, 0, 0, 0, -10, 0, 0, 20, 0, 0, 
0, 0, 10, 0, 0, 0, 10, 0, -10, 10, 0, 0, 10, 10, 10, 0, 0, 0, 
0, 0, -10, 0, 0, 0, 20, 0, 0, 20, 0, 0, 0, 0, 0, 0, 0, 10, 0, 
10, 0, 0, 0, 20, -20, 0, 0, -10, 0, 0, 0, 0, -10, 10, 0, 20, 
0, 0, 0, 0, 0, -10, 0, 0, 0, 0, 0, -10, 0, 0, -10, 0, -10, 30, 
-10, 0, 0, 10, -10, 0, -10, -10, 0, 10, 0, 0, 0, 0, 0, 0, 0, 
0, 10, 0, 0, 10, 10, 0, 0, -20, -10, 0, 0, 0, 0, 0, 0, 10, 30, 
40, 30, 30, 30, 30, 20, 20, 40, 20, 20, 10, 20, 30, 20, 40, 20, 
30, 20, 30, 20, 20, 30, 20, 40, 10, 20, 10, 30, 30, 30, 30, 10, 
30, 30, 20, 10, 40, 30, 40, 40, 30, 20, 10, 10, 20, 20, 30, 40, 
40, 40, 40, 0, 20, 20, 40, 10, 20, 20, 10, 0, -10, 0, 0, 0, 0, 
30, 10, 40, 0, 0, 0, 0, 0, 10, 0, 0, 30, 10, 10, 0, 0, -10, -20, 
-10, 0, 0, 0, -10, 10, 0, 40, 0, 30, 0, 10, 0, 40, 0, 0, -10, 
0, 10, 40, -10, 0, 0, 0, 10, 0, 10, -10, 40, 10, 20, 10, 40, 
0, 10, -10, 0, 40, 0, 0, -10, 0, 0, 20, -10, 0, 10, 0, 30, -10, 
0, 0, 0, -10, 40, 10, 10, 0, 10, -10, 0, 10, 0, 10, 0, -20, 20, 
0, 0, -20, 20, 0, -30, 20, 0, 0, 20, 10, 0, 20, 30, 0, 0, -10, 
10, 10, 0, -10, 40, 10, 0, 10, 0, 0, 20, 10, 20, 30, 0, 40, 30, 
0, 20, 40, -10, 0, 0, 0, -10, 0, 20, -10, 0, 0, 10, 0, 0, 20, 
-20, -20, 0, 20, 0, 0, 10, 0, -10, -10, 20, -10, 0, 0, 0, 0, 
0, 0, 0, -10, 30, 10, 0, 0, 10, 20, 10, -10, 10, 0, 0, -10, 30, 
-20, 10, 0, 0, 0, 10, 10, 10, 10, -10, 0, 20, 10, 10, 10, 0, 
-10, -10, 0, 0, 10, 20, 0, -10, 10, 0, 10, 20, 10, 0, 0, 0, 0, 
10, 10, 10, 30, 10, 0, 0, -10, 40, 0, 0, 10, 10, 40, 30, -10, 
0, 0, 10, 20, 0, 0, 10, 40, 0, 0, -10, -20), .Dim = c(236L, 6L
), .Dimnames = list(NULL, c("participation1", "participation2", 
"ParticipantsNOPUN", "ParticipantsPUN", "delta

Re: [R] Two geom_bar with counts to put in the same plot

2019-12-04 Thread Francesca

Hi!
It is not exactly what I wanted but more than I suspected I could get.
Thanks a lot, this is awesome!
Francesca

On Wed, 4 Dec 2019 at 14:04, Rui Barradas  wrote:

> Hello,
>
> Please keep R-Help in the thread.
>
> As for the question, the following divides by facets, participation1/2
> with values 0/1. See if that's what you want.
>
>
> idv <- grep("part", names(DB)[-(3:4)], ignore.case = TRUE, value = TRUE)
> dblong <- reshape2::melt(DB[-(3:4)], id.vars = idv)
> dblong <- reshape2::melt(dblong, id.vars = c("variable", "value"))
> names(dblong) <- c("deltaVar", "delta", "participationVar",
> "participation")
> dblong <- dblong[complete.cases(dblong),]
>
> ggplot(dblong, aes(x = delta, fill = deltaVar)) +
>geom_density(aes(alpha = 0.2)) +
>scale_alpha_continuous(guide = "none") +
>facet_wrap(participationVar ~ participation)
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 08:25 de 04/12/19, Francesca escreveu:
> > Dear  Rui
> > the code works and the final picture is aesthetical as I wanted(very
> > beautiful indeed), but I probably did not explain that the two
> > distributions that I want to overlap, must be different by participation
> > 1 and participation 2, which are to dummy variables that identify :
> > Participation 1(equivalent to PARTICIPATIONNOPUN): 1 participants, 0 non
> > participants, for the variable delta11_L
> > Participation 2(equivalent to PARTICIPATIONPUN): 1 participants, 0 non
> > participants, for the variable delta2_L
> >
> > The density plots are four in the end rather than 2: I compare delta11_L
> > for Participants1 vsnon participants and delta2_L for Participants 2 vs
> > non Participants 2,
> > I basically want to verify whether the population of Participants vs Non
> > participants, change going from delta11_L to delta2_L
> >
> >
> > Sorry for being unclear.
> > Thanks for any help.
> > Francesca
> >
> > On Wed, 4 Dec 2019 at 09:16, Rui Barradas  > <mailto:ruipbarra...@sapo.pt>> wrote:
> >
> > Hello,
> >
> > Is it as simple as this? The code below does not separate the
> > participant1 and participant2, only the 'delta' variables.
> >
> >
> > idv <- grep("part", names(DB)[-(3:4)], ignore.case = TRUE, value =
> TRUE)
> > dblong <- reshape2::melt(DB[-(3:4)], id.vars = idv)
> > head(dblong)
> >
> > ggplot(dblong, aes(x = value, fill = variable)) +
> > geom_density(aes(alpha = 0.2)) +
> > scale_alpha_continuous(guide = "none")
> >
> >
> > I will also repost the data, since you have posted a matrix and this
> > code needs a data.frame.
> >
> >
> > DB <-
> > structure(list(participation1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
> > 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
> > 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), participation2 = c(1,
> > 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1,
> > 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,
> > 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1,
> > 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
> > 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1,
> > 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1,
> > 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1,
> > 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
> > 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,
> > 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
> > 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 1, 1, 1, 1), ParticipantsNOPUN = structure(

Re: [R] Two geom_bar with counts to put in the same plot

2019-12-04 Thread Francesca

Hi, sorry for bothering again.
I was wondering how I can reshape the data, if in your code, 
I would like to have only two panels, where in the panel with Participation =0, 
I represent delta11_L of participation1==0
and delta2_L of participation2==0, and in the right panel, I want 
Participation=1, but representing together 
delta11_L of participation1==1, and delta2_L of participation2==1.

I get messed up with the joint melting of participation, which determines the 
facet, but then I cannot assign the proper fill to the density plots which 
depend on it, and on the other hand I would like to have in the same plot with 
mixed participation.

I hope it is clear. 
Nonetheless, the previous plot is useful to understand something I had not 
thought about.
Thanks again for your time.
F.
--

> Il giorno 4 dic 2019, alle ore 15:27, Francesca 
>  ha scritto:
> 
> Hi!
> It is not exactly what I wanted but more than I suspected I could get. Thanks 
> a lot, this is awesome!
> Francesca
> 
> On Wed, 4 Dec 2019 at 14:04, Rui Barradas  <mailto:ruipbarra...@sapo.pt>> wrote:
> Hello,
> 
> Please keep R-Help in the thread.
> 
> As for the question, the following divides by facets, participation1/2 
> with values 0/1. See if that's what you want.
> 
> 
> idv <- grep("part", names(DB)[-(3:4)], ignore.case = TRUE, value = TRUE)
> dblong <- reshape2::melt(DB[-(3:4)], id.vars = idv)
> dblong <- reshape2::melt(dblong, id.vars = c("variable", "value"))
> names(dblong) <- c("deltaVar", "delta", "participationVar", "participation")
> dblong <- dblong[complete.cases(dblong),]
> 
> ggplot(dblong, aes(x = delta, fill = deltaVar)) +
>geom_density(aes(alpha = 0.2)) +
>scale_alpha_continuous(guide = "none") +
>facet_wrap(participationVar ~ participation)
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> Às 08:25 de 04/12/19, Francesca escreveu:
> > Dear  Rui
> > the code works and the final picture is aesthetical as I wanted(very 
> > beautiful indeed), but I probably did not explain that the two 
> > distributions that I want to overlap, must be different by participation 
> > 1 and participation 2, which are to dummy variables that identify :
> > Participation 1(equivalent to PARTICIPATIONNOPUN): 1 participants, 0 non 
> > participants, for the variable delta11_L
> > Participation 2(equivalent to PARTICIPATIONPUN): 1 participants, 0 non
> > participants, for the variable delta2_L
> > 
> > The density plots are four in the end rather than 2: I compare delta11_L 
> > for Participants1 vsnon participants and delta2_L for Participants 2 vs
> > non Participants 2,
> > I basically want to verify whether the population of Participants vs Non 
> > participants, change going from delta11_L to delta2_L
> > 
> > 
> > Sorry for being unclear.
> > Thanks for any help.
> > Francesca
> > 
> > On Wed, 4 Dec 2019 at 09:16, Rui Barradas  > <mailto:ruipbarra...@sapo.pt> 
> > <mailto:ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>>> wrote:
> > 
> > Hello,
> > 
> > Is it as simple as this? The code below does not separate the
> > participant1 and participant2, only the 'delta' variables.
> > 
> > 
> > idv <- grep("part", names(DB)[-(3:4)], ignore.case = TRUE, value = TRUE)
> > dblong <- reshape2::melt(DB[-(3:4)], id.vars = idv)
> > head(dblong)
> > 
> > ggplot(dblong, aes(x = value, fill = variable)) +
> > geom_density(aes(alpha = 0.2)) +
> > scale_alpha_continuous(guide = "none")
> > 
> > 
> > I will also repost the data, since you have posted a matrix and this
> > code needs a data.frame.
> > 
> > 
> > DB <-
> > structure(list(participation1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
> > 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
> > 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N

Re: [R] Two geom_bar with counts to put in the same plot

2019-12-05 Thread Francesca

Exactly. I was trying to remelt data in the right way, but I could not get 
there yet. Can you suggest me this code?
Thanks a lot
F.

--

> Il giorno 5 dic 2019, alle ore 11:11, Jim Lemon  ha 
> scritto:
> 
> Hi Francesca,
> Do you want something like this?
> 
> Jim
> 
> On Thu, Dec 5, 2019 at 6:58 PM Francesca  wrote:
>> 
>> Hi, sorry for bothering again.
>> I was wondering how I can reshape the data, if in your code,
>> I would like to have only two panels, where in the panel with Participation 
>> =0, I represent delta11_L of participation1==0
>> and delta2_L of participation2==0, and in the right panel, I want 
>> Participation=1, but representing together
>> delta11_L of participation1==1, and delta2_L of participation2==1.
>> 
>> I get messed up with the joint melting of participation, which determines 
>> the facet, but then I cannot assign the proper fill to the density plots 
>> which depend on it, and on the other hand I would like to have in the same 
>> plot with mixed participation.
>> 
>> I hope it is clear.
>> Nonetheless, the previous plot is useful to understand something I had not 
>> thought about.
>> Thanks again for your time.
>> F.
>> --
>> 
> 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Two geom_bar with counts to put in the same plot

2019-12-05 Thread Francesca

This is a consolation, because I cannot get it in ggplot either!
Thanks for the code!

F.

> Il giorno 5 dic 2019, alle ore 11:17, Jim Lemon  ha 
> scritto:
> 
> Sorry it's not ggplot, I couldn't work that one out.
> 
> # using the data frame structure that Rui kindly added
> # and perhaps Rui can work out how to do this in ggplot
> DBcomplete<-DB[complete.cases(DB),]
> library(plotrix)
> png("fp.png")
> par(mfrow=c(1,2))
> density11_0<-density(DBcomplete$delta11_L[DBcomplete$participation1==0])
> density2_0<-density(DBcomplete$delta2_L[DBcomplete$participation1==0])
> plot(0,xlim=c(-30,50),ylim=c(0,max(density11_0$y)),type="n",
> xlab="delta",ylab="density",main="participation == 0")
> plot_bg("lightgray")
> grid(col="white")
> polygon(density11_0,col="#ff773344")
> polygon(density2_0,col="#3377ff44")
> density11_1<-density(DBcomplete$delta11_L[DBcomplete$participation1==1])
> density2_1<-density(DBcomplete$delta2_L[DBcomplete$participation1==1])
> plot(0,xlim=c(-30,50),ylim=c(0,max(density11_1$y)),type="n",
> xlab="delta",ylab="density",main="participation == 1")
> plot_bg("lightgray")
> grid(col="white")
> polygon(density11_1,col="#ff773344")
> polygon(density2_1,col="#3377ff44")
> par(cex=0.9)
> legend(5,0.11,c("delta11_L","delta2_L"),fill=c("#ff773344","#3377ff44"))
> dev.off()
> 
> Jim
> 
> On Thu, Dec 5, 2019 at 9:14 PM Francesca  wrote:
>> 
>> Exactly. I was trying to remelt data in the right way, but I could not get 
>> there yet. Can you suggest me this code?
>> Thanks a lot
>> F.
>> 
>> --
>> 
>> Il giorno 5 dic 2019, alle ore 11:11, Jim Lemon  ha 
>> scritto:
>> 
>> Hi Francesca,
>> Do you want something like this?
>> 
>> Jim
>> 
>> On Thu, Dec 5, 2019 at 6:58 PM Francesca  
>> wrote:
>> 
>> 
>> Hi, sorry for bothering again.
>> I was wondering how I can reshape the data, if in your code,
>> I would like to have only two panels, where in the panel with Participation 
>> =0, I represent delta11_L of participation1==0
>> and delta2_L of participation2==0, and in the right panel, I want 
>> Participation=1, but representing together
>> delta11_L of participation1==1, and delta2_L of participation2==1.
>> 
>> I get messed up with the joint melting of participation, which determines 
>> the facet, but then I cannot assign the proper fill to the density plots 
>> which depend on it, and on the other hand I would like to have in the same 
>> plot with mixed participation.
>> 
>> I hope it is clear.
>> Nonetheless, the previous plot is useful to understand something I had not 
>> thought about.
>> Thanks again for your time.
>> F.
>> --
>> 
>> 
>> 
>> 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] "And" condition spanning over multiple columns in data frame

2024-09-12 Thread Francesca

Dear contributors,
I need to create a set of columns, based on conditions of a dataframe as
follows.
I have managed to do the trick for one column, but I do not seem to find
any good example where the condition is extended to all the dataframe.

I have these dataframe called c10Dt:



id cp1 cp2 cp3 cp4 cp5 cp6 cp7 cp8 cp9 cp10 cp11 cp12
1  1  NA  NA  NA  NA  NA  NA  NA  NA  NA   NA   NA   NA
2  4   8  18  15  10  12  11   9  18   8   16   15   NA
3  3   8   5   5   4  NA   5  NA   6  NA   10   10   10
4  3   5   5   4   4   3   2   1   3   2112
5  1  NA  NA  NA  NA  NA  NA  NA  NA  NA   NA   NA   NA
6  2   5   5  10  10   9  10  10  10  NA   109   10
-- 

Columns are id, cp1, cp2.. and so on.

What I need to do is the following, made on just one column:

c10Dt <-  mutate(c10Dt, exit1= ifelse(is.na(cp1) & id!=1, 1, 0))

So, I create a new variable, called exit1, in which the program selects
cp1, checks if it is NA, and if it is NA but also the value of the column
"id" is not 1, then it gives back a 1, otherwise 0.
So, what I want is that it selects all the cases in which the id=2,3, or 4
is not NA in the corresponding values of the matrix.
I managed to do it manually column by column, but I feel there should be
something smarter here.

The problem is that I need to replicate this over all the columns from cp2,
to cp12, but keeping fixed the id column instead.

I have tried with

c10Dt %>%
  mutate(x=across(starts_with("cp"), ~ifelse(. == NA)) & id!=1,1,0 )

but the problem with across is that it will implement the condition only on
cp_ columns. How do I tell R to use the column id with all the other
columns?


Thanks for any help provided.


Francesca


--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] (no subject)

2024-09-16 Thread Francesca

Dear Contributors,
I hope someone has found a similar issue.

I have this data set,



cp1
cp2
role
groupid
1
10
13
4
5
2
5
10
3
1
3
7
7
4
6
4
10
4
2
7
5
5
8
3
2
6
8
7
4
4
7
8
8
4
7
8
10
15
3
3
9
15
10
2
2
10
5
5
2
4
11
20
20
2
5
12
9
11
3
6
13
10
13
4
3
14
12
6
4
2
15
7
4
4
1
16
10
0
3
7
17
20
15
3
8
18
10
7
3
4
19
8
13
3
5
20
10
9
2
6



I need to to average of groups, using the values of column groupid, and
create a twin dataset in which the mean of the group is replaced instead of
individual values.
So for example, groupid 3, I calculate the mean (12+18)/2 and then I
replace in the new dataframe, but in the same positions, instead of 12 and
18, the values of the corresponding mean.
I found this solution, where db10_means is the output dataset, db10 is my
initial data.

db10_means<-db10 %>%
  group_by(groupid) %>%
  mutate(across(starts_with("cp"), list(mean = mean)))

It works perfectly, except that for NA values, where it replaces to all
group members the NA, while in some cases, the group is made of some NA and
some values.
So, when I have a group of two values and one NA, I would like that for
those with a value, the mean is replaced, for those with NA, the NA is
replaced.
Here the mean function has not the na.rm=T option associated, but it
appears that this solution cannot be implemented in this case. I am not
even sure that this would be enough to solve my problem.
Thanks for any help provided.

-- 

Francesca


--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2024-09-16 Thread Francesca

Sorry for posting a non understandable code. In my screen the dataset
looked correctly.


I recreated my dataset, folllowing your example:

test<-data.frame(matrix(c( 8,  8,  5 , 5 ,NA ,NA , 1, 15, 20,  5, NA, 17,
 2 , 5 , 5,  2 , 5 ,NA,  5 ,10, 10,  5 ,12, NA),
c( 18,  5,  5,  5, NA,  9,  2,  2, 10,  7 , 5, 19,
NA, 10, NA, 4, NA,  8, NA,  5, 10,  3, 17, NA),
c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4, 4, 2,
2, 3, 2, 3, 3, 2, 2 ,4),
c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, 7, 5,
8, 5, 1, 2, 4, 7, 6, 6)))
colnames(test)<-c("cp1","cp2","role","groupid")

What I have done so far is the following, that works:
 test %>%
  group_by(groupid) %>%
  mutate(across(starts_with("cp"), list(mean = mean)))

But the problem is with NA: everytime the mean encounters a NA, it creates
NA for all group members.
I need the software to calculate the mean ignoring NA. So when the group is
made of three people, mean of the three.
If the group is two values and an NA, calculate the mean of two.

My code works , creates a mean at each position for three subjects,
replacing instead of the value of the single, the group mean.
But when NA appears, all the group gets NA.

Perhaps there is a different way to obtain the same result.



On Mon, 16 Sept 2024 at 11:35, Rui Barradas  wrote:

> Às 08:28 de 16/09/2024, Francesca escreveu:
> > Dear Contributors,
> > I hope someone has found a similar issue.
> >
> > I have this data set,
> >
> >
> >
> > cp1
> > cp2
> > role
> > groupid
> > 1
> > 10
> > 13
> > 4
> > 5
> > 2
> > 5
> > 10
> > 3
> > 1
> > 3
> > 7
> > 7
> > 4
> > 6
> > 4
> > 10
> > 4
> > 2
> > 7
> > 5
> > 5
> > 8
> > 3
> > 2
> > 6
> > 8
> > 7
> > 4
> > 4
> > 7
> > 8
> > 8
> > 4
> > 7
> > 8
> > 10
> > 15
> > 3
> > 3
> > 9
> > 15
> > 10
> > 2
> > 2
> > 10
> > 5
> > 5
> > 2
> > 4
> > 11
> > 20
> > 20
> > 2
> > 5
> > 12
> > 9
> > 11
> > 3
> > 6
> > 13
> > 10
> > 13
> > 4
> > 3
> > 14
> > 12
> > 6
> > 4
> > 2
> > 15
> > 7
> > 4
> > 4
> > 1
> > 16
> > 10
> > 0
> > 3
> > 7
> > 17
> > 20
> > 15
> > 3
> > 8
> > 18
> > 10
> > 7
> > 3
> > 4
> > 19
> > 8
> > 13
> > 3
> > 5
> > 20
> > 10
> > 9
> > 2
> > 6
> >
> >
> >
> > I need to to average of groups, using the values of column groupid, and
> > create a twin dataset in which the mean of the group is replaced instead
> of
> > individual values.
> > So for example, groupid 3, I calculate the mean (12+18)/2 and then I
> > replace in the new dataframe, but in the same positions, instead of 12
> and
> > 18, the values of the corresponding mean.
> > I found this solution, where db10_means is the output dataset, db10 is my
> > initial data.
> >
> > db10_means<-db10 %>%
> >group_by(groupid) %>%
> >mutate(across(starts_with("cp"), list(mean = mean)))
> >
> > It works perfectly, except that for NA values, where it replaces to all
> > group members the NA, while in some cases, the group is made of some NA
> and
> > some values.
> > So, when I have a group of two values and one NA, I would like that for
> > those with a value, the mean is replaced, for those with NA, the NA is
> > replaced.
> > Here the mean function has not the na.rm=T option associated, but it
> > appears that this solution cannot be implemented in this case. I am not
> > even sure that this would be enough to solve my problem.
> > Thanks for any help provided.
> >
> Hello,
>
> Your data is a mess, please don't post html, this is plain text only
> list. Anyway, I managed to create a data frame by copying the data to a
> file named "rhelp.txt" and then running
>
>
>
> db10 <- scan(file = "rhelp.txt", what = character())
> header <- db10[1:4]
> db10 <- db10[-(1:4)] |> as.numeric()
> db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
>as.data.frame() |>
>setNames(header)
>
> str(db10)
> #> 'data.frame':25 obs. of  4 variables:
> #>  $ cp1: num  1 5 3 7 10 5 2 4 8 10 ...
> #>  $ cp2: num  10 2 1 4

Re: [R] (no subject)

2024-09-16 Thread Francesca

All' Na Is Na.


Il lun 16 set 2024, 16:29 Bert Gunter  ha scritto:

> See the na.rm argument of ?mean
>
> But what happens if all values are NA?
>
> -- Bert
>
>
> On Mon, Sep 16, 2024 at 7:24 AM Francesca 
> wrote:
> >
> > Sorry for posting a non understandable code. In my screen the dataset
> > looked correctly.
> >
> >
> > I recreated my dataset, folllowing your example:
> >
> > test<-data.frame(matrix(c( 8,  8,  5 , 5 ,NA ,NA , 1, 15, 20,  5, NA, 17,
> >  2 , 5 , 5,  2 , 5 ,NA,  5 ,10, 10,  5 ,12, NA),
> > c( 18,  5,  5,  5, NA,  9,  2,  2, 10,  7 , 5,
> 19,
> > NA, 10, NA, 4, NA,  8, NA,  5, 10,  3, 17, NA),
> > c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4, 4,
> 2,
> > 2, 3, 2, 3, 3, 2, 2 ,4),
> > c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, 7, 5,
> > 8, 5, 1, 2, 4, 7, 6, 6)))
> > colnames(test)<-c("cp1","cp2","role","groupid")
> >
> > What I have done so far is the following, that works:
> >  test %>%
> >   group_by(groupid) %>%
> >   mutate(across(starts_with("cp"), list(mean = mean)))
> >
> > But the problem is with NA: everytime the mean encounters a NA, it
> creates
> > NA for all group members.
> > I need the software to calculate the mean ignoring NA. So when the group
> is
> > made of three people, mean of the three.
> > If the group is two values and an NA, calculate the mean of two.
> >
> > My code works , creates a mean at each position for three subjects,
> > replacing instead of the value of the single, the group mean.
> > But when NA appears, all the group gets NA.
> >
> > Perhaps there is a different way to obtain the same result.
> >
> >
> >
> > On Mon, 16 Sept 2024 at 11:35, Rui Barradas 
> wrote:
> >
> > > Às 08:28 de 16/09/2024, Francesca escreveu:
> > > > Dear Contributors,
> > > > I hope someone has found a similar issue.
> > > >
> > > > I have this data set,
> > > >
> > > >
> > > >
> > > > cp1
> > > > cp2
> > > > role
> > > > groupid
> > > > 1
> > > > 10
> > > > 13
> > > > 4
> > > > 5
> > > > 2
> > > > 5
> > > > 10
> > > > 3
> > > > 1
> > > > 3
> > > > 7
> > > > 7
> > > > 4
> > > > 6
> > > > 4
> > > > 10
> > > > 4
> > > > 2
> > > > 7
> > > > 5
> > > > 5
> > > > 8
> > > > 3
> > > > 2
> > > > 6
> > > > 8
> > > > 7
> > > > 4
> > > > 4
> > > > 7
> > > > 8
> > > > 8
> > > > 4
> > > > 7
> > > > 8
> > > > 10
> > > > 15
> > > > 3
> > > > 3
> > > > 9
> > > > 15
> > > > 10
> > > > 2
> > > > 2
> > > > 10
> > > > 5
> > > > 5
> > > > 2
> > > > 4
> > > > 11
> > > > 20
> > > > 20
> > > > 2
> > > > 5
> > > > 12
> > > > 9
> > > > 11
> > > > 3
> > > > 6
> > > > 13
> > > > 10
> > > > 13
> > > > 4
> > > > 3
> > > > 14
> > > > 12
> > > > 6
> > > > 4
> > > > 2
> > > > 15
> > > > 7
> > > > 4
> > > > 4
> > > > 1
> > > > 16
> > > > 10
> > > > 0
> > > > 3
> > > > 7
> > > > 17
> > > > 20
> > > > 15
> > > > 3
> > > > 8
> > > > 18
> > > > 10
> > > > 7
> > > > 3
> > > > 4
> > > > 19
> > > > 8
> > > > 13
> > > > 3
> > > > 5
> > > > 20
> > > > 10
> > > > 9
> > > > 2
> > > > 6
> > > >
> > > >
> > > >
> > > > I need to to average of groups, using the values of column groupid,
> and
> > > > create a twin dataset in which the mean of the group is replaced
> instead
> > > of
> > > > individual

Re: [R] (no subject)

2024-09-16 Thread Francesca

Sorry, my typing was corrected by the computer.
When I have a NA, there should be a missing value.
So, if a group has 2 values and a NA, the two that have values, should be
replaced by the mean of the two,
the third should be NA.
The NA is the participant that dropped out.

On Tue, 17 Sept 2024 at 02:27, Bert Gunter  wrote:

> Hmmm... typos and thinkos ?
>
> Maybe:
> mean_narm<- function(x) {
>m <- mean(x, na.rm = T)
>if (is.nan (m)) NA else m
> }
>
> -- Bert
>
> On Mon, Sep 16, 2024 at 4:40 PM CALUM POLWART  wrote:
> >
> > Rui's solution is good.
> >
> > Bert's suggestion is also good!
> >
> > For Berts suggestion you'd make the list bit
> >
> > list(mean = mean_narm)
> >
> > But prior to that define a function:
> >
> > mean_narm<- function(x) {
> >
> > m <- mean(x, na.rm = T)
> >
> > if (!is.Nan (m)) {
> > m <- NA
> > }
> >
> > return (m)
> > }
> >
> > Would do what you suggested in your reply to Bert.
> >
> > On Mon, 16 Sep 2024, 19:48 Rui Barradas,  wrote:
> >
> > > Às 15:23 de 16/09/2024, Francesca escreveu:
> > > > Sorry for posting a non understandable code. In my screen the dataset
> > > > looked correctly.
> > > >
> > > >
> > > > I recreated my dataset, folllowing your example:
> > > >
> > > > test<-data.frame(matrix(c( 8,  8,  5 , 5 ,NA ,NA , 1, 15, 20,  5,
> NA, 17,
> > > >   2 , 5 , 5,  2 , 5 ,NA,  5 ,10, 10,  5 ,12, NA),
> > > >  c( 18,  5,  5,  5, NA,  9,  2,  2, 10,  7 ,
> 5,
> > > 19,
> > > > NA, 10, NA, 4, NA,  8, NA,  5, 10,  3, 17, NA),
> > > >  c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4,
> 4, 4,
> > > 2,
> > > > 2, 3, 2, 3, 3, 2, 2 ,4),
> > > >  c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4,
> 7,
> > > 5,
> > > > 8, 5, 1, 2, 4, 7, 6, 6)))
> > > > colnames(test)<-c("cp1","cp2","role","groupid")
> > > >
> > > > What I have done so far is the following, that works:
> > > >   test %>%
> > > >group_by(groupid) %>%
> > > >mutate(across(starts_with("cp"), list(mean = mean)))
> > > >
> > > > But the problem is with NA: everytime the mean encounters a NA, it
> > > creates
> > > > NA for all group members.
> > > > I need the software to calculate the mean ignoring NA. So when the
> group
> > > is
> > > > made of three people, mean of the three.
> > > > If the group is two values and an NA, calculate the mean of two.
> > > >
> > > > My code works , creates a mean at each position for three subjects,
> > > > replacing instead of the value of the single, the group mean.
> > > > But when NA appears, all the group gets NA.
> > > >
> > > > Perhaps there is a different way to obtain the same result.
> > > >
> > > >
> > > >
> > > > On Mon, 16 Sept 2024 at 11:35, Rui Barradas 
> > > wrote:
> > > >
> > > >> Às 08:28 de 16/09/2024, Francesca escreveu:
> > > >>> Dear Contributors,
> > > >>> I hope someone has found a similar issue.
> > > >>>
> > > >>> I have this data set,
> > > >>>
> > > >>>
> > > >>>
> > > >>> cp1
> > > >>> cp2
> > > >>> role
> > > >>> groupid
> > > >>> 1
> > > >>> 10
> > > >>> 13
> > > >>> 4
> > > >>> 5
> > > >>> 2
> > > >>> 5
> > > >>> 10
> > > >>> 3
> > > >>> 1
> > > >>> 3
> > > >>> 7
> > > >>> 7
> > > >>> 4
> > > >>> 6
> > > >>> 4
> > > >>> 10
> > > >>> 4
> > > >>> 2
> > > >>> 7
> > > >>> 5
> > > >>> 5
> > > >>> 8
> > > >>> 3
> > > >>> 2
> > > >>> 6
> > > >>> 8
> > > >>> 7
> > > >>> 4
> > > >>> 4
> > > >>> 7
> > > >>> 8
> > &g

[R] Loops

2013-01-27 Thread Francesca

Dear Contributors,
I am asking help on the way how to solve a problem related to loops for
that I always get confused with.
I would like to perform the following procedure in a compact way.

Consider that p is a matrix composed of 100 rows and three columns. I need
to calculate the sum over some rows of each
column separately, as follows:

fa1<-(colSums(p[1:25,]))

fa2<-(colSums(p[26:50,]))

fa3<-(colSums(p[51:75,]))

fa4<-(colSums(p[76:100,]))

fa5<-(colSums(p[1:100,]))



and then I need to  apply to each of them the following:


fa1b<-c()

for (i in 1:3){

fa1b[i]<-(100-(100*abs(fa1[i]/sum(fa1[i])-(1/3

}


fa2b<-c()

for (i in 1:3){

fa2b[i]<-(100-(100*abs(fa2[i]/sum(fa2[i])-(1/3

}


and so on.

Is there a more efficient way to do this?

Thanks for your time!

Francesca

--
Francesca Pancotto, PhD
Università di Modena e Reggio Emilia
Viale A. Allegri, 9
40121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Loops

2013-01-28 Thread Francesca

Thanks to you all,
they are very useful and I am learning a lot.
Best,
Francesca

On 27 January 2013 19:20, arun  wrote:

>
>
> Hi,
>
> You could use library(plyr) as well
> library(plyr)
> pnew<-colSums(aaply(laply(split(as.data.frame(p),((1:nrow(as.data.frame(p))-1)%/%
> 25)+1),as.matrix),c(2,3),function(x) x))
> res<-rbind(t(pnew),colSums(p))
> row.names(res)<-1:nrow(res)
> res<- 100-100*abs(res/rowSums(res)-(1/3))
> A.K.
>
>
> - Original Message -
> From: Rui Barradas 
> To: Francesca 
> Cc: r-help@r-project.org
> Sent: Sunday, January 27, 2013 6:17 AM
> Subject: Re: [R] Loops
>
> Hello,
>
> I think there is an error in the expression
>
> 100-(100*abs(fa1[i]/sum(fa1[i])-(1/3)))
>
> Note that fa1[i]/sum(fa1[i]) is always 1. If it's fa1[i]/sum(fa1), try
> the following, using lists to hold the results.
>
>
> # Make up some data
> set.seed(6628)
> p <- matrix(runif(300), nrow = 100)
>
> idx <- seq(1, 100, by = 25)
> fa <- lapply(idx, function(i) colSums(p[i:(i + 24), ]))
> fa[[5]] <- colSums(p)
>
> fab <- lapply(fa, function(x) 100 - 100*abs(x/sum(x) - 1/3))
> fab
>
> You can give names to the lists elements, if you want to.
>
>
> names(fa) <- paste0("fa", 1:5)
> names(fab) <- paste0("fa", 1:5, "b")
>
>
> Hope this helps,
>
> Rui Barradas
>
> Em 27-01-2013 08:02, Francesca escreveu:
> > Dear Contributors,
> > I am asking help on the way how to solve a problem related to loops for
> > that I always get confused with.
> > I would like to perform the following procedure in a compact way.
> >
> > Consider that p is a matrix composed of 100 rows and three columns. I
> need
> > to calculate the sum over some rows of each
> > column separately, as follows:
> >
> > fa1<-(colSums(p[1:25,]))
> >
> > fa2<-(colSums(p[26:50,]))
> >
> > fa3<-(colSums(p[51:75,]))
> >
> > fa4<-(colSums(p[76:100,]))
> >
> > fa5<-(colSums(p[1:100,]))
> >
> >
> >
> > and then I need to  apply to each of them the following:
> >
> >
> > fa1b<-c()
> >
> > for (i in 1:3){
> >
> > fa1b[i]<-(100-(100*abs(fa1[i]/sum(fa1[i])-(1/3
> >
> > }
> >
> >
> > fa2b<-c()
> >
> > for (i in 1:3){
> >
> > fa2b[i]<-(100-(100*abs(fa2[i]/sum(fa2[i])-(1/3
> >
> > }
> >
> >
> > and so on.
> >
> > Is there a more efficient way to do this?
> >
> > Thanks for your time!
> >
> > Francesca
> >
> > --
> > Francesca Pancotto, PhD
> > Università di Modena e Reggio Emilia
> > Viale A. Allegri, 9
> > 40121 Reggio Emilia
> > Office: +39 0522 523264
> > Web: https://sites.google.com/site/francescapancotto/
> > --
> >
> > [[alternative HTML version deleted]]
> >
> >
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 

Francesca

--
Francesca Pancotto, PhD
Università di Modena e Reggio Emilia
Viale A. Allegri, 9
40121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] On the calulation of crossed differences

2013-01-28 Thread Francesca

Dear Contributors,
I am back asking for help concerning the same type of dataset I was asking
before, in a previous help request.

I needed to sum data over subsample of three time series each of them made
of 100 observations. The solution proposed
were various, among which:

db<-p

dim( db ) <- c(25,4,3)

db2 <- apply(db, c(2,3), sum)

db3 <- t(apply(db2, 1, function(poff) 100-(100*abs(poff/sum(poff)-(1/3))) )
)

My request now is about the function at the end of the calculation in db3.

IF instead of the difference from a number, here 1/3, I need to calculate
the following difference: consider that db3 is a matrix 4x3,
I need to calculate

(db3[1,1] -db3[1,2])+(db3[1,1] -db3[1,3])*0.5 and store it to a cell,
then

(db3[1,2] -db3[1,1])+(db3[1,2] -db3[1,3])*0.5 and store it to a cell,
then

(db3[1,3] -db3[1,2])+(db3[1,3] -db3[1,2])*0.5 and store it to a cell,
then

repeat this for each of the four row of the same matrix. The resulting
matrix should be composed of these distances.

I need to repeat this for each of the subsamples. I realize that there
arecalculations that are repeated but I did not find a strategy that does
not require

Francesca


------
Francesca Pancotto, PhD
Università di Modena e Reggio Emilia
Viale A. Allegri, 9
40121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Selecting elements in lists with a row condition

2014-02-04 Thread Francesca

Dear Contributors
I am asking some advice on how to solve the following problem.
I have a list composed of 78 elements, each of which is a matrix of factors
and numbers, similar to the following

bank_name   date px_last_CIB   Q.Yp_made p_for
1   CIB 10/02/061.33 p406-q406406 406
2   CIB 10/23/061.28 p406-q406406 406
3   CIB 11/22/061.28 p406-q406406 406
4   CIB 10/02/061.35 p406-q107406 107
5   CIB 10/23/061.32 p406-q107406 107
6   CIB 11/22/061.32 p406-q107406 107


-- 

Francesca

--
Francesca Pancotto, PhD
Università di Modena e Reggio Emilia
Viale A. Allegri, 9
40121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting elements in lists with a row condition

2014-02-04 Thread Francesca

Dear Contributors
sorry but the message was sent involuntary.
I am asking some advice on how to solve the following problem.
I have a list composed of 78 elements, each of which is a matrix of factors
and numbers, similar to the following

bank_name   date px_last_CIB   Q.Yp_made p_for
1   CIB 10/02/061.33 p406-q406406 406
2   CIB 10/23/061.28 p406-q406406 406
3   CIB 11/22/061.28 p406-q406406 406
4   CIB 10/02/061.35 p406-q107406 107
5   CIB 10/23/061.32 p406-q107406 107
6   CIB 11/22/061.32 p406-q107406 107

Each of these matrixes changes for the column name bank_name and for the
suffix _CIB which reports the name as in bank_name. Moreover each matrix as
a different number of rows, so that I cannot transform it into a large
matrix.

I need to create a matrix made of the rows of each element of the list that
respect the criterium
that the column p_made is = to 406.
I need to pick each of the elements of each matrix that is contained in the
list elements, that satisfy this condition.

It seems difficult to me but perhaps is super easy.
Thanks for any help you can provide.

Francesca

On 4 February 2014 12:42, Francesca  wrote:

> Dear Contributors
> I am asking some advice on how to solve the following problem.
> I have a list composed of 78 elements, each of which is a matrix of
> factors and numbers, similar to the following
>
> bank_name   date px_last_CIB   Q.Yp_made p_for
> 1   CIB 10/02/061.33 p406-q406406 406
> 2   CIB 10/23/061.28 p406-q406406 406
> 3   CIB 11/22/061.28 p406-q406406 406
> 4   CIB 10/02/061.35 p406-q107406 107
> 5   CIB 10/23/061.32 p406-q107406 107
> 6   CIB 11/22/061.32 p406-q107    406 107
>
>
> --
>
> Francesca
>
> --
> Francesca Pancotto, PhD
> Università di Modena e Reggio Emilia
> Viale A. Allegri, 9
> 40121 Reggio Emilia
> Office: +39 0522 523264
> Web: https://sites.google.com/site/francescapancotto/
> --
>

-- 

Francesca

--
Francesca Pancotto, PhD
Università di Modena e Reggio Emilia
Viale A. Allegri, 9
40121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reorganize(stack data) a dataframe inducing names

2011-07-27 Thread Francesca

Dear Contributors,
thanks for collaboration.
I am trying to reorganize data frame, that looks like this:

 n1.Index   DatePX_LASTn2.Index   Date.1 PX_LAST.1
n3.Index   Date.2 PX_LAST.2
1 NA04/02/071.34  NA  04/02/07  1.36
   NA  04/02/07  1.33
2 NA04/09/071.34  NA  04/09/07
1.36   NA  04/09/07  1.33
3 NA 04/16/071.34  NA 04/16/07  1.36
  NA  04/16/07  1.33
4 NA 04/30/071.36  NA 04/30/07
1.40   NA  04/30/07  1.37
5 NA05/07/071.36  NA  05/07/07
1.40   NA  05/07/07  1.37
6 NA 05/14/071.36  NA 05/14/07  1.40
  NA  05/14/07  1.37
7 NA 05/22/071.36  NA 05/22/07  1.40
  NA  05/22/07  1.37


While what I would like to obtain is:
I would like to obtain stacked data as:

n1.Index   DatePX_LAST
n1.Index04/02/071.34
n1.Index04/09/071.34
n1.Index 04/16/071.34
n1.Index 04/30/071.36
n1.Index05/07/071.36
n1.Index 05/14/071.36
n1.Index 05/22/071.36
n2.Index  04/02/071.36
n2.Index 04/16/071.36
n2.Index 04/16/071.36
n2.Index 04/30/071.40
n2.Index 05/07/071.40
n2.Index 05/14/071.40
n2.Index 05/22/071.40
n3.Index 04/02/071.33
n3.Index 04/16/071.33
n3.Index 04/16/071.33
n3.Index 04/30/071.37

I have tried the function stack, but it uses only one argument. Then I
have tested the melt function from the package reshape, but it
seems not to be reproducing the correct organization of the data, as
it takes date as the id values.
PS: the n1 index names are not ordered in the original database, so
I cannot fill in the NA with the names using a recursive formula.
Thank you for any help you can provide.
Francesca

-- 
Francesca

--
Francesca Pancotto, PhD
Dipartimento di Economia
Università di Bologna
Piazza Scaravilli, 2
40126 Bologna
Office: +39 051 2098135
Cell: +39 393 6019138
Web: http://www2.dse.unibo.it/francesca.pancotto/
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reorganize(stack data) a dataframe inducing names

2011-08-01 Thread Francesca

Dear Contributors
thanks for any help you can provide. I searched the threads
but I could not find any query that satisfied my needs.
This is my database:
 index time values
13732  27965 DATA.Q211.SUM.Index04/08/11 1.42
13733  27974 DATA.Q211.SUM.Index05/10/11 1.45
13734  27984 DATA.Q211.SUM.Index06/01/11 1.22
13746  28615 DATA.Q211.TDS.Index04/07/11 1.35
13747  28624 DATA.Q211.TDS.Index05/20/11 1.40
13754  29262 DATA.Q211.UBS.Index05/02/11 1.30
13755  29272 DATA.Q211.UBS.Index05/03/11 1.48
13761  29915 DATA.Q211.UCM.Index04/28/11 1.43
13768  30565 DATA.Q211.VDE.Index05/02/11 1.48
13775  31215 DATA.Q211.WF.Index 04/14/11 1.44
13776  31225 DATA.Q211.WF.Index 05/12/11 1.42
13789  31865 DATA.Q211.WPC.Index04/01/11 1.40
13790  31875 DATA.Q211.WPC.Index04/08/11 1.42
13791  31883 DATA.Q211.WPC.Index05/10/11 1.43
13804  32515 DATA.Q211.XTB.Index04/29/11 1.50
13805  32525 DATA.Q211.XTB.Index05/30/11 1.40
13806  32532 DATA.Q211.XTB.Index06/28/11 1.43

I need to select only the rows of this database that correspond to each
of the first occurrences of the string represented in column
index. In the example shown I would like to obtain a new
data.frame which is

index time values
13732  27965 DATA.Q211.SUM.Index04/08/11 1.42
13746  28615 DATA.Q211.TDS.Index04/07/11 1.35
13754  29262 DATA.Q211.UBS.Index05/02/11 1.30
13761  29915 DATA.Q211.UCM.Index04/28/11 1.43
13768  30565 DATA.Q211.VDE.Index05/02/11 1.48
13775  31215 DATA.Q211.WF.Index04/14/11 1.44
13789  31865 DATA.Q211.WPC.Index04/01/11 1.40
13804  32515 DATA.Q211.XTB.Index04/29/11 1.50

As you can see, it is not the whole string to change,
rather a substring that is part of it. I want to select
only the first values related to the row that presents for the first time
the different part of the string(substring).
I know how to select rows according to a substring condition on the
index column, but I cannot use it here because the substring changes
and moreover the number of occurrences per substring is variable.

Thank you for any help you can provide.
Francesca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reorganize(stack data) a dataframe inducing names

2011-08-02 Thread Francesca

Works perfectly. Thanks.
f.

On 1 August 2011 18:22, jim holtman  wrote:

> Try this:  had to add extra names to your data since it was not clear
> how it was organized.  Next time use 'dput' to enclose data.
>
> > x <- read.table(textConnection(" index time  key date   values
> + 13732  27965 DATA.Q211.SUM.Index04/08/11 1.42
> + 13733  27974 DATA.Q211.SUM.Index05/10/11 1.45
> + 13734  27984 DATA.Q211.SUM.Index06/01/11 1.22
> + 13746  28615 DATA.Q211.TDS.Index04/07/11 1.35
> + 13747  28624 DATA.Q211.TDS.Index05/20/11 1.40
> + 13754  29262 DATA.Q211.UBS.Index05/02/11 1.30
> + 13755  29272 DATA.Q211.UBS.Index05/03/11 1.48
> + 13761  29915 DATA.Q211.UCM.Index04/28/11 1.43
> + 13768  30565 DATA.Q211.VDE.Index05/02/11 1.48
> + 13775  31215 DATA.Q211.WF.Index 04/14/11 1.44
> + 13776  31225 DATA.Q211.WF.Index 05/12/11 1.42
> + 13789  31865 DATA.Q211.WPC.Index04/01/11 1.40
> + 13790  31875 DATA.Q211.WPC.Index04/08/11 1.42
> + 13791  31883 DATA.Q211.WPC.Index05/10/11 1.43
> + 13804  32515 DATA.Q211.XTB.Index04/29/11 1.50
> + 13805  32525 DATA.Q211.XTB.Index05/30/11 1.40
> + 13806  32532 DATA.Q211.XTB.Index06/28/11 1.43")
> + , header = TRUE
> + , as.is = TRUE
> + )
> > closeAllConnections()
> > x
>   index  time key date values
> 1  13732 27965 DATA.Q211.SUM.Index 04/08/11   1.42
> 2  13733 27974 DATA.Q211.SUM.Index 05/10/11   1.45
> 3  13734 27984 DATA.Q211.SUM.Index 06/01/11   1.22
> 4  13746 28615 DATA.Q211.TDS.Index 04/07/11   1.35
> 5  13747 28624 DATA.Q211.TDS.Index 05/20/11   1.40
> 6  13754 29262 DATA.Q211.UBS.Index 05/02/11   1.30
> 7  13755 29272 DATA.Q211.UBS.Index 05/03/11   1.48
> 8  13761 29915 DATA.Q211.UCM.Index 04/28/11   1.43
> 9  13768 30565 DATA.Q211.VDE.Index 05/02/11   1.48
> 10 13775 31215  DATA.Q211.WF.Index 04/14/11   1.44
> 11 13776 31225  DATA.Q211.WF.Index 05/12/11   1.42
> 12 13789 31865 DATA.Q211.WPC.Index 04/01/11   1.40
> 13 13790 31875 DATA.Q211.WPC.Index 04/08/11   1.42
> 14 13791 31883 DATA.Q211.WPC.Index 05/10/11   1.43
> 15 13804 32515 DATA.Q211.XTB.Index 04/29/11   1.50
> 16 13805 32525 DATA.Q211.XTB.Index 05/30/11   1.40
> 17 13806 32532 DATA.Q211.XTB.Index 06/28/11   1.43
> > # get index of first occurance of 'key' column
> > indx <- !duplicated(x$key)
> > x[indx,]
>   index  time key date values
> 1  13732 27965 DATA.Q211.SUM.Index 04/08/11   1.42
> 4  13746 28615 DATA.Q211.TDS.Index 04/07/11   1.35
> 6  13754 29262 DATA.Q211.UBS.Index 05/02/11   1.30
> 8  13761 29915 DATA.Q211.UCM.Index 04/28/11   1.43
> 9  13768 30565 DATA.Q211.VDE.Index 05/02/11   1.48
> 10 13775 31215  DATA.Q211.WF.Index 04/14/11   1.44
> 12 13789 31865 DATA.Q211.WPC.Index 04/01/11   1.40
> 15 13804 32515 DATA.Q211.XTB.Index 04/29/11   1.50
> >
> >
>
>
>
> On Mon, Aug 1, 2011 at 11:13 AM, Francesca 
> wrote:
> > Dear Contributors
> > thanks for any help you can provide. I searched the threads
> > but I could not find any query that satisfied my needs.
> > This is my database:
> >  index time values
> > 13732  27965 DATA.Q211.SUM.Index04/08/11 1.42
> > 13733  27974 DATA.Q211.SUM.Index05/10/11 1.45
> > 13734  27984 DATA.Q211.SUM.Index06/01/11 1.22
> > 13746  28615 DATA.Q211.TDS.Index04/07/11 1.35
> > 13747  28624 DATA.Q211.TDS.Index05/20/11 1.40
> > 13754  29262 DATA.Q211.UBS.Index05/02/11 1.30
> > 13755  29272 DATA.Q211.UBS.Index05/03/11 1.48
> > 13761  29915 DATA.Q211.UCM.Index04/28/11 1.43
> > 13768  30565 DATA.Q211.VDE.Index05/02/11 1.48
> > 13775  31215 DATA.Q211.WF.Index 04/14/11 1.44
> > 13776  31225 DATA.Q211.WF.Index 05/12/11 1.42
> > 13789  31865 DATA.Q211.WPC.Index04/01/11 1.40
> > 13790  31875 DATA.Q211.WPC.Index04/08/11 1.42
> > 13791  31883 DATA.Q211.WPC.Index05/10/11 1.43
> > 13804  32515 DATA.Q211.XTB.Index04/29/11 1.50
> > 13805  32525 DATA.Q211.XTB.Index05/30/11 1.40
> > 13806  32532 DATA.Q211.XTB.Index06/28/11 1.43
> >
> > I need to select only the rows of this database that correspond to each
> > of the first occurrences of the string represented in column
> > index. In the example shown I would like to obtain a new
> > data.frame which is
> >
> > index time values
> > 13732  27965 DATA.Q211.SUM

[R] Simulation over data repeatedly for four loops

2011-11-12 Thread Francesca

Dear Contributors,

I am trying to perform a simulation over sample data,

but I need to reproduce the same simulation over 4 groups of data. My
ability with for loop is null, in particular related

to dimensions as I always get, no matter what I try,

"number of items to replace is not a multiple of replacement length"


This is what I intend to do: replicate this operation for

four times, where the index for the four groups is in the

part of the code: datiPc[[1]][,2].

I have to replicate the following code 4 times, where the

changing part is in the data from which I pick the sample,

the data that are stored in datiPc[[1]][,2].

If I had to use data for the four samples, I would substitute the 1 with a
j and replicate a loop four times, but it never worked.


My desired final outcome is a matrix with 1 observations for each
couple of extracted samples, i.e. 8 columns of 1 observations of means.



db<-c()

# Estrazione dei campioni dai dati di PGG e TRUST

estr1 <- c();

estr2 <- c();

m1<-c()

m2<-c()

   tmp1<- data1[[1]][,2];

  tmp2<- data2[[2]][,2];

for(i in 1:100){

estr1<-sample(tmp1, 1000, replace = TRUE)

estr2<-sample(tmp2, 1000, replace = TRUE)


m1[i]<-mean(estr1,na.rm=TRUE)

m2[i]<-mean(estr2,na.rm=TRUE)

}

db<-data.frame(cbind(m1,m2))
Thanks for any help you can provide.
Best Regards

-- 

Francesca
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help

2011-11-11 Thread Francesca

Dear Contributors
I would like to perform this operation using a loop, instead of repeating
the same operation many times.
The numbers from 1 to 4 related to different groups that are in the
database and for which I have the same data.


x<-c(1,3,7)

datiP1 <- datiP[datiP$city ==1,x];

datiP2 <- datiP[datiP$city ==2,x];

datiP3 <- datiP[datiP$city ==3,x]

datiP4 <- datiP[datiP$city ==4,x];
-- 

Thank you for any help you can provide.

Francesca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] computing scores from a factor analysis

2012-02-09 Thread francesca

William,
I had a problem similar to Wolfgang and I solved it through your help.
Many thanks!

Just an observation which sounded strange to me ( I am not a statistician,
just a wildlife biologist)
I have noticed  that  running the pca using principal with raw data  (and
therefore using scores=TRUE in the command line) gives different pca scores
than running the same pca with the correlation matrix (using scores=FALSE in
the command line and therefore calculating the scores in the way you
suggested to Wolfgang). Is that normal?

Thanks

francesca




--
View this message in context: 
http://r.789695.n4.nabble.com/computing-scores-from-a-factor-analysis-tp4306234p4372993.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to fix indeces in a loop

2012-05-18 Thread Francesca

Dear Contributors,
I have an easy question for you which is puzzling me instead.
I am running loops similar to the following:


for (i in c(100,1000,1)){

print((mean(i)))
#var<-var(rnorm(i,0,1))
}

This is what I obtain:

[1] 100
[1] 1000
[1] 1

In this case I ask the software to print out the result, but I would
like to store it in an object.
I have tried a second loop, because if I index the out put variable
with the i , i get thousands of records which I do not want(a matrix
of dimension 1).

for (i in c(100,1000,1)){
for (j in 1:3){
x[j]<-((mean(i)))
#var<-var(rnorm(i,0,1))
}}

This is the x:

  [,1] [,2] [,3]
[1,] 1   NA   NA
[2,] 1   NA   NA
[3,] 1   NA   NA

Clearly the object x is storing only the last value of i, 1.

I would like to save a vector of dimension 3 with content 100,1000,1,
but I do not know how to fix the index in an efficient manner.

Thanks for any help you can provide.
Francesca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to fix indeces in a loop

2012-05-18 Thread Francesca

Thanks a lot!
Francesca

On 18 May 2012 18:21, arun  wrote:
> Hi Francesca,
>
>> for(i in 1:length(x1<-c(100,1000,1))){
>  j<-x1[i]
>  x1[i]<-mean(j)
>  }
>
>> x1
> [1]   100  1000 1
>
>
>
> A.K.
>
>
>
> - Original Message -
> From: Francesca 
> To: r-help@r-project.org
> Cc:
> Sent: Friday, May 18, 2012 10:59 AM
> Subject: [R] How to fix indeces in a loop
>
> Dear Contributors,
> I have an easy question for you which is puzzling me instead.
> I am running loops similar to the following:
>
>
> for (i in c(100,1000,1)){
>
> print((mean(i)))
> #var<-var(rnorm(i,0,1))
> }
>
> This is what I obtain:
>
> [1] 100
> [1] 1000
> [1] 1
>
> In this case I ask the software to print out the result, but I would
> like to store it in an object.
> I have tried a second loop, because if I index the out put variable
> with the i , i get thousands of records which I do not want(a matrix
> of dimension 1).
>
> for (i in c(100,1000,1)){
> for (j in 1:3){
> x[j]<-((mean(i)))
> #var<-var(rnorm(i,0,1))
> }}
>
> This is the x:
>
>       [,1] [,2] [,3]
> [1,] 1   NA   NA
> [2,] 1   NA   NA
> [3,] 1   NA   NA
>
> Clearly the object x is storing only the last value of i, 1.
>
> I would like to save a vector of dimension 3 with content 100,1000,1,
> but I do not know how to fix the index in an efficient manner.
>
> Thanks for any help you can provide.
> Francesca
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 

Francesca

--
Francesca Pancotto, PhD
Università di Modena e Reggio Emilia
Viale A. Allegri, 9
40121 Reggio Emilia
Office: +39 0522 523264
Web: http://www2.dse.unibo.it/francesca.pancotto/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Collecting results of a test with array

2012-05-25 Thread Francesca

Dear contributors
I have tried this experiment:

x<-c()
for (i in 1:12){
x[i]<-list(cbind(x1[i],x2[i]))  #this is a list of 12 couples of time
series I am using to perform a test
} # that compares them 2 by 2
#
#
#trace statistic
test<-data.frame()
cval<-array( , dim=c(2,3,12))
for (i in 2:12){
for (k in 1:2){
for (j in 1:3){
result[k,j,i]<- ((ca.jo(data.frame(x[i]),ecdet="none",type="trace",
  spec="longrun",K=2))@cval[k,j])
}}}

I have a problem in collecting the results of a
test.
The function ca.jo creates an object with various attributes,
one of which is the "cval" that i can access through @cval.
The attribute cval is an object of dimension 2X3.
I am running recursively the test with ca.jo for 12
couples of time series, so I have an output of 12 matrices of 2X3
elements and I would like to create an object like an array
of dimension (2,3,12) which contains each matrix @cval
produced by ca.jo for the 12 subjects that i tested.

Can anyone help me with that?
I hope my explanation of the problem is clear.
Thanks in advance for any help.

-- 

Francesca

--
Francesca Pancotto, PhD
Università di Modena e Reggio Emilia
Viale A. Allegri, 9
40121 Reggio Emilia
Office: +39 0522 523264
Web: http://www2.dse.unibo.it/francesca.pancotto/
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Select part of character row name in a data frame

2017-10-19 Thread Francesca PANCOTTO

Dear R contributors,

I have a problem in selecting in an efficient way, rows of a data frame 
according to a condition,
which is a part of a row name of the table.

The data frame is made of 64 rows and 2 columns, but the row names are very 
long but I need to select them according to a small part of it and perform 
calculations on the subsets.

This is the example:

X   Y
"Unique to strat  "
0.048228.39
"Unique to crt.dummy"  
0.044125.92
"Unique to gender   "  
0.0159 9.36
"Unique to age   " 
0.083949.37
"Unique to gg_right1  "
0.0019 1.10
"Unique to strat:crt.dummy "   
0.068940.54
"Common to strat, and crt.dummy " 
-0.0392   -23.09
"Common to strat, and gender "
-0.0031-1.84
"Common to crt.dummy, and gender " 
0.0038 2.21
"Common to strat, and age "
0.0072 4.21

X and Y are the two columns of variables, while “Unique to strat”, are the row 
names. I am interested to select for example those rows 
whose name contains “strat” only. It would be very easy if these names were 
simple, but they are not and involve also spaces.
I tried with select matches from dplyr but works for column names but I did not 
find how to use it on row names, which are of course character values.

Thanks for any help you can provide.

--
Francesca Pancotto, PhD

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Select part of character row name in a data frame

2017-10-19 Thread Francesca PANCOTTO

Thanks a lot, so simple so efficient!

I will study more the grep command I did not know.

Thanks!


Francesca Pancotto

> Il giorno 19 ott 2017, alle ore 12:12, Enrico Schumann 
>  ha scritto:
> 
>  df[grep("strat", row.names(df)), ]


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Select part of character row name in a data frame

2017-10-20 Thread Francesca PANCOTTO

I did not need to select the whole character sentence, otherwise I would know 
how to do it.. from basic introduction to R as you suggest.
Grep works perfectly.

f.

--
Francesca Pancotto, PhD
> Il giorno 19 ott 2017, alle ore 18:01, Jeff Newmiller 
>  ha scritto:
> 
> (Re-)read the discussion of indexing (both `[` and `[[`) and be sure to get 
> clear on the difference between matrices and data frames in the Introduction 
> to R document that comes with R. There are many ways to create numeric 
> vectors, character vectors, and logical vectors that can then be used as 
> indexes, including the straightforward way:
> 
> df[ c(
> "Unique to strat  ",
> "Unique to strat:crt.dummy ",
> "Common to strat, and crt.dummy ",
> "Common to strat, and gender ",
> "Common to strat, and age ") ,]
> -- 
> Sent from my phone. Please excuse my brevity.
> 
> On October 19, 2017 3:14:53 AM PDT, Francesca PANCOTTO 
>  wrote:
>> Thanks a lot, so simple so efficient!
>> 
>> I will study more the grep command I did not know.
>> 
>> Thanks!
>> 
>> 
>> Francesca Pancotto
>> 
>>> Il giorno 19 ott 2017, alle ore 12:12, Enrico Schumann
>>  ha scritto:
>>> 
>>> df[grep("strat", row.names(df)), ]
>> 
>> 
>>  [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Map of Italy data filled at the level of the province

2016-06-02 Thread francesca Pancotto

Dear Users
I am very new to the use of ggplot. I am supposed to make a plot of 
Italian provinces in which I have to fill the color of some provinces 
with the values of a variable(I do not provide the data because it is 
irrelevant which data to use).

Right now I explored the function map in maps package thanks to which I managed 
to plot
the map of Italy with provinces borders and select only those provinces 
contained in the 
vector nomi(which is just a list of character elements with the names of the 
provinces which are 
just like counties in the US).

map("italy",col=1:20, regions=nomi)

The problem is to fill the provinces level with the values of a variable that 
is the variable of interest:
I found a series of examples based on US data extracted from very hard to get 
databases.

Can anyone provide an easy example where to start from?

Thanks in advance
Francesca

--
Francesca Pancotto
Professore Associato di Politica Economica
Università degli Studi di Modena e Reggio Emilia
Palazzo Dossetti - Viale Allegri, 9 - 42121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
--


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Small vector into large data frame

2014-11-14 Thread Francesca Pancotto

Dear Contributors
I seem not to get the general rule applying to the use of loops.
I need some help. I have a database in which i need to generate a variable 
according to the following rule.


This is the database head

  bank_name   date px_last   Q_Y p_made p_for p_m p_f
aba.1   ABA 2006-10-241.28 p406-q406406   406   1   1
aba.2   ABA 2006-11-301.31 p406-q406406   406   1   1
aba.3   ABA 2006-10-241.29 p406-q107406   107   1   2
aba.4   ABA 2006-11-301.33 p406-q107406   107   1   2
aba.5   ABA 2006-10-241.31 p406-q207406   207   1   3
aba.6   ABA 2006-11-301.35 p406-q207406   207   1   3


the variable p_f takes values from 1 to 19 in a non regular way.

then I have a vector of 19 elements

> spot$pxlast
 [1] 1.32 1.34 1.35 1.43 1.46 1.58 1.58 1.41 1.40 1.33 1.40 1.46 1.43 1.35 1.22 
1.36 1.34 1.42 1.42

I need to create a variable to attach to the data frame which is composed of 
11500 rows that takes values
1.32 when p_f==1
1.34 when p_f==2

It seems so easy but I cannot find a way to do it in an efficient way.
Thanks in advance for any help.



Francesca



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Organize regression output

2016-12-09 Thread francesca Pancotto

Dear Contributors

I would like to ask some help concerning the automatization process of an 
analysis, that sounds hard to my knowledge.
I have a list of regression models. 
I call them models=c(ra,rb,rc,rd,re,rf,rg,rh)

I can access the output of each of them using for example, for the first

ra$coefficients

and i obtain

(Intercept)   coeff1  coeff2age gender  
 0.62003033  0.00350807 -0.03817848 -0.01513533 -0.18668972
and I know that ra$coefficients[1] would give me the intercept of this model.

What i need to do is to collect the coefficients of each regression in models, 
and calculate and place in a table, the following simple summation:


ra  rb  
rc  ...

intercept   intercept   
intercept
intercept+coeff1intercept+coeff1
intercept+coeff1
intercept+coeff2intercept+coeff2
intercept+coeff2
intercept+coeff1+coeff2 intercept+coeff1+coeff2 
intercept+coeff1+coeff2


The calculations are trivial(I know how to do it in steps) but what is 
difficult for me is to invent a procedure that organizes the data in an 
efficient way.

I tried some step , starting with collecting the coefficients but i think I am 
going the wrong way

calcolati <- list()
for (i in c(ra,rb,rc,rd,re,rf,rg,rh))
{
  calcolati[[i]] <- i$coefficients[1]
}

Thanks for any help you can provide.

f.
--
Francesca Pancotto
Web: https://sites.google.com/site/francescapancotto/ 
<https://sites.google.com/site/francescapancotto/>
Energie: 
http://www.energie.unimore.it/ <http://www.energie.unimore.it/>
--


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] question on DPpackage

2010-03-01 Thread Francesca Ieva


Hi to everyone,
I'm a PhD student  and I'm involved in non parametric analyses of 
hierarchical models. I tried to use package DPpackage on my data, but I 
encountered some problems in interpreting ouputs. Can anybody help me? 
The problem can be remued as follows: I have a logit hierarchical model 
for survival (i.e. binary response) in patients affected by heart 
failure (the court consists of n=536 subjects), admitted in J different 
hospitals. The idea is to explain survival by means of linear predictor 
of relevant covariates with a random effect on grouping factor, 
represented by hospital of dmission. The main goal would be the 
reconstruction of random effect density, because we are interest in 
founding (if present) groups of "similar" hospitals.
Now, I found some trouble in doing this beause I don't understand how 
outputs "ss", "ncluster" and "mub" work. Particullary, how does the 
package compute "ss" starting from "ncluster" distribution? How should I 
interpret the vector "mub"?  I tried to look into the fortrand code, but 
I'm not used to that language so it didn't help me a lot. Could anybody 
make me understand how "ss" and "ncluster" are related, and how to build 
the predictive densities of the random effect starting from "mub"s 
elements? It would be a great gain in my work.


Looking forward to hear news from you
thanks a lot in advance

Regards
Francesca Ieva


--
-

Francesca Ieva

MOX - Modeling and Scientific Computing
Dipartimento di Matematica "F.Brioschi"
Politecnico di Milano
Via Bonardi 9, 20133 Milano, Italy

mailto: francesca.i...@fastwebnet.it
francesca.i...@mail.polimi.it

Voice: +39 02 2399 4604

Skype: francesca.ieva

-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to perform an ols estimation with lm ignoring NA values

2009-10-22 Thread Francesca Pancotto


Dear R community,
probably my question is obvious but I did not find any solution yet by
browsind the mailing list results.
I need to perform a simple ols regression in a dataset with cross section
data, where no temporal dimension is inserted. In this data set there are
missing values. I would like the software to perform the ols regression
but to just ignore these data and consider the rest.

I tried to use the function na.action=na.omit in

lm( y~x, na.action=na.omit)
but it seems to exert no effect on the function.

Thank you for any available help.
Francesca

-- 
Post - doc Finance
HEC Management School of the University of Liège
Rue Louvrex, 14 ,
Bldg N1 , B-4000 Liège
Belgium
Web: https://mail.sssup.it/~pancotto

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rotating pca scores

2010-01-18 Thread francesca . iordan

Dear Folks

I need to rotate PCA loadings and scores using R.
I have run a pca using princomp and I have rotated PCA results with  
varimax. Using varimax R gives me back just rotated PC loadings without  
rotated PC scores.


Does anybody know how I can obtain/calculate rotated PC scores with R?

Your kindly help is appreciated in advance

Francesca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rotating pca scores

2010-01-18 Thread francesca iordan

Dear Folks

I need to rotate PCA loadings and scores using R.
I have run a pca using princomp and I have rotated PCA results with varimax.
Using varimax R gives me back just rotated PC loadings without rotated PC
scores.


Does anybody know how I can obtain/calculate rotated PC scores with R?

Your kindly help is appreciated in advance

Francesca

-- 
Francesca Iordan
11A Sharon Gardens
E97RX
London

UK contact: +44 750 5485255
Italian contact: +39 349 7313294

Skype: jordanfrancesca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Step and AIC

2010-01-27 Thread francesca . iordan

Hello everybody,
I would need some help from you.
I am trying to fit a logistic model to some presence absence data of  
animals living on river islands. I have got 12 predictor variables and I am  
trying to use a stepwise forward method to fit the best logistic model to  
my data. I am using the function STEP (stats).
I have a question for you. Can I use step function if my variables have a  
binomial distribution?
Reading the explanations of the function, I have understood that step is  
more suitable for dealing with gaussian distributed variables.
Is that right?

I apologize in advance for this question, but I am just at the beginning of  
my long path to handle and know statistics and R.

regards
Francesca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Exponentiate very large numbers

2013-02-04 Thread francesca casalino

Dear R experts,

I have the logarithms of 2 values:

log(a) = 1347
log(b) = 1351

And I am trying to solve this expression:

exp( ln(a) ) - exp( ln(0.1) + ln(b) )

But of course every time I try to exponentiate the log(a) or log(b)
values I get Inf. Are there any tricks I can use to get a real result
for exp( ln(a) ) - exp( ln(0.1) + ln(b) ), either in logarithm or
exponential form?


Thank you very much for the help

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exponentiate very large numbers

2013-02-05 Thread francesca casalino

I am sorry I have confused you, the logs are all base e:

ln(a) = 1347
ln(b) = 1351

And I am trying to solve this expression:

exp( ln(a) ) - exp( ln(0.1) + ln(b) )


Thank you.

2013/2/4 francesca casalino :
> Dear R experts,
>
> I have the logarithms of 2 values:
>
> log(a) = 1347
> log(b) = 1351
>
> And I am trying to solve this expression:
>
> exp( ln(a) ) - exp( ln(0.1) + ln(b) )
>
> But of course every time I try to exponentiate the log(a) or log(b)
> values I get Inf. Are there any tricks I can use to get a real result
> for exp( ln(a) ) - exp( ln(0.1) + ln(b) ), either in logarithm or
> exponential form?
>
>
> Thank you very much for the help

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ggplot2 and facet_wrap help

2013-02-18 Thread francesca casalino

Dear R experts,

I am trying to arrange multiple plots, creating one graph for each
size1 factor variable in my data frame, and each plot has the median
price on the y-axis and the size2 on the x-axis grouped by clarity:

library(ggplot2)

df <- data.frame(price=matrix(sample(1:1000, 100, replace = TRUE), ncol = 1))

df$size1 = 1:nrow(df)
df$size1 = cut(df$size1, breaks=11)
df=df[sample(nrow(df)),]
df$size2 = 1:nrow(df)
df$size2 = cut(df$size2, breaks=11)
df=df[sample(nrow(df)),]
df$clarity = 1:nrow(df)
df$clarity = cut(df$clarity, breaks=6)


mydf = aggregate(df$price, by=list(df$size1, df$size2, df$clarity),median)

names(mydf)[1] = 'size1'
names(mydf)[2] = 'size2'
names(mydf)[3] = 'clarity'
names(mydf)[4] = 'median_price'

# So my data is already in a "long" format I think, but when I do this:

ggplot(data=mydf, aes(x=mydf$size2, y=mydf$median_price,
group=as.factor(mydf$clarity), colour=as.factor(mydf$clarity))) +
geom_line() + facet_wrap(~ factor(mydf$size1))


I get this error:
"Error in layout_base(data, vars, drop = drop) :
  At least one layer must contain all variables used for facetting"

Can you please help me understand what I am doing wrong?
-fra

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 and facet_wrap help

2013-02-18 Thread francesca casalino

Dear Ista,

Thank you! It works perfectly!
-fra

2013/2/18 Ista Zahn :
> Hi,
>
> You are making it more complicated than it needs to be. You already
> provided the data.frame in the ggplot call, so you don't need to
> specify it in the aes calls. The various factor() and as.factor()
> calls are also unnecessary. So stripping away this extra stuff your
> plot looks like
>
> ggplot(data=mydf, aes(x=size2,
>  y=median_price,
>  group=clarity,
>  colour=clarity)) +
>   geom_line() +
>   facet_wrap(~ size1)
>
> which does give the desired display.
>
> Best,
> Ista
>
> On Mon, Feb 18, 2013 at 6:04 AM, francesca casalino
>  wrote:
>> Dear R experts,
>>
>> I am trying to arrange multiple plots, creating one graph for each
>> size1 factor variable in my data frame, and each plot has the median
>> price on the y-axis and the size2 on the x-axis grouped by clarity:
>>
>> library(ggplot2)
>>
>> df <- data.frame(price=matrix(sample(1:1000, 100, replace = TRUE), ncol = 1))
>>
>> df$size1 = 1:nrow(df)
>> df$size1 = cut(df$size1, breaks=11)
>> df=df[sample(nrow(df)),]
>> df$size2 = 1:nrow(df)
>> df$size2 = cut(df$size2, breaks=11)
>> df=df[sample(nrow(df)),]
>> df$clarity = 1:nrow(df)
>> df$clarity = cut(df$clarity, breaks=6)
>>
>>
>> mydf = aggregate(df$price, by=list(df$size1, df$size2, df$clarity),median)
>>
>> names(mydf)[1] = 'size1'
>> names(mydf)[2] = 'size2'
>> names(mydf)[3] = 'clarity'
>> names(mydf)[4] = 'median_price'
>>
>> # So my data is already in a "long" format I think, but when I do this:
>>
>> ggplot(data=mydf, aes(x=mydf$size2, y=mydf$median_price,
>> group=as.factor(mydf$clarity), colour=as.factor(mydf$clarity))) +
>> geom_line() + facet_wrap(~ factor(mydf$size1))
>>
>>
>> I get this error:
>> "Error in layout_base(data, vars, drop = drop) :
>>   At least one layer must contain all variables used for facetting"
>>
>> Can you please help me understand what I am doing wrong?
>> -fra
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Submit a new script after all parallel jobs have completed

2012-09-26 Thread francesca casalino

Dear R experts,

I have an R script that creates multiple scripts and submits these
simultaneously to a computer cluster, and after all of the multiple
scripts have completed and the output has been written in the
respective folders, I would like to automatically launch another R
script that works on these outputs.

I haven't been able to figure out whether there is a way to do this in
R: the function 'wait' is not what I want since the scripts are
submitted as different jobs and each of them completes and writes its
output file at different times, but I actually want to run the
subsequent script after all of the outputs appear. Can you please help
me find a solution?

Thank you very much
-fra

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SURVREG Function

2012-09-27 Thread Francesca Meucci


Hi,
I need some help to manage frailty in Survreg function; in particular I'm 
looking for more information about frailty in survreg function applied to a 
loglogistic hazard function.
Actually I need to develope a predictor for frailty random variable realization 
(similar to the Proportional Hazard Model's one based on Laplace Tansforms' 
ratio).
I can't find any documentation about AFT models with Gamma Frailty developed in 
"survival package". Any reference paper to suggest? 
Thank a million
Francesca
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] snpStats reference allele used in genetic associations?

2014-05-23 Thread francesca casalino

Hi,

Does anyone know how to find the reference allele used for genetic
associations ran in snpStats?

I have ran several associations using snp.rhs.tests, but I cannot tell which
allele was used as the "effect allele". Is it the one coded as "Al1" in the
SNP.support file? I can find the RAF (risk allele frequency) from the
function col.summary, but again, which allele does this refer to? Also the
proportions of genotypes from the col.summary is given as "AA/AB/BB", so I
cannot understand from that which is coded as the "risk" allele.

I could find this in the snpStats paper:
"For categorical variables, including SNPs, the user can reorder the
categories. The first one will be treated as reference category in the
analysis."

Thank you very much for your help!
Fra

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] snpStats reference allele used in genetic associations?

2014-05-26 Thread francesca casalino

But then how do you know which allele is the reference and which the risk
allele (between A/T/C/G)?


2014-05-26 1:41 GMT+01:00 David Duffy :

> francesca casalino  asked:
>
>
>> Does anyone know how to find the reference allele used for genetic
>> associations ran in snpStats?
>>
>>  A is ref allele, B is risk allele.
>
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] snpStats reference allele used in genetic associations?

2014-05-26 Thread francesca casalino

I am having this problem because I need to run a meta-analysis and to align
all the variants between the different studies included in the
meta-analysis I need to know the effect allele used to get the beta (so
that I can flip the beta if the effect allele is flipped compared to all
other studies).

Thanks for your help.

Francesca


2014-05-26 11:13 GMT+01:00 francesca casalino :

> But then how do you know which allele is the reference and which the risk
> allele (between A/T/C/G)?
>
>
> 2014-05-26 1:41 GMT+01:00 David Duffy :
>
> francesca casalino  asked:
>>
>>
>>> Does anyone know how to find the reference allele used for genetic
>>> associations ran in snpStats?
>>>
>>>  A is ref allele, B is risk allele.
>>
>>
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] snpStats reference allele used in genetic associations?

2014-05-27 Thread francesca casalino

Dear David,

Thank you very much for you help, I really appreciate it.

I am not using the read.snps.long() or any other import function, as the
data is already in snpMatrix, so I cannot specify it at the input stepâ¦
I am reading the data as a snpMatrix, so using load() after having called
the snpStats package as in the vignette: "Example of genome-wide
association testing":

require(snpStats)

data(for.exercise)

The objects loaded with this set are: 1) genotypes (in probability format
from imputed data); 2) SNP.support object, with information on  the SNPs
such as Allele 1, Allele 2, chr, position. Other information that I need
for the meta-analysis can be extracted from the 'col.summary' command in
snpStats: MAF, and I think RAF (risk allele frequency) can be considered as
the Effect allele frequency for the meta-analysis.

Then I am using snp.rhs.estimates and snp.rhs.tests for the associations.
The problem is that I don't know which allele is taken as the risk allele
in the association, is there a way to see this? Is it always the Allele 2
reported in the SNP.support file? Until I understand this, I won't be able
to harmonise the SNPs all to one reference, since I don't know if I need to
flip the betas when the effect allele is reversed for exampleâ¦

I have noticed this because when comparing the frequency of the Allele 2
(taken as the risk allele) and the RAF which I thought was the frequency
associated with it, with the frequencies of the same allele found in the
1000 Genomes, I get concordance up to frequency= 0.5, then a shift in
direction happens and I get discordance up to 1 for the reference
frequency.

Thank you very much for any suggestions you may have,

Francesca

2014-05-27 2:07 GMT+01:00 David Duffy :

> On Mon, 26 May 2014, francesca casalino wrote:
>
>  I am having this problem because I need to run a meta-analysis and to
>> align
>> all the variants between the different studies included in the
>> meta-analysis I need to know the effect allele used to get the beta (so
>> that I can flip the beta if the effect allele is flipped compared to all
>> other studies).
>>
>
> This depends on how the data has been sent to you.  Obviously, you should
> check the "A" allele frequency in the different datasets.  If they have
> used different genotyping assays and the strand of the SNP is ambiguous, eg
> G->C transversion, then this may be the only way to exclude problems. PLINK
> offers a tool to check for this using LD patterns.
>
> Cheers, David.
>
>
> | David Duffy (MBBS PhD)
> | email: david.du...@qimrberghofer.edu.au  ph: INT+61+7+3362-0217 fax:
> -0101
> | Genetic Epidemiology, QIMR Berghofer Institute of Medical Research
> | 300 Herston Rd, Brisbane, Queensland 4006, Australia  GPG 4D0B994A
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Selecting elements in lists with a row condition

2014-02-04 Thread Francesca Pancotto

Hello A. k. 
thanks for the suggestion.

I tried this but it does not work. I probably use it in the wrong way.
This is what it tells me, 


 do.call(rbind,lapply(bank.list,function(x) x[x[,"p_made"]==406,]))

Errore in match.names(clabs, names(xi)) : 
  names do not match previous names

What am I doing wrong?
f.

--
Francesca Pancotto
Università degli Studi di Modena e Reggio Emilia
Palazzo Dossetti - Viale Allegri, 9 - 42121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
--

Il giorno 04/feb/2014, alle ore 16:42, arun  ha scritto:

> Hi,
> Try:
> 
> If `lst1` is the list:
> do.call(rbind,lapply(lst1,function(x) x[x[,"p_made"]==406,]))
> A.K.
> 
> 
> 
> 
> On Tuesday, February 4, 2014 8:53 AM, Francesca 
>  wrote:
> Dear Contributors
> sorry but the message was sent involuntary.
> I am asking some advice on how to solve the following problem.
> I have a list composed of 78 elements, each of which is a matrix of factors
> and numbers, similar to the following
> 
> bank_name   date px_last_CIB   Q.Yp_made p_for
> 1   CIB 10/02/061.33 p406-q406406 406
> 2   CIB 10/23/061.28 p406-q406406 406
> 3   CIB 11/22/061.28 p406-q406406 406
> 4   CIB 10/02/061.35 p406-q107406 107
> 5   CIB 10/23/061.32 p406-q107406 107
> 6   CIB 11/22/061.32 p406-q107406 107
> 
> 
> Each of these matrixes changes for the column name bank_name and for the
> suffix _CIB which reports the name as in bank_name. Moreover each matrix as
> a different number of rows, so that I cannot transform it into a large
> matrix.
> 
> I need to create a matrix made of the rows of each element of the list that
> respect the criterium
> that the column p_made is = to 406.
> I need to pick each of the elements of each matrix that is contained in the
> list elements, that satisfy this condition.
> 
> It seems difficult to me but perhaps is super easy.
> Thanks for any help you can provide.
> 
> Francesca
> 
> 
> 
> On 4 February 2014 12:42, Francesca  wrote:
> 
>> Dear Contributors
>> I am asking some advice on how to solve the following problem.
>> I have a list composed of 78 elements, each of which is a matrix of
>> factors and numbers, similar to the following
>> 
>> bank_name   date px_last_CIB   Q.Yp_made p_for
>> 1   CIB 10/02/061.33 p406-q406406 406
>> 2   CIB 10/23/061.28 p406-q406406 406
>> 3   CIB 11/22/061.28 p406-q406406 406
>> 4   CIB 10/02/061.35 p406-q107    406 107
>> 5   CIB 10/23/061.32 p406-q107406 107
>> 6   CIB 11/22/061.32 p406-q107406 107
>> 
>> 
>> --
>> 
>> Francesca
>> 
>> ------
>> Francesca Pancotto, PhD
>> Università di Modena e Reggio Emilia
>> Viale A. Allegri, 9
>> 40121 Reggio Emilia
>> Office: +39 0522 523264
>> Web: https://sites.google.com/site/francescapancotto/
> 
>> --
>> 
> 
> 
> 
> -- 
> 
> Francesca
> 
> --
> Francesca Pancotto, PhD
> Università di Modena e Reggio Emilia
> Viale A. Allegri, 9
> 40121 Reggio Emilia
> Office: +39 0522 523264
> Web: https://sites.google.com/site/francescapancotto/
> --
> 
> [[alternative HTML version deleted]]
> 
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Unicode symbols not working with ggplot in R

2014-02-13 Thread francesca casalino

Hi,

I am trying to produce a ggplot graph using specific characters in the
labels, but ggplots doesn't seem to support certain symbols.

For example, when I type:

print("\u25E9")
it shows a square which is half black, but when I try to use it in ggplot
it doesn't print.

I am using facet_wrap, but it looks like the problem is in ggplot that
doesn't recognise the Unicode symbols and not factet_wrap (please let me
know if it is otherwise).

I am taking this very helpful example for illustration (
http://r.789695.n4.nabble.com/plus-minus-in-factor-not-plotmath-not-expression-td4681490.html
):

junk<-data.frame(gug=c(
rep( paste("\u25E9"), 10),
rep( paste("\u25E8"), 10)
)
)
junk$eks<-1:nrow(junk)
junk$why<-with(junk, as.numeric(gug) + eks)
print(summary(junk))
library(ggplot2)
print(
ggplot(data=junk, mapping=aes(x=eks, y=why))
+ geom_point()
+ facet_grid(. ~ gug)
)
Is there a way to have R recognise these Unicode symbols? It is not math
symbols so plotmath will not be useful here...

I'm using a Mac and this is the SessionInfo:

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] Unicode_0.1-3   ggplot2_0.9.3.1 plyr_1.8reshape2_1.2.2

loaded via a namespace (and not attached):
[1] colorspace_1.2-4   dichromat_2.0-0digest_0.6.4   grid_3.0.2

[5] gtable_0.1.2   labeling_0.2   MASS_7.3-29munsell_0.4.2

[9] proto_0.3-10   RColorBrewer_1.0-5 scales_0.2.3   stringr_0.6.2

[13] tcltk_3.0.2tools_3.0.2


Thank you very much for your help!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Filling missing data in a Panel

2014-02-16 Thread Francesca Pancotto

Dear R contributors, 
I have a problem with a database that at the moment I find hard to solve. 

I have a panel composed of n subjects, whose names in the table that I report 
is bank_name,
and observations for each of the individuals  of bank_name from 1 to 18, as 
reported from the column p_for.
As you can see from p_for, there are missing values in the panel that are not 
present and that create problems to my estimation.
Do you know an efficient way to introduce missing values in the rows of the 
panel so that each cross section bank_name has the same number of observations
p_for, even though some of them are NA?
Thanks for any help you can provide,

Best,

Francesca


row.names bank_name   date px_last   Q_Y p_made p_for  
1  2 1   11/30/061.31 p406-q406406   406
1
2 47 1   02/26/091.27 p109-q109109   109
10  
3 55 1 06/08/20091.40 p209-q209209   209
 11   
4 68 1 12/01/20091.51 p409-q409409   409
 13   
5 87 1   05/26/101.22 p210-q210210   210
15  
6 96 1  7/22/20101.25 p310-q310310   310
16  
7221 2   11/14/061.30 p406-q406406   406
 1
8 16 2   02/13/071.27 p107-q107107   107
2   
9 31 2  5/15/20071.36 p207-q207207   207
3   
10   222 3 11/29/20071.50 p407-q407407   4075
11  1110 3   02/25/081.48 p108-q108108   1086
12 6 4   02/15/071.35 p107-q107107   107
2
1318 4  5/24/20071.39 p207-q207207   207
 3   
14   292 4   08/21/071.39 p307-q307307   3074
1538 4 11/29/20071.49 p407-q407407   4075   
1649 4   01/28/081.43 p108-q108108   108
 6   
1761 4   05/15/081.52 p208-q208208   208
7
1871 4   08/18/081.45 p308-q308308   308
8
1978 4   11/20/081.30 p408-q408408   408
9   
2088 4   02/19/091.35 p109-q109109   109
10   
21   941 4   05/28/091.35 p209-q209209   209
11   --
Francesca Pancotto
Università degli Studi di Modena e Reggio Emilia
Palazzo Dossetti - Viale Allegri, 9 - 42121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
--


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Filling missing data in a Panel

2014-02-20 Thread Francesca Pancotto

Thanks a lot, it works perfectly!
f.
--
Francesca Pancotto
Università degli Studi di Modena e Reggio Emilia
Palazzo Dossetti - Viale Allegri, 9 - 42121 Reggio Emilia
Office: +39 0522 523264
Web: https://sites.google.com/site/francescapancotto/
--

Il giorno 17/feb/2014, alle ore 09:57, arun  ha scritto:

> Sorry, a typo: vec3 instead of vec2
> 
>  dat3 <- 
> data.frame(bank_name=vec3,p_for=rep(seq(18),length(unique(dat$bank_name
> A.K.
> 
> 
> 
> 
> On , arun  wrote:
> Hi,
> 
> Looks like one column name is missing.  I am not sure about the output you 
> wanted.  May be this helps.
> 
> 
> dat <- read.table(text="row.names bank_name  date px_last  Q_Y p_made 
> q_madep_for 
> 1  21  11/30/061.31 p406-q406406  406  1  
>
> 2471  02/26/091.27 p109-q109109  109  10
> 3551 06/08/20091.40 p209-q209209  209 11
> 4681 12/01/20091.51 p409-q409409  409 13
> 5871  05/26/101.22 p210-q210210  210  15
> 6961  7/22/20101.25 p310-q310310  310  16
> 72212  11/14/061.30 p406-q406406  406  1   
> 8162  02/13/071.27 p107-q107107  107  2 
> 9312  5/15/20071.36 p207-q207207  207  3
> 10  2223 11/29/20071.50 p407-q407407  407  5   
> 11  11103  02/25/081.48 p108-q108108  108  6   
> 1264  02/15/071.35 p107-q107107  107  2   
> 13184  5/24/20071.39 p207-q207207  207  3
> 14  2924  08/21/071.39 p307-q307307  307  4  
> 15384 11/29/20071.49 p407-q407407  407  5 
> 16494  01/28/081.43 p108-q108108  108  6
> 17614  05/15/081.52 p208-q208208  208  7   
> 18714  08/18/081.45 p308-q308308  308  8   
> 19784  11/20/081.30 p408-q408408  408  9
> 20884  02/19/091.35 p109-q109109  109  10
> 21  9414  05/28/091.35 p209-q209209  209  
> 11",sep="",header=TRUE,stringsAsFactors=FALSE)
> ##Possible solution 1
> 
>  tbl <- table(dat$bank_name)
>  dat2 <- 
> data.frame(bank_name=as.numeric(rep(names(tbl),max(tbl)-tbl)),p_for=NA)
>  res1 <- merge(dat,dat2,all=TRUE)[colnames(dat)]
>  table(res1$bank_name)
> #
> # 1  2  3  4 
> #10 10 10 10 
> 
> 
> ###2
> 
> 
> vec1 <- with(dat,tapply(p_for,list(bank_name),FUN=max))
>  vec2 <- as.numeric(rep(names(vec1),each=max(vec1)))
> dat2New <- data.frame(bank_name=vec2,p_for=rep(seq(max(vec1)),4))
> res2 <- merge(dat,dat2New,all=TRUE)[colnames(dat)]
>  table(res2$bank_name)
> #
> # 1  2  3  4 
> #16 16 16 16 
> 
> #or 
> 
> 3
> 
> #using 18 as mentioned in the description
> 
> vec3 <- rep(unique(dat$bank_name),each=18)
> dat3 <- 
> data.frame(bank_name=vec2,p_for=rep(seq(18),length(unique(dat$bank_name
> res3 <- merge(dat,dat3,all=TRUE)[colnames(dat)]
> table(res3$bank_name)
> 
> # 1  2  3  4 
> #18 18 18 18 
> 
> A.K.
> 
> 
> 
> 
> 
> On Monday, February 17, 2014 2:40 AM, Francesca Pancotto 
>  wrote:
> Dear R contributors, 
> I have a problem with a database that at the moment I find hard to solve.
> 
> I have a panel composed of n subjects, whose names in the table that I report 
> is bank_name,
> and observations for each of the individuals  of bank_name from 1 to 18, as 
> reported from the column p_for.
> As you can see from p_for, there are missing values in the panel that are not 
> present and that create problems to my estimation.
> Do you know an efficient way to introduce missing values in the rows of the 
> panel so that each cross section bank_name has the same number of observations
> p_for, even though some of them are NA?
> Thanks for any help you can provide,
> 
> Best,
> 
> Francesca
> 
> 
> row.names bank_name   date px_last   Q_Y p_made p_for 
> 1  2 1   11/30/061.31 p406-q406406   406   1  
>  
> 2 47 1   02/26/091.27 p109-q109109   109  10  
> 3 55 1 06/08/20091.40 p209-q209209   209  11  
> 4 68 1 12/01/20091.51 p409-q409409   409  13  
> 5 87 1   05/26/101.22 p210-q210210   210  15

[R] opls-da

2010-09-01 Thread Francesca Chignola


Dear all,

I would like to apply Orthogonal Projections to Latent Structures 
Discriminant Analysis (OPLS-DA) to a metabolomic dataset, in order to 
discriminate two groups of samples.
I have looked for an available R package and I have found "K-OPLS" and 
oscorespls.fit (Orthogonal scores PLSR) from "pls" package.

I wonder if K-OPLS performs the same discriminant analysis of OPLS-DA?
Is there any other available package for applying OPLS-DA?
Thanks in advance for any advice.
Best regards,

Francesca Chignola

--
---
Francesca Chignola, PhD
Dulbecco Telethon Institute c/o S. Raffaele Scientific Institute
Center of Genomics, BioInformatics and BioStatistics
Biomolecular NMR Laboratory 1B4
Via Olgettina 58
20132 Milano
Italy
-


-
DAI IL TUO 5 X MILLE AL SAN RAFFAELE. BASTA UNA FIRMA.
SE FIRMI PER LA RICERCA SANITARIA DEL SAN RAFFAELE DI MILANO, FIRMI PER TUTTI.
C.F. 03 06 42 80 153
INFO: 5permi...@hsr.it - www.5xmille.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] objects memory limits

2011-03-16 Thread francesca bader

Dear list,

I'm a quite new user of R-project, and I've a doubt on objects memory: I open a 
new R session and the command memory.limits() gives me 1535 Mb of memory (the 
PC has 2 Gb RAM and 32 bit), I create an integer vector object of 2e8 size, so 
about 2e8*4 bytes (800Mb) of memory are allocated, a size smaller then memory 
available. But when I try to make the dataframe of this object it gives me 
"Errore: cannot allocate vector of size 762.9 Mb".
Why cannot I create a dataframe of an object with size smaller then memory 
available?
I also tried to halve the object size but the situation doesn't changes.
In R, is the memory of dataframe object smaller then vector object one?
Are there different memory limits between objects? Is there a possibility to
 change limits?
This is the command sequence:

R version 2.12.0 (2010-10-15)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i386-pc-mingw32/i386 (32-bit)


> ls()
character(0)
> memory.limit()
[1] 1535
> x=integer(2e8)
> object.size(x)
80024 bytes
> rm(x)
> ls()
character(0)
> x=data.frame(integer(2e8))
Errore: cannot allocate vector of size 762.9 Mb
Inoltre: Warning messages:
1: In as.data.frame.integer(x[[i]], optional
 = TRUE) :
  Reached total allocation of 1535Mb: see help(memory.size)
2: In as.data.frame.integer(x[[i]], optional = TRUE) :
  Reached total allocation of 1535Mb: see help(memory.size)
3: In as.data.frame.integer(x[[i]], optional = TRUE) :
  Reached total allocation of 1535Mb: see help(memory.size)
4: In as.data.frame.integer(x[[i]], optional = TRUE) :
  Reached total allocation of 1535Mb: see help(memory.size)
> x=data.frame(integer(1e8))
Errore: cannot allocate vector of size 381.5 Mb
Inoltre: Warning messages:
1: In unlist(vlist, recursive = FALSE, use.names = FALSE) :
  Reached total allocation of 1535Mb: see help(memory.size)
2: In unlist(vlist, recursive = FALSE, use.names = FALSE) :
  Reached total allocation of 1535Mb: see help(memory.size)
3: In unlist(vlist, recursive = FALSE, use.names = FALSE) :
  Reached total allocation of 1535Mb: see help(memory.size)
4: In unlist(vlist, recursive =
 FALSE, use.names = FALSE) :
  Reached total allocation of 1535Mb: see help(memory.size)
> Many thanks. Kind regards.
Dr. Francesca Bader
University of Trieste
Italy





  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] XIII Meeting GRASS and GFOSS in Italy

2011-11-19 Thread francesca bader

Dear mailing list,

I'd like to inform you about XIII Meeting GRASS and GFOSS, that will take place 
from 15th to 17th February 2012 at University of Trieste (edificio H3, aula 
magna) in Trieste (Italy). 
The meeting will involve both GRASS users both open source software and data 
users. 
More infomations:    
http://sites.google.com/site/grassts/
gr...@units.it

Kind regards. From meeting organizators
Dr. PhD Francesca Bader
Università degli Studi di Trieste

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Wilcox test and data collection

2011-09-28 Thread Francesca Pancotto

Dear Contributors
I have a problem with the collection of data from the results of a test.
I need to perform a comparative test over groups of data , recall the value
of the pvalue and create a table.
My problem is in the way to replicate the analysis over and over again over
subsets of data according to a condition.
I have this database, called y:
gg   t1 t2d
 40  1  1 2
 50 2   2 1
 45 1   3 1
 49 2   1 1
 5   2   1 3
 40 1   1 2

where gg takes values from 1 to 100, t1 and t2 have values in (1,2,3) and d
in (0,1,2,3)
I want to perform tests on the values of gg according to the conditions that

d==0 , compare values of gg when t1==1 with values of gg when t1==3
d==1 , compare values of gg when t1==1 with values of gg when t1==3
d==2 , compare values of gg when t1==1 with values of gg when t1==3
..
then
d==0 , compare values of gg when t2==1 with values of gg when t2==3
d==1...


then collect the data of a statistics and create a table.
The procedure i followed is to create sub datasets called m0,m1,m2,m3
corresponding
to the values of d, i.e.

m0<- y[y$d==0,c(7,17,18,19)]
m1<- y[y$d==1,c(7,17,18,19)]
m2<- y[y$d==2,c(7,17,18,19)]
m3<- y[y$d==3,c(7,17,18,19)]

then perform the test as follows:

x1<-wilcox.test(m0[m0$t1==1,1],m0[m0$t1==3,1],correct=FALSE, exact=FALSE,
conf.int=TRUE,alternative = c("g"))   #ABC   ID
x2<-  wilcox.test(m1[m1$t1==1,1],m1[m1$t1==3,1],correct=FALSE, exact=FALSE,
conf.int=TRUE,alternative = c("g"))
 x3<-  wilcox.test(m2[m2$t1==1,1],m2[m2$t1==3,1],correct=FALSE, exact=FALSE,
conf.int=TRUE,alternative = c("g"))
x4<- wilcox.test(m3[m3$t1==1,1],m3[m3$t1==3,1],correct=FALSE, exact=FALSE,
conf.int=TRUE,alternative = c("g"))

each of these tests will create an object, say x and then I extract the
value statistics using
x$statistics.

How to automatize this?
Thank you for any help you can provide.
Francesca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge two data frames and find common values and non-matching values

2011-10-04 Thread francesca casalino

Yes, your code did exactly what I needed.

Thank you!!
-f

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Import in R with White Spaces

2011-10-04 Thread francesca casalino

Ok I added quoting and it did work...Not sure why, but thank you for both
your replies!
-f

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge two data frames and find common values and non-matching values

2011-10-04 Thread francesca casalino

Sorry---I thought it worked but I think I am actually definitely doing
something wrong...

The problem might be that there are NA's and there are also duplicated
values...My fault. I can't figure out what is going wrong...
I'll be more thorough and modify the two df to mirror more what I have to
explain better:

df1 is:

Name Position location
francesca A 75
maria A 75
cristina B 36

And df2 is:

location Country
75 UK
75 Italy
56 France
56 Austria

So I thought I had to first eliminate the duplicates like this:
df1_unique<-subset(df1, !duplicated(location))
df2_unique<-subset(df2, !duplicated(location))

After doing this I get:

df1 :

Name Position location
francesca A 75
cristina B 36

And df2:

location Country
75 UK
56 France

And I would like to match on "Location" and the output to tell me which
records are matching in df1 and not in df2, the ones matching in both, and
the ones which are in df2 but are not matching in df1...

Name Position Location Match
francesca A 75 1
cristina B 36 0

As William suggested,


df12 <- merge(df1, cbind(df2, fromDF2=TRUE), all.x=TRUE, by="location")
df12$Match <- !is.na(df12$fromDF2)
new_common<- new[which(new$Match==TRUE),]

Would give me the records that are matching, which should be correct, but I
am not getting the correct value for the non-shared elements (the variants
that are in the df2 but not indf1):
df2_only <- subset(df1_unique, !(location %in% df2_unique))
df2_only<- df2_unique[-which(df2_unique$location %in% df1_unique$location),]


Neither of these work and give me wrong records...
My questions are:

1. How do I calculate the records from df2 which are NOT in df1?
2.Do I need to eliminate the duplictaes (or is there a way to record where
they came from)?

Any help is very appreciated...
THANK YOU very much!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Permutation or Bootstrap to obtain p-value for one sample

2011-10-09 Thread francesca casalino

Thank you very much to both Ken and Peter for the very helpful
explanations.

Just to understand this better (sorry for repeating but I am also new in
statisticsso please correct me where I am wrong):

Ken' method:
Random sampling of the mean, and then using these means to construct a
distribution of means (the 'null' distribution), and I can then use this
normal distribution and compare the population mean to my mean using, for
example, z-score.
Of note: The initial distributions are not normal, so I thought I needed to
base my calculations on the median, but I can use the mean to construct a
normal distribution.
This would be defined a bootstrap test.

Peter's method:
Random sampling of the mean, and then comparing each sampled mean with the
population mean and see if it is higher or equal to the difference between
my sample and the population mean. This is a permutation test, but to
actually get CI and a p-value I would need bootstrap method.

Did I understand this correctly?

I tried to start with Ken's approach for now, and followed his steps, but:

1) I get a lot of NaN in the sampling distribution, is this normal?
2) I think I am doing again something wrong when I try to find a
p-valueThis is what I did:

nreps=1
mean.dist=rep(NA,nreps)

for(replication in 1:nreps)
{
my.sample=sample(population$Y, 250, replace=F)
#Peter mentioned that this sampling should be without replacement, so I went
for that---

mean.for.rep=mean(my.sample) #mean for this replication
mean.dist[replication]=mean.for.rep #store the mean
}

hist(mean.dist,main="Null Dist of Means", col="chartreuse")
 #Show the means in a nifty color

mean_dist= mean(mean.dist, na.rm=TRUE)
sd_pop= sd(mean.dist, na.rm=TRUE)

mean_sample= mean(population$Y, na.rm=TRUE)

z_stat= (mean_sample - mean_dist)/(sd_pop/sqrt(2089))
p_value= 2*pnorm(-abs(z_stat))

Is this correct?
THANK YOU SO MUCH FOR ALL YOUR HELP!!

2011/10/9 Ken Hutchison 

> Hi Francy,
>   A bootstrap test would likely be sufficient for this problem, but a
> one-sample t-test isn't advisable or necessary in my opinion. If you use a
> t-test multiple times you are making assumptions about the distribution of
> your data; more importantly, your probability of Type 1 error will be
> increased with each test. So, a valid thing to do would be to sample
> (computation for this problem won't be expensive so do alotta reps) and
> compare your mean to the null distribution of means. I.E.
>
> nreps=1
> mean.dist=rep(NA,nreps)
>
> for(replication in 1:nreps)
> {
> my.sample=sample(population, 2500, replace=T)
> #replace could be false, depends on preference
> mean.for.rep=mean(my.sample) #mean for this replication
> mean.dist[replication]=mean.for.rep #store the mean
> }
>
> hist(mean.dist,main="Null Dist of Means", col="chartreuse")
>  #Show the means in a nifty color
>
> You can then perform various tests given the null distribution, or infer
> from where your sample mean lies within the distribution or something to
> that effect. Also, if the distribution is normal, which is somewhat likely
> since it is a distribution of means: (shapiro.test or require(nortest)
> ad.test will let you know) you should be able to make inference from that
> using parametric methods (once) which will fit the truth a bit better than a
> t.test.
> Hope that's helpful,
>Ken Hutchison
>
>
> On Sat, Oct 8, 2011 at 10:04 AM, francy  wrote:
>
>> Hi,
>>
>> I am having trouble understanding how to approach a simulation:
>>
>> I have a sample of n=250 from a population of N=2,000 individuals, and I
>> would like to use either permutation test or bootstrap to test whether
>> this
>> particular sample is significantly different from the values of any other
>> random samples of the same population. I thought I needed to take random
>> samples (but I am not sure how many simulations I need to do) of n=250
>> from
>> the N=2,000 population and maybe do a one-sample t-test to compare the
>> mean
>> score of all the simulated samples, + the one sample I am trying to prove
>> that is different from any others, to the mean value of the population.
>> But
>> I don't know:
>> (1) whether this one-sample t-test would be the right way to do it, and
>> how
>> to go about doing this in R
>> (2) whether a permutation test or bootstrap methods are more appropriate
>>
>> This is the data frame that I have, which is to be sampled:
>> df<-
>> i.e.
>> x y
>> 1 2
>> 3 4
>> 5 6
>> 7 8
>> . .
>> . .
>> . .
>> 2,000
>>
>> I have this sample from df, and would like to test whether it is has
>> extreme
>> values of y.
>> sample1<-
>> i.e.
>> x y
>> 3 4
>> 7 8
>> . .
>> . .
>> . .
>> 250
>>
>> For now I only have this:
>>
>> R=999 #Number of simulations, but I don't know how many...
>> t.values =numeric(R) #creates a numeric vector with 999 elements,
>> which
>> will hold the results of each simulation.
>> for (i in 1:R) {
>> sample1 <- df[sample(nrow(df), 250, replace=TRUE),]
>>
>> But I don't know how to continue the

Re: [R] Permutation or Bootstrap to obtain p-value for one sample

2011-10-09 Thread francesca casalino

Dear Peter and Tim,

Thank you very much for taking the time to explain this to me! It is much
more clear now.
And sorry for using the space here maybe inappropriately, I really hope this
is OK and gets posted, I think it is really important that non-statisticians
like myself get a good idea of the concepts behind the functions of R. I am
really grateful you went through this with me.

-f

2011/10/9 Tim Hesterberg 

> I'll concur with Peter Dalgaard that
> * a permutation test is the right thing to do - your problem is equivalent
>  to a two-sample test,
> * don't bootstrap, and
> * don't bother with t-statistics
> but I'll elaborate a bit on on why, including
> * two approaches to the whole problem - and how your approach relates
>  to the usual approach,
> * an interesting tidbit about resampling t statistics.
>
> First, I'm assuming that your x variable is irrelevant, only y matters,
> and that sample1 is a proper subset of df.  I'll also assume that you
> want to look for differences in the mean, rather than arbitrary differences
> (in which case you might use e.g. a Kolmogorov-Smirnov test).
>
> There are two general approaches to this problem:
> (1) two-sample problem, sample1$y vs df$y[rows other than sample 1]
> (2) the approach you outlined, thinking of sample1$y as part of df$y.
>
> Consider (1), and call the two data sets y1 and y2
> The basic permutation test approach is
> * compute the test statistic theta(y1, y2), e.g. mean(y1)-mean(y2)
> * repeat  (or 9) times:
>  draw a sample of size n1 from the pooled data, call that Y1, call the rest
> Y2
>  compute theta(Y1, Y2)
> * P-value for a one-sided test is (1 + k) / (1 + )
>  where k is the number of permutation samples with theta(Y1,Y2) >=
> theta(y1,y2)
>
> The test statistic could be
>  mean(y1) - mean(y2)
>  mean(y1)
>  sum(y1)
>  t-statistic (pooled variance)
>  P-value for a t-test (pooled variance)
>  mean(y1) - mean(pooled data)
>  t-statistic (unpooled variance)
>  P-value for a t-test (unpooled variance)
>  median(y1) - median(y2)
>  ...
> The first six of those are equivalent - they give exactly the same P-value
> for the permutation test.  The reason is that those test statistics
> are monotone transformations of each other, given the data.
> Hence, doing the pooled-variance t calculations gains nothing.
>
> Now consider your approach (2).  That is equivalent to the permutation
> test using the test statistic:  mean(y1) - mean(pooled data).
>
> Why not a bootstrap?  E.g. pool the data and draw samples of size
> n1 and n2 from the pooled data, independently and with replacement.
> That is similar to the permutation test, but less accurate.  Probably
> the easiest way to see this is to suppose there is 1 outlier in the pooled
> data.
> In any permutation iteration there is exactly 1 outlier among the two
> samples.
> With bootstrapping, there could be 0, 1, 2, 
> The permutation test answers the question - given that there is exactly
> 1 outlier in my combined data, what is the probability that random chance
> would give a difference as large as I observed.  The bootstrap would
> answer some other question.
>
> Tim Hesterberg
> NEW!  Mathematical Statistics with Resampling and R, Chihara & Hesterberg
>
> http://www.amazon.com/Mathematical-Statistics-Resampling-Laura-Chihara/dp/1118029852/ref=sr_1_1?ie=UTF8
> http://home.comcast.net/~timhesterberg
>  (resampling, water bottle rockets, computers to Guatemala, shower = 2650
> light bulbs, ...)
>
>
> >On Oct 8, 2011, at 16:04 , francy wrote:
> >
> >> Hi,
> >>
> >> I am having trouble understanding how to approach a simulation:
> >>
> >> I have a sample of n=250 from a population of N=2,000 individuals, and I
> >> would like to use either permutation test or bootstrap to test whether
> this
> >> particular sample is significantly different from the values of any
> other
> >> random samples of the same population. I thought I needed to take random
> >> samples (but I am not sure how many simulations I need to do) of n=250
> from
> >> the N=2,000 population and maybe do a one-sample t-test to compare the
> mean
> >> score of all the simulated samples, + the one sample I am trying to
> prove
> >> that is different from any others, to the mean value of the population.
> But
> >> I don't know:
> >> (1) whether this one-sample t-test would be the right way to do it, and
> how
> >> to go about doing this in R
> >> (2) whether a permutation test or bootstrap methods are more appropriate
> >>
> >> This is the data frame that I have, which is to be sampled:
> >> df<-
> >> i.e.
> >> x y
> >> 1 2
> >> 3 4
> >> 5 6
> >> 7 8
> >> . .
> >> . .
> >> . .
> >> 2,000
> >>
> >> I have this sample from df, and would like to test whether it is has
> extreme
> >> values of y.
> >> sample1<-
> >> i.e.
> >> x y
> >> 3 4
> >> 7 8
> >> . .
> >> . .
> >> . .
> >> 250
> >>
> >> For now I only have this:
> >>
> >> R=999 #Number of simulations, but I don't know how many...
> >> t.values =numeric(R)  #crea

[R] Mean or mode imputation fro missing values

2011-10-11 Thread francesca casalino

Dear R experts,

I have a large database made up of mixed data types (numeric,
character, factor, ordinal factor) with missing values, and I am
looking for a package that would help me impute the missing values
using  either the mean if numerical or the mode if character/factor.

I maybe could use replace like this:
df$var[is.na(df$var)] <- mean(df$var, na.rm = TRUE)
And go through all the many different variables of the datasets using
mean or mode for each, but I was wondering if there was a faster way,
or if a package existed to automate this (by doing 'mode' if it is a
factor or character or 'mean' if it is numeric)?

I have tried the package "dprep" because I wanted to use the function
"ce.mimp", btu unfortunately it is not available anymore.

Thank you for your help,
-francy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mean or mode imputation fro missing values

2011-10-11 Thread francesca casalino

Yes thank you Gu…
I am just trying to do this as a rough step and will try other
imputation methods which are more appropriate later.
I am just learning R, and was trying to do the for loop and
f-statement by hand but something is going wrong…

This is what I have until now:

*fake array:
age<- c(5,8,10,12,NA)
a<- factor(c("aa", "bb", NA, "cc", "cc"))
b<- c("banana", "apple", "pear", "grape", NA)
df_test <- data.frame(age=age, a=a, b=b)
df_test$b<- as.character(df_test$b)

for (var in 1:ncol(df_test)) {
if (class(df_test$var)=="numeric") {
df_test$var[is.na(df_test$var)] <- mean(df_test$var, na.rm = 
TRUE)
} else if (class(df_test$var)=="character") {
Mode(df_test$var[is.na(df_test$var)], na.rm = TRUE)
}
}

Where 'Mode' is the function:

function (x, na.rm)
{
xtab <- table(x)
xmode <- names(which(xtab == max(xtab)))
if (length(xmode) > 1)
xmode <- ">1 mode"
return(xmode)
}


It seems as it is just ignoring the statements though, without giving
any error…Does anybody have any idea what is going on?

Thank you very much for all the great help!
-f

2011/10/11 Weidong Gu :
> In your case, it may not be sensible to simply fill missing values by
> mean or mode as multiple imputation becomes the norm this day. For
> your specific question, na.roughfix in randomForest package would do
> the work.
>
> Weidong Gu
>
> On Tue, Oct 11, 2011 at 8:11 AM, francesca casalino
>  wrote:
>> Dear R experts,
>>
>> I have a large database made up of mixed data types (numeric,
>> character, factor, ordinal factor) with missing values, and I am
>> looking for a package that would help me impute the missing values
>> using  either the mean if numerical or the mode if character/factor.
>>
>> I maybe could use replace like this:
>> df$var[is.na(df$var)] <- mean(df$var, na.rm = TRUE)
>> And go through all the many different variables of the datasets using
>> mean or mode for each, but I was wondering if there was a faster way,
>> or if a package existed to automate this (by doing 'mode' if it is a
>> factor or character or 'mean' if it is numeric)?
>>
>> I have tried the package "dprep" because I wanted to use the function
>> "ce.mimp", btu unfortunately it is not available anymore.
>>
>> Thank you for your help,
>> -francy
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Creating data frame with residuals of a data frame

2011-10-24 Thread francesca casalino

Dear experts,

I am trying to create a data frame from the residuals I get after
having applied a linear regression to each column of a data frame, but
I don't know how to create this data frame from the resulting list
since the list has differing numbers of rows.

So for example:
age<- c(5,6,10,14,16,NA,18)
value1<- c(30,70,40,50,NA,NA,NA)
value2<- c(2,4,1,4,4,4,4)
df<- data.frame(age, value1, value2)

#Run linear regression to adjust for age and get residuals:

lm_f <- function(x) {
x<- residuals(lm(data=df, formula= x ~ age))
}
resid <- apply(df,2,lm_f)
resid<- resid[-1]

Then resid is a list with different row numbers:

$value1
 1  2  3  4
-16.945813  22.906404  -7.684729   1.724138

$value2
  1   2   3   4   5   7
-0.37398374  1.50406504 -1.98373984  0.52845528  0.28455285  0.04065041

I am trying to get both the original variable and their residuals in
the same data frame like this:

age, value1, value2, resid_value1, resid_value2

But when I try cbind or other operations I get an error message
because they do not have the same number of rows. Can you please help
me figure out how to solve this?

Thank you.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Order a data frame based on the order of another data frame

2012-03-05 Thread francesca casalino

Hi, I am trying to match the order of the rownames of a dataframe with
the rownames of another dataframe (I can't simply sort both sets
because I would have to change the order of many other connected
datasets if I did that): Also, the second dataset (snp.matrix$fam) is
a snp matrix slot:

so for example:

data_one:
xyz
sample_1110001-0.3352623 -1.141462-0.4032494
sample_1110005 0.1862424  0.015944 0.1329059
sample_1110420 0.1309120   0.0040055960.06117253
sample_2220017 0.1145205  -0.1250900540.04957881

rownames(snp.matrix$fam)
 [1] "sample_2220017" "sample_1110420" "sample_1110001"
 [4] "sample_1110005"

I would like my data_one to look like this:
   x   y
z
sample_2220017 0.1145205  -0.1250900540.04957881
sample_1110420 0.1309120   0.0040055960.06117253
sample_1110001-0.3352623 -1.141462-0.4032494
sample_1110005 0.1862424  0.015944 0.1329059


I have tried these but it doesn't work:
data_one[order(rownames(snp.matrix$fam)),]
data_one[rownames(data_oen)[order(rownames(snp.matrix$fam))],]

Thank you for your help!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] master thesis

2012-04-14 Thread Francesca Sorbie

Hi, 

For my master thesis I have 24 micro-plots on which I did measurements during 3 
months. 

The measurements were: 
- Rainfall and runoff events throughout 3monts (runoff being dependant on the 
rainfall, a coefficient (%) has been made per rainfall event and per 3 months)
- Soil texture (3 different textures were differentiated)
- Slope (3 classes of slopes)
- Stoniness (one time measurement)
- Random roughness (throughout 3 months)
- Land use (crop land or grazing land)
- Vegetation cover (throughout 3 months)
- Vegetation height (throughout 3 months, only measured on cropland)
- Antecedent moisture content (throughout 3 months)

Now I would like to investigate the effect of all these variables on the 
rainfall/runoff. For example does a steeper slope have a larger effect on the 
runoff than the soil texture?
What are the possibilities in R? 

Thank you for any feedback, 
Francesca
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] XIII GRASS and GFOSS italian Meeting

2012-01-18 Thread francesca bader

Dear all,
we would like to point out the approaching XIII GRASS and 
GFOSS italian Meeting which will take place at the University of Trieste from 
Wednesday, February 15 until Friday, February 17, 2012.
Abstracts can be sent to  gr...@units.it till January 20 while subscriptions 
are open up to February 6, 2012.
All important informations regarding the meeting can be found 
at http://sites.google.com/site/grassts/


Kind Regards,
Dipartimento di Scienze della Vita
Università degli Studi di Trieste
via Weiss 2, 34127 Trieste
tel. 040 5582072, fax. 040 5582011
mail: gr...@units.it
http://sites.google.com/site/grassts/
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help Package 'CLIMTRENDS' from Archive

2021-03-22 Thread francesca brun via R-help

Hello,
I need to run the 'climtrend' library which is no longer available, I 
downloaded and installed it from the archive on my pc but it doesn't work, it 
says "I can't find the function ..." what should I do? I absolutely need to use 
it, in addition to installing it, what should I do to use it?
thank you in advance for your kindness,
Regards
Francesca
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Thank you 4 Davide

2021-03-22 Thread francesca brun via R-help

Hello,
The problem was that version 4.0.4 did not support the package so I tried with 
several old versions until 3.6.2 installs both climtrend and Rcmdr with its 
graphical interface !! solved and thanks again Davide !!Francesca
(from Italy)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Create a numeric series in an efficient way

2024-06-13 Thread Francesca PANCOTTO via R-help

Dear Contributors
I am trying to create a numeric series with repeated numbers, not difficult
task, but I do not seem to find an efficient way.

This is my solution

blocB <- c(rep(x = 1, times = 84), rep(x = 2, times = 84), rep(x = 3, times
= 84), rep(x = 4, times = 84), rep(x = 5, times = 84), rep(x = 6, times =
84), rep(x = 7, times = 84), rep(x = 8, times = 84), rep(x = 9, times =
84), rep(x = 10, times = 84), rep(x = 11, times = 84), rep(x = 12, times =
84), rep(x = 13, times = 84))

which works but it is super silly and I need to create different variables
similar to this, changing the value of the repetition, 84 in this case.
Thanks for any help.


F.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Create a numeric series in an efficient way

2024-06-13 Thread Francesca PANCOTTO via R-help

I apologize, I solved the problem, sorry for that.
f.



Il giorno gio 13 giu 2024 alle ore 16:42 Francesca PANCOTTO <
francesca.panco...@unimore.it> ha scritto:

> Dear Contributors
> I am trying to create a numeric series with repeated numbers, not
> difficult task, but I do not seem to find an efficient way.
>
> This is my solution
>
> blocB <- c(rep(x = 1, times = 84), rep(x = 2, times = 84), rep(x = 3,
> times = 84), rep(x = 4, times = 84), rep(x = 5, times = 84), rep(x = 6,
> times = 84), rep(x = 7, times = 84), rep(x = 8, times = 84), rep(x = 9,
> times = 84), rep(x = 10, times = 84), rep(x = 11, times = 84), rep(x = 12,
> times = 84), rep(x = 13, times = 84))
>
> which works but it is super silly and I need to create different variables
> similar to this, changing the value of the repetition, 84 in this case.
> Thanks for any help.
>
>
> F.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fill NA values in columns with values of another column

2024-08-27 Thread Francesca PANCOTTO via R-help

Dear Contributors,
I have a problem with a database composed of many individuals for many
periods, for which I need to perform a manipulation of data as follows.
Here I report the procedure I need to do for the first 32 observations of
the first period.


cbind(VB1d[,1],s1id[,1])
  [,1] [,2]
 [1,]68
 [2,]95
 [3,]   NA1
 [4,]56
 [5,]   NA7
 [6,]   NA2
 [7,]44
 [8,]27
 [9,]27
[10,]   NA3
[11,]   NA2
[12,]   NA4
[13,]56
[14,]95
[15,]   NA5
[16,]   NA6
[17,]   103
[18,]72
[19,]21
[20,]   NA7
[21,]72
[22,]   NA8
[23,]   NA4
[24,]   NA5
[25,]   NA6
[26,]21
[27,]44
[28,]68
[29,]   103
[30,]   NA3
[31,]   NA8
[32,]   NA1


In column s1id, I have numbers from 1 to 8, which are the id of 8 groups ,
randomly mixed in the larger group of 32.
For each group, I want the value that is reported for only to group
members, to all the four group members.

For example, value 8 in first row , second column, is group 8. The value
for group 8 of the variable VB1d is 6. At row 28, again for s1id equal to
8, I have 6.
But in row 22, the value 8 of the second variable, reports a value NA.
in each group is the same, only two values have the correct number, the
other two are NA.
I need that each group, identified by the values of the variable S1id,
correctly report the number of variable VB1d that is present for just two
group members.

I hope my explanation is acceptable.
The task appears complex to me right now, especially because I will need to
multiply this procedure for x12x14 similar databases.

Anyone has ever encountered a similar problem?
Thanks in advance for any help provided.

--

Francesca Pancotto

Associate Professor Political Economy

University of Modena, Largo Santa Eufemia, 19, Modena

Office Phone: +39 0522 523264

Web: *https://sites.google.com/view/francescapancotto/home
<https://sites.google.com/view/francescapancotto/home>*

 --

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

75 matches

Mail list logo