Re: [R] Dataframe: Average cells of two rows and replace them with one row

PIKAL Petr Fri, 30 May 2014 00:46:36 -0700

Hi

Please do not use html formating in your post. It does not bring any advantage.
See inline.


From: Verena Weinbir [mailto:[email protected]]
Sent: Thursday, May 29, 2014 3:33 PM
To: PIKAL Petr
Subject: Re: [R] Dataframe: Average cells of two rows and replace them with one 
row

Hey,
Thank you for your reply!

I've attached some sample data. When I tried your code it gave me the error 
message, that arguments must have same
Why you attached data? Preferable is using dput. When I tried to read your data 
it had some flaw with number of items in row 13 (and probably others), Excel is 
not famous for keeping same formating across versions.
> test<-read.table("clipboard", header=T, na.string="NA", dec=",")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  line 13 did not have 25 elements
So I read only lines 1:10.
> test<-read.table("clipboard", header=T, na.string="NA", dec=",")
Which results in data frame with two factor variables Author and Test. BTW 
there is no variable âNameâ in your data.
> str(test)
'data.frame':   10 obs. of  25 variables:
$ Author  : Factor w/ 4 levels "Beck","Joll",..: 2 2 2 2 1 1 1 1 3 4
$ Year    : int  2006 2006 2006 2006 1988 1988 1988 1988 2004 2004
$ Number  : int  720 720 720 720 33 41 41 41 19 26
$ NumberA : int  344 344 344 344 5 6 6 6 9 12
$ NumberB : int  376 376 376 376 28 35 35 35 10 14
$ Age     : num  15 15 15 15 25.5 NA NA NA 37.4 37.2
$ AgeA    : int  NA NA NA NA 27 NA NA NA NA NA
$ AgeB    : int  NA NA NA NA 24 NA NA NA NA NA
$ Test    : Factor w/ 2 levels "green","red": 2 2 2 2 1 1 1 1 1 1
$ ScoreA  : num  64.8 63 64.7 60.6 61 ...
$ ScoreAdv: num  9.96 9.96 9.96 9.96 20.64 ...
$ ScoreB  : num  75.5 73.4 74.6 69.2 70.8 ...
$ ScoreBdv: num  9.04 9.04 9.04 9.04 16.36 ...
$ Sub1    : logi  NA NA NA NA NA NA ...
$ Sub2    : logi  NA NA NA NA NA NA ...
$ Sub3    : logi  NA NA NA NA NA NA ...
$ Sub4    : logi  NA NA NA NA NA NA ...
$ Sub5    : logi  NA NA NA NA NA NA ...
$ Sub6    : logi  NA NA NA NA NA NA ...
$ Sub7    : logi  NA NA NA NA NA NA ...
$ Sub8    : logi  NA NA NA NA NA NA ...
$ Sub8.1  : logi  NA NA NA NA NA NA ...
$ Sub10   : logi  NA NA NA NA NA NA ...
$ yi      : num  1.124 1.092 1.04 0.903 0.515 ...
$ vi      : num  0.00643 0.00638 0.0063 0.00612 0.23337 ...
Here is output from dput which you can use to inspect if my data are the same 
as yours (that is why dput is preferable)
> dput(test)
structure(list(Author = structure(c(3L, 3L, 3L, 3L, 1L, 1L, 1L,
1L, 4L, 5L, 2L), .Label = c("Beck", "Con", "Joll", "Per(a)",
"Per(b)"), class = "factor"), Year = c(2006L, 2006L, 2006L, 2006L,
1988L, 1988L, 1988L, 1988L, 2004L, 2004L, 2012L), Number = c(720L,
720L, 720L, 720L, 33L, 41L, 41L, 41L, 19L, 26L, 312L), NumberA = c(344L,
344L, 344L, 344L, 5L, 6L, 6L, 6L, 9L, 12L, 156L), NumberB = c(376L,
376L, 376L, 376L, 28L, 35L, 35L, 35L, 10L, 14L, 156L), Age = c(15,
15, 15, 15, 25.5, NA, NA, NA, 37.4, 37.2, 37.25), AgeA = c(NA,
NA, NA, NA, 27, NA, NA, NA, NA, NA, 38.3), AgeB = c(NA, NA, NA,
NA, 24, NA, NA, NA, NA, NA, 36.2), Test = structure(c(3L, 3L,
3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c("blue", "green",
"red"), class = "factor"), ScoreA = c(64.8, 63, 64.7, 60.6, 61,
60.66, 58.5, 61.66, 87.58, 91.2, 0.26), ScoreAdv = c(9.955, 9.955,
9.955, 9.955, 20.64, 19.38, 20.35, 19.44, 16.79, 15.6, 0.27),
    ScoreB = c(75.5, 73.4, 74.6, 69.2, 70.83, 70.34, 70.91, 71.19,
    98.08, 86.87, 0.3), ScoreBdv = c(9.043, 9.043, 9.043, 9.043,
    16.36, 17.78, 18.23, 18.93, 16.35, 15.73, 0.26), Sub1 = c(NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Sub2 = c(NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA), Sub3 = c(NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA), Sub4 = c(NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA), Sub5 = c(NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA), Sub6 = c(NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA), Sub7 = c(NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA), Sub8 = c(NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA), Sub8.1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA), Sub10 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA), yi = c(1.12396298138735, 1.09245000060079, 1.03992836595652,
    0.90337211588142, 0.514940166844419, 0.510422808437657, 0.629923007603453,
    0.487074464117519, 0.605177248294008, -0.26766583881062,
    0.150551071047105), vi = c(0.0064268782069221, 0.00637821308397186,
    0.00630017975096319, 0.00611528303580472, 0.233373905723904,
    0.212826775760406, 0.211924228535386, 0.222536036643126,
    0.224889816220824, 0.158901797586393, 0.0128772400934118)), .Names = 
c("Author",
"Year", "Number", "NumberA", "NumberB", "Age", "AgeA", "AgeB",
"Test", "ScoreA", "ScoreAdv", "ScoreB", "ScoreBdv", "Sub1", "Sub2",
"Sub3", "Sub4", "Sub5", "Sub6", "Sub7", "Sub8", "Sub8.1", "Sub10",
"yi", "vi"), class = "data.frame", row.names = c(NA, -11L))
>
I can use aggregate without problems
> test.ag<-aggregate(test[,-1], list(test[,1]), mean, na.rm=T)
Here is the result
> dput(test.ag)
structure(list(Group.1 = structure(1:5, .Label = c("Beck", "Con",
"Joll", "Per(a)", "Per(b)"), class = "factor"), Year = c(1988,
2012, 2006, 2004, 2004), Number = c(39, 312, 720, 19, 26), NumberA = c(5.75,
156, 344, 9, 12), NumberB = c(33.25, 156, 376, 10, 14), Age = c(25.5,
37.25, 15, 37.4, 37.2), AgeA = c(27, 38.3, NaN, NaN, NaN), AgeB = c(24,
36.2, NaN, NaN, NaN), Test = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), ScoreA = c(60.455, 0.26, 63.275, 87.58,
91.2), ScoreAdv = c(19.9525, 0.27, 9.955, 16.79, 15.6), ScoreB = c(70.8175,
0.3, 73.175, 98.08, 86.87), ScoreBdv = c(17.825, 0.26, 9.043,
16.35, 15.73), Sub1 = c(NaN, NaN, NaN, NaN, NaN), Sub2 = c(NaN,
NaN, NaN, NaN, NaN), Sub3 = c(NaN, NaN, NaN, NaN, NaN), Sub4 = c(NaN,
NaN, NaN, NaN, NaN), Sub5 = c(NaN, NaN, NaN, NaN, NaN), Sub6 = c(NaN,
NaN, NaN, NaN, NaN), Sub7 = c(NaN, NaN, NaN, NaN, NaN), Sub8 = c(NaN,
NaN, NaN, NaN, NaN), Sub8.1 = c(NaN, NaN, NaN, NaN, NaN), Sub10 = c(NaN,
NaN, NaN, NaN, NaN), yi = c(0.535590111750762, 0.150551071047105,
1.03992836595652, 0.605177248294008, -0.26766583881062), vi = 
c(0.220165236665706,
0.0128772400934118, 0.00630513851941547, 0.224889816220824, 0.158901797586393
)), .Names = c("Group.1", "Year", "Number", "NumberA", "NumberB",
"Age", "AgeA", "AgeB", "Test", "ScoreA", "ScoreAdv", "ScoreB",
"ScoreBdv", "Sub1", "Sub2", "Sub3", "Sub4", "Sub5", "Sub6", "Sub7",
"Sub8", "Sub8.1", "Sub10", "yi", "vi"), row.names = c(NA, -5L
), class = "data.frame")
You can see that the Test variable is remuved as it is not mumeric and cannot 
be averaged.
> test.ag$Test
[1] NA NA NA NA NA
length.  Regarding the test variable I want it to look the same as before.
You can check if Test variable is same across aggregated values.
> aggregate(test$Test, list(test[,1]), paste)
  Group.1                          x
1    Beck green, green, green, green
2     Con                       blue
3    Joll         red, red, red, red
4  Per(a)                      green
5  Per(b)                      green
and if yes you can pick up one
> aggregate(test$Test, list(test[,1]), function(x) x[1])
  Group.1     x
1    Beck green
2     Con  blue
3    Joll   red
4  Per(a) green
5  Per(b) green
I believe this can be accomplished also by other ways. Now you can add these 
values to aggregated data e.g. by.
test.ag$Test <- aggregate(test$Test, list(test[,1]), function(x) x[1])$x
> test.ag$Test
[1] green blue  red   green green
Levels: blue green red

I hope it solves your problem. Again please use plain text and dput for 
presenting data. It is much more convenient
Regards
Petr
Best,
Verena


On Thu, May 29, 2014 at 10:16 AM, PIKAL Petr 
<[email protected]<mailto:[email protected]>> wrote:
Hi

So what do you want to do with the test variable when averaging?
Did you try aggregate function?
What was results?

Please real data (at least structure) and code you used.

Regards
Petr

From: Verena Weinbir [mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, May 29, 2014 9:48 AM
To: PIKAL Petr

Cc: r-help
Subject: Re: [R] Dataframe: Average cells of two rows and replace them with one 
row

Hello,
thank you for your reply.

Actually, the whole rows would have to be averaged anyways - my mistake :-)
Besides the first column "name" there is one other string (chr) variable "Test" 
in the dataset (the rows I want to average have always the same Testvariable), 
the other variables are numeric or integer.
Best,
Verena

On Wed, May 28, 2014 at 2:57 PM, PIKAL Petr 
<[email protected]<mailto:[email protected]>> wrote:
Hi

AFAIK you can not average values only in 2 columns leaving others intact. The 
exact code depends on what are in columns 2-39 in your data frame. If numbers, 
you can averege them as well.

Something like

dat.ag<http://dat.ag> <- aggregate(dat[,-1], list(dat$Name), mean, na.rm=TRUE)

if your data frame is named dat and first column calls Name. You get new object 
with aggregated values for the same Name.

If some columns are nonnumeric the problem gets trickier and solution strongly 
depends what mode are those columns and what you want to do with them when 
aggregating values in column 40 and 41.

Show us at least structure of your data frame.

?str

Regards
Petr

> -----Original Message-----
> From: [email protected]<mailto:[email protected]> 
> [mailto:r-help-bounces@r-<mailto:r-help-bounces@r->
> project.org<http://project.org>] On Behalf Of Verena Weinbir
> Sent: Wednesday, May 28, 2014 2:00 PM
> To: arun
> Cc: r-help
> Subject: Re: [R] Dataframe: Average cells of two rows and replace them
> with one row
>
> Hey guys,
>
> thank you very much for your help.  Since I am a R-newbie I am still
> checking out how your code works and how I could adapt it to my
> dataframe,
> which has 124 rows and 41 columns/variables.  The first column would be
> "name", the last ones, 40 and 41, contain the cells I want to average
> for
> some rows. Is it possible to read the dataframe without copying the
> whole
> thing into the text"" function (just tried it and got an error
> message)?
>
> Thank you!
>
> Verena
>
>
> On Wed, May 28, 2014 at 3:48 AM, arun 
> <[email protected]<mailto:[email protected]>> wrote:
>
> > Hi,
> > You can also try:
> > dat <- read.table(text="Name C1 C2 C3
> >   1  A  3  3  5
> >   2  B  2  7  4
> >   3  C  4  3  3
> >   4  C  4  4  6
> >   5  D  5  5  3",sep="",header=TRUE,stringsAsFactors=FALSE)
> >
> >
> >  library(plyr)
> >  ddply(dat,.(Name),numcolwise(mean,na.rm=TRUE))
> > A.K.
> >
> >
> > On Tuesday, May 27, 2014 4:08 PM, Verena Weinbir 
> > <[email protected]<mailto:[email protected]>>
> > wrote:
> > Hello,
> >
> > I have a big dataframe, and want to average two specific cells of two
> > specific rows and then replace those two rows with one row which
> contains
> > the averaged cells. Example (row 3 and 4: Cells2 and Cells3 averaged
> and
> > replaced)
> >
> >     NameC1 C2 C3
> >   1  A  3  3  5
> >   2  B  2  7  4
> >   3  C  4  3  3
> >   4  C  4  4  6
> >   5  D  5  5  3
> >
> >
> >
> >     NameC1 C2  C3
> >   1  A  3  3   5
> >   2  B  2  7   4
> >   3  C  4  3.5 4.5  4  D  5  5   3
> >
> >
> > Many thanks in advance!
> >
> > Best,
> >
> > Verena
> >
> >     [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [email protected]<mailto:[email protected]> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected]<mailto:[email protected]> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
________________________________



________________________________
Tento e-mail a jakÃ©koliv k nÄmu pÅipojenÃ© dokumenty jsou dÅ¯vÄrnÃ© a jsou 
urÄeny pouze jeho adresÃ¡tÅ¯m.
JestliÅ¾e jste obdrÅ¾el(a) tento e-mail omylem, informujte laskavÄ neprodlenÄ 
jeho odesÃlatele. Obsah tohoto emailu i s pÅÃlohami a jeho kopie vymaÅ¾te ze 
svÃ©ho systÃ©mu.
Nejste-li zamÃ½Å¡lenÃ½m adresÃ¡tem tohoto emailu, nejste oprÃ¡vnÄni tento 
email jakkoliv uÅ¾Ãvat, rozÅ¡iÅovat, kopÃrovat Äi zveÅejÅovat.
OdesÃlatel e-mailu neodpovÃdÃ¡ za eventuÃ¡lnÃ Å¡kodu zpÅ¯sobenou 
modifikacemi Äi zpoÅ¾dÄnÃm pÅenosu e-mailu.

V pÅÃpadÄ, Å¾e je tento e-mail souÄÃ¡stÃ obchodnÃho jednÃ¡nÃ:
- vyhrazuje si odesÃlatel prÃ¡vo ukonÄit kdykoliv jednÃ¡nÃ o uzavÅenÃ 
smlouvy, a to z jakÃ©hokoliv dÅ¯vodu i bez uvedenÃ dÅ¯vodu.
- a obsahuje-li nabÃdku, je adresÃ¡t oprÃ¡vnÄn nabÃdku bezodkladnÄ 
pÅijmout; OdesÃlatel tohoto e-mailu (nabÃdky) vyluÄuje pÅijetÃ nabÃdky 
ze strany pÅÃjemce s dodatkem Äi odchylkou.
- trvÃ¡ odesÃlatel na tom, Å¾e pÅÃsluÅ¡nÃ¡ smlouva je uzavÅena teprve 
vÃ½slovnÃ½m dosaÅ¾enÃm shody na vÅ¡ech jejÃch nÃ¡leÅ¾itostech.
- odesÃlatel tohoto emailu informuje, Å¾e nenÃ oprÃ¡vnÄn uzavÃrat za 
spoleÄnost Å¾Ã¡dnÃ© smlouvy s vÃ½jimkou pÅÃpadÅ¯, kdy k tomu byl pÃsemnÄ 
zmocnÄn nebo pÃsemnÄ povÄÅen a takovÃ© povÄÅenÃ nebo plnÃ¡ moc byly 
adresÃ¡tovi tohoto emailu pÅÃpadnÄ osobÄ, kterou adresÃ¡t zastupuje, 
pÅedloÅ¾eny nebo jejich existence je adresÃ¡tovi Äi osobÄ jÃm zastoupenÃ© 
znÃ¡mÃ¡.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dataframe: Average cells of two rows and replace them with one row

Reply via email to