Hi Michel, Sorry, I misunderstood your question. You could also try: library(plyr) df1New<-droplevels(ddply(TEST,.(Matricule),function(x) {x[,c("Nom","Prenom")]<- x[1,c("Nom","Prenom")];x})) df2New<-droplevels(ddply(TEST,.(Matricule),function(x) {x[,c("Nom","Prenom")]<- x[nrow(x),c("Nom","Prenom")];x})) identical(df1,df1New) #[1] TRUE identical(df2,df2New) #[1] TRUE
#or library(data.table) dt1<- data.table(TEST) dt2<-dt1 dt1[,Nom:=head(Nom,1),by=Matricule] dt1[,Prenom:=head(Prenom,1),by=Matricule] identical(df1,droplevels(as.data.frame(dt1))) #[1] TRUE dt2[,Nom:=tail(Nom,1),by=Matricule] dt2[,Prenom:=tail(Prenom,1),by=Matricule] identical(df2,droplevels(as.data.frame(dt2))) #[1] TRUE #If you are considering speed, then ?data.table() would be useful set.seed(28) dfTest<- as.data.frame(matrix(sample(1:50,1e6*5,replace=TRUE),ncol=5)) system.time({res1<-do.call(rbind,lapply(split(dfTest,dfTest$V1), FUN=function(x) {x[,c("V2","V3")] <- x[1,c("V2","V3")];x}))}) # user system elapsed # 4.452 0.036 4.499 row.names(res1)<-1:nrow(res1) dtNew<- data.table(dfTest) system.time({dtNew[,V2:=head(V2,1),by=V1] dtNew[,V3:=head(V3,1),by=V1] dtNew<-dtNew[order(V1)] #here, the dataset was not pre-sorted, so just to keep the same order as the above solution }) # user system elapsed # 0.132 0.000 0.133 identical(res1,as.data.frame(dtNew)) #[1] TRUE A.K. ----- Original Message ----- From: Arnaud Michel <michel.arn...@cirad.fr> To: Berend Hasselman <b...@xs4all.nl> Cc: R help <r-help@r-project.org> Sent: Thursday, July 25, 2013 3:59 AM Subject: Re: [R] Change values in a dateframe-Speed TEST Le 25/07/2013 08:50, Berend Hasselman a écrit : > On 25-07-2013, at 08:35, Arnaud Michel <michel.arn...@cirad.fr> wrote: > >> But I just noticed that the two solutions are not comparable : >> the change concern only Nom and Prenom (solution Berend) and not also Sexe >> or Date.de.naissance orother variables (solution Arun) that can changed. But >> my question was badly put. > Indeed:-) > > But that can be remedied with (small correction w.r.t. initial solution: > drop=TRUE removed; not relevant here) > > r1 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule), > FUN=function(x) {x[,1:ncol(x)] <- x[1,1:ncol(x)];x}))) > > and > > r2 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule), > FUN=function(x) {x[,1:ncol(x)] <- >x[nrow(x),1:ncol(x)];x}))) Thank you but I keep {x[,c("Nom","Prénom")] <- x[nrow(x),c("Nom","Prénom")];x} because in the dataframe there are other variables that I do not want to change. I want change only "Nom" and "Prénom" PS : ?w.r.t. Michel > Less elegant than alternative with ave > > Berend > >> Michel >> >> Le 25/07/2013 08:06, Arnaud Michel a écrit : >>> Hi >>> >>> For a dataframe with name PaysContrat1 and with >>> nrow(PaysContrat1) >>> [1] 52366 >>> >>> the test of system.time is : >>> >>> system.time(droplevels(do.call(rbind,lapply(split(PaysContrat1,PaysContrat1$Matricule), >>> FUN=function(x) {x[,c("Nom","Prénom")] <- >>> x[nrow(x),c("Nom","Prénom"),drop=TRUE];x})))) >>> user system elapsed >>> 14.03 0.00 14.04 >>> >>> system.time(droplevels(PaysContrat1[with(PaysContrat1,ave(seq_along(Matricule),Matricule,FUN=min)) >>> ,] )) >>> user system elapsed >>> 0.2 0.0 0.2 >>> >>> Michel >>> >>> Le 24/07/2013 15:29, arun a écrit : >>>> Hi Michel, >>>> You could try: >>>> >>>> >>>> df1New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=min)),]) >>>> row.names(df1New)<-1:nrow(df1New) >>>> df2New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=max)),]) >>>> row.names(df2New)<-1:nrow(df2New) >>>> identical(df1New,df1) >>>> #[1] TRUE >>>> identical(df2New,df2) >>>> #[1] TRUE >>>> A.K. >>>> >>>> >>>> >>>> ----- Original Message ----- >>>> From: Arnaud Michel <michel.arn...@cirad.fr> >>>> To: R help <r-help@r-project.org> >>>> Cc: >>>> Sent: Wednesday, July 24, 2013 2:39 AM >>>> Subject: [R] Change values in a dateframe >>>> >>>> Hello >>>> >>>> I have the following problem : >>>> The dataframe TEST has multiple lines for a same person because : >>>> there are differents values of Nom or differents values of Prenom >>>> but the values of Matricule or Sexe or Date.de.naissance are the same. >>>> >>>> TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, >>>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L, >>>> 5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF", "GUTIER", >>>> "JACQUE", "LANGUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT" >>>> ), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L, >>>> 2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie", "Jeanine", >>>> "Jeannine", "Michel", "Michele", "Michèle", "Michelle", "Victor" >>>> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L, >>>> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class = >>>> "factor"), >>>> Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L, >>>> 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947", >>>> "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = >>>> "factor")), .Names = c("Matricule", >>>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", >>>> row.names = c(NA, >>>> -11L)) >>>> >>>> >>>> I would want to make homogeneous the information and would like built 2 >>>> dataframes : >>>> df1 wich has the value of Nom and Prenom of the first lines of TEST when >>>> there are different values. The other values (Matricule or Sexe or >>>> Date.de.naissance) are unchanged >>>> >>>> df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, >>>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L, >>>> 5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF", "GUTIER", >>>> "JACQUE", "LANGUE", "TRU", "VINCENT"), class = "factor"), Prenom = >>>> structure(c(6L, >>>> 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar", >>>> "Elodie", "Jeanine", "Michel", "Michele", "Michelle", "Victor" >>>> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L, >>>> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class = >>>> "factor"), >>>> Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L, >>>> 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947", >>>> "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = >>>> "factor")), .Names = c("Matricule", >>>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", >>>> row.names = c(NA, >>>> -11L)) >>>> >>>> df2 wich has the value of Nom and Prenom of the last lines of TEST when >>>> there are different values. The other values (Matricule or Sexe or >>>> Date.de.naissance) are unchanged. >>>> >>>> df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, >>>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L, >>>> 4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF", "JACQUE", >>>> "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"), class = "factor"), >>>> Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, >>>> 5L, 5L), .Label = c("Edgar", "Elodie", "Jeannine", "Michel", >>>> "Michèle", "Michelle", "Victor"), class = "factor"), Sexe = >>>> structure(c(1L, >>>> 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", >>>> "Masculin"), class = "factor"), Date.de.naissance = structure(c(4L, >>>> 2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940", >>>> "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", "29/12/1936", >>>> "30/03/1935"), class = "factor")), .Names = c("Matricule", >>>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", >>>> row.names = c(NA, >>>> -11L)) >>>> >>>> Thank for your helps >>>> Michel >>>> >> -- >> Michel ARNAUD >> Chargé de mission auprès du DRH >> DGDRD-Drh - TA 174/04 >> Av Agropolis 34398 Montpellier cedex 5 >> tel : 04.67.61.75.38 >> fax : 04.67.61.57.87 >> port: 06.47.43.55.31 >> > -- Michel ARNAUD Chargé de mission auprès du DRH DGDRD-Drh - TA 174/04 Av Agropolis 34398 Montpellier cedex 5 tel : 04.67.61.75.38 fax : 04.67.61.57.87 port: 06.47.43.55.31 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.