On 25-07-2013, at 08:35, Arnaud Michel <michel.arn...@cirad.fr> wrote:
> But I just noticed that the two solutions are not comparable : > the change concern only Nom and Prenom (solution Berend) and not also Sexe or > Date.de.naissance orother variables (solution Arun) that can changed. But my > question was badly put. Indeed:-) But that can be remedied with (small correction w.r.t. initial solution: drop=TRUE removed; not relevant here) r1 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule), FUN=function(x) {x[,1:ncol(x)] <- x[1,1:ncol(x)];x}))) and r2 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule), FUN=function(x) {x[,1:ncol(x)] <- x[nrow(x),1:ncol(x)];x}))) Less elegant than alternative with ave Berend > Michel > > Le 25/07/2013 08:06, Arnaud Michel a écrit : >> Hi >> >> For a dataframe with name PaysContrat1 and with >> nrow(PaysContrat1) >> [1] 52366 >> >> the test of system.time is : >> >> system.time(droplevels(do.call(rbind,lapply(split(PaysContrat1,PaysContrat1$Matricule), >> >> FUN=function(x) {x[,c("Nom","Prénom")] <- >> x[nrow(x),c("Nom","Prénom"),drop=TRUE];x})))) >> user system elapsed >> 14.03 0.00 14.04 >> >> system.time(droplevels(PaysContrat1[with(PaysContrat1,ave(seq_along(Matricule),Matricule,FUN=min)) >> ,] )) >> user system elapsed >> 0.2 0.0 0.2 >> >> Michel >> >> Le 24/07/2013 15:29, arun a écrit : >>> Hi Michel, >>> You could try: >>> >>> >>> df1New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=min)),]) >>> >>> row.names(df1New)<-1:nrow(df1New) >>> df2New<-droplevels(TEST[with(TEST,ave(seq_along(Matricule),Matricule,FUN=max)),]) >>> >>> row.names(df2New)<-1:nrow(df2New) >>> identical(df1New,df1) >>> #[1] TRUE >>> identical(df2New,df2) >>> #[1] TRUE >>> A.K. >>> >>> >>> >>> ----- Original Message ----- >>> From: Arnaud Michel <michel.arn...@cirad.fr> >>> To: R help <r-help@r-project.org> >>> Cc: >>> Sent: Wednesday, July 24, 2013 2:39 AM >>> Subject: [R] Change values in a dateframe >>> >>> Hello >>> >>> I have the following problem : >>> The dataframe TEST has multiple lines for a same person because : >>> there are differents values of Nom or differents values of Prenom >>> but the values of Matricule or Sexe or Date.de.naissance are the same. >>> >>> TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, >>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L, >>> 5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF", "GUTIER", >>> "JACQUE", "LANGUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT" >>> ), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L, >>> 2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie", "Jeanine", >>> "Jeannine", "Michel", "Michele", "Michèle", "Michelle", "Victor" >>> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L, >>> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class = >>> "factor"), >>> Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L, >>> 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947", >>> "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = >>> "factor")), .Names = c("Matricule", >>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", >>> row.names = c(NA, >>> -11L)) >>> >>> >>> I would want to make homogeneous the information and would like built 2 >>> dataframes : >>> df1 wich has the value of Nom and Prenom of the first lines of TEST when >>> there are different values. The other values (Matricule or Sexe or >>> Date.de.naissance) are unchanged >>> >>> df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, >>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L, >>> 5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF", "GUTIER", >>> "JACQUE", "LANGUE", "TRU", "VINCENT"), class = "factor"), Prenom = >>> structure(c(6L, >>> 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar", >>> "Elodie", "Jeanine", "Michel", "Michele", "Michelle", "Victor" >>> ), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L, >>> 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class = >>> "factor"), >>> Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L, >>> 1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947", >>> "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = >>> "factor")), .Names = c("Matricule", >>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", >>> row.names = c(NA, >>> -11L)) >>> >>> df2 wich has the value of Nom and Prenom of the last lines of TEST when >>> there are different values. The other values (Matricule or Sexe or >>> Date.de.naissance) are unchanged. >>> >>> df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L, >>> 91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L, >>> 4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF", "JACQUE", >>> "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"), class = "factor"), >>> Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, >>> 5L, 5L), .Label = c("Edgar", "Elodie", "Jeannine", "Michel", >>> "Michèle", "Michelle", "Victor"), class = "factor"), Sexe = >>> structure(c(1L, >>> 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", >>> "Masculin"), class = "factor"), Date.de.naissance = structure(c(4L, >>> 2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940", >>> "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", "29/12/1936", >>> "30/03/1935"), class = "factor")), .Names = c("Matricule", >>> "Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", >>> row.names = c(NA, >>> -11L)) >>> >>> Thank for your helps >>> Michel >>> >> > > -- > Michel ARNAUD > Chargé de mission auprès du DRH > DGDRD-Drh - TA 174/04 > Av Agropolis 34398 Montpellier cedex 5 > tel : 04.67.61.75.38 > fax : 04.67.61.57.87 > port: 06.47.43.55.31 > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.