Thank you Berend
It is exactly what I wanted.
Michel
Le 24/07/2013 09:48, Berend Hasselman a écrit :
On 24-07-2013, at 08:39, Arnaud Michel <michel.arn...@cirad.fr> wrote:

Hello

I have the following problem :
The dataframe TEST has multiple lines for a same person because :
there are differents values of Nom or differents values of Prenom
but the values of Matricule or Sexe or Date.de.naissance are the same.

TEST <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 8L,
5L, 6L, 9L, 3L, 3L, 7L), .Label = c("CHICHE", "GEOF", "GUTIER",
"JACQUE", "LANGUE", "LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"
), class = "factor"), Prenom = structure(c(8L, 3L, 4L, 5L, 1L,
2L, 2L, 9L, 6L, 7L, 7L), .Label = c("Edgar", "Elodie", "Jeanine",
"Jeannine", "Michel", "Michele", "Michèle", "Michelle", "Victor"
), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class = "factor"),
    Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
    1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947",
    "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = "factor")), .Names = 
c("Matricule",
"Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names 
= c(NA,
-11L))


I would want to make homogeneous the information and would like built 2 
dataframes :
df1 wich has the value of Nom and Prenom of the first lines of TEST when there 
are different values. The other values (Matricule or Sexe or Date.de.naissance) 
are unchanged

df1 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 4L, 6L,
5L, 5L, 7L, 3L, 3L, 3L), .Label = c("CHICHE", "GEOF", "GUTIER",
"JACQUE", "LANGUE", "TRU", "VINCENT"), class = "factor"), Prenom = 
structure(c(6L,
3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L, 5L, 5L), .Label = c("Edgar",
"Elodie", "Jeanine", "Michel", "Michele", "Michelle", "Victor"
), class = "factor"), Sexe = structure(c(1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin", "Masculin"), class = "factor"),
    Date.de.naissance = structure(c(4L, 2L, 2L, 7L, 6L, 5L, 5L,
    1L, 3L, 3L, 3L), .Label = c("03/09/1940", "04/03/1946", "07/12/1947",
    "18/11/1945", "27/09/1947", "29/12/1936", "30/03/1935"), class = "factor")), .Names = 
c("Matricule",
"Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names 
= c(NA,
-11L))

df2 wich has the value of Nom and Prenom of the last lines of TEST when there 
are different values. The other values (Matricule or Sexe or Date.de.naissance) 
are unchanged.

df2 <- structure(list(Matricule = c(66L, 67L, 67L, 68L, 89L, 90L, 90L,
91L, 108L, 108L, 108L), Nom = structure(c(1L, 2L, 2L, 3L, 6L,
4L, 4L, 7L, 5L, 5L, 5L), .Label = c("CHICHE", "GEOF", "JACQUE",
"LANGUE-LOPEZ", "RIVIER", "TRU", "VINCENT"), class = "factor"),
    Prenom = structure(c(6L, 3L, 3L, 4L, 1L, 2L, 2L, 7L, 5L,
    5L, 5L), .Label = c("Edgar", "Elodie", "Jeannine", "Michel",
    "Michèle", "Michelle", "Victor"), class = "factor"), Sexe = structure(c(1L,
    1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("Féminin",
    "Masculin"), class = "factor"), Date.de.naissance = structure(c(4L,
    2L, 2L, 7L, 6L, 5L, 5L, 1L, 3L, 3L, 3L), .Label = c("03/09/1940",
    "04/03/1946", "07/12/1947", "18/11/1945", "27/09/1947", "29/12/1936",
    "30/03/1935"), class = "factor")), .Names = c("Matricule",
"Nom", "Prenom", "Sexe", "Date.de.naissance"), class = "data.frame", row.names 
= c(NA,
-11L))

Something like this

r1 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule),
                     FUN=function(x) {x[,c("Nom","Prenom")] <- 
x[1,c("Nom","Prenom"),drop=TRUE];x})))
rownames(r1) <- NULL
r1

r2 <- droplevels(do.call(rbind,lapply(split(TEST,TEST$Matricule),
                     FUN=function(x) {x[,c("Nom","Prenom")] <- 
x[nrow(x),c("Nom","Prenom"),drop=TRUE];x})))
rownames(r2) <- NULL
r2

#> identical(r1,df1)
#[1] TRUE
#> identical(r2,df2)
#[1] TRUE

Note: I had to change the Prenom and Sexe columns because of encoding issues. 
but that shouldn't have any influence on the above.

Berend




--
Michel ARNAUD
Chargé de mission auprès du DRH
DGDRD-Drh - TA 174/04
Av Agropolis 34398 Montpellier cedex 5
tel : 04.67.61.75.38
fax : 04.67.61.57.87
port: 06.47.43.55.31

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to