Re: [R] deduplication

2010-06-04 Thread Wu Gong
Please try this ## Import data id1<-c(4,17,9,1,1,1,3,3,6,15,1,1,1,1,3,3,3,3,4,4,4,5,5,12,9,9,10,10) id2<-c(8,18,10,3,6,7,6,7,7,16,4,5,12,18,4,5,12,18,5,12,18,12,18,18,15,16,15,16) id<-data.frame(id1 = id1, id2 = id2) ## Create same structure table id <- id0 <- unique(id) leng <- nrow(id) n <- 0

Re: [R] deduplication

2010-06-03 Thread Allan Engelhardt
Maybe something like the following will get you started: library("igraph") g <- graph.data.frame(id, directed=FALSE) neighborhood(g, +Inf) There is perhaps a more efficient way, but I hope this helps a little. Allan. On 03/06/10 14:14, Epi-schnier wrote: Colleagues, I am trying to de-dupli

[R] deduplication

2010-06-03 Thread Epi-schnier
Colleagues, I am trying to de-duplicate a large (long) database (approx 1mil records) of diagnostic tests. Individuals in the database can have up-to 25 observations, but most will have only one. IDs for de-duplication (names, sex, lab number...) are patchy. In a first step, I am using Andreas B