Re: [R] Merging fully overlapping groups

Petr Savicky Wed, 14 Mar 2012 13:12:23 -0700

On Tue, Mar 13, 2012 at 08:56:33PM -0700, mdvaan wrote:
> Hi,
> 
> I have data on individuals (B) who participated in events (A). If ALL
> participants in an event are a subset of the participants in another event I
> would like to remove the smaller event and if the participants in one event
> are exactly similar to the participants in another event I would like to
> remove one of the events (I don't care which one). The following example
> does that however it is extremely slow (and the true dataset is very large).
> What would be a more efficient way to solve the problem? I really appreciate
> your help. Thanks!  
> 
> DF <- data.frame(read.table(textConnection("  A  B
> 12095  69832
> 12095  51750
...


Hi.

Try the following.

  data <- unique(DF$A)
  gr <- split(DF$B, f=factor(DF$A, levels=data))
  gr <- lapply(gr, FUN=sort)
  gr <- lapply(gr, FUN=unique)
  elim <- rep(FALSE, times=length(gr))
  for (i in seq.int(along=gr)) {
      gr.i <- gr[[i]]
      for (j in seq.int(along=gr)) {
          gr.j <- gr[[j]]
          if (j < i && identical(gr.i, gr.j)) {
              elim[i] <- TRUE
          } else if (i != j) {
              both <- unique(sort(c(gr.i, gr.j)))
              if (identical(gr.j, both) && !identical(gr.i, both)) {
                  elim[i] <- TRUE
              }
          }
      }
  }
  DF1 <- DF[DF$A %in% data[!elim], ]

How frequent it is that an event is eliminated in the real data?

Petr Savicky.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merging fully overlapping groups

Reply via email to