On 11/05/2012 09:14 AM, Hermann Norpois wrote:
Hello,

I have start and end coordinates from different experiments (DNase
hypersensitivity data) and now I would like to combine overlapping
intervals. For instance (see my test data below) (2) 30-52 and (3) 49-101
are combined to 30-101. But 49-101 and 70-103 would not be combined because
they are on different chromosomes (chr a and chr b).
Does anybody have an idea?

This data is very naturally handled by the "GRange" class in Bioconductor's GenomicRanges package

  source("http://bioconductor.org/biocLite.R";)
  biocLite("GenomicRanges')
  library(GenomicRanges)

  gr = GRanges(rep(c("a", "b"), each=3),
               IRanges(c(5, 30, 49, 70, 100, 129),
                       c(10, 52, 101, 103, 130, 140)),
               strand="*")

and then

> reduce(gr)
GRanges with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]        a [ 5,  10]      *
  [2]        a [30, 101]      *
  [3]        b [70, 140]      *
  ---
  seqlengths:
    a  b
   NA NA

There are vignettes

  vignette(package="GenomicRanges")

and additional training material, e.g.,

  http://bioconductor.org/help/course-materials/2012/CSC2012/

If you pursue this solution then please follow-up with questions on the Bioconductor mailing list

  http://bioconductor.org/help/mailing-list/

Martin

Thanks
Hermann

df
   chr start end
1   a     5  10
2   a    30  52
3   a    49 101
4   b    70  103
5   b   100 130
6   b   129 140
dput (df)
structure(list(chr = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("a",
"b"), class = "factor"), start = c(5, 30, 49, 70, 100, 129),
     end = c(10, 52, 101, 103, 130, 140)), .Names = c("chr", "start",
"end"), row.names = c(NA, -6L), class = "data.frame")

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to