Here is a modification of the algorithm to use a specified value for the overlap:
> vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8) > # following add 0.5 as the overlap detection -- can be changed > x <- rbind(cbind(value=vector, oper=1, id=seq_along(vector)), + cbind(value=vector+0.5, oper=-1, id=seq_along(vector))) > x <- x[order(x[,'value'], -x[, 'oper']),] > # determine which ones overlap > x <- cbind(x, over=cumsum(x[, 'oper'])) > # now partition into groups and only use groups greater than or equal to 3 > # determine where the breaks are (0 values in cumsum(over)) > x <- cbind(x, breaks=cumsum(x[, 'over'] == 0)) > # delete entries with 'over' == 0 > x <- x[x[, 'over'] != 0,] > # split into groupd > x.groups <- split(x[, 'id'], x[, 'breaks']) > # only keep those with more than 2 > x.subsets <- x.groups[sapply(x.groups, length) >= 3] > # print out the subsets > invisible(lapply(x.subsets, function(a) print(vector[unique(a)]))) [1] 0.00 0.45 [1] 3.00 3.25 3.33 3.75 4.10 [1] 6.00 6.45 [1] 7.0 7.1 On Dec 21, 2007 4:56 AM, Johannes Graumann <[EMAIL PROTECTED]> wrote: > <posted & mailed> > > Dear all, > > I'm trying to solve the problem, of how to find clusters of values in a > vector that are closer than a given value. Illustrated this might look as > follows: > > vector <- c(0,0.45,1,2,3,3.25,3.33,3.75,4.1,5,6,6.45,7,7.1,8) > > When using '0.5' as the proximity requirement, the following groups would > result: > 0,0.45 > 3,3.25,3.33,3.75,4.1 > 6,6.45 > 7,7.1 > > Jim Holtman proposed a very elegant solution in > http://tolstoy.newcastle.edu.au/R/e2/help/07/07/21286.html, which I have > modified and perused since he wrote it to me. The beauty of this approach > is that it will not only work for constant proximity requirements as above, > but also for overlap-windows defined in terms of ppm around each value. > Now I have an additional need and have found no way (short of iteratively > step through all the groups returned) to figure out how to do that with > Jim's approach: how to figure out that 6,6.45 and 7,7.1 are separate > clusters? > > Thanks for any hints, Joh > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.