Hello,
Would around two orders of magnitude interess?
f1 <- function(Nodes, Weights){
drop.index <- duplicated(Nodes)
n.unique <- Nodes[!drop.index, ]
w.unique <- numeric(length(n.unique[,1]))
lw <- length(Weights)
for (i in seq_along(w.unique)){
index <- as.logical(2 == rowSums(
Nodes == matrix(rep(n.unique[i,],lw), byrow=TRUE,
nrow=lw)))
w.unique[i] <- sum(Weights[index])
}
list(n.unique=n.unique, w.unique=w.unique)
}
f2 <- function(Nodes, Weights){
rows <- paste(Nodes[,1], Nodes[,2], sep=".")
w.uniq <- tapply(Weights, rows, sum)
attributes(w.uniq) <- NULL
ord <- order(unique(rows))
list(n.unique=unique(Nodes), w.unique=w.uniq[order(ord)])
}
# Test it
M <- 100 # see text below
n <- 2e5
set.seed(1234)
nd <- matrix(sample(M, n*2, TRUE), n, 2)
ww <- rep(1, n)
t1 <- system.time(r1 <- f1(nd, ww))
t2 <- system.time(r2 <- f2(nd, ww))
identical(r1, r2)
print(rbind(t1=t1, t2=t2, ratio=t1/t2), digits=3)
user.self sys.self elapsed user.child sys.child
t1 310.41 67 379.07 NA NA
t2 5.59 0 5.62 NA NA
ratio 55.53 Inf 67.45 NA NA
With bigger M the number of pairwise combinations increases and so does
the number of unique rows. This causes the time taken by f1 to really
increase, but f2 scales up rather slowly. The ratio above becomes really
better and better.
Hope this helps,
Rui Barradas
Em 28-06-2012 14:06, Weiser, Constantin escreveu:
Hi, all together. I have - a maybe trivial - problem with aggregating a
list of weights.
Here is the problem:
- At first I have set of nodes (X/Y-coordinates) and associated weights,
where the set
of nodes is typically not unique
- I want to get a set of unique nodes and the sum of associated weights
I am grateful for any help
See for example:
# weights:
w <- c(1, 1, 1, 1, 1)
# not unique set of nodes (X/Y-coordinates):
nodes <- matrix(c(1,2,3,4,5,6,1,2,1,4),ncol=2, byrow=T)
desired Result:
#nodes
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
[4,] 1 4
#weights
2 1 1 1
That is my solution, but it is very slow (typical size of nodes -->
200000x2):
weights <- c(1, 1, 1, 1, 1)
nodes <- matrix(c(1,2,3,4,5,6,1,2,1,4),ncol=2, byrow=T)
## to be replaced by a faster code
drop.index <- duplicated(nodes)
n.unique <- nodes[!drop.index, ]
w.unique <- numeric(length(n.unique[,1]))
lw <- length(weights)
for (i in seq_along(w.unique)){
index <-
as.logical(2==rowSums(nodes==matrix(rep(n.unique[i,],lw),byrow
= TRUE, nrow=lw)))
w.unique[i] <- sum(weights[index])
}
##
n.unique
w.unique
^
| X
| /
| /eiser, Constantin
| / Gutenberg University of Mainz, Germany
| * /\ / Chair of Statistics & Econometrics
| \ / \ / Jakob-Welder-Weg 4, 55128 Mainz
| \/ \/ House of Law and Economics II, Room 00-116
| Tel: 0049 6131 39 22715
+--------------------------------------------------------->
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.