Hi:
>From what I can tell, Henrik efficiently finds the 50 largest values without
the matrix
indices and Peter efficiently finds the matrix indices without the
corresponding values.
Let's combine the two:
x <- rnorm(8e6)
is.na(x) <- sample(8e6, 1e6)
n <- 50
x1 <- sort(x, decreasing=TRUE)[1:n]
# F
You might also want to consider _partial sorting_ by using the
'partial' argument of sort(), especially when the number of data
points is really large.
Since argument 'decreasing=FALSE' is not supported when using
'partial', you have to flip it yourself by negating the values, e.g.
x <- rnorm(8e6
m <- matrix(round(rnorm(4000 * 2000), 4), nr = 4000)
is.na(m) <- sample(8e6, 1e6)
system.time(
idx <- which(
matrix(m %in% head(sort(m, TRUE), 50),
nr = nrow(m)), arr.ind = TRUE))
# user system elapsed
# 3.120.193.18
-Peter Ehlers
On 2010-06-18 5:13, Dennis Mu
Hi:
Here's a faked up example:
a <- matrix(rnorm(4000*2000), 4000, 2000)
# Generate some NAs in the matrix
nr <- sample(50, 1:4000)
nc <- sample(50, 1:2000)
a[nr, nc] <- NA
# convert to data frame:
b <- data.frame(row = rep(1:4000, 2000), col = rep(1:2000, each = 4000),
Matrix is just a vector. So order should work
haven't verified the following code.
a <- matrix(rnorm(4000*2000), 4000, 2000)
b <- order(a, na.last=TRUE, decreasing=TRUE)[1:50]
use %% or %/% to get the row# and column #s
Nikhil Kaza
Asst. Professor,
City and Regional Planning
University of Nor
Hi,
I have a huge matrix (4000 * 2000 data points) and I would like to retrieve
the coordinates (column and row) for the top 50 (or x) values. Some
positions in the matrix have NA as a value. These should be discarded.
My current method is to replace all NAs by 0, then rank all the values and
6 matches
Mail list logo