Hi, It was pointed out to me by Hans Borchers that the timing that I reported (0.3 sec) in the previous email for solving the LSAP problem, for N=1000, was too optimistic, because "X" and "Y" were equivalent up to a permutation.
In order to test this out, I ran a few more experiments with different random variate distributions for X and Y. In all the experiments, I took N = 500. The execution times were faster when X and Y had the same or similar distributions. This is generally around 8 - 9 seconds. The more different the distributions are, the greater the time. For example, when I took the real and imaginary parts of X to be from a t-distribution with 3 degrees of freedom, and Y to be from uniform distribution in (0, 1), the execution times were around 80-90 seconds. n <- 500 x <- rt(n, df=3) + 1i * rt(n, df=3) y <- runif(n) + 1i * runif(n) Cmat <- outer(x, y, FUN=function(x,y) Mod(x - y)) system.time(ans <- solve_LSAP(Cmat, maximum=FALSE)) When I increased N = 1000, the time was about 1400 seconds! Ravi. ____________________________________________________________________ Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu ----- Original Message ----- From: Ravi Varadhan <rvarad...@jhmi.edu> Date: Saturday, November 14, 2009 10:53 am Subject: Re: [R] A combinatorial optimization problem: finding the best permutation of a complex vector To: "Charles C. Berry" <cbe...@tajo.ucsd.edu> Cc: r-help@r-project.org > Hi, > > I have solved the problem that I had posed before. Here is a > statement of the problem: > > "I have a complex-valued vector X in C^n. Given another > complex-valued vector Y in C^n, I want to find a permutation of Y, > say, Y*, that minimizes ||X - Y*||, the distance between X and Y*. " > > I was talking to Professor Moody T. Chu, who is a well-known numerical > analyst from NC State Univ, and he pointed out that this problem is an > instance of the classical "linear sum assignment problem (LSAP)" in > discrete mathematics. Once this was revealed to me, it didn't take me > long to find out the existence of various algorithms (e.g. Hungarian > algorithm) and codes (C, Fortran, Matlab) for solving this problem. I > also looked in the CRAN task view on optimization and found that the > LSAP solver is present in the "clue" package. Thanks to Kurt Hornik > for this package. > > So, here is an illustration of the "amazing" power of mathematics: > > n <- 1000 > > x <- rt(n, df=3) + 1i * rt(n, df=3) # this is the target vector to be > matched > > y <- x[sample(n)] # this is the vector to be permuted > > # Note: I have chosen a random permutation of the target so that I > know the answer is "x" itself > # and the minimum distance is zero > > Cmat <- outer(x, y, FUN=function(x,y) Mod(x - y)) > > require(clue) > > ans <- solve_LSAP(Cmat, maximum=FALSE) # We are minimizing the linear > sum > > dist <- function(x, y) sqrt(sum(Mod(x - y)^2)) > > dist(x, y[c(ans)]) > > > This is remarkable. It takes only about 0.3 seconds to solve this > difficult combinatorial problem! > > > Best, > Ravi. > > ____________________________________________________________________ > > Ravi Varadhan, Ph.D. > Assistant Professor, > Division of Geriatric Medicine and Gerontology > School of Medicine > Johns Hopkins University > > Ph. (410) 502-2619 > email: rvarad...@jhmi.edu > > > ----- Original Message ----- > From: "Charles C. Berry" <cbe...@tajo.ucsd.edu> > Date: Thursday, November 12, 2009 2:20 pm > Subject: Re: [R] A combinatorial optimization problem: finding the > best permutation of a complex vector > To: Ravi Varadhan <rvarad...@jhmi.edu> > Cc: r-help@r-project.org > > > > On Thu, 12 Nov 2009, Ravi Varadhan wrote: > > > > > > > > Hi, > > > > > > I have a complex-valued vector X in C^n. Given another > > complex-valued > > > vector Y in C^n, I want to find a permutation of Y, say, Y*, that > > > > minimizes ||X - Y*||, the distance between X and Y*. > > > > > > Note that this problem can be trivially solved for "Real" vectors, > > > since > > > real numbers possess the ordering property. Complex numbers, > > however, do > > > not possess this property. Hence the challenge. > > > > > > The obvious solution is to enumerate all the permutations of Y and > > > pick > > > out the one that yields the smallest distance. This, however, is > > > only > > > feasible for small n. I would like to be able to solve this for n > > > as > > > large as 100 - 1000, in which cases the permutation approach is > > > infeasible. > > > > > > I am looking for algorithms, possibly iterative, that can provide > a > > > > > "good" approximate solutions that are not necessarily optimal for > > > > high-dimensional vectors. I can do random sampling, but this can > be > > very > > > inefficient in high-dimensional problems. I am looking for > > efficient > > > algorithms because this step has to be performed in each iteration > > > of an > > > "outer" algorithm. > > > > > > Are there any clever adaptive algorithms out there? > > > > > > > I do not know. > > > > But would you settle for a not-so-clever adaptive heuristic? > > > > If so, see below. > > > > > > > > > Here is an example illustrating the problem: > > > > > > require(e1071) > > > > > > n <- 8 > > > x <- runif(n) + 1i * runif(n) > > > y <- runif(n) + 1i * runif(n) > > > > > > dist <- function(x, y) sqrt(sum(Mod(x - y)^2)) > > > > > > perms <- permutations(n) > > > dim(perms) # [1] 40320 8 > > > tmp <- apply(perms, 1, function(ord) dist(x, y[ord])) > > > z <- y[perms[which.min(tmp), ]] # exact solution > > > dist(x, z) > > > > > > # an aproximate random-sampling approach > > > nsamp <- 10000 > > > perms <- t(replicate(nsamp, sample(1:n, size=n, replace=FALSE))) > > > tmp <- apply(perms, 1, function(ord) dist(x, y[ord])) > > > z.app <- y[perms[which.min(tmp), ]] # approximate solution > > > dist(x, z.app) > > > > > > > The heuristic is to use a stochastic greedy updates. Here is a very > > > simple > > one: > > > > swap.samp <- > > function(index) { > > sub.ind <- sample(seq(along=index),2) > > index[sub.ind]<- rev(sub.ind) > > index > > } > > > > > > z.app <- y > > z.cand <- y > > > > for (i in 1:100) z.cand <- > > if( dist(x,z.app) < dist(x,z.cand) ) { > > > > z.app[swap.samp(1:8)] > > > > } else { > > z.app <- z.cand > > z.cand[swap.samp(1:8)] > > } > > > > On your toy example, this usually finds the min(dist(x,z.app)) in < > > > 100 > > trials. > > > > Note that when > > > > z.diff <- z.app != z.cand > > > > dist(x[ z.diff ],z.app[ z.diff ])^2 - dist(x[ z.diff ],z.cand[ > > z.diff ])^2 > > > > equals > > > > dist(x,z.app)^2 - dist(x,z.cand)^2 > > > > so you could vectorize the above to randomly pair up all the points > > > (for n > > even) then update the n%/%2 pairs all at once. > > > > > > For large problems, you might try to preferentially sample pairs of > > > points > > with similar values of Mod ( x - z.app ) to begin with. The > heuristic > > > > being that in two pairs of points that are both distant, swapping > them > > > > might realize a bigger benefit. > > > > > > -- > > > > Another version is to sample k points, randomly permute, then do a > > greedy > > update: > > > > sub.samp <- > > function(index,n) { > > sub.ind <- sample(seq(along=index),n) > > index[sub.ind]<- sample(sub.ind) > > index > > } > > > > > > # k is 4 here > > > > for (i in 1:100) z.cand <- > > if( dist(x,z.app) < dist(x,z.cand) ){ > > z.app[sub.samp(1:8,4)] > > } else { > > z.app <- z.cand > > z.cand[sub.samp(1:8,4)] > > } > > > > > > On your toy problem, this takes in the low 100's to find the min. > > > > I think you can do the similar vectorization trick here. > > -- > > > > I suppose another variant would repeatedly sample k points, then > > enumerate > > distances for all permutations of them, then choose the best one to > > > update z.app. > > > > Again, in larger problems, one might weight those k points towards > > higher > > values of Mod( x - z.app ) at least to begin with. > > > > HTH, > > > > Chuck > > > > > > > Thanks for any help. > > > > > > Best, > > > Ravi. > > > > > > > > > ____________________________________________________________________ > > > > > > Ravi Varadhan, Ph.D. > > > Assistant Professor, > > > Division of Geriatric Medicine and Gerontology > > > School of Medicine > > > Johns Hopkins University > > > > > > Ph. (410) 502-2619 > > > email: rvarad...@jhmi.edu > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > > > > PLEASE do read the posting guide > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > Charles C. Berry (858) 534-2098 > > Dept of > Family/Preventive > > Medicine > > E UC San Diego > > La Jolla, San Diego 92093-0901 > > > > > > ______________________________________________ > R-help@r-project.org mailing list > > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.