Chuck, Thanks for the reference. MCMC with "swapping" was my original goal and I think I'll go with that in the long run--although using sample() has worked out for now. I was initially concerned that checking all choose(n,2) possible swaps would slow down the process, but in my case for choose(30,2) = 435 this seems not unreasonable. I wonder if for larger vectors an alternative would be desired.
Thanks to everyone for your help-- Andy On Fri, Jan 29, 2010 at 8:23 PM, Charles C. Berry <cbe...@tajo.ucsd.edu>wrote: > On Fri, 29 Jan 2010, Andrew Rominger wrote: > > Being reasonably sure that all valid permutations are equally probable is >> important to me. I've played around with search algorithms in permuting >> contingency tables and find that possible solutions decrease rapidly once >> one starts assigning values, particularly if small values are assigned >> first, so it would seem all solutions are not equally probable (not only >> that but one frequently encounters "dead ends" where there are values left >> to assign and no allowable place to put them). As such I think I'd opt to >> use sample()... several times if needed. >> >> To clarify, yes, I only need one valid permutation, the idea is I'll >> generate 1000s of ordered vectors, and then for each one generate one >> valid >> permutation. >> >> Thanks very much for the help and insights-- >> Andy >> > > Andy, > > If you have some sense of importance sampling and/or MCMC you might look at > > Zaman and Simberloff (2002, Environmental and Ecological Statistics 9, > 405--421). > > which concerns sampling a binary matrix with fixed margins - not quite your > problem, but akin to it in being a combinatorial nightmare without an > obvious direct solution of workable size for real problems. > > They define a neighborhood for each allowable matrix s.t. swapping a pair > of 1's at ij and kl with a pair of 0's at il and kj (which doesn't violate > the margin constraints) leads to a member of the neighborhood. IIRC, the > size of the neighborhood and the sizes of the neighborhoods of the members > of its neighborhood determine the probabilities of staying put or moving to > get the next element of the MCMC chain and which member of the neighborhood > to choose. > > I suppose something like that (i.e. defining neighborhoods of allowable > permutations, measuring their size, and using this to guide sampling or > develop importance weights) might apply in your case. Maybe something like > this: start with an ordering of your n-vector that conforms to the > constraints, look at all the choose(n,2) pairs of elements and check which > of them could be exchanged to yield another conforming ordering; the > allowable swaps define the neighborhood, and their number is its size. > > So, the idea is to develop an MCMC sampler. Run it for each ordered vector > to get past the burn in, then take one value from it. > > HTH, > > Chuck > > >> >> On Thu, Jan 28, 2010 at 3:04 PM, Thomas Lumley <tlum...@u.washington.edu >> >wrote: >> >> On Thu, 28 Jan 2010, Jason Smith wrote: >>> >>> It wouldn't be guaranteed to produce any usable permutation, but it >>> seems >>> >>>> like it would be much faster and so could be repeated until an >>>>> acceptable >>>>> vector is found. What do you think? >>>>> >>>>> Thanks-- >>>>> Andy >>>>> >>>>> >>>>> I think I am not understanding what your ultimate goal is so I'm not >>>> sure I can give you appropriate advice. Are you looking for a single >>>> valid permutation or all of them? >>>> >>>> Since that constraint sets a ceiling on each subsequent value, it >>>> seems like you could solve this problem more easily and quickly by >>>> using a search strategy instead of random sampling or generating all >>>> permutations then testing. The constraint will help prune the search >>>> space so you only generate valid permutations. Once you are examining >>>> a particular element you can determine which of the additional >>>> elements would be valid, so only consider those. >>>> >>>> >>> It's easy to generate valid permutations this way. It does not appear >>> straightforward to ensure that all valid permutations are sampled with >>> equal >>> probability, which I thought was part of the specification of the >>> problem. >>> >>> -thomas >>> >>> >>> Thomas Lumley Assoc. Professor, Biostatistics >>> tlum...@u.washington.edu University of Washington, Seattle >>> >>> >> [[alternative HTML version deleted]] >> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > Charles C. Berry (858) 534-2098 > Dept of Family/Preventive > Medicine > E mailto:cbe...@tajo.ucsd.edu UC San Diego > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.