On Fri, 29 Jan 2010, Andrew Rominger wrote:

Being reasonably sure that all valid permutations are equally probable is
important to me.  I've played around with search algorithms in permuting
contingency tables and find that possible solutions decrease rapidly once
one starts assigning values, particularly if small values are assigned
first, so it would seem all solutions are not equally probable (not only
that but one frequently encounters "dead ends" where there are values left
to assign and no allowable place to put them).  As such I think I'd opt to
use sample()... several times if needed.

To clarify, yes, I only need one valid permutation, the idea is I'll
generate 1000s of ordered vectors, and then for each one generate one valid
permutation.

Thanks very much for the help and insights--
Andy

Andy,

If you have some sense of importance sampling and/or MCMC you might look at

Zaman and Simberloff (2002, Environmental and Ecological Statistics 9,
405--421).

which concerns sampling a binary matrix with fixed margins - not quite your problem, but akin to it in being a combinatorial nightmare without an obvious direct solution of workable size for real problems.

They define a neighborhood for each allowable matrix s.t. swapping a pair of 1's at ij and kl with a pair of 0's at il and kj (which doesn't violate the margin constraints) leads to a member of the neighborhood. IIRC, the size of the neighborhood and the sizes of the neighborhoods of the members of its neighborhood determine the probabilities of staying put or moving to get the next element of the MCMC chain and which member of the neighborhood to choose.

I suppose something like that (i.e. defining neighborhoods of allowable permutations, measuring their size, and using this to guide sampling or develop importance weights) might apply in your case. Maybe something like this: start with an ordering of your n-vector that conforms to the constraints, look at all the choose(n,2) pairs of elements and check which of them could be exchanged to yield another conforming ordering; the allowable swaps define the neighborhood, and their number is its size.

So, the idea is to develop an MCMC sampler. Run it for each ordered vector to get past the burn in, then take one value from it.

HTH,

Chuck



On Thu, Jan 28, 2010 at 3:04 PM, Thomas Lumley <tlum...@u.washington.edu>wrote:

On Thu, 28 Jan 2010, Jason Smith wrote:

 It wouldn't be guaranteed to produce any usable permutation, but it seems
like it would be much faster and so could be repeated until an acceptable
vector is found.  What do you think?

Thanks--
Andy


I think I am not understanding what your ultimate goal is so I'm not
sure I can give you appropriate advice.  Are you looking for a single
valid permutation or all of them?

Since that constraint sets a ceiling on each subsequent value, it
seems like you could solve this problem more easily and quickly by
using a search strategy instead of random sampling or generating all
permutations then testing.  The constraint will help prune the search
space so you only generate valid permutations.  Once you are examining
a particular element you can determine which of the additional
elements would be valid, so only consider those.


It's easy to generate valid permutations this way.  It does not appear
straightforward to ensure that all valid permutations are sampled with equal
probability, which I thought was part of the specification of the problem.

     -thomas


Thomas Lumley                   Assoc. Professor, Biostatistics
tlum...@u.washington.edu        University of Washington, Seattle


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu               UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to