1) select only denied individuals from the original data. This is S. 2) There is a fixed sample size of exactly t 3) There is a fixed target sum T such that sum(t values from S) = T
You can reduce the problem. All large values where the max(S) + (t-1 smallest values) > T can be eliminated from S. Likewise if (t-1 max S) + min(S) < T then the smallest values can be eliminated. You can estimate the number of options by asking how many ways can I draw t objects from the size of S with replacement. This is the size of S raised to the t power. If Joe has a score of 6 and Kim has a score of 6 these are different outcomes. If not, then S can be reduced to the number of unique values. Could you randomly sample S, filter for the sum=T and then use the random sample? Do you need every case when there are millions of cases (or more)? Is a long execution time important, or what do you consider a long execution time? Are you planning on looking at all t and T within a range? Tim -----Original Message----- From: R-help <r-help-boun...@r-project.org> On Behalf Of Bert Gunter Sent: Monday, April 14, 2025 4:16 PM To: Duncan Murdoch <murdoch.dun...@gmail.com> Cc: r-help@r-project.org Subject: Re: [R] Drawing a sample based on certain condition [External Email] Just a comment: You wish to draw subsets of size n with or without replacement -- and I suspect without replacement is simpler than with -- from a set of positive integers that sum to a fixed value T. This sounds related to the so-called subset sum problem in computational complexity theory: Given a set S of positive integers and positive total, T, is there *any* subset of S (i.e. a sample without replacement) whose sum is T? This is known to be NP-complete. So this means that listing all such subsets must be NP-complete. I don't know whether specifying that, as you have, the size of the subsets must be a fixed number (obviously?) simplifies the problem sufficiently to make it computationally tractable; computational wizards or suitable web searches might tell you this. But for these reasons, your query about how to sample the set of all such subsets -- both those drawn with or without replacement -- seems computationally difficult to me. Cheers, Bert "An educated person is one who can entertain new ideas, entertain others, and entertain herself." On Mon, Apr 14, 2025 at 8:37 AM Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 2025-04-14 7:26 a.m., Brian Smith wrote: > > Hi, > > > > For my analytical work, I need to draw a sample of certain sample > > size from a denied population, where population members are marked > > by non-negative integers, such that sum of sample members if fixed. > > For example, > > > > Population = 0:100 > > Sample_size = 10 > > Sample_Sum = 20 > > > > Under this setup if my sample members are X1, X2, ..., X10 then I > > should have X1+X2+...+X10 = 20 > > > > Sample drawing scheme may be with/without replacement > > > > Is there any R function to achieve this? One possibility is to > > employ naive trial-error approach, but this doesnt seem to be > > practical as it would take long time to get the final sample with desired > > properties. > > > > Any pointer would be greatly appreciated. > > One general way to think of this problem is that you are defining a > distribution on the space of all possible samples of size 10, such > that the probability of a sample is X if the sum is 20, and zero > otherwise, and you want to sample from this distribution. > > There's probably a slick method to do that for your example, but if > you've got a general population instead of that special one, I doubt it. > > What I would do is the following: > > Define another distribution on samples that has probabilities that > depend on the sum of the sample, with the highest probabilities > attached to ones with the correct sum, and probabilities for other > sums declining with distance from the sum. For example, maybe > > P(sum) = Y/(1 + abs(sum - 20)) > > for some constant Y. > > You can use MCMC to sample from that distribution and then only keep > the samples where the sum is exactly equal to the target sum. If you > do that, you don't need to care about the value of Y. but you do need > to think about how proposed moves are made, and you probably need to > use a different function than the example above for acceptable efficiency. > > Duncan Murdoch > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat/ > .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C02%7Ctebert%40ufl.edu > %7C722a00e77d1343db507d08dd7b914147%7C0d4da0f84a314d76ace60a62331e1b84 > %7C0%7C0%7C638802585985433701%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGki > OnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ > %3D%3D%7C0%7C%7C%7C&sdata=KNgaZ7e92Zfs51%2F%2B%2B9%2BQkGI5hiSj9OzhDtFz > iWXx2D0%3D&reserved=0 > PLEASE do read the posting guide > https://www/. > r-project.org%2Fposting-guide.html&data=05%7C02%7Ctebert%40ufl.edu%7C7 > 22a00e77d1343db507d08dd7b914147%7C0d4da0f84a314d76ace60a62331e1b84%7C0 > %7C0%7C638802585985456014%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRy > dWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D% > 3D%7C0%7C%7C%7C&sdata=Vhraq88ySxlq4IUEqvoeRvNdhjvliYdz7VDDvmQzmkE%3D&r > eserved=0 and provide commented, minimal, self-contained, reproducible > code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.