1) select only denied individuals from the original data. This is S.
2) There is a fixed sample size of exactly t
3) There is a fixed target sum T such that sum(t values from S) = T

You can reduce the problem. All large values where the max(S) + (t-1 smallest 
values) > T can be eliminated from S. Likewise if (t-1 max S) + min(S) < T then 
the smallest values can be eliminated.

You can estimate the number of options by asking how many ways can I draw t 
objects from the size of S with replacement. This is the size of S raised to 
the t power. If Joe has a score of 6 and Kim has a score of 6 these are 
different outcomes. If not, then S can be reduced to the number of unique 
values.

Could you randomly sample S, filter for the sum=T and then use the random 
sample? Do you need every case when there are millions of cases (or more)?

Is a long execution time important, or what do you consider a long execution 
time? Are you planning on looking at all t and T within a range?

Tim

-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Bert Gunter
Sent: Monday, April 14, 2025 4:16 PM
To: Duncan Murdoch <murdoch.dun...@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] Drawing a sample based on certain condition

[External Email]

Just a comment:

You wish to draw subsets of size n with or without replacement -- and I suspect 
without replacement is simpler than with -- from a set of positive integers 
that sum to a fixed value T. This sounds related to the so-called subset sum 
problem in computational complexity theory: Given a set S of positive integers 
and positive total, T, is there *any* subset of S (i.e. a sample without 
replacement) whose sum is T? This is known to be NP-complete. So this means 
that listing all such subsets must be NP-complete. I don't know whether 
specifying that, as you have, the size of the subsets must be a fixed number 
(obviously?) simplifies the problem sufficiently to make it computationally 
tractable; computational wizards or suitable web searches might tell you this. 
But for these reasons,  your query about how to sample the set of all such 
subsets -- both those drawn with or without replacement -- seems 
computationally difficult to me.


Cheers,
Bert



"An educated person is one who can entertain new ideas, entertain others, and 
entertain herself."



On Mon, Apr 14, 2025 at 8:37 AM Duncan Murdoch <murdoch.dun...@gmail.com>
wrote:

> On 2025-04-14 7:26 a.m., Brian Smith wrote:
> > Hi,
> >
> > For my analytical work, I need to draw a sample of certain sample
> > size from a denied population, where population members are marked
> > by non-negative integers, such that sum of sample members if fixed.
> > For example,
> >
> > Population = 0:100
> > Sample_size = 10
> > Sample_Sum = 20
> >
> > Under this setup if my sample members are X1, X2, ..., X10 then I
> > should have X1+X2+...+X10 = 20
> >
> > Sample drawing scheme may be with/without replacement
> >
> > Is there any R function to achieve this? One possibility is to
> > employ naive trial-error approach, but this doesnt seem to be
> > practical as it would take long time to get the final sample with desired 
> > properties.
> >
> > Any pointer would be greatly appreciated.
>
> One general way to think of this problem is that you are defining a
> distribution on the space of all possible samples of size 10, such
> that the probability of a sample is X if the sum is 20, and zero
> otherwise, and you want to sample from this distribution.
>
> There's probably a slick method to do that for your example, but if
> you've got a general population instead of that special one, I doubt it.
>
> What I would do is the following:
>
> Define another distribution on samples that has probabilities that
> depend on the sum of the sample, with the highest probabilities
> attached to ones with the correct sum, and probabilities for other
> sums declining with distance from the sum.  For example, maybe
>
>   P(sum) = Y/(1 + abs(sum - 20))
>
> for some constant Y.
>
> You can use MCMC to sample from that distribution and then only keep
> the samples where the sum is exactly equal to the target sum.  If you
> do that, you don't need to care about the value of Y. but you do need
> to think about how proposed moves are made, and you probably need to
> use a different function than the example above for acceptable efficiency.
>
> Duncan Murdoch
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat/
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C02%7Ctebert%40ufl.edu
> %7C722a00e77d1343db507d08dd7b914147%7C0d4da0f84a314d76ace60a62331e1b84
> %7C0%7C0%7C638802585985433701%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGki
> OnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ
> %3D%3D%7C0%7C%7C%7C&sdata=KNgaZ7e92Zfs51%2F%2B%2B9%2BQkGI5hiSj9OzhDtFz
> iWXx2D0%3D&reserved=0
> PLEASE do read the posting guide
> https://www/.
> r-project.org%2Fposting-guide.html&data=05%7C02%7Ctebert%40ufl.edu%7C7
> 22a00e77d1343db507d08dd7b914147%7C0d4da0f84a314d76ace60a62331e1b84%7C0
> %7C0%7C638802585985456014%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRy
> dWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%
> 3D%7C0%7C%7C%7C&sdata=Vhraq88ySxlq4IUEqvoeRvNdhjvliYdz7VDDvmQzmkE%3D&r
> eserved=0 and provide commented, minimal, self-contained, reproducible
> code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to