Quote:
Originally Posted by Charles Claythorne
I am also fairly surprised that five million trials is approaching the theoretical probabilities to only two significant digits. Statistics is weird!!!
|
Actually, we can calculate this as well.
Let n be 5000000. Let p be the probability of getting exactly 3 hearts for one draw, or 2717/33320 (as shown already in this thread).
Let z be the 1-a/2 percentile of a standard normal distribution, and a be the error percentile. The confidence interval is
p +/- z sqrt(p(1-p)/n)
For a 50% confidence level, z = 0.674. The confidence interval is between 8.146% and 8.163%.
For a 99% confidence level, z = 2.576. The confidence interval is between 8.123% and 8.186%.
In order for your confidence interval to shrink by a factor of 10 [due to sample size], or become closer to the theoretical value by one more significant digit, the second term needs to differ by a factor of 10. The variable n only appears once, and it is inside the root, so it's quite easy to see that
the number of trials needs to be increased about 100-fold. However, it should be noted that this is less accurate for small values of n and probabilities close to 0 and 1. Going backwards we can estimate 50,000 trials is accurate to a few percent, and 500 trials is accurate to a few ten-percent intervals.
This is one of the rough approximations used for finding confidence intervals of binomial distributions. There are more accurate ones, but this one should give you a pretty good idea.
Wrote some javascript code in like 2 minutes for anyone who wants to try:
Code:
<script>
// heartdraw.html
var start = Date.now();
var deck = 52;
var hearts = 13;
var errors = [];
// CHANGE THE FOLLOWING VARIABLES TO CHOOSE HOW MANY TRIALS AND DRAWS PER TRIAL
var trials = 5;
var totaldraws = 50000;
// Test trials
for (var n = 0; n < trials; n++) {
var success = 0;
for (var k = 0; k < totaldraws; k++) {
if (Math.random() < 2717/33320) success++;
// The below is commented out because it's brute force drawing and slower.
// However, I kept it in in case anyone wants to try other draws at the cost of not needing to calculate the theoretical probabilties.
/*var handhearts = 0;
var decksize = 52;
var heartsleft = 13;
for (var i = 0; i < 5; i++) {
if (Math.random() < heartsleft/decksize) {
heartsleft--;
handhearts++;
}
decksize--;
}
if (handhearts === 3) success++;*/
}
// Display trial results
if (trials <= 20) { // change the numerical value for display option
var rate = success/totaldraws;
var theoretical = 2717/33320;
var error = (rate/theoretical-1)*100;
errors.push(error);
console.log('Out of ' + totaldraws + ' draws, ' + success + ' were successes.');
if (trials <= 5) { // change the numerical value for display option
console.log('Percentage of successful draws: ' + rate);
console.log('Theoretical success rate: ' + theoretical);
console.log('Percent error: ' + error);
}
}
}
// Check trial cumulative stats
if (trials > 1) {
var maxerror = errors[0];
var minerror = errors[0];
var sum = errors[0];
for (var n = 1; n < errors.length; n++) {
if (errors[n] > maxerror) maxerror = errors[n];
else if (errors[n] < minerror) minerror = errors[n];
sum = sum + errors[n];
}
var mean = sum/errors.length;
console.log('------------ Trial overall results ------------');
console.log('Mean trial error (percent): ' + mean);
console.log('Range of trial errors (percent): ' + minerror + ' to ' + maxerror);
}
console.log('Time elapsed: ' + ((Date.now()-start)/1000) + ' seconds.');
</script>
I also haven't done this stuff in a long ass time so if I made a mistake anywhere someone please correct me on it.