Keywords: C Programming | Random Number Generation | Uniform Distribution | Rejection Sampling | Integer Arithmetic
Abstract: This article provides an in-depth exploration of generating random integers within specified ranges in C programming. By analyzing common implementation errors, it explains why simple modulo operations lead to non-uniform distributions and presents a mathematically correct solution based on integer arithmetic. The article includes complete code implementations, mathematical principles, and practical application examples.
Introduction
Generating random numbers is a common requirement in programming practice, particularly in simulation, game development, and cryptography. However, many developers often use simple but incorrect implementations when working with C standard library functions, resulting in non-uniform distributions of generated random numbers. Based on high-quality discussions from Stack Overflow, this article deeply analyzes the mathematical principles of random number generation and provides correct implementation methods.
Common Errors and Mathematical Analysis
Many developers habitually use rand() % N to generate random numbers in the range [0, N), but this approach is mathematically incorrect. Only when N divides RAND_MAX + 1 (i.e., N is a power of 2) can this method produce a uniform distribution. In general cases, using modulo operations causes certain numbers to appear with higher probability than others.
For example, assuming RAND_MAX = 11 and we want to generate random numbers from 1 to 6 (simulating a die). If we use (rand() % 6) + 1, then:
- Numbers 1-5 appear with probability 2/12 ≈ 0.167
- Number 6 appears with probability 1/12 ≈ 0.083
This non-uniform distribution can cause serious problems in practical applications.
Correct Implementation Method
Based on the assumption of Poisson distribution, the correct approach is to divide the random number range into equally sized intervals. Here is the complete implementation code:
#include <stdlib.h>
// Assumes 0 <= max <= RAND_MAX
// Returns a random number in the closed interval [0, max]
long random_at_most(long max) {
unsigned long
num_bins = (unsigned long) max + 1,
num_rand = (unsigned long) RAND_MAX + 1,
bin_size = num_rand / num_bins,
defect = num_rand % num_bins;
long x;
do {
x = random();
}
while (num_rand - defect <= (unsigned long)x);
return x / bin_size;
}
Algorithm Principle Explanation
The core idea of this algorithm is to ensure uniform distribution through rejection sampling. The specific steps are:
- Parameter Calculation:
num_bins: Number of target intervals (max + 1)num_rand: Range of the random number generator (RAND_MAX + 1)bin_size: Size of each interval (floor division)defect: Excess portion that cannot be evenly distributed
- Rejection Sampling Loop:
The loop ensures that only random numbers that can be evenly distributed across all intervals are accepted. The rejection condition occurs when the random number falls within the last
defectvalues, as these values cannot be evenly allocated to all intervals. - Result Calculation:
Accepted random numbers are mapped to target intervals through integer division, ensuring each number appears with exactly the same probability.
Extended Applications
For more general [min, max] ranges, the following wrapper function can be used:
long random_in_range(long min, long max) {
if (min > max) {
// Error handling: swap min and max
long temp = min;
min = max;
max = temp;
}
return min + random_at_most(max - min);
}
Practical Application Examples
Dice Simulation: Generate random integers from 1 to 6:
long dice_roll = random_in_range(1, 6);
Lottery Number Generation: Select 5 non-repeating numbers from 1 to 49 (requires additional deduplication logic):
// Simplified example: generate single lottery number
long lottery_number = random_in_range(1, 49);
Types of Random Number Generators
According to the reference article classification, random number generators are mainly divided into two types:
- Pseudorandom Number Generator (PRNG): Uses deterministic algorithms to generate seemingly random sequences. C language's
rand()andrandom()functions belong to this category. They have reproducibility and are suitable for most application scenarios. - True Random Number Generator (TRNG): Based on randomness from physical processes, such as atmospheric noise or radioactive decay. These generators produce truly random numbers but typically require special hardware support.
Performance Considerations and Optimization
The rejection sampling method may require multiple calls to the random number generator in the worst case. For performance-sensitive applications, consider the following optimization strategies:
- Use more efficient random number generators (such as
random()instead ofrand()) - Precompute optimal parameters for specific ranges
- Use simpler methods in scenarios where approximate uniform distribution is acceptable
Conclusion
Generating random integers within specified ranges is a seemingly simple problem that contains profound mathematical principles. By understanding the requirements of uniform distribution and the principles of rejection sampling, developers can avoid common pitfalls and write correct and efficient random number generation code. The implementation method provided in this article not only guarantees mathematical correctness but also offers good readability and maintainability.