Generating Random Integers Within a Specified Range in C: Theory and Practice

Keywords: C Programming | Random Number Generation | Uniform Distribution | Rejection Sampling | Integer Arithmetic

Abstract: This article provides an in-depth exploration of generating random integers within specified ranges in C programming. By analyzing common implementation errors, it explains why simple modulo operations lead to non-uniform distributions and presents a mathematically correct solution based on integer arithmetic. The article includes complete code implementations, mathematical principles, and practical application examples.

Introduction

Generating random numbers is a common requirement in programming practice, particularly in simulation, game development, and cryptography. However, many developers often use simple but incorrect implementations when working with C standard library functions, resulting in non-uniform distributions of generated random numbers. Based on high-quality discussions from Stack Overflow, this article deeply analyzes the mathematical principles of random number generation and provides correct implementation methods.

Common Errors and Mathematical Analysis

Many developers habitually use rand() % N to generate random numbers in the range [0, N), but this approach is mathematically incorrect. Only when N divides RAND_MAX + 1 (i.e., N is a power of 2) can this method produce a uniform distribution. In general cases, using modulo operations causes certain numbers to appear with higher probability than others.

For example, assuming RAND_MAX = 11 and we want to generate random numbers from 1 to 6 (simulating a die). If we use (rand() % 6) + 1, then:

Numbers 1-5 appear with probability 2/12 ≈ 0.167
Number 6 appears with probability 1/12 ≈ 0.083

This non-uniform distribution can cause serious problems in practical applications.

Correct Implementation Method

Based on the assumption of Poisson distribution, the correct approach is to divide the random number range into equally sized intervals. Here is the complete implementation code:

#include <stdlib.h>

// Assumes 0 <= max <= RAND_MAX
// Returns a random number in the closed interval [0, max]
long random_at_most(long max) {
  unsigned long
    num_bins = (unsigned long) max + 1,
    num_rand = (unsigned long) RAND_MAX + 1,
    bin_size = num_rand / num_bins,
    defect   = num_rand % num_bins;

  long x;
  do {
    x = random();
  }
  while (num_rand - defect <= (unsigned long)x);

  return x / bin_size;
}

Algorithm Principle Explanation

The core idea of this algorithm is to ensure uniform distribution through rejection sampling. The specific steps are:

Parameter Calculation:
- num_bins: Number of target intervals (max + 1)
- num_rand: Range of the random number generator (RAND_MAX + 1)
- bin_size: Size of each interval (floor division)
- defect: Excess portion that cannot be evenly distributed
Rejection Sampling Loop:
The loop ensures that only random numbers that can be evenly distributed across all intervals are accepted. The rejection condition occurs when the random number falls within the last defect values, as these values cannot be evenly allocated to all intervals.
Result Calculation:
Accepted random numbers are mapped to target intervals through integer division, ensuring each number appears with exactly the same probability.

Extended Applications

For more general [min, max] ranges, the following wrapper function can be used:

long random_in_range(long min, long max) {
  if (min > max) {
    // Error handling: swap min and max
    long temp = min;
    min = max;
    max = temp;
  }
  return min + random_at_most(max - min);
}

Practical Application Examples

Dice Simulation: Generate random integers from 1 to 6:

long dice_roll = random_in_range(1, 6);

Lottery Number Generation: Select 5 non-repeating numbers from 1 to 49 (requires additional deduplication logic):

// Simplified example: generate single lottery number
long lottery_number = random_in_range(1, 49);

Types of Random Number Generators

According to the reference article classification, random number generators are mainly divided into two types:

Pseudorandom Number Generator (PRNG): Uses deterministic algorithms to generate seemingly random sequences. C language's rand() and random() functions belong to this category. They have reproducibility and are suitable for most application scenarios.
True Random Number Generator (TRNG): Based on randomness from physical processes, such as atmospheric noise or radioactive decay. These generators produce truly random numbers but typically require special hardware support.

Performance Considerations and Optimization

The rejection sampling method may require multiple calls to the random number generator in the worst case. For performance-sensitive applications, consider the following optimization strategies:

Use more efficient random number generators (such as random() instead of rand())
Precompute optimal parameters for specific ranges
Use simpler methods in scenarios where approximate uniform distribution is acceptable

Conclusion

Generating random integers within specified ranges is a seemingly simple problem that contains profound mathematical principles. By understanding the requirements of uniform distribution and the principles of rejection sampling, developers can avoid common pitfalls and write correct and efficient random number generation code. The implementation method provided in this article not only guarantees mathematical correctness but also offers good readability and maintainability.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.