Keywords: random numbers | probability distribution | Python | SciPy | NumPy
Abstract: This article explores methods for generating random numbers that follow custom discrete probability distributions in Python, using SciPy's rv_discrete, NumPy's random.choice, and the standard library's random.choices. It provides in-depth analysis of implementation principles, efficiency comparisons, and practical examples such as generating non-uniform birthday lists.
Introduction
In the field of random number generation, standard uniform distributions often fail to simulate real-world data, such as birthday distributions with specific probability patterns. This article introduces multiple methods in Python for generating random numbers with custom discrete distributions, aiding users in efficiently handling such tasks.
Method 1: Using scipy.stats.rv_discrete
The rv_discrete class in the SciPy library allows users to define discrete random variables by specifying value and probability parameters. This method internally employs inverse transform sampling based on the cumulative distribution function (CDF), ensuring efficient and accurate random number generation.
from scipy.stats import rv_discrete
values = [1, 2, 3, 4, 5, 6]
probabilities = [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]
custom_dist = rv_discrete(values=(values, probabilities))
random_numbers = custom_dist.rvs(size=10)
print(random_numbers)This code example demonstrates generating 10 random numbers from a custom distribution, with outputs adhering to the given probability weights.
Method 2: Using numpy.random.choice
NumPy's random.choice function offers a lightweight approach for direct sampling from a population, with probabilities specified via the p parameter. It is suitable for simple scenarios and supports batch generation to enhance efficiency.
import numpy as np
population = np.arange(1, 7)
weights = [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]
random_sample = np.random.choice(population, p=weights, size=5)
print(random_sample)By setting the size parameter, multiple samples can be generated in one call, reducing overhead from repeated invocations.
Method 3: Using random.choices in the Standard Library
Starting from Python 3.6, the standard random module includes the choices function for weighted random selections. This method requires no additional libraries and is ideal for quick implementation of custom distribution generation.
from random import choices
population = [1, 2, 3, 4, 5, 6]
weights = [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]
random_samples = choices(population, weights, k=1000)
print(random_samples[:10])Using the k parameter for batch generation optimizes performance, particularly for large-scale data simulations.
Efficiency Analysis and Best Practices
When selecting a method, consider library dependencies, performance needs, and distribution complexity. scipy.stats.rv_discrete is suited for complex distributions and repeated sampling, while numpy.random.choice and random.choices are lighter. Precomputing the CDF or using batch sampling can further improve efficiency, drawing on numerical discretization techniques for approximating continuous distributions.
Application Example: Generating Birthday Lists
Assuming a non-uniform birthday distribution, random birthday lists can be generated by defining days and probabilities. The following example uses random.choices for implementation.
from random import choices
days = list(range(1, 366))
probabilities = [0.003] * 100 + [0.001] * 265
birthdays = choices(days, weights=probabilities, k=50)
print(birthdays)This approach can be extended to load probability data from files, adapting to real-world application requirements.
Conclusion
Python offers various tools for generating random numbers with custom distributions, allowing users to choose appropriate methods based on specific contexts. The standard library's random.choices is convenient for most cases, while SciPy and NumPy provide stronger support for scientific computing. Integrating numerical discretization methods can further address complex distribution problems.