Anagram Detection Using Prime Number Mapping: Principles, Implementation and Performance Analysis

Keywords: Anagram Detection | Prime Number Mapping | Algorithm Optimization

Abstract: This paper provides an in-depth exploration of core anagram detection algorithms, focusing on the efficient solution based on prime number mapping. By mapping 26 English letters to unique prime numbers and calculating the prime product of strings, the algorithm achieves O(n) time complexity using the fundamental theorem of arithmetic. The article explains the algorithm principles in detail, provides complete Java implementation code, and compares performance characteristics of different methods including sorting, hash table, and character counting approaches. It also discusses considerations for Unicode character processing, big integer operations, and practical applications, offering comprehensive technical reference for developers.

Core Algorithm Principles

Anagram detection is a classic string processing problem that requires determining whether two strings contain the same number and types of characters. The prime number mapping algorithm provides an elegant and efficient solution. The core idea is to map each English letter to a unique prime number, for example: a→2, b→3, c→5, d→7, etc., ensuring 26 letters correspond to 26 distinct prime numbers.

Application of Fundamental Theorem of Arithmetic

According to the fundamental theorem of arithmetic, every integer greater than 1 can be uniquely factored into prime numbers (ignoring order). When we convert strings to products of corresponding prime numbers, if two strings are anagrams, their prime products must be equal; conversely, if the products are equal, the strings must be anagrams. This mathematical property guarantees the algorithm's correctness.

Java Implementation Example

The following complete Java implementation demonstrates how to apply the prime mapping algorithm for anagram detection:

import java.math.BigInteger;

public class PrimeAnagramChecker {
    // Define prime number mapping for 26 English letters
    private static final int[] PRIME_MAPPING = {
        2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
        43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101
    };
    
    public static boolean isAnagram(String str1, String str2) {
        // Convert to lowercase and remove whitespace
        str1 = str1.toLowerCase().replaceAll("\\s", "");
        str2 = str2.toLowerCase().replaceAll("\\s", "");
        
        // Different lengths immediately return false
        if (str1.length() != str2.length()) {
            return false;
        }
        
        // Use BigInteger for potentially large products
        BigInteger product1 = BigInteger.ONE;
        BigInteger product2 = BigInteger.ONE;
        
        // Calculate prime product for first string
        for (char c : str1.toCharArray()) {
            if (c < 'a' || c > 'z') {
                throw new IllegalArgumentException("Input contains non-English letter characters");
            }
            int primeIndex = c - 'a';
            product1 = product1.multiply(BigInteger.valueOf(PRIME_MAPPING[primeIndex]));
        }
        
        // Calculate prime product for second string
        for (char c : str2.toCharArray()) {
            if (c < 'a' || c > 'z') {
                throw new IllegalArgumentException("Input contains non-English letter characters");
            }
            int primeIndex = c - 'a';
            product2 = product2.multiply(BigInteger.valueOf(PRIME_MAPPING[primeIndex]));
        }
        
        // Compare if products are equal
        return product1.equals(product2);
    }
    
    public static void main(String[] args) {
        String word1 = "schoolmaster";
        String word2 = "theclassroom";
        String word3 = "theclafsroom";
        
        System.out.println(word1 + " and " + word2 + " are anagrams: " + isAnagram(word1, word2));
        System.out.println(word1 + " and " + word3 + " are anagrams: " + isAnagram(word1, word3));
    }
}

Algorithm Performance Analysis

The prime mapping algorithm has O(n) time complexity, where n is the string length. Each character requires only one multiplication and one array lookup operation, making it efficient. However, the algorithm has several important considerations:

Big Integer Handling: For longer strings, the prime product may exceed the range of Java primitive data types, necessitating the use of the BigInteger class. BigInteger multiplication can theoretically reach O(n²) complexity, but in practical applications with reasonably sized strings, performance remains acceptable.
Character Range Limitations: The basic implementation only supports 26 lowercase English letters. To support uppercase letters, digits, or other characters, the prime mapping table needs expansion.
Space Complexity: The algorithm requires O(m) space to store the prime mapping table, where m is the character set size. For extended Unicode character sets, this may lead to significant memory overhead.

Comparison with Other Algorithms

Referencing solutions from other answers, we can conduct a comprehensive algorithm comparison:

Sorting Method: Using Arrays.sort() to sort character arrays before comparison, with O(n log n) time complexity. Simple to implement but less efficient.
Character Counting Method: Using an integer array of length 26 to record character frequencies, with O(n) time complexity and high space efficiency, but not directly supporting extended character sets.
Hash Table Method: Using HashMap to record character frequencies, with O(n) time complexity, supporting arbitrary character sets, but with some hash operation overhead.
Prime Mapping Method: The algorithm discussed in this paper, with theoretical mathematical elegance, but requiring special attention when handling big integers and extended character sets.

Practical Application Considerations

In actual development, selecting an anagram detection algorithm requires considering the following factors:

Input Scale: For short strings, differences between algorithms are minimal; for long texts, O(n) complexity algorithms should be prioritized.
Character Set Range: If only English text processing is needed, the character counting method is optimal; if multilingual support is required, the hash table method is more suitable.
Performance Requirements: In performance-critical applications, actual benchmarking should be conducted, as theoretical complexity may not reflect actual runtime differences.
Unicode Handling: As mentioned in Answer 5, when processing Unicode characters containing surrogate pairs, special attention is needed. Simple char array processing may lead to incorrect results.

Extensions and Optimizations

The prime mapping algorithm can be optimized and extended in the following ways:

Cache Calculation Results: For frequently checked strings, their prime products can be cached to avoid repeated calculations.
Parallel Computation: For extremely long strings, the string can be segmented, with prime products calculated in parallel for each segment, then combined.
Modular Arithmetic Optimization: To avoid big integer overflow, modular arithmetic can be used during calculation, though this may introduce hash collisions.
Multi-Character Set Support: By dynamically generating prime mapping tables, character sets of arbitrary size can be supported.

Conclusion

The prime mapping algorithm provides a unique and efficient solution for anagram detection, particularly suitable for scenarios requiring algorithmic elegance. Although practical applications require consideration of big integer handling and character set limitations, its O(n) time complexity and solid mathematical foundation make it a technology worthy of in-depth study and application. Developers should balance algorithm simplicity, performance, and functional completeness based on specific requirements.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.