Simple Digit Recognition OCR with OpenCV-Python: Comprehensive Guide to KNearest and SVM Methods

Abstract: This article provides a detailed implementation of a simple digit recognition OCR system using OpenCV-Python. It analyzes the structure of letter_recognition.data file and explores the application of KNearest and SVM classifiers in character recognition. The complete code implementation covers data preprocessing, feature extraction, model training, and testing validation. A simplified pixel-based feature extraction method is specifically designed for beginners. Experimental results show 100% recognition accuracy under standardized font and size conditions, offering practical guidance for computer vision beginners.

Introduction

Optical Character Recognition (OCR) is a crucial application in computer vision, widely used in document digitization, license plate recognition, and automated data entry. OpenCV, as an open-source computer vision library, provides rich image processing and machine learning capabilities, with KNearest and SVM being two commonly used classification algorithms. This article details the construction of a simple digit recognition system using OpenCV-Python based on practical development experience.

Data File Structure and Feature Analysis

The letter_recognition.data file in OpenCV samples uses comma-separated format, with the first column as character labels (A-Z) and the subsequent 16 columns as feature values. These features originate from the paper "Letter Recognition Using Holland-Style Adaptive Classifiers", including geometric attributes and statistical characteristics of characters. For beginners, understanding these professional features can be challenging, hence we adopt more intuitive pixel values as alternative features.

Training Data Preparation and Annotation

Data preprocessing forms the foundation of OCR systems. Using training images containing standard digits, we complete data preparation through the following steps:

import sys
import numpy as np
import cv2

# Image reading and preprocessing
im = cv2.imread('pitrain.png')
im3 = im.copy()
gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5,5), 0)
thresh = cv2.adaptiveThreshold(blur, 255, 1, 1, 11, 2)

# Contour detection and filtering
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

samples = np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]  # ASCII codes for digits 0-9

for cnt in contours:
    if cv2.contourArea(cnt) > 50:
        [x,y,w,h] = cv2.boundingRect(cnt)
        
        if h > 28:  # Height filtering
            cv2.rectangle(im, (x,y), (x+w,y+h), (0,0,255), 2)
            roi = thresh[y:y+h, x:x+w]
            roismall = cv2.resize(roi, (10,10))
            cv2.imshow('norm', im)
            key = cv2.waitKey(0)

            if key == 27:  # ESC key to exit
                sys.exit()
            elif key in keys:
                responses.append(int(chr(key)))
                sample = roismall.reshape((1,100))
                samples = np.append(samples, sample, 0)

# Data saving
responses = np.array(responses, np.float32)
responses = responses.reshape((responses.size, 1))
print("training complete")

np.savetxt('generalsamples.data', samples)
np.savetxt('generalresponses.data', responses)

This code implements automated digit extraction and manual annotation workflow. Through contour detection to locate digit regions, using area and height thresholds to filter noise, then normalizing each digit to 10×10 pixels to generate 100-dimensional feature vectors. Users complete annotation by keyboard input of corresponding digits, finally saving feature data and labels as text files separately.

KNearest Model Training and Implementation

K-Nearest Neighbors (KNearest) algorithm is an instance-based machine learning method that classifies by calculating distances between samples to be classified and training samples. OpenCV provides complete KNearest implementation:

import cv2
import numpy as np

# Training phase
samples = np.loadtxt('generalsamples.data', np.float32)
responses = np.loadtxt('generalresponses.data', np.float32)
responses = responses.reshape((responses.size, 1))

model = cv2.KNearest()
model.train(samples, responses)

# Testing phase
im = cv2.imread('pi.png')
out = np.zeros(im.shape, np.uint8)
gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray, 255, 1, 1, 11, 2)

contours, hierarchy = cv2.findContours(thresh, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

for cnt in contours:
    if cv2.contourArea(cnt) > 50:
        [x,y,w,h] = cv2.boundingRect(cnt)
        if h > 28:
            cv2.rectangle(im, (x,y), (x+w,y+h), (0,255,0), 2)
            roi = thresh[y:y+h, x:x+w]
            roismall = cv2.resize(roi, (10,10))
            roismall = roismall.reshape((1,100))
            roismall = np.float32(roismall)
            retval, results, neigh_resp, dists = model.find_nearest(roismall, k=1)
            string = str(int((results[0][0])))
            cv2.putText(out, string, (x,y+h), 0, 1, (0,255,0))

cv2.imshow('im', im)
cv2.imshow('out', out)
cv2.waitKey(0)

During testing phase, the program performs identical preprocessing operations on input images, extracts each digit region and converts to feature vectors, then uses trained KNearest model for prediction. The find_nearest function returns nearest neighbor prediction results, with the program annotating recognition results on output images.

SVM Classifier Implementation Method

Support Vector Machine (SVM) is another effective classification algorithm, particularly suitable for small sample situations. SVM implementation in OpenCV is as follows:

import cv2
import numpy as np

# Data loading
samples = np.loadtxt('generalsamples.data', np.float32)
responses = np.loadtxt('generalresponses.data', np.float32)
responses = responses.reshape((responses.size, 1))

# SVM model configuration
model = cv2.SVM()
params = dict(kernel_type=cv2.SVM_LINEAR, svm_type=cv2.SVM_C_SVC, C=1)
model.train(samples, responses, params=params)

# Prediction function
def predict_digit(roismall):
    roismall = roismall.reshape((1,100))
    roismall = np.float32(roismall)
    result = model.predict(roismall)
    return int(result)

SVM achieves classification by finding optimal classification hyperplanes, possessing good generalization capability. Compared to KNearest, SVM has longer training time but faster prediction speed, suitable for applications requiring high real-time performance.

Experimental Results and Analysis

Experiments on standard test sets show that simple OCR systems based on pixel features can achieve 100% recognition accuracy under conditions of uniform fonts and standardized sizes. The advantage of this method lies in simple implementation and computational efficiency, particularly suitable for beginner understanding and practice.

However, this method also has limitations: poor robustness to font variations, rotations, noise, and other interference factors. In practical applications, more complex feature extraction methods such as HOG (Histogram of Oriented Gradients) or deep learning features need consideration to improve system generalization capability.

Conclusion and Future Directions

This article details the complete implementation process of a simple digit recognition OCR system based on OpenCV-Python. Through the combination of pixel value features and KNearest/SVM classifiers, a fully functional recognition system is constructed. This method provides excellent introductory practice for computer vision beginners while establishing foundation for more complex OCR applications.

Future improvement directions include: introducing more robust feature descriptors, integrating deep learning models, and adding data augmentation techniques to enhance model generalization capability. These improvements will enable OCR systems to adapt to more complex real-world application scenarios.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.