The Underlying Mechanism of Comparing Two Numbers in Assembly Language: An In-Depth Analysis from CMP Instruction to Machine Code

Keywords: Assembly Language | x86 Architecture | CMP Instruction | Machine Code | Binary Comparison

Abstract: This article delves into the core mechanism of comparing two numbers in assembly language, using the x86 architecture as an example to detail the syntax, working principles, and corresponding machine code representation of the CMP instruction. It first introduces the basic method of using the CMP instruction combined with conditional jump instructions (e.g., JE, JG, JMP) to implement number comparison. Then, it explores the underlying implementation, explaining how comparison operations are achieved through subtraction and the role of flags (e.g., sign flag) in determining results. Further, the article analyzes the binary representation of machine code, showing how instructions are encoded into sequences of 0s and 1s, and briefly touches on lower-level implementations from machine code to circuit design. By integrating insights from multiple answers, this paper provides a comprehensive perspective from high-level assembly syntax to low-level binary representation, helping readers deeply understand the complete process of number comparison in computer systems.

Basic Syntax for Comparing Two Numbers in Assembly Language

In x86 assembly language, comparing two numbers typically uses the CMP instruction, which compares two operands and sets the processor's flags based on the result. For example, in TASM (Turbo Assembler), code can be written as follows to compare two 8-bit numbers stored in registers BL and BH:

cmp BL, BH
je EQUAL       ; Jump to EQUAL if BL equals BH
jg GREATER     ; Jump to GREATER if BL is greater than BH
jmp LESS       ; Jump to LESS if BL is less than BH

Here, after the CMP instruction executes, it does not store the result directly but instead reflects the comparison outcome by setting flags such as the zero flag (ZF) and sign flag (SF). Conditional jump instructions (e.g., je, jg, jmp) then determine the program flow based on these flags. Other commonly used conditional jump instructions include jbe (jump if below or equal) and jge or jae (jump if greater or equal), offering more flexible comparison logic.

Underlying Implementation: From Subtraction to Flags

At the machine level, the CMP instruction essentially implements comparison through a subtraction operation. Specifically, CMP calculates the result of subtracting the second operand from the first operand but does not save the result, only updating the flags. For example, CMP BL, BH computes BL - BH and sets flags accordingly: if the result is zero, the zero flag ZF is set; if negative, the sign flag SF is set; if overflow occurs, the overflow flag OF is set, etc.

This design allows subsequent conditional jump instructions to make decisions based on the flags. For instance, jg (jump if greater) checks flag combinations (e.g., SF equals OF and ZF equals 0) to determine if the condition is met. Below is an example code demonstrating how comparison is achieved through subtraction and flag checking:

; Compare r1 and r2
    CMP $r1, $r2
    JLT lessthan          ; Jump to lessthan if r1 < r2
greater_or_equal:
    ; Handle case where r1 >= r2
    JMP l1
lessthan:
    ; Handle case where r1 < r2
l1:

In this example, the JLT (jump if less than) instruction checks if the sign flag SF is set to determine if the subtraction result is negative, thus deciding whether r1 is less than r2.

Machine Code Representation: From Assembly to Binary

Assembly instructions are translated into machine code at the底层, i.e., binary sequences of 0s and 1s. In the x86 architecture, the machine code encoding of the CMP instruction depends on operand types and addressing modes. For example, CMP BL, BH (comparing two 8-bit registers) might correspond to machine code such as 38 DF in hexadecimal. In binary, this is represented as 00111000 11011111, where the high byte 00111000 specifies the opcode and operand type, and the low byte 11011111 specifies register encoding.

Parsing machine code involves details of the Instruction Set Architecture (ISA), including opcode fields, register fields, and immediate fields. For instance, in x86, the CMP instruction can have multiple encoding forms, such as CMP EAX, 23 (comparing register EAX with constant 23), which generates different machine code sequences. These binary sequences are ultimately decoded and executed by the processor, driving underlying circuit operations.

Lower-Level Perspective: From Bit-Level to Circuit Design

Once at the machine code level, comparison operations delve further into bit-level logic. In binary representation, numbers are encoded as sequences of 0s and 1s, and comparison is implemented through bit-level operations like subtraction. For example, at the hardware level, subtractor circuits use logic gates (e.g., AND, OR, NOT gates) to compute the difference between two binary numbers and generate flags.

The transition from machine code to circuit design involves microarchitecture implementation, where the Arithmetic Logic Unit (ALU) in the processor performs comparison operations. The ALU internally contains comparator circuits that handle binary numbers through bit-by-bit comparison and carry chains. For instance, for two 8-bit numbers, a comparator compares bits from most significant to least significant and sets output signals based on the result.

At an even lower level, these circuits are implemented by transistors, whose switching states correspond to 0s and 1s. Thus, the entire process of comparing two numbers, from high-level assembly instructions to low-level transistor operations, reflects the hierarchical abstraction of computer systems. Although assembly language operates directly on bytes rather than bits, bit-level logic is realized through machine code and hardware design.

Summary and Extensions

This article systematically explores the mechanism of comparing two numbers in assembly language, from syntactic aspects to underlying implementations. Key points include: using the CMP instruction and conditional jumps for number comparison, the underlying principle of updating flags via subtraction, binary encoding of machine code, and bit-level and circuit-level implementations. These concepts are based on the x86 architecture but can be generalized to other instruction sets.

In practical applications, assembly syntax may vary by assembler, but the fundamental concepts remain similar. For example, different assemblers might support different comment styles or operand formats, but the core functionality of the CMP instruction is consistent. By understanding these low-level details, developers can optimize code performance and gain deeper insights into how computer systems work.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.

Basic Syntax for Comparing Two Numbers in Assembly Language

Underlying Implementation: From Subtraction to Flags

Machine Code Representation: From Assembly to Binary

Lower-Level Perspective: From Bit-Level to Circuit Design

Summary and Extensions

Cite this article