Keywords: AndroidManifest.xml | Binary XML | APK Parsing | Java Parsing | Apktool
Abstract: This paper provides an in-depth analysis of the binary XML format used in Android APK packages for AndroidManifest.xml files. It examines the encoding mechanisms, data structures including header information, string tables, tag trees, and attribute storage. The article presents complete Java implementation for parsing binary manifests, comparing Apktool-based approaches with custom parsing solutions. Designed for developers working outside Android environments, this guide supports security analysis, reverse engineering, and automated testing scenarios requiring manifest file extraction and interpretation.
Understanding Binary AndroidManifest.xml Format
The AndroidManifest.xml file within Android Application Packages (APK) employs a specialized binary XML format optimized for reduced file size and efficient parsing. Unlike traditional text-based XML, this binary format organizes strings, tags, and attributes into compact binary structures comprising four main sections: header information, string index table, string table, and XML tag tree.
Binary Format Structure Analysis
Binary AndroidManifest.xml files begin with a 24-byte header containing nine 32-bit little-endian words. Critical information includes: the third word indicating string table end offset, and the fourth word representing the number of strings in the string table. The string index table follows immediately, storing offset pointers to each string within the string table. The string table uses Unicode encoding with each string preceded by a 16-bit length identifier.
The XML tag tree resides after the string table, using specific markers to identify document structure. Each XML start tag consists of six 32-bit words: tag type identifier (0x00100102 for start tags), flags, line number, namespace string index, and element name string index. Start tags include three additional words, with the seventh word indicating the number of following attributes.
Attribute information is stored in groups of five words: attribute namespace string index, attribute name string index, attribute value string index (or resource ID), flags, and resource ID (or duplicate attribute value index). This structure enables efficient storage of repeated string references and supports Android's resource system through resource ID mechanisms.
Programmatic Parsing Implementation
Parsing binary AndroidManifest.xml in Java environments requires handling both APK file format and binary XML structure. First, extract the manifest file from the APK package:
public byte[] extractManifestFromApk(String apkPath) throws IOException {
try (JarFile jarFile = new JarFile(apkPath)) {
JarEntry manifestEntry = jarFile.getJarEntry("AndroidManifest.xml");
try (InputStream is = jarFile.getInputStream(manifestEntry)) {
byte[] buffer = new byte[is.available()];
int bytesRead = is.read(buffer);
return bytesRead == buffer.length ? buffer : null;
}
}
}
The core of binary XML parsing involves correctly reading little-endian data and traversing the tag structure:
public void parseBinaryXml(byte[] xmlData) {
// Read header information
int stringCount = readLittleEndianInt(xmlData, 16);
int stringTableOffset = 0x24;
int xmlStartOffset = readLittleEndianInt(xmlData, 12);
// Locate XML tag starting position
int currentOffset = xmlStartOffset;
while (currentOffset < xmlData.length - 4) {
int tagType = readLittleEndianInt(xmlData, currentOffset);
if (tagType == 0x00100102) { // Start tag
parseStartTag(xmlData, currentOffset, stringTableOffset);
currentOffset += 36; // Skip 9 words
} else if (tagType == 0x00100103) { // End tag
currentOffset += 24; // Skip 6 words
} else if (tagType == 0x00100101) { // Document end
break;
}
}
}
private int readLittleEndianInt(byte[] data, int offset) {
return (data[offset + 3] & 0xFF) << 24 |
(data[offset + 2] & 0xFF) << 16 |
(data[offset + 1] & 0xFF) << 8 |
(data[offset] & 0xFF);
}
Tool-Based Parsing Approaches
Beyond custom parsing, established tools like Apktool provide robust parsing capabilities. Apktool offers both command-line interface and Java API support, converting binary XML to readable text format:
// Using Apktool command-line tool
apktool d application.apk
// Generated AndroidManifest.xml resides in output directory
Apktool's Java integration enables direct invocation within programs, suitable for automation scenarios. Compared to custom parsing, tool-based approaches offer greater stability and more complete Android feature support, though they introduce dependencies and potential performance overhead.
Application Scenarios and Considerations
Binary AndroidManifest.xml parsing finds important applications across multiple domains: examining permission declarations in security analysis, verifying manifest configurations in automated testing, and understanding application structure in reverse engineering. Key considerations during parsing include endianness handling, string encoding conversion, and resource ID interpretation.
Practical implementations should incorporate error handling mechanisms covering invalid offset detection, corrupted data recovery, and encoding exception management. For production environments, consider combining signature verification to ensure parsed manifest files haven't been tampered with, and implement caching mechanisms to improve performance for repeated parsing operations.