Keywords: PHP | strip_tags | HTML tag processing | string manipulation | web development
Abstract: This article provides an in-depth exploration of the strip_tags function in PHP, detailing its operational principles and application scenarios. Through practical case studies, it demonstrates how to remove HTML tags from database strings and extract text of specified lengths. The analysis covers parameter configuration, security considerations, and enhanced solutions for complex scenarios like processing Word-pasted content, aiding developers in effectively handling user-input rich text.
Problem Background and Requirement Analysis
In web development practice, there is often a need to extract text content containing HTML markup from databases and display plain text summaries in specific contexts. As described in the user's case: it is necessary to display the first 110 characters of the database entry business_description, but this field contains HTML code entered by the client, causing direct truncation to display incomplete HTML tags and impairing user experience.
Core Solution: The strip_tags Function
PHP's built-in strip_tags function is the ideal tool for handling such requirements. This function removes all HTML and PHP tags from a string, retaining only plain text content.
Basic Usage Demonstration
The basic syntax is: strip_tags(string $str, string $allowable_tags = null). Here, $str is the string to be processed, and $allowable_tags is an optional parameter specifying tags to allow.
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
// Output: Test paragraph. Other text
?>
Practical Application Integration
For the original problem, combine strip_tags with the substr function:
<?php echo substr(strip_tags($row_get_Business['business_description']), 0, 110) . "..."; ?>
This code first removes all HTML tags, then truncates to the first 110 characters, and finally adds an ellipsis to indicate content truncation.
Advanced Applications and Security Considerations
Selective Tag Preservation
strip_tags supports specifying a list of allowed tags via the second parameter:
<?php
$text = '<b>Bold text</b> and <i>italic text</i>';
echo strip_tags($text, '<b><i>');
// Output: <b>Bold text</b> and <i>italic text</i>
?>
Handling Complex HTML Content
For content pasted from rich text editors like Word, more complex processing logic may be required. The referenced article's strip_word_html function demonstrates an enhanced approach:
<?php
function strip_word_html($text, $allowed_tags = '<b><i><sup><sub><em><strong><u><br>') {
// Handle special characters and encoding
mb_regex_encoding('UTF-8');
$search = array('/‘/u', '/’/u', '/“/u', '/”/u', '/—/u');
$replace = array('\'', '\'', '"', '"', '-');
$text = preg_replace($search, $replace, $text);
// Decode HTML entities
$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
// Remove CSS styles and comments
if(mb_stripos($text, '/*') !== FALSE) {
$text = mb_eregi_replace('#/\*.*?\*/#s', '', $text, 'm');
}
// Prevent numbers from being misparsed as tags
$text = preg_replace(array('/<([0-9]+)/'), array('< $1'), $text);
// Core tag stripping
$text = strip_tags($text, $allowed_tags);
// Clean up excess whitespace
$text = preg_replace(array('/^\s\s+/', '/\s\s+$/', '/\s\s+/u'), array('', '', ' '), $text);
// Standardize tag formats
$search = array('#<(strong|b)[^>]*>(.*?)</(strong|b)>#isu',
'#<(em|i)[^>]*>(.*?)</(em|i)>#isu',
'#<u[^>]*>(.*?)</u>#isu');
$replace = array('<b>$2</b>', '<i>$2</i>', '<u>$1</u>');
$text = preg_replace($search, $replace, $text);
// Remove HTML comments
$num_matches = preg_match_all("/\<!--/u", $text, $matches);
if($num_matches) {
$text = preg_replace('/\<!--(.)*--\>/isu', '', $text);
}
return $text;
}
?>
Performance Optimization and Best Practices
Character Encoding Handling
When processing multilingual content, ensure correct character encoding. It is recommended to always explicitly specify UTF-8 encoding to support international character sets.
Error Handling Mechanisms
In practical applications, appropriate error handling should be added:
<?php
function safe_strip_and_truncate($text, $length = 110) {
if (!is_string($text)) {
return '';
}
$stripped = strip_tags($text);
if (mb_strlen($stripped) <= $length) {
return $stripped;
}
return mb_substr($stripped, 0, $length) . '...';
}
?>
Extended Application Scenarios
Search Engine Optimization Summaries
When generating page meta descriptions, similar techniques can be used to ensure summary content is plain text, avoiding HTML tag interference with search engine parsing.
Email Content Preprocessing
When sending plain text emails, all HTML tags need to be removed, for which strip_tags provides a simple solution.
Conclusion
The strip_tags function, as a core PHP feature, offers an efficient and reliable solution for handling HTML tags. Through proper parameter configuration and integration with other string processing functions, it can meet various complex text processing needs. In actual development, it is advisable to choose appropriate processing strategies based on specific scenarios, always considering character encoding, performance optimization, and security factors.