Techniques for Counting Non-Blank Lines of Code in Bash

Keywords: Bash | line counting | non-blank lines

Abstract: This article provides a comprehensive exploration of various techniques for counting non-blank lines of code in projects using Bash. It begins with basic methods utilizing sed and wc commands through pipeline composition for single-file statistics. The discussion extends to excluding comment lines and addresses language-specific adaptations. Further, the article delves into recursive solutions for multi-file projects, covering advanced skills such as file filtering with find, path exclusion, and extension-based selection. By comparing the strengths and weaknesses of different approaches, it offers a complete toolkit from simple to complex scenarios, emphasizing the importance of selecting appropriate tools based on project requirements in real-world development.

Basic Counting Methods

In Bash environments, a straightforward approach to counting code lines involves combining sed and wc commands. For a single file, the following command sequence can be used: cat foo.c | sed '/^\s*$/d' | wc -l. This command works by first reading the file content with cat, then piping it to sed. The regular expression /^\s*$/ in sed '/^\s*$/d' matches lines containing only whitespace characters (spaces, tabs, etc.), and d deletes these lines. Finally, wc -l counts the remaining lines, which represent non-blank lines.

Extended Methods for Excluding Comments

In some coding standards, comment lines are considered non-code and should be excluded from counts. For example, in Perl files, lines starting with # typically denote comments. This can be achieved by extending the sed command: cat foo.pl | sed '/^\s*#/d;/^\s*$/d' | wc -l. Here, /^\s*#/d matches lines beginning with whitespace followed by # and deletes them, while /^\s*$/d removes blank lines. This method is language-dependent, as different languages use varied comment syntax (e.g., C uses // or /* */), requiring adjustments to regular expressions based on the specific language in practice.

Multi-File Project Solutions

For projects with multiple files and directories, recursive counting of all relevant files is necessary. A typical solution employs the find command combined with other tools. For instance: find . -path './pma' -prune -o -path './blog' -prune -o -path './punbb' -prune -o -path './js/3rdparty' -prune -o -print | egrep '\.php|\.as|\.sql|\.css|\.js' | grep -v '\.svn' | xargs cat | sed '/^\s*$/d' | wc -l. The execution flow is as follows: first, find . recursively searches for files starting from the current directory, with -path options specifying directories to exclude (e.g., ./pma), -prune preventing entry into these directories, and -o -print outputting paths of other files. Then, egrep filters files by specific extensions (e.g., .php, .js) using a regular expression, and grep -v excludes version control files (e.g., .svn). Next, xargs cat concatenates all file contents, and sed removes blank lines, followed by wc -l to count the total lines. This method allows flexible customization of file types and exclusion paths, making it suitable for large-scale projects.

Technical Comparison and Best Practices

Basic methods are simple and efficient for quick single-file statistics but lack flexibility. Extended methods improve accuracy by excluding comments but require language-specific adjustments, potentially increasing maintenance overhead. Multi-file solutions are powerful, supporting recursive counting and custom filtering, yet the commands are more complex and depend on correct path and extension settings. In practical applications, it is advisable to choose an appropriate method based on project scale: for small scripts, basic or extended methods suffice; for large projects, multi-file solutions better meet needs. Additionally, automation through scripting, such as encapsulating commands in Bash functions, can enhance reusability. Regardless of the approach, ensuring accurate regular expressions and filter conditions is crucial to avoid counting errors.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.

Basic Counting Methods

Extended Methods for Excluding Comments

Multi-File Project Solutions

Technical Comparison and Best Practices

Cite this article