Keywords: Linux | cut command | Shell data processing
Abstract: This article explores advanced usage of the cut command in Linux systems, focusing on how to flexibly trim the first and last columns of text files through the multi-range specification of the -f parameter. With detailed examples and theoretical analysis, it demonstrates the application of field range syntax (e.g., -n, n-, n-m) for complex data extraction tasks, comparing it with other Shell tools to provide professional solutions for data processing.
In Linux Shell environments, processing structured text files (e.g., tab-separated TSV files) often requires extracting specific columns of data. The cut command, as a core text processing tool, supports multi-range specification via the -f parameter, enabling efficient trimming of first and last columns. This article systematically explains this technique through code examples and in-depth analysis, helping readers master advanced data processing skills.
Basics of cut Command and Field Range Syntax
The cut command specifies fields (columns) using the -f parameter, combined with -d to set the delimiter (default is tab). Field range syntax includes three forms: -n indicates columns 1 to n, n- indicates columns n to the last, and n-m indicates columns n to m. For example, cut -f 1-5 extracts the first 5 columns, and cut -f 10- extracts all columns from the 10th onward.
Combining Multiple Ranges for Trimming First and Last Columns
To simultaneously trim the first n and last n columns, specify multiple ranges in the -f parameter, separated by commas. Assuming a file has 20 columns and you need to trim the first 4 and last 7 columns, the command is: cut -f -4,14-. Here, -4 corresponds to the first 4 columns, and 14- corresponds to columns 14 to the end (i.e., the last 7 columns). By calculating the total number of columns, you can flexibly adjust the range values.
# Example: Trimming first 3 and last 5 columns (assuming total columns is 15)
cut -f -3,11- input.tsv
Advanced Applications and Error Handling
The multi-range syntax supports complex combinations, such as cut -f 1,2,5,6,10- to extract columns 1, 2, 5, 6, and all columns from the 10th onward. This is useful for extracting non-contiguous columns. In practice, note that column indexing starts at 1, and ensure the delimiter is correctly set (e.g., -d "\t" for tabs). If the number of columns is uncertain, use head -1 to check the field count in the first line.
# Verify column count and trim
head -1 file.tsv | tr '\t' '\n' | wc -l # Calculate column count
cut -f -5,21- file.tsv # Adjust ranges based on count
Comparison with Other Tools
Compared to awk or sed, the cut command is more efficient for simple column extraction, especially with large files. However, cut does not support regular expressions or conditional filtering; for complex scenarios, combine it with other tools. For example, awk can dynamically calculate column ranges: awk -F'\t' '{for(i=1;i<=3;i++) printf "%s\t", $i; for(i=NF-4;i<=NF;i++) printf "%s\t", $i; print ""}', where NF represents the total number of columns.
Summary and Best Practices
The multi-range functionality of the cut command provides powerful support for data preprocessing. Key steps include: determining the delimiter, calculating column counts, and specifying range combinations. It is recommended to automate column count detection in scripts to improve robustness. By mastering these techniques, you can significantly enhance the efficiency and flexibility of Shell data processing.