Technology
How to Split Large Text Files into Smaller Files Using Python
How to Split Large Text Files into Smaller Files Using Python
When dealing with large text files, it can be beneficial to split them into smaller files for easier management and processing. In this article, we'll explore two methods to achieve this using Python: splitting by lines and splitting by file size. Both methods are demonstrated with practical code examples and usage instructions.
Method 1: Split by Number of Lines
This method splits each large text file into smaller files containing a specified number of lines. It's a straightforward approach that makes managing large files more manageable.
Code Example:
def split_file_by_lines(input_file, lines_per_file): with open(input_file, 'r') as file: file_number 1 current_lines [] for line in file: current_(line) if len(current_lines) lines_per_file: output_file open(f'output_file{file_number}.txt', 'w') output_file.writelines(current_lines) file_number 1 current_lines [] Write remaining lines if any if current_lines: output_file open(f'output_file{file_number}.txt', 'w') output_file.writelines(current_lines)
Example Usage:
To use this function, replace large_file1.txt and large_file2.txt with the names of your large files, and adjust the lines_per_file parameter as needed.
split_file_by_lines("large_file1.txt", 1000) split_file_by_lines("large_file2.txt", 1000)
Method 2: Split by File Size
This method splits each large text file into smaller files based on a specified maximum file size in bytes. This is useful when you want to control the size of each output file, regardless of the number of lines.
Code Example:
import os def split_file_by_size(input_file, max_size): with open(input_file, 'rb') as file: file_number 1 part_data bytearray() while True: chunk (1024) # Read in chunks of 1 KB if not chunk: break part_data.extend(chunk) if len(part_data) > max_size: output_file open(f'output_file{file_number}.txt', 'wb') output_file.write(part_data) file_number 1 part_data bytearray() Write remaining data if any if part_data: output_file open(f'output_file{file_number}.txt', 'wb') output_file.write(part_data)
Example Usage:
To use this function, replace large_file1.txt and large_file2.txt with the names of your large files, and adjust the max_size parameter as needed.
split_file_by_size("large_file1.txt", 1024 * 1024) # 1 MB per file split_file_by_size("large_file2.txt", 1024 * 1024) # Adjust as needed
Explanation
split_file_by_lines: This function reads the input file line by line and groups lines into smaller files based on the specified lines_per_file parameter. split_file_by_size: This function reads the input file in chunks and writes to smaller files based on the specified max_size in bytes.Usage:
Make sure to replace large_file1.txt and large_file2.txt with the actual filenames you want to split and adjust the parameters as needed.Feel free to ask if you need further customization or explanations!