TechTorch

Location:HOME > Technology > content

Technology

How to Split Large Text Files into Smaller Files Using Python

March 04, 2025Technology4390
How to Split Large Text Files into Smaller Files Using Python When dea

How to Split Large Text Files into Smaller Files Using Python

When dealing with large text files, it can be beneficial to split them into smaller files for easier management and processing. In this article, we'll explore two methods to achieve this using Python: splitting by lines and splitting by file size. Both methods are demonstrated with practical code examples and usage instructions.

Method 1: Split by Number of Lines

This method splits each large text file into smaller files containing a specified number of lines. It's a straightforward approach that makes managing large files more manageable.

Code Example:

def split_file_by_lines(input_file, lines_per_file):
    with open(input_file, 'r') as file:
        file_number  1
        current_lines  []
        for line in file:
            current_(line)
            if len(current_lines)  lines_per_file:
                output_file  open(f'output_file{file_number}.txt', 'w')
                output_file.writelines(current_lines)
                file_number   1
                current_lines  []
         Write remaining lines if any
        if current_lines:
            output_file  open(f'output_file{file_number}.txt', 'w')
            output_file.writelines(current_lines)

Example Usage:

To use this function, replace large_file1.txt and large_file2.txt with the names of your large files, and adjust the lines_per_file parameter as needed.

split_file_by_lines("large_file1.txt", 1000)
split_file_by_lines("large_file2.txt", 1000)

Method 2: Split by File Size

This method splits each large text file into smaller files based on a specified maximum file size in bytes. This is useful when you want to control the size of each output file, regardless of the number of lines.

Code Example:

import os
def split_file_by_size(input_file, max_size):
    with open(input_file, 'rb') as file:
        file_number  1
        part_data  bytearray()
        while True:
            chunk  (1024) # Read in chunks of 1 KB
            if not chunk:
                break
            part_data.extend(chunk)
            if len(part_data) > max_size:
                output_file  open(f'output_file{file_number}.txt', 'wb') 
                output_file.write(part_data)
                file_number   1
                part_data  bytearray()
         Write remaining data if any
        if part_data:
            output_file  open(f'output_file{file_number}.txt', 'wb')
            output_file.write(part_data)

Example Usage:

To use this function, replace large_file1.txt and large_file2.txt with the names of your large files, and adjust the max_size parameter as needed.

split_file_by_size("large_file1.txt", 1024 * 1024)  # 1 MB per file
split_file_by_size("large_file2.txt", 1024 * 1024)  # Adjust as needed

Explanation

split_file_by_lines: This function reads the input file line by line and groups lines into smaller files based on the specified lines_per_file parameter. split_file_by_size: This function reads the input file in chunks and writes to smaller files based on the specified max_size in bytes.

Usage:

Make sure to replace large_file1.txt and large_file2.txt with the actual filenames you want to split and adjust the parameters as needed.

Feel free to ask if you need further customization or explanations!