Splitting Text Files That Are Too Large to Open

Being a consultant, I am often sent CSV files and SQL files when starting a new project so I can get a development version of their environment up and running locally. Many times, these projects involve massive legacy databases, and when I get the files, they are too large to open in a text editor.

On Mac (or on Linux), I can use a terminal to split this file without needing to be able to open it first. I can split the file based on a certain number of lines. For example, let’s say I have a file with 2500 lines in it. I want to split this up with a break every 1000 lines. I would enter this at the command prompt:

split -l 1000 file_to_split

The original file will remain, but I will also now have three new files. These will be named something like “xaa”, “xab”, and “xac”, and they will contain 1000 lines, 1000 lines, and 500 lines, respectively.

Sometimes I have a 900MB file that I need to upload to a GitHub repo, and GitHub has a limit of 100MB for file uploads. I can split this up into 10 files of 90MB each using this:

split -b 90m file_to_split

I will now have my original file plus 10 new files of 90MB each, which I can now upload to GitHub.

The flags are pretty easy to remember, “l” for “lines” and “b” for “bytes”. You can also prefix the new files with something, like if I wanted the new files to all start with “SplitFile-“, I would use:

split -b 90m file_to_split SplitFile-

And then I would get filenames like “SplitFile-aa”, “SplitFile-ab”, etc.

Leave a Reply

Your email address will not be published. Required fields are marked *