Split File into Byte Chunks in Python

Whats up guys?

In this post we’ll take a look at a Python script to break a file (image, executable, whatever) into several chunks of raw bytes and another one to merge those chunks together and reassemble that file back.

Since I’ve been doing a cloud security certification, it mentioned a technique widely employed in cloud of storing chunks of files into different virtual machines for security purposes (kind of like a RAID setup).

That made me think: ah, this could be a cool Python script to code! Hence, this tutorial. 🙂

Each script is less than 20 lines of code and not difficult at all, just basic file manipulation of raw bytes.

In any case, I have made an introduction to files in Python to get you started – if you are new to programming and/or Python, check it out since it should be all that you need for this tutorial.

Break File into Byte Chunks Script

Before we get started make sure you have a file in the same directory as the script and also modify the script accordingly with the proper filename right on the first line of code (include the extension).

Let’s take a look at the script to break a file into multiple chunks of 1024 bytes:

# File to open and break apart
fileR = open("img.jpg", "rb")

chunk = 0

byte = fileR.read(1024)
while byte:

    # Open a temporary file and write a chunk of bytes
    fileN = "chunk" + str(chunk) + ".txt"
    fileT = open(fileN, "wb")
    fileT.write(byte)
    fileT.close()
    
    # Read next 1024 bytes
    byte = fileR.read(1024)

    chunk += 1

In this case you can see the file I’m breaking into chunks is img.jpg and each chunk will be 1024 bytes.

Now to explain the script a little bit.

We start off by opening the file in read binary mode. Then we go ahead and read the first 1024 bytes.

We write the first chunk of bytes into a text file then start a loop to repeat this process until we read the entire file and write all the necessary chunks. Note that you can change these to suit your needs in regards to the amount of bytes, name of the files and so on.

All text files containing the raw bytes will be written to the current directory. Of course you can alter this to write it to specific locations as well.

Merge Byte Chunks into File Script

Now let’s take a look at the script to reassemble all of the text files into the original file:

# Open original file for reconstruction
fileM = open("imgCopy.jpg", "wb")

# Manually enter total amount of "chunks"
chunk = 0
chunks = 100

# Piece the file together using all chunks
while chunk <= chunks:
	print(" - Chunk #" + str(chunk) + " done.")
	fileName = "chunk" + str(chunk) + ".txt"
	fileTemp = open(fileName, "rb")

	byte = fileTemp.read(1024)
	fileM.write(byte)

	chunk += 1

fileM.close()

First you should notice that you have to manually input the total amount of chunks to reassemble the file. So whatever the last text file you have in the directory will be the last chunk. Alternatively, just check for the output of the first script you ran – the terminal should tell you the last chunk processed.

You should set the first line of code to a different filename as to not overwrite the original file.

Now compared to the first script, we are simply reverting the process and going chunk by chunk and adding the raw bytes into a new file for reassembly.

Overall the scripts are very simply and straight-forward.

If you have any questions, please leave a comment below.

Download Scripts

Share: