Python for Data Recovery
What’s up yall!
In this post we’ll take a look at Python for data recovery.
We’ll be covering a script I made that reads through a disk (in this example, a USB) and checks for specific file signatures in an attempt to recover photos and images (.jpg) from said disk.
If you haven’t seen my previous post about data recovery using a hex editor, I highly recommend it since a lot of the knowledge from that post will be automated using this script.
I’ll also get into how you can use this script to search for other types of files later on… 😉
Let’s get into the tutorial!
Python Installation
Make sure you have Python 3 already installed in your system.
We’ll be using all native Python functions, so no additional libraries will be needed.
Hex Editor
I recommend installing a hex editor to check up on the results of the script manually at first.
That way you can verify that you are getting the results that you want from the script.
I would go with HxD hex editor for Windows and bless hex editor on Linux.
Data Recovery using Python
Once you have Python installed, we’re ready to get into the script.
Download Data Recovery Script
Note that some changes will have to be made, such as assigning the correct drive letter to open (on Windows) or the proper /dev/sdX mount location on Linux. I’m assuming you know this in advance.
For now, we’ll cover data recovery for JPG files specifically:
drive = "\\\\.\\X:" # Open drive as raw bytes fileD = open(drive, "rb") size = 512 # Size of bytes to read byte = fileD.read(size) # Read 'size' bytes offs = 0 # Offset location drec = False # Recovery mode rcvd = 0 # Recovered file ID while byte: found = byte.find(b'\xff\xd8\xff\xe0\x00\x10\x4a\x46') if found >= 0: drec = True print('==== Found JPG at location: ' + str(hex(found+(size*offs))) + ' ====') # Now lets create recovered file and search for ending signature fileN = open(str(rcvd) + '.jpg', "wb") fileN.write(byte[found:]) while drec: byte = fileD.read(size) bfind = byte.find(b'\xff\xd9') if bfind >= 0: fileN.write(byte[:bfind+2]) fileD.seek((offs+1)*size) print('==== Wrote JPG to location: ' + str(rcvd) + '.jpg ====\n') drec = False rcvd += 1 fileN.close() else: fileN.write(byte) byte = fileD.read(size) offs += 1 fileD.close()
- We start by opening the disk (replace the X: letter for desired disk)
- Then we read the first 512 bytes (this value can be changed for faster reading of bytes)
- Next we define some preliminary variables:
- Offset location: identifies the hexadecimal location on disk
- Data recovery mode: triggers once we match a file signature
- Recovered file ID: used for defining the recovered filenames
- Now we get into the main loop of the script (reading chunks of bytes)
- We’ll scan the read bytes for the specific file signature (in this case, JPG)
- Once we locate the file signature in the read bytes, we go into data recovery mode:
- Print out a message notifying the user of a file found in the hex location.
- Create a new file and open it in write binary mode.
- Then we’ll keep reading chunks of bytes ’til we reach the end of file signature (trailer).
- Once we do, we set data recovery mode to off and increment the filename ID.
- Keep reading bytes until we locate the next file signature that matches our criteria.
Recovering Other File Types
If you’re interested in recovering other types of files, you will have to know the corresponding file signature. That is, the magic bytes that identify the starting of that specific type of file.
Take a look at this Wikipedia page that contains file signatures: List of file signatures
Alternatively, you can simply open a file of the desired type in a hex editor and check the first few bytes. Once you’ve done that, modify the script in the following line of code:
found = byte.find(b'\xff\xd8\xff\xe0\x00\x10\x4a\x46')
You’ll want to change the following characters: FF D8 FF E0 00 10 4A 46
Make sure to keep the \x as those identify the hexadecimal bytes in Python.
You’ll also have to modify the trailer (end of file signature) to match the specific file type as well.
For that, change the following line of code:
bfind = byte.find(b'\xff\xd9')
For jpg’s, the file finishes with: FF D9
So you’ll have to modify those with the trailer for whatever file type you are looking for.
Note that since FF D9 is two bytes, I manually add these to this line of code shortly after:
fileN.write(byte[:bfind+2])
The +2 is a manual addition, so if your trailer is longer, change the +2 accordingly.
I have plans to expand this script into searching for multiple file signatures at once, but for now that’ll be it for this tutorial. If you have any questions, leave a comment below! See ya next time. 🙂
It was a very useful and interesting article
But one question that still exists for me is that the software recovers the original name of the files and their directory? And what is the difference between normal and deep scanning?
Also, thank you again for your very informative (and of course unique) article, I think it would be great if you also wrote a script to defrag hard disk, view devices connected to Wi-Fi, manipulate RAM data and a script to convert multimedia file formats or their compression with Python in your list of articles and tutorials. Thanks