Advanced Zip Carver

Carving of Zip files is generally very effective. There are a few limitations which our tools can not handle:

  • We only support verification of "Deflated" and "Stored" files. If fragmentation occurs within a file not stored with these methods we are unable to find exactly the point of fragmentation. This means that this particular file in the archive will not be recovered correctly.
  • We do not currently implement an exhaustive search for higher order fragmentation. We assume only first order fragmentation and attempt to disambiguate points. This effectively means that we can only carve files with low levels of fragmentations, and/or many small files within the archive (This increases the frequency of identified points).

Example

The following is an example of how to use the ZIP carver. First obtain the latest development version of PyFlag:

$ darcs get http://www.pyflag.net/pyflag/
$ cd pyflag
$ ./configure
$ make
$ sudo make install

The advanced carvers are only available as stand alone tools at this stage and are not incorporated into the GUI.

We now use the carvers to solve the DFRWS 2006 challenge. Carving for Zips requires going through a number of steps. The first step is to index the image for Zip artifacts. The following will create the index file zip_test.idx:

$ python zip_carver.py  -c -i zip_test.idx dfrws-2006-challenge.img 

Next we need to create the map files. These map files are the initial guess for the mapping functions of each Zip found in the image. Each map file corresponds to a single Central Directory table:

$ python zip_carver.py  -m -i zip_test.idx dfrws-2006-challenge.img
$ ls *.map
14707896.map  16060875.map  23319375.map

It is possible to view each of these maps graphically (You need to have gnuplot installed for this), for example:

$ python zip_carver.py  -p -M 16060875.map

We can see the number and the nature of the discontinuities, although their exact position is inaccurate as we did not test this file yet. For Zip files this map is usually sufficient to be able to open the file with a zip program despite the errors. This is because most of the archived files are there, and the general structure of the file is correct. Only those archived files which are stored in the region of the discontinuity are corrupted - other files in the archive should be recoverable.

If for some reason we are unable to verify this file, we can just extract them (This is the case with the DFRWS 2007 challenge which contains encrypted Zip files - which can not be verified):

$ python zip_carver.py -M 16060875.map -e output.zip dfrws-2006-challenge.img

To work out exactly where the discontinuities are, we need to force the file. In this example we also ask for the output map to be saved in "output.map":

$ python zip_carver.py -M 16060875.map -f output.zip -F output.map dfrws-2006-challenge.img

...
Ambiguous point found at offset 408064: forward=15117312 vs reverse=15305216...
Total decompressed data: 513037 (513027)
Error occured after parsing 488584 bytes (Decompressed file does not have the expected length)
Errors detected, point removed.(Decompressed file does not have the expected length)
Ambiguous point found at offset 408576: forward=15117824 vs reverse=15305728...
Total decompressed data: 513032 (513027)
Error occured after parsing 488584 bytes (Decompressed file does not have the expected length)
Errors detected, point removed.(Decompressed file does not have the expected length)
Ambiguous point found at offset 409088: forward=15118336 vs reverse=15306240...
Total decompressed data: 513029 (513027)
Error occured after parsing 488584 bytes (Decompressed file does not have the expected length)
Errors detected, point removed.(Decompressed file does not have the expected length)
Ambiguous point found at offset 409600: forward=15118848 vs reverse=15306752...
Match found at offset 409600
Extracting into file output.zip
Saving map in output.map

As can be seen the discontinuity was moved from its original position to a more accurate position.