skip to navigation
skip to content

Not Logged In

pytsk3 2013-12-30

Python bindings for the SleuthKit

This is an unoffical index for the pytsk3 Python module.

#summary The Sleuthkit Python Bindings

= Introduction =

The Sleuthkit is a complete filesystem analysis tool. In the past
PyFlag shipped a python binding for a statically compiled version
which was incorporated in the PyFlag source tree (Version 2.78). That
version is now very old and does not support HFS+ which Sleuthkit 3.1
does. At the time there were some important functions that we needed
to link to but the old libtsk (the shared object produced by older
Sleuthkit binaries) did not export these - which is the reason for
incorporating a slightly modified version in the source tree.

These days things are much better - libtsk3 is designed to be a
general purpose library with many useful functions linked in. The
overall architecture has been tremendously improved and it is now very
easy to use it from an external program.

This is a python binding against the libtsk3 shared object. Our aim is
to make the binding reflect the TSK API as much as possible in
capabilities, while at the same time having a nice pythonic OO
interface:

http://www.sleuthkit.org/sleuthkit/docs/api-docs/index.html

The new binding just links to libtsk3 which should make it easier to
maintain against newer versions. We should be able to rewrite all the
sleuthkit tools in python (using the library and bindings) as a
demonstration of what is possible with the new bindings. This page
documents how to use the binding from a practical point of view - we
want to show examples of how to do some common tasks. There are lots
of sample programs in the sameples directory to demonstrate how these
bindings can be used.

To build the bindings just use the standard python distutils method:

{{{
python setup.py build
python setup.py install
}}}

At the top level of the source tree.

The python binding is autogenerated from the libtsk3 header files
using a small OO C shim. This means that most of the fields in many of
the structs are already available. We aim to provide most of the
functionality using this shim (e.g. traversing and iterating over
lists etc). The authoritative source of documentation is the library
API linked above.

= Basics =

== Listing all the files in a directory ==

The first task is to list all the files in a directory and their
sizes:

{{{
## Step 1: get an IMG_INFO object
img = pytsk3.Img_Info(url)

## Step 2: Open the filesystem
fs = pytsk3.FS_Info(img)

## Step 3: Open the directory node this will open the node based on path
## or inode as specified.
directory = fs.open_dir(path=path, inode=inode)

## Step 4: Iterate over all files in the directory and print their
## name. What you get in each iteration is a proxy object for the
## TSK_FS_FILE struct - you can further dereference this struct into a
## TSK_FS_NAME and TSK_FS_META structs.
for f in directory:
    print f.info.meta.size, f.info.name.name
}}}

The specified url can be any URL that TSK understands. Note that TSK
automatically knows about EWF files and a regular dd files. See
section below on "Extending Img_Info" to support other image types.

The directory can be opened by either path or inode. If path is None
(or unspecified) we use the inode. An inode has to be an integer (All
the bound methods implement sanity checking and will raise if you
provide the wrong types of args).

You can iterate over the directory to receive all the File objects
within it. Each File object is just a proxy for
[http://www.sleuthkit.org/sleuthkit/docs/api-docs/structTSK__FS__FILE.html
TSK_FS_FILE] struct which can be obtained through the "info"
member. Note that the TSK_FS_FILE struct contains links to a
[http://www.sleuthkit.org/sleuthkit/docs/api-docs/structTSK__FS__META.html
TSK_FS_META] and
[http://www.sleuthkit.org/sleuthkit/docs/api-docs/structTSK__FS__NAME.html
TSK_FS_NAME] structs. We just pick specific members of these structs
to print.

== Reading a file ==

Now we want to read a file out and write it to stdout (basically the
same as icat).

{{{
## Step 1: get an IMG_INFO object
img = pytsk3.Img_Info(url)

## Step 2: Open the filesystem
fs = pytsk3.FS_Info(img)

## Step 3: Open the file using the inode
f = fs.open_meta(inode = inode)

## Step 4: Read all the data and print to stdout
offset = 0
size = f.info.meta.size
BUFF_SIZE = 1024 * 1024

while offset < size:
    available_to_read = min(BUFF_SIZE, size - offset)
    data = f.read_random(offset, available_to_read)
    if not data: break

    offset += len(data)
    print data
}}}

Note that we go into some length to not read the slack here. This is
due to an early bug in TSK which should be fixed by now.

== List all the blocks allocated for a file ==

We want to list all the blocks that a file allocates (kind of like
istat).

{{{
## Step 1: get an IMG_INFO object (url can be any URL that AFF4 can
## handle)
img = pytsk3.Img_Info(url)

## Step 2: Open the filesystem
fs = pytsk3.FS_Info(img)

## Step 3: Open the file using the inode
f = fs.open_meta(inode = inode)

## Step 4: List all blocks allocated by this file. Note that in some
## filesystems each file has several attributes and each can allocates
## multiple blocks. So we really need to iterate over all attributes
## of each file:
for attr in f:
    print "Attribute %s, type %s, id %s" % (attr.info.name,
                                            attr.info.type,
                                            attr.info.id)
    for run in attr:
        print "   Blocks %s to %s (%s blocks)" % (run.addr, run.addr + run.len, run.len)
}}}

Example output:
{{{
Attribute N/A, type TSK_FS_ATTR_TYPE_NTFS_SI, id 0
Attribute N/A, type TSK_FS_ATTR_TYPE_NTFS_FNAME, id 3
Attribute N/A, type TSK_FS_ATTR_TYPE_NTFS_FNAME, id 2
Attribute $Data, type TSK_FS_ATTR_TYPE_NTFS_DATA, id 4
   Blocks 89471 to 89477 (6 blocks)
   Blocks 89487 to 89493 (6 blocks)
   Blocks 90023 to 90076 (53 blocks)
}}}

== Extending Img_Info ==

Sometimes we want to use image formats that are not available to TSK
natively. We have seen that in order to obtain the FS_Info object, we
must supply it with a valid Img_Info object. It is possible to extend
TSK's support for different image formats by creating a different
Img_Info object that TSK can use when opening a filesystem on it.

The python wrappers are fully extensible. For example, the following
implements an AFF4 image class:

{{{
## This is the AFF4 resolver we will use
oracle = pyaff4.Resolver()

class AFF4ImgInfo(pytsk3.Img_Info):
    def __init__(self, url):
        ## Open the image using the AFF4 library
        urn = pyaff4.RDFURN(url)
        self.fd = oracle.open(urn, 'r')
        if not self.fd:
            raise IOError("Unable to open %s" % url)

        ## Call the base class with an empty URL
        pytsk3.Img_Info.__init__(self, '')

    def get_size(self):
        """ This function returns the size of the image """
        return self.fd.size.value

    def read(self, off, length):
        """ This returns byte ranges from the image """
        self.fd.seek(off)
        return self.fd.read(length)

    def close(self):
        """ This is called when we want to close the image """
        self.fd.close()

## Step 1: get an IMG_INFO object (url can be any URL that AFF4 can
## handle)
img = AFF4ImgInfo(url)

## Step 2: Open the filesystem
fs = pytsk3.FS_Info(img, offset=options.offset)

...
}}}

As can be seen an Img_Info class simply must define the read and
get_size methods to be a fully functional Img_Info. We then
instantiate this object, and pass it to FS_Info which automatically
uses the python implementation to access the image.

In this way we can provide the sleuthkit with a virtualized image
format, allowing for multiple format support.
 
File Type Py Version Uploaded on Size
pytsk3-2013-12-30.tgz (md5) Source 2014-02-03 87KB
  • Downloads (All Versions):
  • 2 downloads in the last day
  • 22 downloads in the last week
  • 175 downloads in the last month