skip to navigation
skip to content

internetarchive 0.2.6

A python interface to

Latest Version: 1.7.7


You can install this module via pip:

pip install internetarchive


The Internet Archive stores data in items. You can query the archive using an item identifier:

>>> import internetarchive
>>> item = internetarchive.Item('stairs')
>>> print item.metadata

Items contains files. You can download the entire item:


or you can download just a particular file:

>>> f = item.file('glogo.png')
>>> #writes to disk

You can iterate over files:

>>> for f in item.files():
...     print, f.sha1

Uploading from Python

You can use the IA’s S3-like interface to upload files to an item. You need to supply your IAS3 credentials in environment variables in order to upload. You can retrieve S3 keys from

>>> import os
>>> os.environ['AWS_ACCESS_KEY_ID']='x'
>>> os.environ['AWS_SECRET_ACCESS_KEY']='y'
>>> item = internetarchive.Item('new_identifier')
>>> item.upload('/path/to/image.jpg', dict(mediatype='image', creator='Jake Johnson'))

Item-level metadata must be supplied with the first file uploaded to an item.

You can upload additional files to an existing item:

>>> item = internetarchive.Item('existing_identifier')
>>> item.upload(['/path/to/image2.jpg', '/path/to/image3.jpg'])

You can also upload file-like objects:

>>> import StringIO
>>> fh = StringIO.StringIO('hello world')
>>> = 'hello_world.txt
>>> item.upload(fh)

Uploading from the command-line

You can use the provided ia command-line tool to upload items:

$ export AWS_ACCESS_KEY_ID='xxx'
$ export AWS_SECRET_ACCESS_KEY='yyy'

$ ia upload new_identifier file1.txt file2.txt --metadata="title=foo" --metadata="blah=arg"

Modifying Metadata

You can modify metadata for existing items, using the item.modify_metadata() function. This uses the IA Metadata API under the hood and requires your IAS3 credentials.

>>> import os
>>> os.environ['AWS_ACCESS_KEY_ID']='x'
>>> os.environ['AWS_SECRET_ACCESS_KEY']='y'
>>> item = internetarchive.Item('my_identifier')
>>> md = dict(blah='one', foo=['two', 'three'])
>>> item.modify_metadata(md)

You can also use the provided ia command-line tool to modify metadata. Be sure that the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables are set.

$ ia metadata my_identifier --modify foo=bar baz=foooo


You can search for items using the advanced search engine:

>>> import internetarchive
>>> search = internetarchive.Search('collection:nasa')
>>> print search.num_found

You can iterate over your results:

>>> for result in search.results:
...     print result['identifier']

You can also search using the provided ia command-line script:

$ ia search 'collection:usenet'

A note about uploading items with mixed-case names

The Internet Archive allows mixed-case item identifiers, but Amazon S3 does not allow mixed-case bucket names. The internetarchive python module is built on top of the boto S3 module. boto disallows creation of mixed-case buckets, but allows you to download from existing mixed-case buckets. If you wish to upload a new item to the Internet Archive with a mixed-case item identifier, you will need to monkey-patch the boto.s3.connection.check_lowercase_bucketname function:

>>> import boto
>>> def check_lowercase_bucketname(n):
...     return True

>>> boto.s3.connection.check_lowercase_bucketname = check_lowercase_bucketname

>>> item = internetarchive.Item('TestUpload_pythonapi_20130812')
>>> item.upload('file.txt', dict(mediatype='texts', creator='Internet Archive'))
File Type Py Version Uploaded on Size
internetarchive-0.2.6.tar.gz (md5) Source 2013-08-29 23KB