internetarchive

A python interface to archive.org.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

https://travis-ci.org/jjjake/ia-wrapper.png?branch=master

https://pypip.in/d/internetarchive/badge.png

Installation

You can install this module via pip:

pip install internetarchive

Downloading

The Internet Archive stores data in items. You can query the archive using an item identifier:

>>> import internetarchive
>>> item = internetarchive.Item('stairs')
>>> print item.metadata

Items contains files. You can download the entire item:

>>> item.download()

or you can download just a particular file:

>>> f = item.file('glogo.png')
>>> f.download() #writes to disk
>>> f.download('/foo/bar/some_other_name.png')

You can iterate over files:

>>> for f in item.files():
...     print f.name, f.sha1

Uploading from Python

You can use the IA’s S3-like interface to upload files to an item. You need to supply your IAS3 credentials in environment variables in order to upload. You can retrieve S3 keys from https://archive.org/account/s3.php

>>> import os
>>> os.environ['AWS_ACCESS_KEY_ID']='x'
>>> os.environ['AWS_SECRET_ACCESS_KEY']='y'
>>> item = internetarchive.Item('new_identifier')
>>> item.upload('/path/to/image.jpg', dict(mediatype='image', creator='Jake Johnson'))

Item-level metadata must be supplied with the first file uploaded to an item.

You can upload additional files to an existing item:

>>> item = internetarchive.Item('existing_identifier')
>>> item.upload(['/path/to/image2.jpg', '/path/to/image3.jpg'])

You can also upload file-like objects:

>>> import StringIO
>>> fh = StringIO.StringIO('hello world')
>>> fh.name = 'hello_world.txt
>>> item.upload(fh)

Uploading from the command-line

You can use the provided ia command-line tool to upload items:

$ export AWS_ACCESS_KEY_ID='xxx'
$ export AWS_SECRET_ACCESS_KEY='yyy'

$ ia upload new_identifier file1.txt file2.txt --metadata="title=foo" --metadata="blah=arg"

Modifying Metadata

You can modify metadata for existing items, using the item.modify_metadata() function. This uses the IA Metadata API under the hood and requires your IAS3 credentials.

>>> import os
>>> os.environ['AWS_ACCESS_KEY_ID']='x'
>>> os.environ['AWS_SECRET_ACCESS_KEY']='y'
>>> item = internetarchive.Item('my_identifier')
>>> md = dict(blah='one', foo=['two', 'three'])
>>> item.modify_metadata(md)

You can also use the provided ia command-line tool to modify metadata. Be sure that the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables are set.

$ ia metadata my_identifier --modify foo=bar baz=foooo

Searching

You can search for items using the archive.org advanced search engine:

>>> import internetarchive
>>> search = internetarchive.Search('collection:nasa')
>>> print search.num_found
186911

You can iterate over your results:

>>> for result in search.results:
...     print result['identifier']

You can also search using the provided ia command-line script:

$ ia search 'collection:usenet'

A note about uploading items with mixed-case names

The Internet Archive allows mixed-case item identifiers, but Amazon S3 does not allow mixed-case bucket names. The internetarchive python module is built on top of the boto S3 module. boto disallows creation of mixed-case buckets, but allows you to download from existing mixed-case buckets. If you wish to upload a new item to the Internet Archive with a mixed-case item identifier, you will need to monkey-patch the boto.s3.connection.check_lowercase_bucketname function:

>>> import boto
>>> def check_lowercase_bucketname(n):
...     return True

>>> boto.s3.connection.check_lowercase_bucketname = check_lowercase_bucketname

>>> item = internetarchive.Item('TestUpload_pythonapi_20130812')
>>> item.upload('file.txt', dict(mediatype='texts', creator='Internet Archive'))
True

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

4.1.0

May 7, 2024

4.0.1

Apr 15, 2024

3.7.0

Mar 19, 2024

3.6.0

Dec 27, 2023

3.5.0

May 9, 2023

3.4.0

Apr 5, 2023

3.3.0

Jan 30, 2023

3.2.0

Jan 6, 2023

3.1.0

Jan 6, 2023

3.0.2

Jun 15, 2022

3.0.1

Jun 2, 2022

3.0.0

Mar 17, 2022

2.3.0

Jan 20, 2022

2.2.0

Nov 23, 2021

2.1.0

Aug 27, 2021

2.0.3

May 3, 2021

2.0.2

Apr 6, 2021

2.0.0

Apr 5, 2021

1.9.9

Jan 27, 2021

1.9.6

Nov 10, 2020

1.9.5

Sep 18, 2020

1.9.4

Jun 24, 2020

1.9.3

Apr 7, 2020

1.9.2

Mar 15, 2020

1.9.0

Dec 5, 2019

1.8.5

Jun 7, 2019

1.8.4

Apr 11, 2019

1.8.3

Mar 29, 2019

1.8.2

Mar 28, 2019

1.8.1

Jun 28, 2018

1.8.0

Jun 28, 2018

1.7.7

Mar 5, 2018

1.7.6

Jan 5, 2018

1.7.5

Dec 7, 2017

1.7.4

Nov 6, 2017

1.7.3

Sep 21, 2017

1.7.2

Sep 11, 2017

1.7.1

Jul 25, 2017

1.7.0

Jul 25, 2017

1.6.0

Jun 27, 2017

1.5.0

Feb 17, 2017

1.4.0

Jan 26, 2017

1.3.0

Jan 26, 2017

1.2.0

Jan 26, 2017

1.1.0

Nov 18, 2016

1.0.10

Sep 20, 2016

1.0.9

Aug 16, 2016

1.0.8

Aug 10, 2016

1.0.7

Aug 3, 2016

1.0.6

Jul 14, 2016

1.0.5

Jul 7, 2016

1.0.4

Jun 28, 2016

1.0.3

May 17, 2016

1.0.2

Mar 8, 2016

1.0.1

Mar 4, 2016

1.0.0

Mar 1, 2016

0.9.8

Nov 9, 2015

0.9.7

Nov 5, 2015

0.9.6

Oct 12, 2015

0.9.5

Oct 12, 2015

0.9.3

Sep 28, 2015

0.9.2

Aug 17, 2015

0.9.1

Aug 13, 2015

0.9.0

Aug 13, 2015

0.8.9

Aug 13, 2015

0.8.5

Jul 23, 2015

0.8.4

Jun 18, 2015

0.8.3

May 18, 2015

0.8.2

May 12, 2015

0.8.1

Mar 17, 2015

0.8.0

Mar 9, 2015

0.7.9

Jan 26, 2015

0.7.8

Dec 23, 2014

0.7.7

Dec 18, 2014

0.7.6

Dec 17, 2014

0.7.5

Oct 8, 2014

0.7.4

Oct 8, 2014

0.7.3

Oct 8, 2014

0.7.2

Sep 16, 2014

0.7.1

Aug 25, 2014

0.7.0

Jul 23, 2014

0.6.9

Jul 15, 2014

0.6.8

Jul 11, 2014

0.6.6

Jun 6, 2014

0.6.5

May 29, 2014

0.6.3

May 22, 2014

0.6.2

May 16, 2014

0.6.1

May 15, 2014

0.6.0

May 15, 2014

0.5.9

May 15, 2014

0.5.7

Apr 18, 2014

0.5.5

Apr 14, 2014

0.5.4

Apr 3, 2014

0.5.2

Mar 25, 2014

0.5.1

Jan 31, 2014

0.5.0

Jan 25, 2014

0.4.9

Dec 12, 2013

0.4.8

Nov 26, 2013

0.4.7

Nov 16, 2013

0.4.6

Nov 11, 2013

0.4.5

Nov 11, 2013

0.4.4

Oct 21, 2013

0.4.3

Oct 9, 2013

0.4.2

Oct 9, 2013

0.4.1

Oct 9, 2013

0.4.0

Oct 8, 2013

0.3.9

Oct 7, 2013

0.3.8

Oct 2, 2013

0.3.7

Sep 27, 2013

0.3.6

Sep 25, 2013

0.3.4

Sep 23, 2013

0.3.3

Sep 22, 2013

0.3.2

Sep 17, 2013

0.3.1

Sep 17, 2013

0.3.0

Sep 17, 2013

0.2.9

Sep 16, 2013

0.2.8

Sep 10, 2013

0.2.7

Aug 29, 2013

This version

0.2.6

Aug 29, 2013

0.2.5

Aug 22, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

internetarchive-0.2.6.tar.gz (24.1 kB view hashes)

Uploaded Aug 29, 2013 Source

Hashes for internetarchive-0.2.6.tar.gz

Hashes for internetarchive-0.2.6.tar.gz
Algorithm	Hash digest
SHA256	`82349c87cec6cf540cbd8edf26473e975f1f9e159bf5abce202ffd55f287ceb2`
MD5	`ad387b8a28d67220315f259b721db068`
BLAKE2b-256	`25318e25dbf43034203d5e8053d6d8336b0f44a600ae34d0e6b1be5b6dedb179`