internetarchive

A python interface to archive.org.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: GNU Affero General Public License v3
Natural Language
- English
Programming Language

Project description

A python interface to archive.org

https://travis-ci.org/jjjake/internetarchive.svg

https://img.shields.io/pypi/dm/internetarchive.svg

This package installs a CLI tool named ia for using archive.org from the command-line. It also installs the internetarchive python module for programatic access to archive.org. Please report all bugs and issues on Github.

Installation

You can install this module via pip:

pip install internetarchive

Alternatively, you can install a few extra dependencies to help speed things up a bit:

pip install "internetarchive[speedups]"

This will install ujson for faster JSON parsing, and gevent for concurrent downloads.

If you want to install this module globally on your system instead of inside a virtualenv, use sudo:

sudo pip install internetarchive

Configuring

You can configure both the ia command-line tool and the Python interface from the command-line:

$ ia configure

You will be prompted to enter your Archive.org login credentials. If authorization is successful a config file will be saved on your computer that contains your Archive.org S3 keys for uploading and modifying metadata.

Command-Line Usage

Help is available by typing ia --help. You can also get help on a command: ia <command> --help. Available subcommands are configure, metadata, upload, download, search, delete, list, and catalog.

Downloading

To download the entire TripDown1905 item:

$ ia download TripDown1905

ia download usage examples:

#download just the mp4 files using ``--glob``
$ ia download TripDown1905 --glob='*.mp4'

#download all the mp4 files using ``--formats``:
$ ia download TripDown1905 --format='512Kb MPEG4'

#download multiple formats from an item:
$ ia download TripDown1905 --format='512Kb MPEG4' --format='Ogg Video'

#list all the formats in an item:
$ ia metadata --formats TripDown1905

#download a single file from an item:
$ ia download TripDown1905 TripDown1905_512kb.mp4

#download multiple files from an item:
$ ia download TripDown1905 TripDown1905_512kb.mp4 TripDown1905.ogv

Uploading

You can use the provided ia command-line tool to upload items. After configuring ia, you can upload files like so:

#upload files:
$ ia upload <identifier> file1 file2 --metadata="title:foo" --metadata="blah:arg"

#upload from `stdin`:
$ curl http://dumps.wikimedia.org/kywiki/20130927/kywiki-20130927-pages-logging.xml.gz |
  ia upload <identifier> - --remote-name=kywiki-20130927-pages-logging.xml.gz --metadata="title:Uploaded from stdin."

Metadata

You can use the ia command-line tool to download item metadata in JSON format:

$ ia metadata TripDown1905

You can also modify metadata after configuring ia.

$ ia metadata <identifier> --modify="foo:bar" --modify="baz:foooo"

Data Mining

IA Mine can be used for data mining Archive.org metadata and search results: https://github.com/jjjake/iamine.

Searching

You can search using the provided ia command-line script:

$ ia search 'subject:"market street" collection:prelinger'

Parallel Downloading

If you have the GNU parallel tool intalled, then you can combine ia search and ia metadata to quickly retrieve data for many items in parallel:

$ia search 'subject:"market street" collection:prelinger' | parallel -j40 'ia metadata {} > {}_meta.json'

Python module usage

Below is brief overview of the internetarchive Python library. Please refer to the API documentation for more specific details.

Downloading from Python

The Internet Archive stores data in items. You can query the archive using an item identifier:

>>> from internetarchive import get_item
>>> item = get_item('stairs')
>>> print(item.metadata)

Items contains files. You can download the entire item:

>>> item.download()

or you can download just a particular file:

>>> f = item.get_file('glogo.png')
>>> f.download() #writes to disk
>>> f.download('/foo/bar/some_other_name.png')

You can iterate over files:

>>> for f in item.iter_files():
...     print(f.name, f.sha1)

Uploading from Python

You can use the IA’s S3-like interface to upload files to an item after configuring the internetarchive library.

>>> from internetarchive import get_item
>>> item = get_item('new_identifier')
>>> md = dict(mediatype='image', creator='Jake Johnson')
>>> item.upload('/path/to/image.jpg', metadata=md)

Item-level metadata must be supplied with the first file uploaded to an item.

You can upload additional files to an existing item:

>>> item = internetarchive.Item('existing_identifier')
>>> item.upload(['/path/to/image2.jpg', '/path/to/image3.jpg'])

You can also upload file-like objects:

>>> import StringIO
>>> fh = StringIO.StringIO('hello world')
>>> fh.name = 'hello_world.txt'
>>> item.upload(fh)

Modifying Metadata from Python

You can modify metadata for existing items, using the item.modify_metadata() function. This uses the IA Metadata API under the hood and requires your IAS3 credentials. So, once again make sure you have the internetarchive library configured.

>>> from internetarchive import get_item
>>> item = get_item('my_identifier')
>>> md = dict(blah='one', foo=['two', 'three'])
>>> item.modify_metadata(md)

Searching from Python

You can search for items using the archive.org advanced search engine:

>>> from internetarchive import search_items
>>> search = search_items('collection:nasa')
>>> print(search.num_found)
186911

You can iterate over your results:

>>> for result in search:
...     print(result['identifier'])

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: GNU Affero General Public License v3
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

4.0.1

Apr 15, 2024

3.7.0

Mar 19, 2024

3.6.0

Dec 27, 2023

3.5.0

May 9, 2023

3.4.0

Apr 5, 2023

3.3.0

Jan 30, 2023

3.2.0

Jan 6, 2023

3.1.0

Jan 6, 2023

3.0.2

Jun 15, 2022

3.0.1

Jun 2, 2022

3.0.0

Mar 17, 2022

2.3.0

Jan 20, 2022

2.2.0

Nov 23, 2021

2.1.0

Aug 27, 2021

2.0.3

May 3, 2021

2.0.2

Apr 6, 2021

2.0.0

Apr 5, 2021

1.9.9

Jan 27, 2021

1.9.6

Nov 10, 2020

1.9.5

Sep 18, 2020

1.9.4

Jun 24, 2020

1.9.3

Apr 7, 2020

1.9.2

Mar 15, 2020

1.9.0

Dec 5, 2019

1.8.5

Jun 7, 2019

1.8.4

Apr 11, 2019

1.8.3

Mar 29, 2019

1.8.2

Mar 28, 2019

1.8.1

Jun 28, 2018

1.8.0

Jun 28, 2018

1.7.7

Mar 5, 2018

1.7.6

Jan 5, 2018

1.7.5

Dec 7, 2017

1.7.4

Nov 6, 2017

1.7.3

Sep 21, 2017

1.7.2

Sep 11, 2017

1.7.1

Jul 25, 2017

1.7.0

Jul 25, 2017

1.6.0

Jun 27, 2017

1.5.0

Feb 17, 2017

1.4.0

Jan 26, 2017

1.3.0

Jan 26, 2017

1.2.0

Jan 26, 2017

1.1.0

Nov 18, 2016

1.0.10

Sep 20, 2016

1.0.9

Aug 16, 2016

1.0.8

Aug 10, 2016

1.0.7

Aug 3, 2016

1.0.6

Jul 14, 2016

1.0.5

Jul 7, 2016

1.0.4

Jun 28, 2016

1.0.3

May 17, 2016

1.0.2

Mar 8, 2016

1.0.1

Mar 4, 2016

1.0.0

Mar 1, 2016

0.9.8

Nov 9, 2015

0.9.7

Nov 5, 2015

0.9.6

Oct 12, 2015

0.9.5

Oct 12, 2015

0.9.3

Sep 28, 2015

0.9.2

Aug 17, 2015

0.9.1

Aug 13, 2015

0.9.0

Aug 13, 2015

0.8.9

Aug 13, 2015

This version

0.8.5

Jul 23, 2015

0.8.4

Jun 18, 2015

0.8.3

May 18, 2015

0.8.2

May 12, 2015

0.8.1

Mar 17, 2015

0.8.0

Mar 9, 2015

0.7.9

Jan 26, 2015

0.7.8

Dec 23, 2014

0.7.7

Dec 18, 2014

0.7.6

Dec 17, 2014

0.7.5

Oct 8, 2014

0.7.4

Oct 8, 2014

0.7.3

Oct 8, 2014

0.7.2

Sep 16, 2014

0.7.1

Aug 25, 2014

0.7.0

Jul 23, 2014

0.6.9

Jul 15, 2014

0.6.8

Jul 11, 2014

0.6.6

Jun 6, 2014

0.6.5

May 29, 2014

0.6.3

May 22, 2014

0.6.2

May 16, 2014

0.6.1

May 15, 2014

0.6.0

May 15, 2014

0.5.9

May 15, 2014

0.5.7

Apr 18, 2014

0.5.5

Apr 14, 2014

0.5.4

Apr 3, 2014

0.5.2

Mar 25, 2014

0.5.1

Jan 31, 2014

0.5.0

Jan 25, 2014

0.4.9

Dec 12, 2013

0.4.8

Nov 26, 2013

0.4.7

Nov 16, 2013

0.4.6

Nov 11, 2013

0.4.5

Nov 11, 2013

0.4.4

Oct 21, 2013

0.4.3

Oct 9, 2013

0.4.2

Oct 9, 2013

0.4.1

Oct 9, 2013

0.4.0

Oct 8, 2013

0.3.9

Oct 7, 2013

0.3.8

Oct 2, 2013

0.3.7

Sep 27, 2013

0.3.6

Sep 25, 2013

0.3.4

Sep 23, 2013

0.3.3

Sep 22, 2013

0.3.2

Sep 17, 2013

0.3.1

Sep 17, 2013

0.3.0

Sep 17, 2013

0.2.9

Sep 16, 2013

0.2.8

Sep 10, 2013

0.2.7

Aug 29, 2013

0.2.6

Aug 29, 2013

0.2.5

Aug 22, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

internetarchive-0.8.5.tar.gz (48.2 kB view hashes)

Uploaded Jul 23, 2015 Source

Hashes for internetarchive-0.8.5.tar.gz

Hashes for internetarchive-0.8.5.tar.gz
Algorithm	Hash digest
SHA256	`2ba5e8db802953b1ac25a73c88d0955df2e4cd947bc664d94d6000003b91f14e`
MD5	`d23bc9461aed69d91af07671fce81f7b`
BLAKE2b-256	`3325b80d551da28d6b22f40a3f6e89cbadc9fb41b334baafcfe7d9dfc8a77130`