Skip to main content

GuessIt - a library for guessing information from video files.

Project description

GuessIt

Latest Version License Build Status Coveralls

GuessIt is a python library that extracts as much information as possible from a video file.

It has a very powerful filename matcher that allows to guess a lot of metadata from a video using its filename only. This matcher works with both movies and tv shows episodes.

For example, GuessIt can do the following:

$ guessit "Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi"
For: Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi
GuessIt found: {
    [1.00] "mimetype": "video/x-msvideo",
    [0.80] "episodeNumber": 3,
    [0.80] "videoCodec": "XviD",
    [1.00] "container": "avi",
    [1.00] "format": "HDTV",
    [0.70] "series": "Treme",
    [0.50] "title": "Right Place, Wrong Time",
    [0.80] "releaseGroup": "NoTV",
    [0.80] "season": 1,
    [1.00] "type": "episode"
}

Install

Installing GuessIt is simple with pip:

$ pip install guessit

or, with easy_install:

$ easy_install guessit

But, you really shouldn’t do that.

You can now launch a demo:

$ guessit -d

and guess your own filename:

$ guessit "Breaking.Bad.S05E08.720p.MP4.BDRip.[KoTuWa].mkv"
For: Breaking.Bad.S05E08.720p.MP4.BDRip.[KoTuWa].mkv
GuessIt found: {
    [1.00] "mimetype": "video/x-matroska",
    [1.00] "episodeNumber": 8,
    [0.30] "container": "mkv",
    [1.00] "format": "BluRay",
    [0.70] "series": "Breaking Bad",
    [1.00] "releaseGroup": "KoTuWa",
    [1.00] "screenSize": "720p",
    [1.00] "season": 5,
    [1.00] "type": "episode"
}

Filename matcher

The filename matcher is based on regular expressions and tree splitting to guess values from input filename.

It is able to find many properties, like title, year, series, episodeNumber, seasonNumber, videoCodec, screenSize, language. Guessed values are cleaned up and given in a readable format which may not match the raw filename.

DVDSCR will be guessed as format = DVD + other = Screener.

1920x1080 will be guessed as screenSize = 1080p.

DD5.1 will be guessed as audioCodec = DolbyDigital + audioChannel = 5.1.

Here’s the exhaustive list of properties that guessit can find:

Main properties

  • type

    Type of the file.

    • unknown, movie, episode, moviesubtitle, episodesubtitle

  • title

    Title of movie or episode.

  • container

    Container of the file.

    • 3g2, wmv, webm, mp4, avi, mp4a, mpeg, sub, mka, m4v, ts, mkv, ra, rm, wma, ass, mpg, ram, 3gp, ogv, mov, ogm, asf, divx, ogg, ssa, qt, idx, nfo, wav, flv, 3gp2, iso, mk2, srt

  • date

    Date found in filename.

  • year

    Year of movie (or episode).

  • releaseGroup

    Name of (non)scene group that released the file.

  • website

    Name of website contained in the filename.

Episode properties

  • series

    Name of series.

  • season

    Season number.

  • episodeNumber

    Episode number.

  • episodeList

    List of episode numbers if several were found.

    • note: If several are found, episodeNumber is the first item of this list.

  • seasonList

    List of season numbers if several were found.

    • note: If several are found, seasonNumber is the first item of this list.

  • episodeCount

    Total number of episodes.

  • seasonCount

    Total number of seasons.

  • episodeDetails

    Some details about the episode.

    • Bonus Oav Ova Omake Extras Unaired Special Pilot

  • episodeFormat

    Episode format of the series.

    • Minisode

  • part

    Part number of the episode.

  • version

    Version of the episode.

    • In anime fansub scene, new versions are released with tag <episode>v[0-9].

Video properties

  • format

    Format of the initial source

    • HDTV WEB-DL TV VOD BluRay DVD WEBRip Workprint Telecine VHS DVB Telesync HD-DVD PPV Cam

  • screenSize

    Resolution of video. - 720p 1080p 1080i <width>x<height> 4K 360p 368p 480p 576p 900p

  • videoCodec Codec used for video.

    • h264 h265 DivX XviD Real Mpeg2

  • videoProfile Codec profile used for video.

    • 8bit 10bit HP BP MP XP Hi422P Hi444PP

  • videoApi API used for the video.

    • DXVA

Audio properties

  • audioChannels

    Number of channels for audio.

    • 1.0 2.0 5.1 7.1

  • audioCodec Codec used for audio.

    • DTS TrueHD DolbyDigital AAC AC3 MP3 Flac

  • audioProfile The codec profile used for audio.

    • LC HQ HD HE HDMA

Localization properties

  • Country

    Country(ies) of content. Often found in series, Shameless (US) for instance.

    • [<babelfish.Country>] (This class equals name and iso code)

  • Language

    Language(s) of the audio soundtrack.

    • [<babelfish.Language>] (This class equals name and iso code)

  • subtitleLanguage

    Language(s) of the subtitles.

    • [<babelfish.Language>] (This class equals name and iso code)

Other properties

  • bonusNumber

    Bonus number.

  • bonusTitle

    Bonus title.

  • cdNumber

    CD number.

  • cdNumberTotal

    Total number of CD.

  • crc32

    CRC32 of the file.

  • idNumber

    Volume identifier (UUID).

  • edition

    Edition of the movie.

    • Special Edition, Collector Edition, Director's cut, Criterion Edition, Deluxe Edition

  • filmNumber

    Film number of this movie.

  • filmSeries

    Film series of this movie.

  • other

    Other property will appear under this property.

    • Fansub, HR, HQ, Netflix, Screener, Unrated, HD, 3D, SyncFix, Bonus, WideScreen, Fastsub, R5, AudioFix, DDC, Trailer, Complete, Limited, Classic, Proper, DualAudio, LiNE

Other features

GuessIt also allows you to compute a whole lof of hashes from a file, namely all the ones you can find in the hashlib python module (md5, sha1, …), but also the Media Player Classic hash that is used (amongst others) by OpenSubtitles and SMPlayer, as well as the ed2k hash.

If you have the ‘guess-language’ python package installed, GuessIt can also analyze a subtitle file’s contents and detect which language it is written in.

If you have the ‘enzyme’ python package installed, GuessIt can also detect the properties from the actual video file metadata.

Usage

guessit can be use from command line:

$ guessit
Usage: guessit [options] file1 [file2...]

Options:
  -h, --help            show this help message and exit
  -P SHOW_PROPERTY, --show-property=SHOW_PROPERTY
                        Display the value of a single property (title, series,
                        videoCodec, year, type ...)

  Naming:
    -t TYPE, --type=TYPE
                        The suggested file type: movie, episode. If undefined,
                        type will be guessed.
    -n, --name-only     Parse files as name only. Disable folder parsing,
                        extension parsing, and file content analysis.
    -c, --split-camel   Split camel case part of filename.
    -Y, --date-year-first
                        If short date is found, consider the first digits as
                        the year.
    -D, --date-day-first
                        If short date is found, consider the second digits as
                        the day.
    -E, --episode-prefer-number
                        Guess "serie.213.avi" as the episodeNumber 213.
                        Without this option, it will be guessed as season 2,
                        episodeNumber 13
    -L ALLOWED_LANGUAGES, --allowed-languages=ALLOWED_LANGUAGES
                        List of allowed languages. Separate languages codes
                        with ";"
    -C ALLOWED_COUNTRIES, --allowed-countries=ALLOWED_COUNTRIES
                        List of allowed countries. Separate country codes with
                        ";"
    -S EXPECTED_SERIES, --expected-series=EXPECTED_SERIES
                        List of expected series to parse. Separate series
                        names with ";"
    -T EXPECTED_TITLE, --expected-title=EXPECTED_TITLE
                        List of expected titles to parse. Separate title names
                        with ";"
    -G EXPECTED_GROUP, --expected-group=EXPECTED_GROUP
                        List of expected groups to parse. Separate group names
                        with ";"
    --disabled-transformers=DISABLED_TRANSFORMERS
                        List of transformers to disable. Separate transformers
                        names with ";"

  Output:
    -v, --verbose       Display debug output
    -a, --advanced      Display advanced information for filename guesses, as
                        json output
    -y, --yaml          Display information for filename guesses as yaml
                        output (like unit-test)
    -f INPUT_FILE, --input-file=INPUT_FILE
                        Read filenames from an input file.
    -d, --demo          Run a few builtin tests instead of analyzing a file

  Information:
    -p, --properties    Display properties that can be guessed.
    -V, --values        Display property values that can be guessed.
    -s, --transformers  Display transformers that can be used.
    --version           Display the guessit version.

  guessit.io:
    -b, --bug           Submit a wrong detection to the guessit.io service

  Other features:
    -i INFO, --info=INFO
                        The desired information type: filename, video,
                        hash_mpc or a hash from python's hashlib module, such
                        as hash_md5, hash_sha1, ...; or a list of any of them,
                        comma-separated

It can also be used as a python module:

>>> from guessit import guess_file_info
>>> guess_file_info('Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi')
{u'mimetype': 'video/x-msvideo', u'episodeNumber': 3, u'videoCodec': u'XviD', u'container': u'avi', u'format':     u'HDTV', u'series': u'Treme', u'title': u'Right Place, Wrong Time', u'releaseGroup': u'NoTV', u'season': 1, u'type': u'episode'}

Support

The project website for GuessIt is hosted at ReadTheDocs. There you will also find the User guide and Developer documentation.

This project is hosted on GitHub: https://github.com/wackou/guessit

Please report issues and/or feature requests via the bug tracker.

You can also report issues using the command-line tool:

$ guessit --bug "filename.that.fails.avi"

Contribute

GuessIt is under active development, and contributions are more than welcome!

  1. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug. There is a Contributor Friendly tag for issues that should be ideal for people who are not very familiar with the codebase yet.

  2. Fork the repository on Github to start making your changes to the master branch (or branch off of it).

  3. Write a test which shows that the bug was fixed or that the feature works as expected.

  4. Send a pull request and bug the maintainer until it gets merged and published. :)

License

GuessIt is licensed under the LGPLv3 license.

History

0.8.1 (unreleased)

  • Nothing changed yet.

0.8 (2014-07-06)

  • New webservice that allows to use GuessIt just by sending a POST request to the http://guessit.io/guess url

  • Command-line util can now report bugs to the http://guessit.io/bugs service by specifying the -b or --bug flag

  • GuessIt can now use the Enzyme python package to detect metadata out of the actual video file metadata instead of the filename

  • Finished transition to babelfish.Language and babelfish.Country

  • New property: duration which returns the duration of the video in seconds This requires the Enzyme package to work

  • New property: fileSize which returns the size of the file in bytes

  • Renamed property special to episodeDetails

  • Added support for Python 3.4

  • Optimization and bugfixes

0.7.1 (2014-03-03)

  • New property “special”: values can be trailer, pilot, unaired

  • New options for the guessit cmdline util: -y, --yaml outputs the result in yaml format and -n, --name-only analyzes the input as simple text (instead of filename)

  • Added properties formatters and validators

  • Removed support for python 3.2

  • A healthy amount of code cleanup/refactoring and fixes :)

0.7 (2014-01-29)

  • New plugin API that allows to register custom patterns / transformers

  • Uses Babelfish for language and country detection

  • Added Quality API to rate file quality from guessed property values

  • Better and more accurate overall detection

  • Added roman and word numeral detection

  • Added ‘videoProfile’ and ‘audioProfile’ property

  • Moved boolean properties to ‘other’ property value (‘is3D’ became ‘other’ = ‘3D’)

  • Added more possible values for various properties.

  • Added command line option to list available properties and values

  • Fixes for Python3 support

0.6.2 (2013-11-08)

  • Added support for nfo files

  • GuessIt can now output advanced information as json (‘-a’ on the command line)

  • Better language detection

  • Added new property: ‘is3D’

0.6.1 (2013-09-18)

  • New property “idNumber” that tries to identify a hash value or a serial number

  • The usual bugfixes

0.6 (2013-07-16)

  • Better packaging: unittests and doc included in source tarball

  • Fixes everywhere: unicode, release group detection, language detection, …

  • A few speed optimizations

0.5.4 (2013-02-11)

  • guessit can be installed as a system wide script (thanks @dplarson)

  • Enhanced logging facilities

  • Fixes for episode number and country detection

0.5.3 (2012-11-01)

  • GuessIt can now optionally act as a wrapper around the ‘guess-language’ python module, and thus provide detection of the natural language in which a body of text is written

  • Lots of fixes everywhere, mostly for properties and release group detection

0.5.2 (2012-10-02)

  • Much improved auto-detection of filetype

  • Fixed some issues with the detection of release groups

0.5.1 (2012-09-23)

  • now detects ‘country’ property; also detect ‘year’ property for series

  • more patterns and bugfixes

0.5 (2012-07-29)

  • Python3 compatibility

  • the usual assortment of bugfixes

0.4.2 (2012-05-19)

  • added Language.tmdb language code property for TheMovieDB

  • added ability to recognize list of episodes

  • bugfixes for Language.__nonzero__ and episode regexps

0.4.1 (2012-05-12)

  • bugfixes for unicode, paths on Windows, autodetection, and language issues

0.4 (2012-04-28)

  • much improved language detection, now also detect language variants

  • supports more video filetypes (thanks to Rob McMullen)

0.3.1 (2012-03-15)

  • fixed package installation from PyPI

  • better imports for the transformations (thanks Diaoul!)

  • some small language fixes

0.3 (2012-03-12)

  • fix to recognize 1080p format (thanks to Jonathan Lauwers)

0.3b2 (2012-03-02)

  • fixed the package installation

0.3b1 (2012-03-01)

  • refactored quite a bit, code is much cleaner now

  • fixed quite a few tests

  • re-vamped the documentation, wrote some more

0.2 (2011-05-27)

  • new parser/matcher completely replaced the old one

  • quite a few more unittests and fixes

0.2b1 (2011-05-20)

  • brand new parser/matcher that is much more flexible and powerful

  • lots of cleaning and a bunch of unittests

0.1 (2011-05-10)

  • fixed a few minor issues & heuristics

0.1b2 (2011-03-12)

  • Added PyPI trove classifiers

  • fixed version number in setup.py

0.1b1 (2011-03-12)

  • first pre-release version; imported from Smewt with a few enhancements already in there.

Project details


Release history Release notifications | RSS feed

This version

0.9.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

guessit-0.9.0.tar.gz (135.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page