skip to navigation
skip to content

Not Logged In

pycaption 0.3.4

Closed caption converter


|Build Status|

``pycaption`` is a caption reading/writing module. Use one of the given
Readers to read content into a CaptionSet object,
and then use one of the Writers to output the CaptionSet into
captions of your desired format.

Turn a caption into multiple caption outputs:


    srt_caps = '''1
    00:00:09,209 --> 00:00:12,312
    This is an example SRT file,
    which, while extremely short,
    is still a valid SRT file.

    converter = CaptionConverter(), SRTReader())
    print converter.write(SAMIWriter())
    print converter.write(DFXPWriter())
    print converter.write(pycaption.transcript.TranscriptWriter())

Not sure what format the caption is in? Detect it:


    from pycaption import detect_format

    caps = '''1
    00:00:01,500 --> 00:00:12,345
    Small caption'''

    reader = detect_format(caps)
    if reader:
        print SAMIWriter().write(reader().read(caps))

Or if you expect to have only a subset of the supported input formats:


    caps = '''1
    00:00:01,500 --> 00:00:12,345
    Small caption'''

    if SRTReader().detect(caps):
        print SAMIWriter().write(SRTReader().read(caps))
    elif DFXPReader().detect(caps):
        print SAMIWriter().write(DFXPReader().read(caps))
    elif SCCReader().detect(caps):
        print SAMIWriter().write(SCCReader().read(caps))

Supported Formats


Write: - DFXP/TTML - SAMI - SRT - Transcript - WebVTT

See the `examples
folder <>`__ for
example captions that currently can be read correctly.

Python Usage

Example: Convert from SAMI to DFXP


    from pycaption import SAMIReader, DFXPWriter

    sami = '''<SAMI><HEAD><TITLE>NOVA3213</TITLE><STYLE TYPE="text/css">
    P { margin-left:  1pt;
        margin-right: 1pt;
        margin-bottom: 2pt;
        margin-top: 2pt;
        text-align: center;
        font-size: 10pt;
        font-family: Arial;
        font-weight: normal;
        font-style: normal;
        color: #ffffff; }

    .ENCC {Name: English; lang: en-US; SAMI_Type: CC;}
    .FRCC {Name: French; lang: fr-cc; SAMI_Type: CC;}

    <SYNC start="9209"><P class="ENCC">
           ( clock ticking )
    </P><P class="FRCC">
           FRENCH LINE 1!
    <SYNC start="12312"><P class="ENCC">&nbsp;</P></SYNC>
    <SYNC start="14848"><P class="ENCC">
             <span style="text-align:center;font-size:10">When <i>we</i> think</span><br/>
        of E equals m c-squared,
    </P><P class="FRCC">
           FRENCH LINE 2?

    print DFXPWriter().write(SAMIReader().read(sami))

Which will output the following:


    <?xml version="1.0" encoding="utf-8"?>
    <tt xml:lang="en" xmlns="" xmlns:tts="">
       <style id="p" tts:color="#fff" tts:fontfamily="Arial" tts:fontsize="10pt" tts:textAlign="center"/>
      <div xml:lang="fr-cc">
       <p begin="00:00:09.209" end="00:00:14.848" style="p">
        FRENCH LINE 1!
       <p begin="00:00:14.848" end="00:00:18.848" style="p">
        FRENCH LINE 2?
      <div xml:lang="en-US">
       <p begin="00:00:09.209" end="00:00:12.312" style="p">
        ( clock ticking )
       <p begin="00:00:14.848" end="00:00:18.848" style="p">
        <span tts:fontsize="10" tts:textAlign="center">When</span> <span tts:fontStyle="italic">we</span> think<br/>
        of E equals m c-squared,


Different readers and writers are easy to add if you would like to: -
Read/Write a previously unsupported format - Read/Write a supported
format in a different way (more styling?)

Simply follow the format of a current Reader or Writer, and edit to your
heart's desire.

SAMI Reader / Writer :: `spec <>`__

Microsoft Synchronized Accessible Media Interchange. Supports multiple

Supported Styling: - text-align - italics - font-size - font-family -

If the SAMI file is not valid XML (e.g. unclosed tags), will still
attempt to read it.

DFXP/TTML Reader / Writer :: `spec <>`__

The W3 standard. Supports multiple languages.

Supported Styling: - text-align - italics - font-size - font-family -

SRT Reader / Writer :: `spec <>`__

SubRip captions. If given multiple languages to write, will output all
joined together by a 'MULTI-LANGUAGE SRT' line.

Supported Styling: - None

Assumes input language is english. To change:


    pycaps = SRTReader().read(srt_content, lang='fr')

SCC Reader :: `spec <>`__

Scenarist Closed Caption format. Assumes Channel 1 input.

Supported Styling: - italics

By default, the SCC Reader does not simulate roll-up captions. To enable


    pycaps = SCCReader().read(scc_content, simulate_roll_up=True)

Also, assumes input language is english. To change:


    pycaps = SCCReader().read(scc_content, lang='fr')

Now has the option of specifying an offset (measured in seconds) for the
timestamp. For example, if the SCC file is 45 seconds ahead of the


    pycaps = SCCReader().read(scc_content, offset=45)

The SCC Reader handles both dropframe and non-dropframe captions, and
will auto-detect which format the captions are in.

Transcript Writer

Text stripped of styling, arranged in sentences.

Supported Styling: - None

The transcript writer uses natural sentence boundary detection
algorithms to create the transcript.

WebVTT Reader / Writer `spec <>`__

Web Video Text Tracks format.

Supported Styling - None (yet)


This module is Copyright 2012 and is available under the `Apache
License, Version 2.0 <>`__.

.. |Build Status| image::
File Type Py Version Uploaded on Size
pycaption-0.3.4.tar.gz (md5) Source 2014-03-21 181KB
  • Downloads (All Versions):
  • 64 downloads in the last day
  • 509 downloads in the last week
  • 2412 downloads in the last month