skip to navigation
skip to content

Not Logged In

pycaption 0.4.6

Closed caption converter


|Build Status|

``pycaption`` is a caption reading/writing module. Use one of the given
Readers to read content into a CaptionSet object,
and then use one of the Writers to output the CaptionSet into
captions of your desired format.

Requires Python 2.7.

Turn a caption into multiple caption outputs:


srt_caps = u'''1
00:00:09,209 --> 00:00:12,312
This is an example SRT file,
which, while extremely short,
is still a valid SRT file.

converter = CaptionConverter(), SRTReader())
print converter.write(SAMIWriter())
print converter.write(DFXPWriter())
print converter.write(pycaption.transcript.TranscriptWriter())

Not sure what format the caption is in? Detect it:


from pycaption import detect_format

caps = u'''1
00:00:01,500 --> 00:00:12,345
Small caption'''

reader = detect_format(caps)
if reader:
print SAMIWriter().write(reader().read(caps))

Or if you expect to have only a subset of the supported input formats:


caps = u'''1
00:00:01,500 --> 00:00:12,345
Small caption'''

if SRTReader().detect(caps):
print SAMIWriter().write(SRTReader().read(caps))
elif DFXPReader().detect(caps):
print SAMIWriter().write(DFXPReader().read(caps))
elif SCCReader().detect(caps):
print SAMIWriter().write(SCCReader().read(caps))

Supported Formats


Write: - DFXP/TTML - SAMI - SRT - Transcript - WebVTT

See the `examples
folder <https:"" pbs="" pycaption="" tree="" master="" examples=""/>`__ for
example captions that currently can be read correctly.

Python Usage

Example: Convert from SAMI to DFXP


from pycaption import SAMIReader, DFXPWriter

sami = u'''<sami><head><title>NOVA3213</title><style type="text/css">
<sync start="9209"><p class="ENCC">
( clock ticking )
</p><p class="FRCC">
<sync start="12312"><p class="ENCC"> </p></sync>
<sync start="14848"><p class="ENCC">

<span style="text-align:center;font-size:10">When <i>we</i> think</span>

of E equals m c-squared,
</p><p class="FRCC">

print DFXPWriter().write(SAMIReader().read(sami))

Which will output the following:


<tt xml:lang="en" xmlns="" xmlns:tts="">
<style id="p" tts:color="#fff" tts:fontfamily="Arial" tts:fontsize="10pt" tts:textalign="center"/>
<div xml:lang="fr-cc">
<p begin="00:00:09.209" end="00:00:14.848" style="p">
<p begin="00:00:14.848" end="00:00:18.848" style="p">
<div xml:lang="en-US">
<p begin="00:00:09.209" end="00:00:12.312" style="p">
( clock ticking )
<p begin="00:00:14.848" end="00:00:18.848" style="p">

<span tts:fontsize="10" tts:textalign="center">When</span> <span tts:fontstyle="italic">we</span> think

of E equals m c-squared,


Different readers and writers are easy to add if you would like to: -
Read/Write a previously unsupported format - Read/Write a supported
format in a different way (more styling?)

Simply follow the format of a current Reader or Writer, and edit to your
heart's desire.

SAMI Reader / Writer :: `spec <http:"" en-us="" library="" ms971327.aspx="">`__

Microsoft Synchronized Accessible Media Interchange. Supports multiple

Supported Styling: - text-align - italics - font-size - font-family -

If the SAMI file is not valid XML (e.g. unclosed tags), will still
attempt to read it.

DFXP/TTML Reader / Writer :: `spec <http:"" tr="" ttaf1-dfxp=""/>`__

The W3 standard. Supports multiple languages.

Supported Styling: - text-align - italics - font-size - font-family -

SRT Reader / Writer :: `spec <http:"" technical="" specs="" subtitles="" srt.html="">`__

SubRip captions. If given multiple languages to write, will output all
joined together by a 'MULTI-LANGUAGE SRT' line.

Supported Styling: - None

Assumes input language is english. To change:


pycaps = SRTReader().read(srt_content, lang='fr')

SCC Reader :: `spec <http:"" mcpoodle="" scc_tools="" docs="" scc_format.html="">`__

Scenarist Closed Caption format. Assumes Channel 1 input.

Supported Styling: - italics

By default, the SCC Reader does not simulate roll-up captions. To enable


pycaps = SCCReader().read(scc_content, simulate_roll_up=True)

Also, assumes input language is english. To change:


pycaps = SCCReader().read(scc_content, lang='fr')

Now has the option of specifying an offset (measured in seconds) for the
timestamp. For example, if the SCC file is 45 seconds ahead of the


pycaps = SCCReader().read(scc_content, offset=45)

The SCC Reader handles both dropframe and non-dropframe captions, and
will auto-detect which format the captions are in.

Transcript Writer

Text stripped of styling, arranged in sentences.

Supported Styling: - None

The transcript writer uses natural sentence boundary detection
algorithms to create the transcript.

WebVTT Reader / Writer `spec <http:"" html5="" webvtt=""/>`__

Web Video Text Tracks format.

Supported Styling - None (yet)


This module is Copyright 2012 and is available under the `Apache
License, Version 2.0 <http:"" licenses="" license-2.0="">`__.

.. |Build Status| image::
File Type Py Version Uploaded on Size
pycaption-0.4.6.tar.gz (md5) Source 2015-07-03 182KB
  • Downloads (All Versions):
  • 90 downloads in the last day
  • 900 downloads in the last week
  • 3357 downloads in the last month