skip to navigation
skip to content

Not Logged In

pycaption 0.4.6

Closed caption converter

py-caption
==========

|Build Status|

``pycaption`` is a caption reading/writing module. Use one of the given
Readers to read content into a CaptionSet object,
and then use one of the Writers to output the CaptionSet into
captions of your desired format.

Requires Python 2.7.

Turn a caption into multiple caption outputs:

::

srt_caps = u'''1
00:00:09,209 --> 00:00:12,312
This is an example SRT file,
which, while extremely short,
is still a valid SRT file.
'''

converter = CaptionConverter()
converter.read(srt_caps, SRTReader())
print converter.write(SAMIWriter())
print converter.write(DFXPWriter())
print converter.write(pycaption.transcript.TranscriptWriter())

Not sure what format the caption is in? Detect it:

::

from pycaption import detect_format

caps = u'''1
00:00:01,500 --> 00:00:12,345
Small caption'''

reader = detect_format(caps)
if reader:
print SAMIWriter().write(reader().read(caps))

Or if you expect to have only a subset of the supported input formats:

::

caps = u'''1
00:00:01,500 --> 00:00:12,345
Small caption'''

if SRTReader().detect(caps):
print SAMIWriter().write(SRTReader().read(caps))
elif DFXPReader().detect(caps):
print SAMIWriter().write(DFXPReader().read(caps))
elif SCCReader().detect(caps):
print SAMIWriter().write(SCCReader().read(caps))

Supported Formats
-----------------

Read: - DFXP/TTML - SAMI - SCC - SRT - WebVTT

Write: - DFXP/TTML - SAMI - SRT - Transcript - WebVTT

See the `examples
folder <https: github.com="" pbs="" pycaption="" tree="" master="" examples=""/>`__ for
example captions that currently can be read correctly.

Python Usage
------------

Example: Convert from SAMI to DFXP

::

from pycaption import SAMIReader, DFXPWriter

sami = u'''<sami><head><title>NOVA3213</title><style type="text/css">
</style></head><body>
<sync start="9209">


( clock ticking )


FRENCH LINE 1!

</sync>
<sync start="12312">

 

</sync>
<sync start="14848">


MAN:

When we think

of E equals m c-squared,


FRENCH LINE 2?

</sync>'''

print DFXPWriter().write(SAMIReader().read(sami))

Which will output the following:

::



<head>
<styling>
<style id="p" tts:color="#fff" tts:fontfamily="Arial" tts:fontsize="10pt" tts:textalign="center"/>
</styling>
</head>
<body>


FRENCH LINE 1!



FRENCH LINE 2?





( clock ticking )



MAN:

When we think

of E equals m c-squared,



</body>


Extensibility
-------------

Different readers and writers are easy to add if you would like to: -
Read/Write a previously unsupported format - Read/Write a supported
format in a different way (more styling?)

Simply follow the format of a current Reader or Writer, and edit to your
heart's desire.

SAMI Reader / Writer :: `spec <http: msdn.microsoft.com="" en-us="" library="" ms971327.aspx="">`__
----------------------------------------------------------------------------------------

Microsoft Synchronized Accessible Media Interchange. Supports multiple
languages.

Supported Styling: - text-align - italics - font-size - font-family -
color

If the SAMI file is not valid XML (e.g. unclosed tags), will still
attempt to read it.

DFXP/TTML Reader / Writer :: `spec <http: www.w3.org="" tr="" ttaf1-dfxp=""/>`__
-------------------------------------------------------------------

The W3 standard. Supports multiple languages.

Supported Styling: - text-align - italics - font-size - font-family -
color

SRT Reader / Writer :: `spec <http: matroska.org="" technical="" specs="" subtitles="" srt.html="">`__
----------------------------------------------------------------------------------------

SubRip captions. If given multiple languages to write, will output all
joined together by a 'MULTI-LANGUAGE SRT' line.

Supported Styling: - None

Assumes input language is english. To change:

::

pycaps = SRTReader().read(srt_content, lang='fr')

SCC Reader :: `spec <http: www.theneitherworld.com="" mcpoodle="" scc_tools="" docs="" scc_format.html="">`__
-----------------------------------------------------------------------------------------------

Scenarist Closed Caption format. Assumes Channel 1 input.

Supported Styling: - italics

By default, the SCC Reader does not simulate roll-up captions. To enable
roll-ups:

::

pycaps = SCCReader().read(scc_content, simulate_roll_up=True)

Also, assumes input language is english. To change:

::

pycaps = SCCReader().read(scc_content, lang='fr')

Now has the option of specifying an offset (measured in seconds) for the
timestamp. For example, if the SCC file is 45 seconds ahead of the
video:

::

pycaps = SCCReader().read(scc_content, offset=45)

The SCC Reader handles both dropframe and non-dropframe captions, and
will auto-detect which format the captions are in.

Transcript Writer
-----------------

Text stripped of styling, arranged in sentences.

Supported Styling: - None

The transcript writer uses natural sentence boundary detection
algorithms to create the transcript.

WebVTT Reader / Writer `spec <http: dev.w3.org="" html5="" webvtt=""/>`__
-----------------------------------------------------------------

Web Video Text Tracks format.

Supported Styling - None (yet)


License
-------

This module is Copyright 2012 PBS.org and is available under the `Apache
License, Version 2.0 <http: www.apache.org="" licenses="" license-2.0="">`__.

.. |Build Status| image:: https://travis-ci.org/pbs/pycaption.png?branch=master
:target: https://travis-ci.org/pbs/pycaption  
File Type Py Version Uploaded on Size
pycaption-0.4.6.tar.gz (md5) Source 2015-07-03 182KB
  • Downloads (All Versions):
  • 355 downloads in the last day
  • 1500 downloads in the last week
  • 5003 downloads in the last month