WebVTT Python Library
webvtt-py is a Python library (current version 0.5.1) for reading, writing, converting, and segmenting WebVTT caption files. It is actively maintained with regular releases, often several times a year, addressing new features, bug fixes, and Python compatibility.
Warnings
- breaking Python 3.4, 3.5, and 3.6 are no longer supported since version 0.5.0. The library now requires Python 3.7 or newer.
- deprecated The `webvtt.read_buffer()` method was deprecated in version 0.5.0.
- gotcha The `caption.text` attribute returns the cue text with HTML/WebVTT tags (like `<c.classname>`) removed. To retrieve the original text including all tags, use the `caption.raw_text` attribute. This behavior was introduced in version 0.3.3.
- gotcha Since version 0.5.0, the parser is no longer strict and ignores malformed blocks. This means that invalid WebVTT syntax that might have caused errors in previous versions will now be silently skipped or processed with best effort.
- gotcha Prior to version 0.5.1, when converting and saving to SRT format, cue tags (e.g., `<c.colorE5E5E5>`) were not removed, which is typically not desired for SRT.
Install
-
pip install webvtt-py
Imports
- webvtt
import webvtt
Quickstart
import webvtt
import os
# Create a dummy VTT file for demonstration
dummy_vtt_content = """
WEBVTT
1
00:00:00.000 --> 00:00:03.000
Hello, world!
2
00:00:04.000 --> 00:00:07.000
This is a test caption.
"""
with open("example.vtt", "w", encoding="utf-8") as f:
f.write(dummy_vtt_content)
# Read a WebVTT file
vtt = webvtt.read('example.vtt')
print("--- Captions from example.vtt ---")
for caption in vtt:
print(f"[{caption.start} --> {caption.end}] {caption.text}")
# Create a new WebVTT object and add captions programmatically
new_vtt = webvtt.WebVTT()
new_vtt.add(webvtt.Caption(start='00:00:01.000', end='00:00:05.000', text='First dynamic caption.'))
new_vtt.add(webvtt.Caption(start='00:00:06.000', end='00:00:10.000', text='Second dynamic caption.'))
# Save the new WebVTT object to a file
new_vtt.save('output.vtt')
print("\nGenerated output.vtt with 2 captions.")
# Clean up dummy files
os.remove("example.vtt")
os.remove("output.vtt")
print("Cleaned up example.vtt and output.vtt")