Google Cloud Video Intelligence
The Google Cloud Video Intelligence API Python client library (current version 2.19.0) enables developers to analyze video content by detecting objects, scenes, activities, and transcribing speech. It provides capabilities to extract metadata, such as labels, shot changes, explicit content, and more, from videos stored in Google Cloud Storage or provided as data bytes. The library is actively maintained with frequent updates as part of the larger `google-cloud-python` ecosystem.
Warnings
- gotcha Authentication is critical. Ensure your environment is correctly authenticated, typically via Application Default Credentials. For local development, `gcloud auth application-default login` is recommended, or explicitly setting the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to a service account key file path.
- gotcha Most video annotation operations are asynchronous and return a `google.api_core.operation.Operation` object. You must explicitly call `.result()` on this operation object and wait for its completion to retrieve the actual API response. Failure to do so will result in an `Operation` object, not the annotation results.
- gotcha The Video Intelligence API has different versions (e.g., `v1`, `v1p1beta1`). Ensure you import the correct version (e.g., `videointelligence_v1`) and use features available in that specific version. Beta features may not be stable or present in the stable API.
- gotcha For features like `LABEL_DETECTION` and `SHOT_CHANGE_DETECTION`, you can specify different underlying models (e.g., `builtin/stable`, `builtin/latest`). Google may update or deprecate these models, which could lead to changes in detection results over time if not explicitly pinned or monitored.
- gotcha The library's logging events (when enabled via `GOOGLE_SDK_PYTHON_LOGGING_SCOPE`) may contain sensitive information. Google may also refine the occurrence, level, and content of log messages without flagging such changes as breaking. Do not depend on the immutability of logging events or store sensitive data in logs without proper access restrictions.
Install
-
pip install google-cloud-videointelligence
Imports
- VideoIntelligenceServiceClient
from google.cloud import videointelligence_v1 as videointelligence
- Feature
from google.cloud.videointelligence_v1 import Feature
- LabelDetectionConfig
from google.cloud.videointelligence_v1 import LabelDetectionConfig
- LabelDetectionMode
from google.cloud.videointelligence_v1 import LabelDetectionMode
Quickstart
import os
from google.cloud import videointelligence_v1 as videointelligence
# Set GOOGLE_APPLICATION_CREDENTIALS environment variable or ensure gcloud is authenticated.
# For local development, run `gcloud auth application-default login`.
# os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/your/key.json'
def analyze_video_labels(gcs_uri):
"""Detects labels in the video specified by the GCS URI."""
client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.LABEL_DETECTION]
# Optional: Configure label detection mode for more granular control
config = videointelligence.LabelDetectionConfig(
label_detection_mode=videointelligence.LabelDetectionMode.SHOT_AND_FRAME_MODE,
stationary_camera=False # Set to True if analyzing footage from a stationary camera
)
video_context = videointelligence.VideoContext(label_detection_config=config)
print(f'Processing video for label annotations: {gcs_uri}')
operation = client.annotate_video(
request={
"input_uri": gcs_uri,
"features": features,
"video_context": video_context
}
)
# Long-running operations must be waited for.
print('\nWaiting for operation to complete...')
result = operation.result(timeout=600) # Adjust timeout as needed (in seconds)
print('\nFinished processing.')
# First result is retrieved because a single video is processed
annotation_result = result.annotation_results[0]
for i, shot_label in enumerate(annotation_result.shot_label_annotations):
print(f'Video shot label: {shot_label.entity.description} ({shot_label.entity.entity_id})')
for segment in shot_label.segments:
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
print(f'\tSegment: {start_time:.1f}s to {end_time:.1f}s (confidence: {segment.confidence:.2f})')
for i, frame_label in enumerate(annotation_result.frame_label_annotations):
print(f'Video frame label: {frame_label.entity.description} ({frame_label.entity.entity_id})')
for frame in frame_label.frames:
time_offset = (frame.time_offset.seconds +
frame.time_offset.nanos / 1e9)
print(f'\tFrame: {time_offset:.1f}s (confidence: {frame.confidence:.2f})')
if __name__ == '__main__':
# Replace with your GCS video URI
# Public sample video from Google Cloud documentation
video_uri = "gs://cloud-samples-data/video/chicago.mp4"
analyze_video_labels(video_uri)