TensorFlow Serving API
The `tensorflow-serving-api` library provides the Python client API for interacting with TensorFlow Serving, a flexible, high-performance serving system for machine learning models. Designed for production environments, TensorFlow Serving facilitates model deployment, versioning, and management, exposing both gRPC and HTTP/REST inference endpoints. The Python API primarily focuses on client-side gRPC communication. The current version is 2.19.1, and its releases typically align with the main TensorFlow project's release cadence.
Warnings
- breaking Version compatibility between `tensorflow-serving-api` and the `tensorflow_model_server` binary is crucial. Mismatched versions can lead to protobuf deserialization errors or other unexpected client/server communication failures.
- gotcha This library (`tensorflow-serving-api`) provides only the *client* API. It does not include the `tensorflow_model_server` itself, which is the actual server component that loads and serves your models. The server must be installed and run separately (typically via Docker or `apt-get`).
- gotcha Models must be exported in the TensorFlow `SavedModel` format and stored in a versioned directory structure (e.g., `model_name/1/`, `model_name/2/`) for the `tensorflow_model_server` to load them correctly. Incorrect directory structure is a common source of 'Model not found' or 'No versions of servable found' errors.
- gotcha When sending gRPC requests, input data (e.g., NumPy arrays) must be correctly converted into Protobuf `TensorProto` format. Incorrect type mapping or shape can lead to prediction errors on the server side.
- gotcha The documentation specifically for the `tensorflow-serving-api` Python client can be sparse. Many guides focus on the server setup or REST API, requiring users of the gRPC Python client to infer usage from examples or C++ API definitions.
Install
-
pip install tensorflow-serving-api
Imports
- grpc
import grpc
- prediction_service_pb2_grpc
from tensorflow_serving.apis import prediction_service_pb2_grpc
- predict_pb2
from tensorflow_serving.apis import predict_pb2
- get_model_status_pb2
from tensorflow_serving.apis import get_model_status_pb2
- model_service_pb2_grpc
from tensorflow_serving.apis import model_service_pb2_grpc
- TensorProto
from tensorflow.core.framework import tensor_pb2
Quickstart
import grpc
import numpy as np
from tensorflow_serving.apis import prediction_service_pb2_grpc, predict_pb2
from tensorflow.core.framework import tensor_pb2
from tensorflow.python.framework import dtypes # for tf.float32, etc. (usually for older TF, or explicit type)
# NOTE: This quickstart assumes a TensorFlow Serving server is already running.
# For example, via Docker:
# docker run -p 8500:8500 --name tfserving_test \
# --mount type=bind,source=/path/to/your/model,target=/models/your_model \
# -e MODEL_NAME=your_model -t tensorflow/serving &
# Configuration for your model server
SERVER_ADDRESS = 'localhost:8500'
MODEL_NAME = 'your_model'
SIGNATURE_NAME = 'serving_default'
def make_prediction(input_data: np.ndarray):
"""Sends a prediction request to TensorFlow Serving via gRPC."""
channel = grpc.insecure_channel(SERVER_ADDRESS)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = MODEL_NAME
request.model_spec.signature_name = SIGNATURE_NAME
# Convert NumPy array to TensorProto
tensor_proto = tensor_pb2.TensorProto()
tensor_proto.CopyFrom(np.asarray(input_data).astype(np.float32)._to_proto())
# TensorFlow 2.x often requires explicit type, e.g., using a dtypes enum
# tensor_proto.dtype = dtypes.float32.as_datatype_enum # Example for explicit type
request.inputs['input_tensor_name'].CopyFrom(tensor_proto) # Replace 'input_tensor_name' with your model's input name
try:
response = stub.Predict(request, 10.0) # 10-second timeout
print("Prediction successful:")
# Process response.outputs to get results
# Example: print(response.outputs['output_tensor_name'])
for key, val in response.outputs.items():
print(f" Output '{key}': {np.array(val.float_val) if val.float_val else val}")
except grpc.RpcError as e:
print(f"Error making prediction: {e.code()} - {e.details()}")
if __name__ == '__main__':
# Example usage: a simple 1x5 float array as input
sample_input = np.array([[1.0, 2.0, 3.0, 4.0, 5.0]])
make_prediction(sample_input)