TensorFlow Serving API

2.19.1 · active · verified Thu Apr 09

The `tensorflow-serving-api` library provides the Python client API for interacting with TensorFlow Serving, a flexible, high-performance serving system for machine learning models. Designed for production environments, TensorFlow Serving facilitates model deployment, versioning, and management, exposing both gRPC and HTTP/REST inference endpoints. The Python API primarily focuses on client-side gRPC communication. The current version is 2.19.1, and its releases typically align with the main TensorFlow project's release cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to construct and send a gRPC prediction request to a running TensorFlow Serving instance using the `tensorflow-serving-api` library. It covers setting up the gRPC channel and stub, creating a `PredictRequest`, populating it with input data converted to `TensorProto` format, and handling the response. Ensure that a TensorFlow Serving server is running and accessible at the specified address and port, with your model loaded, before attempting to run this client code. Replace `your_model`, `input_tensor_name`, and `output_tensor_name` with your actual model's details.

import grpc
import numpy as np
from tensorflow_serving.apis import prediction_service_pb2_grpc, predict_pb2
from tensorflow.core.framework import tensor_pb2
from tensorflow.python.framework import dtypes # for tf.float32, etc. (usually for older TF, or explicit type)

# NOTE: This quickstart assumes a TensorFlow Serving server is already running.
# For example, via Docker: 
# docker run -p 8500:8500 --name tfserving_test \
#  --mount type=bind,source=/path/to/your/model,target=/models/your_model \
#  -e MODEL_NAME=your_model -t tensorflow/serving & 

# Configuration for your model server
SERVER_ADDRESS = 'localhost:8500'
MODEL_NAME = 'your_model'
SIGNATURE_NAME = 'serving_default'

def make_prediction(input_data: np.ndarray):
    """Sends a prediction request to TensorFlow Serving via gRPC."""
    channel = grpc.insecure_channel(SERVER_ADDRESS)
    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

    request = predict_pb2.PredictRequest()
    request.model_spec.name = MODEL_NAME
    request.model_spec.signature_name = SIGNATURE_NAME

    # Convert NumPy array to TensorProto
    tensor_proto = tensor_pb2.TensorProto()
    tensor_proto.CopyFrom(np.asarray(input_data).astype(np.float32)._to_proto())
    
    # TensorFlow 2.x often requires explicit type, e.g., using a dtypes enum
    # tensor_proto.dtype = dtypes.float32.as_datatype_enum # Example for explicit type
    
    request.inputs['input_tensor_name'].CopyFrom(tensor_proto) # Replace 'input_tensor_name' with your model's input name

    try:
        response = stub.Predict(request, 10.0) # 10-second timeout
        print("Prediction successful:")
        # Process response.outputs to get results
        # Example: print(response.outputs['output_tensor_name'])
        for key, val in response.outputs.items():
            print(f"  Output '{key}': {np.array(val.float_val) if val.float_val else val}")

    except grpc.RpcError as e:
        print(f"Error making prediction: {e.code()} - {e.details()}")

if __name__ == '__main__':
    # Example usage: a simple 1x5 float array as input
    sample_input = np.array([[1.0, 2.0, 3.0, 4.0, 5.0]])
    make_prediction(sample_input)

view raw JSON →