{"id":1742,"library":"tensorflow-serving-api","title":"TensorFlow Serving API","description":"The `tensorflow-serving-api` library provides the Python client API for interacting with TensorFlow Serving, a flexible, high-performance serving system for machine learning models. Designed for production environments, TensorFlow Serving facilitates model deployment, versioning, and management, exposing both gRPC and HTTP/REST inference endpoints. The Python API primarily focuses on client-side gRPC communication. The current version is 2.19.1, and its releases typically align with the main TensorFlow project's release cadence.","status":"active","version":"2.19.1","language":"en","source_language":"en","source_url":"https://github.com/tensorflow/serving","tags":["tensorflow","serving","machine-learning","grpc","rest-api","model-deployment","client","ai-inference"],"install":[{"cmd":"pip install tensorflow-serving-api","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Required for gRPC communication with the TensorFlow Serving server.","package":"grpcio","optional":false},{"reason":"Required for serializing and deserializing data for gRPC requests.","package":"protobuf","optional":false},{"reason":"Necessary for creating and exporting models in the SavedModel format, which TensorFlow Serving consumes. Not a direct dependency of `tensorflow-serving-api`, but essential for the ecosystem.","package":"tensorflow","optional":true}],"imports":[{"symbol":"grpc","correct":"import grpc"},{"symbol":"prediction_service_pb2_grpc","correct":"from tensorflow_serving.apis import prediction_service_pb2_grpc"},{"symbol":"predict_pb2","correct":"from tensorflow_serving.apis import predict_pb2"},{"symbol":"get_model_status_pb2","correct":"from tensorflow_serving.apis import get_model_status_pb2"},{"symbol":"model_service_pb2_grpc","correct":"from tensorflow_serving.apis import model_service_pb2_grpc"},{"note":"TensorProto is part of the core TensorFlow framework, not directly within tensorflow_serving.apis.","wrong":"from tensorflow_serving.apis.tensor_pb2 import TensorProto","symbol":"TensorProto","correct":"from tensorflow.core.framework import tensor_pb2"}],"quickstart":{"code":"import grpc\nimport numpy as np\nfrom tensorflow_serving.apis import prediction_service_pb2_grpc, predict_pb2\nfrom tensorflow.core.framework import tensor_pb2\nfrom tensorflow.python.framework import dtypes # for tf.float32, etc. (usually for older TF, or explicit type)\n\n# NOTE: This quickstart assumes a TensorFlow Serving server is already running.\n# For example, via Docker: \n# docker run -p 8500:8500 --name tfserving_test \\\n#  --mount type=bind,source=/path/to/your/model,target=/models/your_model \\\n#  -e MODEL_NAME=your_model -t tensorflow/serving & \n\n# Configuration for your model server\nSERVER_ADDRESS = 'localhost:8500'\nMODEL_NAME = 'your_model'\nSIGNATURE_NAME = 'serving_default'\n\ndef make_prediction(input_data: np.ndarray):\n    \"\"\"Sends a prediction request to TensorFlow Serving via gRPC.\"\"\"\n    channel = grpc.insecure_channel(SERVER_ADDRESS)\n    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)\n\n    request = predict_pb2.PredictRequest()\n    request.model_spec.name = MODEL_NAME\n    request.model_spec.signature_name = SIGNATURE_NAME\n\n    # Convert NumPy array to TensorProto\n    tensor_proto = tensor_pb2.TensorProto()\n    tensor_proto.CopyFrom(np.asarray(input_data).astype(np.float32)._to_proto())\n    \n    # TensorFlow 2.x often requires explicit type, e.g., using a dtypes enum\n    # tensor_proto.dtype = dtypes.float32.as_datatype_enum # Example for explicit type\n    \n    request.inputs['input_tensor_name'].CopyFrom(tensor_proto) # Replace 'input_tensor_name' with your model's input name\n\n    try:\n        response = stub.Predict(request, 10.0) # 10-second timeout\n        print(\"Prediction successful:\")\n        # Process response.outputs to get results\n        # Example: print(response.outputs['output_tensor_name'])\n        for key, val in response.outputs.items():\n            print(f\"  Output '{key}': {np.array(val.float_val) if val.float_val else val}\")\n\n    except grpc.RpcError as e:\n        print(f\"Error making prediction: {e.code()} - {e.details()}\")\n\nif __name__ == '__main__':\n    # Example usage: a simple 1x5 float array as input\n    sample_input = np.array([[1.0, 2.0, 3.0, 4.0, 5.0]])\n    make_prediction(sample_input)\n","lang":"python","description":"This quickstart demonstrates how to construct and send a gRPC prediction request to a running TensorFlow Serving instance using the `tensorflow-serving-api` library. It covers setting up the gRPC channel and stub, creating a `PredictRequest`, populating it with input data converted to `TensorProto` format, and handling the response. Ensure that a TensorFlow Serving server is running and accessible at the specified address and port, with your model loaded, before attempting to run this client code. Replace `your_model`, `input_tensor_name`, and `output_tensor_name` with your actual model's details."},"warnings":[{"fix":"Always align the `tensorflow-serving-api` Python package version with the version of the `tensorflow_model_server` (e.g., from Docker image `tensorflow/serving:2.19.1`). Consult the official TensorFlow Serving GitHub releases for version information.","message":"Version compatibility between `tensorflow-serving-api` and the `tensorflow_model_server` binary is crucial. Mismatched versions can lead to protobuf deserialization errors or other unexpected client/server communication failures.","severity":"breaking","affected_versions":"All versions"},{"fix":"Ensure `tensorflow_model_server` is running and accessible before attempting to connect with this API. For example, using Docker: `docker run -p 8500:8500 --name tfserving_test -v \"$(pwd)/my_model_dir:/models/my_model\" -e MODEL_NAME=my_model -t tensorflow/serving`.","message":"This library (`tensorflow-serving-api`) provides only the *client* API. It does not include the `tensorflow_model_server` itself, which is the actual server component that loads and serves your models. The server must be installed and run separately (typically via Docker or `apt-get`).","severity":"gotcha","affected_versions":"All versions"},{"fix":"When saving your model, place it in a subdirectory named with an integer version number (e.g., `tf.saved_model.save(model, '/path/to/models/my_model/1')`). The server automatically picks up the highest version.","message":"Models must be exported in the TensorFlow `SavedModel` format and stored in a versioned directory structure (e.g., `model_name/1/`, `model_name/2/`) for the `tensorflow_model_server` to load them correctly. Incorrect directory structure is a common source of 'Model not found' or 'No versions of servable found' errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use `tensorflow.python.framework.tensor_util.make_tensor_proto` or `np.ndarray._to_proto()` with appropriate data types (e.g., `np.float32`) to ensure compatibility with your model's input signature.","message":"When sending gRPC requests, input data (e.g., NumPy arrays) must be correctly converted into Protobuf `TensorProto` format. Incorrect type mapping or shape can lead to prediction errors on the server side.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Refer to official TensorFlow Serving examples on GitHub (e.g., `tensorflow/serving/tensorflow_serving/example`) and community blogs for comprehensive usage patterns, particularly for advanced scenarios or specific data types.","message":"The documentation specifically for the `tensorflow-serving-api` Python client can be sparse. Many guides focus on the server setup or REST API, requiring users of the gRPC Python client to infer usage from examples or C++ API definitions.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}