Mooncake Transfer Engine

0.3.10.post1 · active · verified Thu Apr 16

Mooncake Transfer Engine is a Python binding (using pybind11) for the core data transfer component of the Mooncake project. Mooncake itself is a KVCache-centric disaggregated architecture designed to optimize Large Language Model (LLM) inference. The Transfer Engine provides a high-performance, unified interface for batched data movement across various storage devices and network links, supporting protocols like TCP, RDMA, CXL/shared-memory, and NVMe over Fabric. It is actively maintained with frequent updates and integrations into LLM serving frameworks like SGLang and vLLM.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates the basic initialization of the Mooncake Transfer Engine. It creates an instance of `TransferEngine` and initializes it with network configuration, then allocates a NumPy buffer. In a real distributed setting, you would typically run a receiver and a sender, with `METADATA_SERVER` pointing to an actual coordination service like etcd. The `protocol` should be set to 'rdma' for high-performance RDMA-capable networks.

import numpy as np
import os
# In a real distributed setup, a metadata server (e.g., etcd) would be used.
# For a simple local demo, 'P2PHANDSHAKE' can be used.
METADATA_SERVER = os.environ.get('MC_METADATA_SERVER', 'P2PHANDSHAKE')
LOCAL_HOSTNAME = os.environ.get('MC_LOCAL_HOSTNAME', '127.0.0.1:12345')
PROTOCOL = os.environ.get('MC_PROTOCOL', 'tcp') # Use 'rdma' for RDMA-capable networks
DEVICE_NAME = os.environ.get('MC_DEVICE_NAME', '') # Auto discovery if empty

try:
    from mooncake.engine import TransferEngine

    # Create transfer engine instance
    engine = TransferEngine()

    # Initialize with basic configuration
    # In a real scenario, local_hostname would be the actual server IP/port
    # and metadata_server would point to the etcd cluster or similar.
    engine.initialize(
        LOCAL_HOSTNAME,
        METADATA_SERVER,
        PROTOCOL,
        DEVICE_NAME
    )

    # Allocate and initialize a buffer (e.g., 1MB)
    # Note: For GPU memory, specific allocation methods/context would be needed.
    client_buffer = np.zeros(1024 * 1024, dtype=np.uint8)
    buffer_address = client_buffer.ctypes.data
    buffer_length = client_buffer.nbytes

    print(f"TransferEngine initialized on {LOCAL_HOSTNAME} with {PROTOCOL} protocol.")
    print(f"Allocated buffer at address: {buffer_address}, length: {buffer_length} bytes.")

    # Example: Register memory (optional, depending on protocol/usage)
    # engine.register_memory(buffer_address, buffer_length)

    # In a full setup, you would then perform transfer operations
    # e.g., engine.transfer_sync_write(target_hostname, buffer_address, peer_buffer_address, buffer_length)

    print("Mooncake Transfer Engine basic setup successful (no actual transfer performed).")

except ImportError:
    print("mooncake-transfer-engine not installed or could not be imported.")
    print("Please ensure you installed the correct version for your CUDA environment.")
except Exception as e:
    print(f"An error occurred: {e}")

view raw JSON →