MLX Language Models

0.31.2 · active · verified Sat Apr 11

mlx-lm provides tools for loading, fine-tuning, and generating text with Large Language Models (LLMs) on Apple Silicon, leveraging the MLX framework. It offers seamless integration with the Hugging Face Hub for model access. The library is actively developed, with frequent patch releases, currently at version 0.31.2.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load an MLX-optimized LLM and tokenizer from the Hugging Face Hub and use it for streaming text generation. Ensure you have an Apple Silicon device for optimal performance.

import mlx_lm as lm

# Load a model and its tokenizer from Hugging Face Hub (MLX community variants are optimized)
# Replace 'mlx-community/Phi-3-mini-4k-instruct-8bit' with your desired model
model, tokenizer = lm.load("mlx-community/Phi-3-mini-4k-instruct-8bit")

# Define a prompt for text generation
prompt_text = "Write a short story about a cat who learns to fly:"

# Generate text
response_stream = lm.generate(
    model=model,
    tokenizer=tokenizer,
    prompt=prompt_text,
    verbose=False, # Set to True for detailed generation info
    temp=0.7,
    max_tokens=200,
    stream=True # Stream tokens as they are generated
)

print("Generated text:")
for token in response_stream:
    print(token, end="")
print()

view raw JSON →