Model2Vec

0.8.1 · active · verified Mon Apr 13

Model2Vec is a Python library designed for training and using state-of-the-art static embeddings for various NLP tasks like classification, clustering, and semantic search. Built on top of Hugging Face's `transformers` library, it aims for fast and efficient embedding generation. The current version is 0.8.1, and it maintains an active release cadence with updates typically occurring monthly or bi-monthly.

Warnings

Install

Imports

Quickstart

Initialize a Model2Vec instance with a pre-trained model from Hugging Face Hub and use it to encode a list of sentences into embeddings.

from model2vec import Model2Vec
import os

# Load a pre-trained model. Specify 'device' for CPU/GPU.
# Example uses a dummy device for quickstart portability.
model = Model2Vec("minishlab/m2v_base", device=os.environ.get('MODEL2VEC_DEVICE', 'cpu'))

# Get embeddings for some text
sentences = [
    "This is a test sentence for model2vec.",
    "Another example sentence to demonstrate embedding."
]
embeddings = model.encode(sentences)

print(f"Embeddings shape: {embeddings.shape}")
# Expected output for base model: Embeddings shape: (2, 768)

view raw JSON →