InstructorEmbeddings

1.0.1 · active · verified Wed Apr 15

InstructorEmbeddings is a Python library that simplifies the generation of high-quality text embeddings using the INSTRUCTOR family of models. It's built upon `sentence-transformers` and Hugging Face `transformers`, providing an easy-to-use interface to leverage instruction-tuned embeddings. Currently at version 1.0.1, it has an active development status with releases as new models or features are integrated.

Warnings

Install

Imports

Quickstart

Initialize an INSTRUCTOR model, provide a suitable instruction, and encode text to generate embeddings. The model will be downloaded on first use.

from InstructorEmbedding import INSTRUCTOR

# Initialize the model (downloads 'hkunlp/instructor-xl' on first use)
model = INSTRUCTOR('hkunlp/instructor-xl')

# Define the instruction and the sentence to embed
instruction = "Represent the document for retrieval:"
sentence = "This is a document about machine learning."

# Generate embedding (can also take a list of sentences)
# For optimal performance, ensure a relevant instruction is provided.
embeddings = model.encode([[instruction, sentence]])

print(f"Embedding shape: {embeddings.shape}")
print(f"First 5 dimensions: {embeddings[0,:5]}")

view raw JSON →