PyTorch Pretrained BERT

0.6.2 verified Fri May 01 auth: no python deprecated

PyTorch version of Google AI's BERT model with a script to load Google pre-trained models. This library (version 0.6.2) was the predecessor to the 'transformers' library by Hugging Face, which now includes BERT and many other models. It is deprecated and no longer maintained; all users should migrate to the 'transformers' package for active development, bug fixes, and better performance.

pip install pytorch-pretrained-bert==0.6.2

Common errors

error AttributeError: 'BertModel' object has no attribute 'from_pretrained' ↓

cause Importing the class directly from the submodule, e.g., `from pytorch_pretrained_bert.model import BertModel`.

fix

Use correct import: from pytorch_pretrained_bert import BertModel.

error ModuleNotFoundError: No module named 'pytorch_pretrained_bert' ↓

cause The library is not installed or pip install failed.

fix

Run pip install pytorch-pretrained-bert==0.6.2. If you are offline, download the wheel from PyPI.

error ImportError: cannot import name 'BertTokenizer' from 'pytorch_pretrained_bert' ↓

cause Corrupted installation or version mismatch.

fix

Reinstall: pip uninstall pytorch-pretrained-bert && pip install pytorch-pretrained-bert==0.6.2.

Warnings

deprecated pytorch-pretrained-bert is deprecated and no longer maintained. All models have been merged into the 'transformers' library. Use 'transformers' for latest features and security fixes. ↓

fix Run: pip install transformers. Then replace imports: from transformers import BertTokenizer, BertModel.

breaking The API for model output changed. In pytorch-pretrained-bert, model() returns a tuple. In 'transformers', it returns a ModelOutput object. Access last hidden state via outputs.last_hidden_state. ↓

fix Use outputs.last_hidden_state or outputs[0] consistently across both libs.

gotcha Tokenizer.from_pretrained() downloads files from S3. If you have network issues, it fails silently. Pre-download or use environment variable BERT_CACHE_DIR. ↓

fix Set BERT_CACHE_DIR to a local path, e.g., import os; os.environ['BERT_CACHE_DIR'] = './cache'.

Install

pip install transformers

Imports

BertModel

wrong

from pytorch_pretrained_bert.model import BertModel

correct

from pytorch_pretrained_bert import BertModel

Correct import is from pytorch_pretrained_bert directly.

BertTokenizer

wrong

from pytorch_pretrained_bert.tokenization import BertTokenizer

correct

from pytorch_pretrained_bert import BertTokenizer

Common mistake: trying to import from submodules.

Quickstart

Minimal example: tokenize input, load BERT, and get encoder output.

from pytorch_pretrained_bert import BertTokenizer, BertModel
import torch

# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Encode text
text = "Who was Jim Henson?"
tokenized_text = tokenizer.tokenize(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)

# Convert to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])

# Load pre-trained model
model = BertModel.from_pretrained('bert-base-uncased')
model.eval()

# Predict hidden states features
with torch.no_grad():
    outputs = model(tokens_tensor)
print(outputs[0].shape)  # (batch_size, seq_len, hidden_size)