PyTorch Pretrained BERT
raw JSON → 0.6.2 verified Fri May 01 auth: no python deprecated
PyTorch version of Google AI's BERT model with a script to load Google pre-trained models. This library (version 0.6.2) was the predecessor to the 'transformers' library by Hugging Face, which now includes BERT and many other models. It is deprecated and no longer maintained; all users should migrate to the 'transformers' package for active development, bug fixes, and better performance.
pip install pytorch-pretrained-bert==0.6.2 Common errors
error AttributeError: 'BertModel' object has no attribute 'from_pretrained' ↓
cause Importing the class directly from the submodule, e.g., `from pytorch_pretrained_bert.model import BertModel`.
fix
Use correct import:
from pytorch_pretrained_bert import BertModel. error ModuleNotFoundError: No module named 'pytorch_pretrained_bert' ↓
cause The library is not installed or pip install failed.
fix
Run
pip install pytorch-pretrained-bert==0.6.2. If you are offline, download the wheel from PyPI. error ImportError: cannot import name 'BertTokenizer' from 'pytorch_pretrained_bert' ↓
cause Corrupted installation or version mismatch.
fix
Reinstall:
pip uninstall pytorch-pretrained-bert && pip install pytorch-pretrained-bert==0.6.2. Warnings
deprecated pytorch-pretrained-bert is deprecated and no longer maintained. All models have been merged into the 'transformers' library. Use 'transformers' for latest features and security fixes. ↓
fix Run: pip install transformers. Then replace imports: from transformers import BertTokenizer, BertModel.
breaking The API for model output changed. In pytorch-pretrained-bert, model() returns a tuple. In 'transformers', it returns a ModelOutput object. Access last hidden state via outputs.last_hidden_state. ↓
fix Use outputs.last_hidden_state or outputs[0] consistently across both libs.
gotcha Tokenizer.from_pretrained() downloads files from S3. If you have network issues, it fails silently. Pre-download or use environment variable BERT_CACHE_DIR. ↓
fix Set BERT_CACHE_DIR to a local path, e.g., import os; os.environ['BERT_CACHE_DIR'] = './cache'.
Install
pip install transformers Imports
- BertModel wrong
from pytorch_pretrained_bert.model import BertModelcorrectfrom pytorch_pretrained_bert import BertModel - BertTokenizer wrong
from pytorch_pretrained_bert.tokenization import BertTokenizercorrectfrom pytorch_pretrained_bert import BertTokenizer
Quickstart
from pytorch_pretrained_bert import BertTokenizer, BertModel
import torch
# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Encode text
text = "Who was Jim Henson?"
tokenized_text = tokenizer.tokenize(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
# Convert to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
# Load pre-trained model
model = BertModel.from_pretrained('bert-base-uncased')
model.eval()
# Predict hidden states features
with torch.no_grad():
outputs = model(tokens_tensor)
print(outputs[0].shape) # (batch_size, seq_len, hidden_size)