pgmpy: Probabilistic Graphical Models
pgmpy is a Python library for working with Probabilistic Graphical Models (PGMs). It provides implementations of various models, inference algorithms, and learning algorithms for Bayesian Networks, Markov Networks, and other causal and probabilistic reasoning tasks. The current version is 1.1.0, with minor releases occurring periodically and major versions like 1.0.0 introducing significant breaking changes.
Common errors
-
AttributeError: module 'pgmpy.models' has no attribute 'BayesianModel'
cause Attempting to import or use the `BayesianModel` class in pgmpy versions 1.0.0 or later.fixThe `BayesianModel` class was renamed to `BayesianNetwork`. Update your code to `from pgmpy.models import BayesianNetwork`. -
ValueError: The sum of the probability values for Variable [variable_name] is not 1.
cause When defining a `TabularCPD`, the probabilities for each state combination of the parent variables must sum to 1. This error indicates a mathematical inconsistency in your CPD definition.fixReview the `values` array in your `TabularCPD` definition. Ensure that each column (representing a parent configuration) sums to 1. Double-check the order of variables and their cardinalities. -
ImportError: cannot import name 'VariableElimination' from 'pgmpy.inference'
cause The module path for exact inference algorithms changed. `VariableElimination` is no longer directly under `pgmpy.inference`.fixImport `VariableElimination` from its specific submodule: `from pgmpy.inference.exact import VariableElimination`.
Warnings
- breaking Major breaking changes were introduced in v1.0.0. Specifically, `BayesianModel` was renamed to `BayesianNetwork` and `MarkovModel` to `MarkovNetwork`. Code using the old class names will fail with an `AttributeError`.
- gotcha The order of evidence variables and their cardinality is critical when defining `TabularCPD`s. An incorrect order or mismatch between `evidence` and `evidence_card` will lead to incorrect probability distributions or errors.
- gotcha Inference on large or densely connected graphical models can be computationally expensive and memory intensive, especially for exact inference methods like Variable Elimination. Consider approximate inference for such cases.
Install
-
pip install pgmpy
Imports
- BayesianNetwork
from pgmpy.models import BayesianModel
from pgmpy.models import BayesianNetwork
- MarkovNetwork
from pgmpy.models import MarkovModel
from pgmpy.models import MarkovNetwork
- TabularCPD
from pgmpy.factors.discrete import TabularCPD
- VariableElimination
from pgmpy.inference import VariableElimination
from pgmpy.inference.exact import VariableElimination
Quickstart
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination
# 1. Define the network structure
# D: Difficulty (Easy, Hard), I: Intelligence (Low, High)
# G: Grade (A, B, C), L: Letter (Good, Bad), S: SAT (Low, High)
model = BayesianNetwork([('D', 'G'), ('I', 'G'), ('G', 'L'), ('I', 'S')])
# 2. Define Conditional Probability Distributions (CPDs)
cpd_d = TabularCPD(variable='D', variable_card=2, values=[[0.6], [0.4]])
cpd_i = TabularCPD(variable='I', variable_card=2, values=[[0.7], [0.3]])
# G depends on I and D. Order of evidence: I, D
cpd_g = TabularCPD(variable='G', variable_card=3,
values=[[0.3, 0.05, 0.9, 0.5],
[0.4, 0.25, 0.08, 0.3],
[0.3, 0.7, 0.02, 0.2]],
evidence=['I', 'D'], evidence_card=[2, 2])
cpd_l = TabularCPD(variable='L', variable_card=2,
values=[[0.1, 0.4, 0.99],
[0.9, 0.6, 0.01]],
evidence=['G'], evidence_card=[3])
cpd_s = TabularCPD(variable='S', variable_card=2,
values=[[0.95, 0.2],
[0.05, 0.8]],
evidence=['I'], evidence_card=[2])
# 3. Add CPDs to the model
model.add_cpds(cpd_d, cpd_i, cpd_g, cpd_l, cpd_s)
# 4. Check if the model is valid (optional but good practice)
assert model.check_model(), "Model is not valid!"
# 5. Perform inference
inference = VariableElimination(model)
# Query for P(L | D=0, I=1)
result = inference.query(variables=['L'], evidence={'D': 0, 'I': 1})
print("P(L | D=0, I=1):")
print(result)
# Query for P(G | S=0)
result_g_s = inference.query(variables=['G'], evidence={'S': 0})
print("\nP(G | S=0):")
print(result_g_s)