Drain3 Log Template Miner
Drain3 is a Python library for mining log templates from raw log messages, designed for stream processing. It's based on the Drain algorithm and is suitable for real-time log analysis. The library is actively maintained with frequent patch releases, currently at version 0.9.11.
Warnings
- breaking Version 0.9.0 introduced significant breaking changes, including a major refactor of the API. The `LogViewer` module was removed, and `TemplateMinerConfig` was refactored from a static method `load()` to a class that can be instantiated and configured. Imports for `Drain3` and `TemplateMinerConfig` have also changed.
- gotcha Drain3 does not automatically persist its state (the learned log templates) by default. If your application restarts without explicit state saving and loading, it will lose all learned templates and start learning from scratch, leading to reprocessing and inconsistent cluster IDs.
- gotcha The quality of log templates heavily depends on the `TemplateMinerConfig` parameters, especially `drain_sim_th` (similarity threshold) and `depth`. Incorrect settings can lead to overly generic or too specific templates, reducing the effectiveness of log parsing.
Install
-
pip install drain3
Imports
- Drain3
from drain3 import Drain3
- TemplateMinerConfig
from drain3.template_miner_config import TemplateMinerConfig
Quickstart
from drain3 import Drain3
from drain3.template_miner_config import TemplateMinerConfig
import os
# Configure Drain3. For production, consider loading from a file or Redis.
# Example for file persistence:
# config = TemplateMinerConfig.load('drain3.ini')
# Ensure 'persist_state_to_file' and 'state_file_path' are set in config.
config = TemplateMinerConfig()
config.load_default_config()
config.drain_sim_th = 0.4
config.depth = 4
# If you want to use file persistence (ensure directory exists and is writable):
# config.persistence_type = 'FILE'
# config.file_persistence_path = os.path.join(os.getcwd(), 'drain3_state.bin')
drain = Drain3(config)
log_messages = [
"081109 203619 143 INFO dfs.DataNode$PacketResponder: PacketResponder "
"0 for block blk_3886504917409280145 terminating",
"081109 203619 369 INFO dfs.DataNode$PacketResponder: PacketResponder "
"0 for block blk_-6755409170280820986 terminating",
"081109 203620 357 INFO dfs.DataNode$PacketResponder: PacketResponder "
"2 for block blk_814013142207908518 terminating",
"081109 203620 543 INFO dfs.DataNode$DataXceiver: Receiving block blk_-6755409170280820986 "
"src: /10.250.9.141:50106 dest: /10.250.9.141:50010",
]
for log_message in log_messages:
cluster_id = drain.add_log_message(log_message)
print(f"Log: '{log_message}' -> Cluster ID: {cluster_id}")
# After processing, it's good practice to save the state if using persistence
# if config.persistence_type == 'FILE':
# drain.save_state()
print("\n--- Current Clusters ---")
for cluster in drain.drain.clusters:
print(cluster)