DirtyJSON Python Decoder
dirtyjson is a Python JSON decoder (version 1.0.8, last updated Nov 2022) designed to extract data from malformed or 'dirty' JSON, often found embedded in JavaScript files. It tolerates common non-standard JSON elements like single quotes, comments (line and block), dangling commas, unquoted keys, and hexadecimal/octal numbers. It focuses solely on decoding and provides line/column number contexts for parsed elements. While its release cadence is infrequent, it remains active for specialized parsing needs.
Warnings
- gotcha dirtyjson is a decoder only; it does not provide encoding capabilities (dump/dumps). For writing JSON, use Python's standard `json` library or `simplejson`.
- gotcha Due to its flexibility in parsing malformed JSON, `dirtyjson` may not always produce the exact data structure or values that a user 'expects' from highly ambiguous input. Its primary goal is to extract data, not to strictly validate.
- gotcha Unlike the standard JSON specification, `dirtyjson` handles `NaN`, `Infinity`, `-Infinity`, and hexadecimal/octal integers (`0x...`, `0o...`) as valid numbers. Additionally, it uses `AttributedDict` and `AttributedList` subclasses (which are `dict`/`list` compatible) that store line/column number metadata, differing from standard Python types.
- gotcha Parsing 'dirty' or malformed JSON is inherently more complex and potentially less performant than parsing strict JSON. While `dirtyjson` is based on `simplejson`'s loader, it may not be suitable for high-performance or extremely high-reliability parsing environments compared to strict JSON parsers.
Install
-
pip install dirtyjson
Imports
- loads
import dirtyjson dirtyjson.loads(json_string)
- load
import dirtyjson with open('file.js') as f: data = dirtyjson.load(f)
Quickstart
import dirtyjson
dirty_json_string = """
{
name: 'John Doe', // Unquoted key, single quotes, comment
age: 30,
email: "john@example.com", /* Block comment */
'is_active': true,
data: [
1,
2, // Dangling comma
]
}
"""
try:
data = dirtyjson.loads(dirty_json_string)
print("Parsed data:", data)
# Accessing position attributes
name_pos = data.attributes('name')
print(f"'name' key starts at line {name_pos.key.line}, column {name_pos.key.column}")
# Using standard dict/list operations
print("Name:", data['name'])
print("First data element:", data['data'][0])
except dirtyjson.Error as e:
print(f"Error parsing dirty JSON: {e}")