DirtyJSON Python Decoder

1.0.8 · active · verified Thu Apr 09

dirtyjson is a Python JSON decoder (version 1.0.8, last updated Nov 2022) designed to extract data from malformed or 'dirty' JSON, often found embedded in JavaScript files. It tolerates common non-standard JSON elements like single quotes, comments (line and block), dangling commas, unquoted keys, and hexadecimal/octal numbers. It focuses solely on decoding and provides line/column number contexts for parsed elements. While its release cadence is infrequent, it remains active for specialized parsing needs.

Warnings

Install

Imports

Quickstart

Demonstrates parsing a malformed JSON string with unquoted keys, single quotes, comments, and a dangling comma. It also shows how to access the positional attributes stored by dirtyjson and use the resulting AttributedDict as a standard dictionary.

import dirtyjson

dirty_json_string = """
{
    name: 'John Doe', // Unquoted key, single quotes, comment
    age: 30,
    email: "john@example.com", /* Block comment */
    'is_active': true,
    data: [
        1,
        2, // Dangling comma
    ]
}
"""

try:
    data = dirtyjson.loads(dirty_json_string)
    print("Parsed data:", data)
    
    # Accessing position attributes
    name_pos = data.attributes('name')
    print(f"'name' key starts at line {name_pos.key.line}, column {name_pos.key.column}")

    # Using standard dict/list operations
    print("Name:", data['name'])
    print("First data element:", data['data'][0])

except dirtyjson.Error as e:
    print(f"Error parsing dirty JSON: {e}")

view raw JSON →