Gibberish Detector

0.1.1 · maintenance · verified Thu Apr 16

The `gibberish-detector` Python library, currently at version 0.1.1, identifies nonsensical strings using a Markov Chain-based model. It's an adaptation of an earlier project, updated for Python 3. Users first train a model on a corpus of 'good' text to understand character transition probabilities, and then use this model to determine if new input strings are gibberish. The library is primarily maintained as a utility for text validation and spam filtering, with updates occurring on an infrequent basis.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to import the `gibberish_detector` and use a trained model to detect gibberish. Note that a valid trained model file is essential for the `create_from_model` function to work correctly. The provided code creates a dummy model file, which will likely cause an error upon loading but illustrates the API usage. For actual detection, you must first train a model using the `gibberish-detector train` command-line tool, providing a large text file of 'good' (non-gibberish) text, and then point `create_from_model` to your generated model file.

import os
import tempfile

# NOTE: In a real scenario, you would train a model on a large text file.
# For this quickstart, we'll create a dummy model file for demonstration.
# A proper model training example would be:
#   gibberish-detector train examples/big.txt > big.model

# Create a dummy model file for demonstration purposes
# This content is NOT a valid gibberish-detector model and will likely fail.
# It's purely to show the API usage. A real model is a JSON file.
model_content = "{}"

with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.model', encoding='utf-8') as tmp_model_file:
    tmp_model_file.write(model_content)
    model_path = tmp_model_file.name

try:
    from gibberish_detector import detector

    # Attempt to load from the dummy model file
    # This will likely fail with a JSONDecodeError or similar since it's an empty dict string.
    # In a real application, ensure your model_path points to a valid, trained model file.
    print(f"Attempting to load model from: {model_path}")
    my_detector = detector.create_from_model(model_path)

    # Example usage with a loaded detector
    print(f"'superman' is gibberish: {my_detector.is_gibberish('superman')}")
    print(f"'ertrjiloifdfyyoiu' is gibberish: {my_detector.is_gibberish('ertrjiloifdfyyoiu')}")

except Exception as e:
    print(f"Could not run quickstart due to an error. This is expected if the model_path is not a valid trained model. Error: {e}")
    print("To run properly, first train a model using the command line tool:")
    print("  gibberish-detector train <path_to_good_text_file> > your_model.model")
    print("Then, replace 'model_path' above with 'your_model.model'.")
finally:
    # Clean up the dummy model file
    if os.path.exists(model_path):
        os.remove(model_path)

view raw JSON →