{"id":8082,"library":"detect-delimiter","title":"Delimiter Detector","description":"The `detect-delimiter` Python library, currently at version 0.1.1 and last released in July 2018, provides a simple function to automatically identify the delimiter used in various ad-hoc file formats like CSV or TSV. It primarily operates by counting character frequencies within an input string. The library exposes a single `detect()` function, making it straightforward to use for basic delimiter detection needs. Its release cadence appears to be sporadic or ceased, indicating a stable but not actively developed state.","status":"maintenance","version":"0.1.1","language":"en","source_language":"en","source_url":null,"tags":["csv","tsv","delimiter","parser","file-format"],"install":[{"cmd":"pip install detect-delimiter","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"symbol":"detect","correct":"from detect_delimiter import detect"}],"quickstart":{"code":"from detect_delimiter import detect\n\n# Example 1: Basic comma-separated data\ntext1 = \"apple,banana,cherry\"\ndelimiter1 = detect(text1)\nprint(f\"Delimiter for '{text1}': '{delimiter1}'\")\n\n# Example 2: Tab-separated data\ntext2 = \"name\\tage\\tcity\"\ndelimiter2 = detect(text2)\nprint(f\"Delimiter for '{text2}': '{delimiter2}'\")\n\n# Example 3: Semicolon-separated with a custom whitelist\ntext3 = \"one;two;three\"\ndelimiter3 = detect(text3, whitelist=[';', ',', '|'])\nprint(f\"Delimiter for '{text3}': '{delimiter3}'\")\n\n# Example 4: No common delimiter found, returning a default value\ntext4 = \"hello world\"\ndelimiter4 = detect(text4, default='NA')\nprint(f\"Delimiter for '{text4}': '{delimiter4}'\")\n\n# Example 5: Period as delimiter, which is blacklisted by default\ntext5 = \"file.name.txt\"\ndelimiter5 = detect(text5)\nprint(f\"Delimiter for '{text5}': '{delimiter5}'\") # Expected: None (as '.' is blacklisted by default)\n","lang":"python","description":"The `detect()` function is the primary entry point. It takes the text as a string and can optionally take `whitelist` (a list of characters to prioritize), `blacklist` (characters to ignore), and `default` (a value to return if no delimiter is found) parameters."},"warnings":[{"fix":"Use the `blacklist` parameter to explicitly include characters for consideration, for example: `detect(text, blacklist=[])` to remove all default blacklisted characters, or `detect(text, whitelist=['.'])` to force checking for a period.","message":"The `detect()` function, by default, will not check alphanumeric characters or the period/full stop character ('.') as delimiters. If your files use these as actual delimiters (e.g., a custom file format with `.` as a separator), they will be ignored.","severity":"gotcha","affected_versions":"0.1.1"},{"fix":"For robust CSV parsing that respects quoting and escaping, consider using Python's built-in `csv.Sniffer` or a more advanced library like `CleverCSV`.","message":"The library does not handle CSV quoting rules (e.g., delimiters within double quotes `\"field, with, commas\"`). It primarily relies on simple character frequency counting. This can lead to incorrect delimiter detection in malformed CSVs or when data fields contain characters that are also common delimiters.","severity":"gotcha","affected_versions":"0.1.1"},{"fix":"For files with multi-character delimiters, manual parsing or a custom solution will be required, as this library is not suitable.","message":"The `detect-delimiter` library is designed for single-character delimiters and does not support multi-character delimiters (e.g., `##`, `|||`).","severity":"gotcha","affected_versions":"0.1.1"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"For files known to follow CSV standards with quoting, use Python's `csv.Sniffer`. If a simplified approach is still desired and the problem persists, use the `whitelist` parameter with only the *true* expected delimiters, e.g., `detect(text, whitelist=[';'])`.","cause":"The `detect()` function performs a basic character frequency count without a full understanding of CSV format rules, such as quoted fields. If a common delimiter like a comma appears frequently within quoted text, it may be incorrectly identified as the primary delimiter.","error":"detect_delimiter doesn't consider quotation and escaping, and hence can easily miss the correct separator if it occurs more often because it's escaped."},{"fix":"Explicitly provide a `whitelist` parameter with the characters you expect to be delimiters (e.g., `detect(text, whitelist=['|', '~'])`) or adjust the `blacklist` if characters are being incorrectly ignored (e.g., `detect(text, blacklist=[])`).","cause":"The expected delimiter might not be in the default `whitelist` `[',', ';', ':', '|', '\\t']` or it might be a character that is blacklisted by default (e.g., alphanumeric, period).","error":"None returned as delimiter when an expected delimiter is clearly present in the text."},{"fix":"Narrow down the possibilities using the `whitelist` parameter, for example, `detect(text, whitelist=[';', '|'])`. For highly ambiguous cases, manual inspection or a more context-aware parsing library might be needed.","cause":"The library's frequency-based detection can be misled if a character that is *not* the true delimiter appears more often in the sample text. This is common if data fields contain frequent commas in a semicolon-delimited file.","error":"Incorrect delimiter detected (e.g., returns ',' but file is ';'-separated)."}]}