Comment Parser
comment-parser is a Python module designed to extract comments from various source code files. It supports common languages like C, C++, Java, JavaScript, Python, and others, handling both single-line and multi-line comment formats. The library, currently at version 1.2.5, features an active release schedule, with recent updates focusing on setup stability and adding typing information.
Common errors
-
comment_parser.errors.UnsupportedError: The given mime type for the file is not supported.
cause The library does not have a parser for the detected or provided MIME type, or `python-magic` (and `libmagic`) is not installed/configured to correctly deduce the MIME type.fixExplicitly provide the `mime` argument (e.g., `comment_parser.extract_comments(filename, mime='text/x-python')`). If the language is genuinely unsupported, consider using a different tool or contributing a parser. Ensure `python-magic` and `libmagic` (for Linux/Unix) are installed if relying on auto-detection. -
ModuleNotFoundError: No module named 'magic'
cause The `python-magic` library, which `comment-parser` uses for automatic MIME type deduction on some operating systems, is not installed.fixInstall the `python-magic` package: `pip install python-magic`. On Linux/Unix, also ensure the `libmagic` system library is installed (e.g., `sudo apt-get install libmagic-dev` or `sudo yum install file-devel`).
Warnings
- gotcha Automatic MIME type detection requires external dependencies (`python-magic` for Python and `libmagic` system library). Without these, the parser might fail to correctly identify file types, leading to `UnsupportedError`.
- gotcha The library raises an `UnsupportedError` if it encounters a file or string with a MIME type that it does not have a parser for, or if MIME type deduction fails. While many common languages are supported (C, C++, Java, Python, JavaScript, HTML, XML, Go, Ruby, Shell), niche or custom language files may not be.
- gotcha This Python library `comment-parser` is distinct from a similarly named JavaScript library (`comment-parser` on NPM) and a Rust crate (`comment-parser` on crates.io). Features, APIs, and breaking changes from those other ecosystems do not apply here.
Install
-
pip install comment-parser
Imports
- comment_parser
from comment_parser import comment_parser
- Comment
from comment_parser.parsers.common import Comment
Quickstart
import os
import tempfile
from comment_parser import comment_parser
# Example 1: Extract comments from a string
python_code = """
# This is a single-line Python comment
def example_function():
'''
This is a docstring, not a comment parsed by comment-parser by default.
'''
x = 10 # Inline comment
# Another line
"""
try:
# Explicitly specify MIME type for robust parsing
comments_from_str = comment_parser.extract_comments_from_str(python_code, mime='text/x-python')
print("\n--- Comments from string ---")
for comment in comments_from_str:
print(f"[Line {comment.line_number}] {comment.text} (Multiline: {comment.is_multiline})")
except Exception as e:
print(f"Error parsing string: {e}")
# Example 2: Extract comments from a file
# Create a dummy file for demonstration
temp_dir = tempfile.gettempdir()
file_path = os.path.join(temp_dir, 'example.c')
c_code = """
/* This is a multi-line C comment
* spanning several lines. */
#include <stdio.h>
// Single line C comment
int main() {
printf("Hello, World!");
return 0;
}
"""
with open(file_path, 'w') as f:
f.write(c_code)
try:
# Using extract_comments with a file path
comments_from_file = comment_parser.extract_comments(file_path, mime='text/x-c')
print("\n--- Comments from file ---")
for comment in comments_from_file:
print(f"[Line {comment.line_number}] {comment.text} (Multiline: {comment.is_multiline})")
except Exception as e:
print(f"Error parsing file: {e}")
finally:
# Clean up the dummy file
if os.path.exists(file_path):
os.remove(file_path)