{"id":8973,"library":"emrvalidator","title":"EMR Validator","description":"EMR Validator (emrvalidator) is a Python library designed for comprehensive data validation of healthcare data. It allows users to define validation rules in an Excel-based schema and apply them to various data formats like CSV. The current version is 1.0.2, and it receives active maintenance with minor releases addressing bug fixes and enhancements.","status":"active","version":"1.0.2","language":"en","source_language":"en","source_url":"https://github.com/pwcindia/EMR-Validator-tool","tags":["healthcare","data validation","EMR","excel","csv"],"install":[{"cmd":"pip install emrvalidator","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Core data manipulation and I/O for CSV and Excel files.","package":"pandas"},{"reason":"Required for reading and writing Excel schema files (.xlsx).","package":"openpyxl"},{"reason":"Fundamental package for numerical computing, often a dependency of pandas.","package":"numpy"},{"reason":"Used for pretty-printing validation summaries and results to the console.","package":"tabulate"}],"imports":[{"symbol":"EMRValidator","correct":"from emrvalidator import EMRValidator"}],"quickstart":{"code":"import os\nimport pandas as pd\nfrom emrvalidator import EMRValidator\n\n# --- Dummy file creation for runnable example START ---\n# In a real scenario, you would have these files pre-existing.\nschema_data = {\n    \"Column Name\": [\"PatientID\", \"Name\", \"Age\", \"AdmissionDate\"],\n    \"Data Type\": [\"STRING\", \"STRING\", \"INTEGER\", \"DATETIME\"],\n    \"Is Mandatory\": [\"YES\", \"YES\", \"YES\", \"NO\"],\n    \"Allowed Values\": [\"\", \"\", \"\", \"\"],\n    \"Min Length\": [\"\", \"2\", \"0\", \"\"],\n    \"Max Length\": [\"\", \"50\", \"120\", \"\"],\n    \"Regex Pattern\": [\"\", \"\", \"\", \"\"]\n}\nschema_df = pd.DataFrame(schema_data)\n\n# Using tempfile for demonstration, replace with your actual file paths\nimport tempfile\ntemp_dir = tempfile.gettempdir()\nschema_path = os.path.join(temp_dir, \"registry_schema.xlsx\")\ndata_path = os.path.join(temp_dir, \"registry_data.csv\")\n\nwith pd.ExcelWriter(schema_path, engine='openpyxl') as writer:\n    schema_df.to_excel(writer, index=False, sheet_name='Sheet1')\n\ndata_csv_content = \"\"\"PatientID,Name,Age,AdmissionDate\nP001,Alice,30,2023-01-15\nP002,Bob,25,\nP003,Charlie,40,2024-03-20\n\"\"\"\nwith open(data_path, 'w') as f:\n    f.write(data_csv_content)\n# --- Dummy file creation for runnable example END ---\n\n# Initialize the EMRValidator\n# Replace 'schema_path' and 'data_path' with your actual file paths\nvalidator = EMRValidator(schema_path=schema_path, data_path=data_path)\n\n# Run the validation\nvalidation_result = validator.validate()\n\n# Get summary of validation\nsummary = validator.get_summary()\nprint(\"Validation Summary:\")\nprint(summary)\n\n# Get invalid records\ninvalid_records = validator.get_invalid_records()\nif not invalid_records.empty:\n    print(\"\\nInvalid Records:\")\n    print(invalid_records)\nelse:\n    print(\"\\nNo invalid records found.\")\n\n# Get validated records\nvalidated_records = validator.get_validated_records()\nif not validated_records.empty:\n    print(\"\\nValidated Records:\")\n    print(validated_records)\n\n# Clean up temporary files (optional, for demonstration)\nos.remove(schema_path)\nos.remove(data_path)\n","lang":"python","description":"This quickstart demonstrates how to set up and run a basic data validation using `EMRValidator`. It first creates dummy `schema.xlsx` and `data.csv` files in temporary locations for a runnable example, then initializes `EMRValidator` with these paths, runs the validation, and prints the summary, invalid records, and validated records. In a real application, `schema_path` and `data_path` would point to your actual data files."},"warnings":[{"fix":"Refer to the official documentation or example `schema.xlsx` for the exact format required for schema definition. Ensure column names like 'Column Name', 'Data Type', 'Is Mandatory', 'Allowed Values', 'Min Length', 'Max Length', and 'Regex Pattern' are spelled correctly and present.","message":"The schema definition (e.g., `schema.xlsx`) must strictly adhere to the expected column headers and structure described in the documentation. Incorrect headers, missing mandatory columns, or deviations in format will lead to validation failures or `KeyError`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Consult the documentation for the precise list of supported data types. Ensure your schema uses these exact keywords for the 'Data Type' column entries.","message":"Data type definitions in the schema (`Data Type` column) must use specific keywords recognized by the library (e.g., 'STRING', 'INTEGER', 'DECIMAL', 'DATETIME', 'BOOLEAN'). Mismatches between these keywords and actual data types or unrecognized keywords will cause validation errors.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure the `schema_path` and `data_path` arguments provide correct, absolute or relative paths to the respective files. Double-check file names, extensions, and the current working directory.","cause":"The `data_path` or `schema_path` provided to `EMRValidator` does not point to an existing file, or the path is incorrect.","error":"FileNotFoundError: [Errno 2] No such file or directory: 'your_file.csv'"},{"fix":"Verify that your schema file includes all required column headers: 'Column Name', 'Data Type', 'Is Mandatory', 'Allowed Values', 'Min Length', 'Max Length', 'Regex Pattern'. Ensure correct spelling and casing for each header.","cause":"The schema file (e.g., `schema.xlsx`) is missing one of the mandatory column headers expected by the validator, or there's a typo in a column name.","error":"KeyError: 'Column Name'"},{"fix":"Install the library using `pip install emrvalidator`. If already installed, ensure you are running your script in the correct Python environment.","cause":"The `emrvalidator` library has not been installed in the current Python environment, or the environment where the script is run is different from where it was installed.","error":"ModuleNotFoundError: No module named 'emrvalidator'"}]}