{"library":"parse-latin","title":"Latin-script Natural Language Parser","description":"`parse-latin` is a JavaScript library designed for parsing natural language written in Latin-script languages, producing a Natural Language Concrete Syntax Tree (NLCST). It is currently at version 7.0.0, with a release cadence that includes regular patch and minor updates, and significant breaking changes between major versions, often related to ecosystem standards (like ESM-only) or core API adjustments. Key differentiators include its ability to precisely tokenize and structure text from diverse Latin-script languages such as Old English, Icelandic, French, and German, by correctly handling complex punctuation, word boundaries, and sentence structures. Unlike higher-level abstractions like `retext-latin`, `parse-latin` provides a lower-level API for manual manipulation of syntax trees. It explicitly handles nuances like hyphenated words, contractions (e.g., 'she’s'), and periods that don't signify sentence endings (e.g., in abbreviations), making it robust for detailed linguistic analysis and processing.","language":"javascript","status":"active","last_verified":"Sun Apr 19","install":{"commands":["npm install parse-latin"],"cli":null},"imports":["import { ParseLatin } from 'parse-latin'","import type { RootNode } from 'nlcst'","import type { Node } from 'unist'"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import {ParseLatin} from 'parse-latin'\nimport {inspect} from 'unist-util-inspect' // Make sure to install 'unist-util-inspect' separately: npm install unist-util-inspect\n\n// Initialize the parser\nconst parser = new ParseLatin()\n\n// Parse a comprehensive Latin-script text\nconst tree = parser.parse(`\n  A truly simple sentence for demonstration. This is another sentence,\n  featuring some hyphenated words like \"well-being\" and common abbreviations\n  such as \"e.g.\", \"i.e.\", and \"etc.\" Periods, question marks? And\n  exclamation points! all mark sentence boundaries. Special characters\n  like © and ® are treated as symbols. What about numbers? 1, 2, 3.\n  Paragraphs are also recognized.\n\n  New paragraph starts here. It might contain quotes like \"Hello world!\"\n  or parenthetical expressions (like this one). The parser aims to correctly\n  segment words, punctuation, and sentences according to Latin-script rules.\n  For example, \"U.S.A.\" is a single word token.\n`)\n\nconsole.log(inspect(tree))","lang":"typescript","description":"Demonstrates how to initialize `ParseLatin` and parse a sample Latin-script string into an NLCST syntax tree, then logs the inspected tree structure.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":null}