UDAPI

0.5.2 · active · verified Thu Apr 16

UDAPI is a Python framework for processing Universal Dependencies (UD) data, providing an API for reading, writing, and transforming CoNLL-U formatted treebanks. It supports tasks like visualization, format conversion, querying, and transformations of dependency trees. The library is actively maintained, with the current version being 0.5.2, and sees regular updates.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize a `Document`, read a plain text sentence using `read.Text`, tokenize it with `tokenize.Simple`, and then write the resulting (unparsed) document into CoNLL-U format using `write.Conllu`. The output is captured and printed.

import io
from contextlib import redirect_stdout

from udapi.core.document import Document
from udapi.block.read.text import Text
from udapi.block.tokenize.simple import Simple
from udapi.block.write.conllu import Conllu

# Create a new document
doc = Document()

# Read raw text into the document
read_text_block = Text(string="This is a test sentence.")
read_text_block.apply_on_document(doc)

# Tokenize the sentence using a simple whitespace tokenizer
tokenize_block = Simple()
tokenize_block.apply_on_document(doc)

# Write the document in CoNLL-U format to a string
f = io.StringIO()
with redirect_stdout(f):
    write_conllu_block = Conllu()
    write_conllu_block.apply_on_document(doc)
conllu_output = f.getvalue()

print(conllu_output)

view raw JSON →