Python tools for Universal Dependencies

0.2.7 · active · verified Thu Apr 16

udtools (version 0.2.7) provides a suite of Python tools for working with Universal Dependencies (UD) data. It offers functionalities for reading, writing, querying, and transforming CoNLL-U files, as well as integrating with UDPipe. The library is actively maintained with an irregular release cadence, focusing on facilitating linguistic research and processing of dependency parsed text.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a CoNLLUDocument from a string, print its contents, apply a transformation, and add new sentences programmatically, showcasing basic data manipulation without file dependencies.

from udtools.conllu import CoNLLUDocument, Sentence, Token
from udtools.transform import collapse_compounds

# Create a sample CoNLL-U document from a string
conllu_string = """
# sent_id = 1
# text = This is an example.
1	This	this	PRON	DT	Number=Sing|PronType=Dem	3	nsubj	_	_
2	is	be	AUX	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	3	cop	_	_
3	an	a	DET	DT	Definite=Ind|PronType=Art	4	det	_	_
4	example	example	NOUN	NN	Number=Sing	0	root	_	SpaceAfter=No
5	.	.	PUNCT	.	_	4	punct	_	_

"""
doc = CoNLLUDocument.from_string(conllu_string)
print("Original document:")
print(doc.to_string())

# Example transformation (collapse_compounds might not change this simple example)
collapsed_doc = collapse_compounds(doc)

print("\nDocument after collapse_compounds (no change for this simple example):")
print(collapsed_doc.to_string())

# Demonstrate adding a new sentence
new_sentence = Sentence()
new_sentence.tokens.append(Token(id="1", form="Hello", lemma="hello", upos="INTJ"))
new_sentence.tokens.append(Token(id="2", form=".", lemma=".", upos="PUNCT"))
doc.sentences.append(new_sentence)

print("\nDocument with a new sentence added:")
print(doc.to_string())

view raw JSON →