CoNLL-U Parser

6.0.0 · active · verified Tue Apr 14

The `conllu` library (version 6.0.0) is a Python parser for the CoNLL-U format, converting CoNLL-U formatted strings into a nested Python dictionary structure. CoNLL-U is frequently used as an output format for natural language processing tasks. It is actively maintained with a moderate release cadence and has no external dependencies.

Warnings

Install

Imports

Quickstart

Parses a CoNLL-U formatted string into a list of `TokenList` objects, each representing a sentence. Tokens can be accessed as dictionary-like objects, and sentence metadata is available via the `.metadata` attribute.

from conllu import parse

data = """
# text = The quick brown fox jumps over the lazy dog.
1	The	the	DET	DT	Definite=Def|PronType=Art	4	det	_	_
2	quick	quick	ADJ	JJ	Degree=Pos	4	amod	_	_
3	brown	brown	ADJ	JJ	Degree=Pos	4	amod	_	_
4	fox	fox	NOUN	NN	Number=Sing	5	nsubj	_	_
5	jumps	jump	VERB	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	0	root	_	_
6	over	over	ADP	IN	_	9	case	_	_
7	the	the	DET	DT	Definite=Def|PronType=Art	9	det	_	_
8	lazy	lazy	ADJ	JJ	Degree=Pos	9	amod	_	_
9	dog	dog	NOUN	NN	Number=Sing	5	nmod	_	SpaceAfter=No
10	.	.	PUNCT	.	_	5	punct	_	_
"""

sentences = parse(data)

# Accessing tokens and metadata
sentence = sentences[0]
print(f"Sentence text: {sentence.metadata.get('text')}")
for token in sentence:
    print(f"ID: {token['id']}, Form: {token['form']}, UPos: {token['upos']}")

view raw JSON →