Linear TSV
Linear TSV is a Python library providing a simple, line-oriented, and portable tabular data format. Unlike standard CSV, it uses escape codes for newlines and tabs within field data, enabling robust processing with line-oriented shell tools. The format aligns with the TEXT serialization mode of Postgres and MySQL, making it ideal for reliable data exchange. It is currently at version 1.1.0 and maintains an active release cadence.
Common errors
-
ValueError: need more than 1 value to unpack (or similar errors indicating incorrect column count)
cause This usually occurs when a line in the TSV is not correctly split into the expected number of columns. This is often due to embedded newlines or tabs in a field that are not properly escaped or are being parsed by a tool that doesn't understand the `linear-tsv` escape conventions.fixEnsure the input TSV strictly adheres to the `linear-tsv` format's escape rules for newlines, tabs, and backslashes. If the file is supposed to be `linear-tsv` compliant, use `linear-tsv.tsv.un()` to parse it. If it's a standard TSV without embedded delimiters, verify your parser (e.g., `line.strip().split('\t')` or `csv.reader(file, delimiter='\t')`) is correctly configured. -
Data appears as single-column strings or contains unparsed escape sequences like '\\n' or '\\t'.
cause The TSV data contains `linear-tsv` specific escape sequences (`\n`, `\t`, `\r`, `\\`) within fields, but it is being read by a generic text reader or a CSV parser that does not interpret these sequences as delimiters.fixUtilize the `linear-tsv.tsv.un()` function for parsing. This library is specifically designed to correctly decode these escape sequences into their intended characters, ensuring proper field separation and data integrity for this particular TSV format.
Warnings
- gotcha The `linear-tsv` library implements a specific TSV format where newlines (\n), tabs (\t), carriage returns (\r), and backslashes (\\) within field data *must* be escaped. Generic TSV parsers (e.g., `csv.reader` with `delimiter='\t'`) will not correctly interpret these escape sequences, leading to data corruption or incorrect parsing.
- gotcha The `linear-tsv` format does not natively include or interpret header rows. The `tsv.un()` function treats all lines as data rows.
- gotcha The format uses `\N` (backslash followed by capital N) to represent `NULL` values, akin to PostgreSQL/MySQL TEXT format. The `tsv.un()` function returns `\N` as a literal string.
Install
-
pip install linear-tsv
Imports
- un
from linear_tsv import un
import tsv lists = tsv.un(data_stream)
Quickstart
import tsv
import io
# Example TSV data string with escaped newline and tab
data_string = "col1\tcol2\nval1a\\nwithnewline\tval1b\\twithtab\nval2a\t\\N"
# Use io.StringIO to simulate a file stream
data_stream = io.StringIO(data_string)
# Parse the TSV stream to a generator of lists of strings
parsed_data = list(tsv.un(data_stream))
# Print the parsed data
for row in parsed_data:
print(row)
# Output:
# ['col1', 'col2']
# ['val1a\nwithnewline', 'val1b\twithtab']
# ['val2a', '\N']