delphin.tsdb

Test Suite Database (TSDB) Primitives

Note

This module implements the basic, low-level functionality for working with TSDB databases. For higher-level views and uses of these databases, see delphin.itsdb. For complex queries of the databases, see delphin.tsql.

TSDB databases are plain-text file-based relational databases minimally consisting of a directory with a file, called relations, containing the database’s schema (see Schemas). Every relation, or table, in the database has its own file, which may be gzipped to save space. The relations have a simple format with columns delimited by @ and records delimited by newlines. This makes them easy to inspect at the command line with standard Unix tools such as cut and awk (but gzipped relations need to be decompressed or piped from a tool such as zcat).

This module handles the technical details of reading and writing TSDB databases, including:

  • parsing database schemas

  • transparently opening either the plain-text or gzipped relations on disk, as appropriate

  • escaping and unescaping reserved characters in the data

  • pairing columns with their schema descriptions

  • casting types (such as :integer, :date, etc.)

Additionally, this module provides very basic abstractions of databases and relations as the Database and Relation classes, respectively. These serve as base classes for the more featureful delphin.itsdb.TestSuite and delphin.itsdb.Table classes, but may be useful as they are for simple needs.

Module Constants

delphin.tsdb.SCHEMA_FILENAME

relations – The filename for the schema.

delphin.tsdb.FIELD_DELIMITER

@ – The character used to delimit fields (or columns) in a record.

delphin.tsdb.TSDB_CORE_FILES

The list of files used in “skeletons”. Includes:

item
analysis
phenomenon
parameter
set
item-phenomenon
item-set
delphin.tsdb.TSDB_CODED_ATTRIBUTES

The default values of specific fields. Includes:

i-wf = 1
i-difficulty = 1
polarity = -1

Fields without a special value given above get assigned one based on their datatype.

Schemas

A TSDB database defines its schema in a file called relations. This file contains descriptions of each relation (table) and its fields (columns), including the datatypes and whether a column counts as a “key”. Key columns may be used when joining relations together. As an example, the first 9 lines of the run relation description is as follows:

run:
  run-id :integer :key                  # unique test run identifier
  run-comment :string                   # descriptive narrative
  platform :string                      # implementation platform (version)
  protocol :integer                     # [incr tsdb()] protocol version
  tsdb :string                          # tsdb(1) (version) used
  application :string                   # application (version) used
  environment :string                   # application-specific information
  grammar :string                       # grammar (version) used
  ...

See also

See the TsdbSchemaRfc wiki for a description of the format of relations files.

In PyDelphin, TSDB schemas are represented as dictionaries of lists of Field objects.

class delphin.tsdb.Field(name, datatype, flags=None, comment=None)[source]

A tuple describing a column in a TSDB database relation.

Parameters
  • name (str) – column name

  • datatype (str) – “:string”, “:integer”, “:date”, or “:float”

  • flags (list) – List of additional flags

  • comment (str) – description of the column

is_key

True if the column is a key in the database.

Type

bool

default

The default formatted value (see format()) when the value it describes is None.

Type

str

delphin.tsdb.read_schema(path)[source]

Instantiate schema dict from a schema file given by path.

If path is a directory, use the relations file under path. If path is a file, use it directly as the schema’s path. Otherwise raise a TSDBSchemaError.

delphin.tsdb.write_schema(path, schema)[source]

Serialize schema and write it to the relations file at path.

If path is a directory, write to a relations file under path, otherwise write to the file path.

delphin.tsdb.make_field_index(fields)[source]

Create and return a mapping of field names to indices.

This mapping helps with looking up columns by their names.

Parameters

fields – iterable of Field objects

Examples

>>> fields = [tsdb.Field('i-id', ':integer'),
...           tsdb.Field('i-input', ':string')]
>>> tsdb.make_field_index(fields)
{'i-id': 0, 'i-input': 1}

Data Operations

Character Escaping and Unescaping

delphin.tsdb.escape(string)[source]

Replace any special characters with their TSDB escape sequences. The characters and their escape sequences are:

@          ->  \s
(newline)  ->  \n
\          ->  \\

Also see unescape()

Parameters

string – string to escape

Returns

The escaped string

delphin.tsdb.unescape(string)[source]

Replace TSDB escape sequences with the regular equivalents.

Also see escape().

Parameters

string (str) – TSDB-escaped string

Returns

The string with escape sequences replaced

Record Splitting and Joining

delphin.tsdb.split(line, fields=None)[source]

Split a raw line from a relation into a list of column values.

Decoding involves splitting the line by the field delimiter and unescaping special characters. The column value for empty fields is None.

If fields is given, cast each column value into its datatype, otherwise the value is returned as a string.

Parameters
  • line – raw line from a TSDB relation file.

  • fields – iterable of Field objects

Returns

A list of column values.

delphin.tsdb.join(values, fields=None)[source]

Join a list of column values into a string for a relation file.

Encoding involves escaping special characters for each value, then joining the values into a single string with the field delimiter. If fields is given, None values will be replaced with the default value for their datatype.

For creating a record from a mapping of column names to values, see make_record().

Parameters
  • values – list of column values

  • fields – iterable of Field objects

Returns

A TSDB-encoded string

delphin.tsdb.make_record(colmap, fields)[source]

Create a record tuple from a mapping of column names to values.

This function is useful when colmap is either a subset or superset of the columns defined for a relation (as determined by fields). That is, it selects the relevant column values and fills in the missing ones with None. fields is also responsible for determining the column order.

Parameters
  • colmap – mapping of column names to values

  • fields – iterable of Field objects

Returns

A tuple of column values

Datatype Conversion

delphin.tsdb.cast(datatype, raw_value)[source]

Cast TSDB field raw_value into datatype.

If raw_value is None or an empty string (‘’), None will be returned, regardless of the datatype. However, when datatype is :integer and raw_value is ‘-1’ (the default value for most :integer columns), -1 is returned instead of None. This means that cast() is the inverse of format() except for integer values of -1, some date formats, and coded defaults.

Supported datatypes:

TSDB datatype

Python type

:integer

int

:string

str

:float

float

:date

datetime.datetime

Casting the :integer, :string, and :float types is trivial, but for :date TSDB uses a non-standard date format. This format generally follows the DD-MM-YY pattern, optionally followed by a time (with no timezone or UTC-offset allowed). The day of the month may be left unspecified, in which case 01 is used. Years may be 2 or 4 digits: in the case of 2-digit years, 19 is prepended if the 2-digit year is greater than or equal to 93 (the year of the first TSNLP publications and the earliest test suites), otherwise 20 is prepended (meaning that users are advised to start using 4-digit years by, at least, the year 2093). In addition, the more universal YYYY-MM-DD format is allowed, but it must have 4-digit years (to disambiguate with the other pattern).

Examples

>>> tsdb.cast(':integer', '15')
15
>>> tsdb.cast(':float', '2.05e-3')
0.00205
>>> tsdb.cast(':string', 'Abrams slept.')
'Abrams slept.'
>>> tsdb.cast(':date', '10-6-2002')
datetime.datetime(2002, 6, 10, 0, 0)
>>> tsdb.cast(':date', '8-sep-1999')
datetime.datetime(1999, 9, 8, 0, 0)
>>> tsdb.cast(':date', 'apr-95')
datetime.datetime(1995, 4, 1, 0, 0)
>>> tsdb.cast(':date', '01-dec-02 (15:31:01)')
datetime.datetime(2002, 12, 1, 15, 31, 1)
>>> tsdb.cast(':date', '2008-10-12 10:51')
datetime.datetime(2008, 10, 12, 10, 51)
delphin.tsdb.format(datatype, value, default=None)[source]

Format a column value based on its field.

If value is None then default is returned if it is given (i.e., not None). If default is None, ‘-1’ is returned if datatype is ‘:integer’, otherwise an empty string (‘’) is returned.

If datatype is ‘:date’ and value is a datetime.datetime object then a TSDB-compatible date format (DD-MM-YYYY) is returned.

In all other cases, value is cast directly to a string and returned.

Examples

>>> tsdb.format(':integer', 42)
'42'
>>> tsdb.format(':integer', None)
'-1'
>>> tsdb.format(':integer', None, default='1')
'1'
>>> tsdb.format(':date', datetime.datetime(1999,9,8))
'8-sep-1999'

File and Directory Operations

Paths

delphin.tsdb.is_database_directory(path)[source]

Return True if path is a valid TSDB database directory.

A path is a valid database directory if it is a directory containing a schema file. This is a simple test; the schema file itself is not checked for validity.

delphin.tsdb.get_path(dir, name)[source]

Determine if the file path should end in .gz or not and return it.

A .gz path is preferred only if it exists and is newer than any regular text file path.

Parameters
  • dir – TSDB database directory

  • name – name of a file in the database

Raises

TSDBError – when neither the .gz nor the text file exist.

Relation File Access

delphin.tsdb.open(dir, name, encoding=None)[source]

Open a TSDB database file.

Unlike a normal open() call, this function takes a base directory dir and a filename name and determines whether the plain text dir/name or compressed dir/name.gz file is opened. Furthermore, this function only opens files in read-only text mode. For writing database files, see write().

Parameters
  • dir – path to the database directory

  • name – name of the file to open

  • encoding – character encoding of the file

Example

>>> sentences = []
>>> with tsdb.open('my-profile', 'item') as item:
...     for line in item:
...         sentences.append(tsdb.split(line)[6])
delphin.tsdb.write(dir, name, records, fields=None, append=False, gzip=False, encoding='utf-8')[source]

Write records to relation name in the database at dir.

The simplest way to write data to a file would be something like the following:

>>> with open(os.path.join(db.path, 'item'), 'w') as fh:
...     print('\n'.join(map(tsdb.join, db['item'])), file=fh)

This function improves on that method by doing the following:

  • Determining the path from the gzip parameter and existing files

  • Writing plain text or compressed data, as appropriate

  • Appending or overwriting data, as requested

  • Using the schema information to format fields

  • Writing to a temporary file then copying when done; this prevents accidental data loss when overwriting a file that is being read

  • Deleting any alternative (compressed or plain text) file to avoid having inconsistent files (e.g., delete any existing item when writing item.gz)

Note that append cannot be used with gzip or with an existing gzipped file and in such a case a NotImplementedError will be raised. This may be allowed in the future, but as appending to a gzipped file (in general) results in inefficient compression, it is better to append to plain text and compress when done.

Parameters
  • dir – path to the database directory

  • name – name of the relation to write

  • records – iterable of records to write

  • fields – iterable of Field objects, optional if dir points to an existing test suite directory

  • append – if True, append to rather than overwrite the file

  • gzip – if True and the file is not empty, compress the file with gzip; if False, do not compress

  • encoding – character encoding of the file

Example

>>> tsdb.write('my-profile',
...            'item',
...            item_records,
...            schema['item'])

Database Directories

delphin.tsdb.initialize_database(path, schema, files=False)[source]

Initialize a bare database directory at path.

Initialization creates the directory at path if it does not exist, writes the schema, an deletes any existing files defined by the schema.

Warning

If path points to an existing directory, all relation files defined by the schema will be overwritten or deleted.

Parameters
  • path – the path to the destination database directory

  • schema – the destination database schema

  • files – if True, create an empty file for every relation in schema

delphin.tsdb.write_database(db, path, names=None, schema=None, gzip=False, encoding='utf-8')[source]

Write TSDB database db to path.

If path is an existing file (not a directory), a TSDBError is raised. If path is an existing directory, the files for all relations in the destination schema will be cleared. Every relation name in names must exist in the destination schema. If schema is given (even if it is the same as for db), every record will be remade (using make_record()) using the schema, and columns may be dropped or None values inserted as necessary, but no more sophisticated changes will be made.

Warning

If path points to an existing directory, all relation files defined by the schema will be overwritten or deleted.

Parameters
  • db – Database containing data to write

  • path – the path to the destination database directory

  • names – list of names of relations to write; if None use all relations in the destination schema

  • schema – the destination database schema; if None use the schema of db

  • gzip – if True, compress all non-empty files; if False, do not compress

  • encoding – character encoding for the database files

Basic Database Class

class delphin.tsdb.Database(path, autocast=False, encoding='utf-8')[source]

A basic abstraction of a TSDB database.

This class manages basic access into a TSDB database by loading its schema and allowing for named access to relation data.

Warning

Named access to relation data returns a generator iterator of an open file. Calling generator.close() or using an idiom like contextlib.closing() ensures that the file descriptor gets closed.

Parameters
  • path – path to the database directory

  • autocast – if True, automatically cast column values to their datatypes

  • encoding – character encoding of the database files

Example

>>> db = tsdb.Database('my-profile')
>>> items = db['item']
>>> first_record = next(items)
>>> items.close()
schema

The schema for the database.

autocast

Whether to automatically cast column values to their datatypes.

encoding

The character encoding of database files.

property path

The database directory’s path.

select_from(name, columns=None, cast=False)[source]

Yield values for columns from relation name.

Exceptions

exception delphin.tsdb.TSDBSchemaError(*args, **kwargs)[source]

Bases: delphin.tsdb.TSDBError

Raised when there is an error processing a TSDB schema.

exception delphin.tsdb.TSDBError(*args, **kwargs)[source]

Bases: delphin.exceptions.PyDelphinException

Raised when encountering invalid TSDB databases.

exception delphin.tsdb.TSDBWarning(*args, **kwargs)[source]

Bases: delphin.exceptions.PyDelphinWarning

Raised when encountering possibly invalid TSDB data.